Fixed #35, clarifying consistency and static typing.

This commit is contained in:
Andy J. Ko 2019-04-23 20:16:14 -07:00
parent 6f38aed125
commit 364ef9585e

View file

@ -112,10 +112,25 @@
<p>So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question (<a href="#vonmay">von Mayrhauser & Vans 1994</a>, <a href="latoza2">LaToza et al. 2007</a>). Reading and comprehending source code is fundamentally different from those of reading and comprehending natural language (<a href="#binkley">Binkley et al. 2013</a>); what experts are doing is ultimately reasoning about <strong>dependencies</strong> between code (<a href="#weiser">Weiser 1981</a>). Dependencies include things like <strong>data dependencies</strong> (where a variable is used to compute something, what modifies a data structure, how data flows through a program, etc.) and <strong>control dependencies</strong> (which components call which functions, which events can trigger a function to be called, how a function is reached, etc.). All of the questions above fundamentally get at different types of data and control dependencies. In fact, theories of how developers navigate code by following these dependencies are highly predictive of what information a developer will seek next (<a href="#fleming">Fleming et al. 2013</a>), suggesting that expert behavior is highly procedural. This work, and work explicitly investigating the role of identifier names (<a href="#lawrie">Lawrie et al. 2006</a>), finds that names are actually critical to facilitating higher level comprehension of program behavior.</p> <p>So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question (<a href="#vonmay">von Mayrhauser & Vans 1994</a>, <a href="latoza2">LaToza et al. 2007</a>). Reading and comprehending source code is fundamentally different from those of reading and comprehending natural language (<a href="#binkley">Binkley et al. 2013</a>); what experts are doing is ultimately reasoning about <strong>dependencies</strong> between code (<a href="#weiser">Weiser 1981</a>). Dependencies include things like <strong>data dependencies</strong> (where a variable is used to compute something, what modifies a data structure, how data flows through a program, etc.) and <strong>control dependencies</strong> (which components call which functions, which events can trigger a function to be called, how a function is reached, etc.). All of the questions above fundamentally get at different types of data and control dependencies. In fact, theories of how developers navigate code by following these dependencies are highly predictive of what information a developer will seek next (<a href="#fleming">Fleming et al. 2013</a>), suggesting that expert behavior is highly procedural. This work, and work explicitly investigating the role of identifier names (<a href="#lawrie">Lawrie et al. 2006</a>), finds that names are actually critical to facilitating higher level comprehension of program behavior.</p>
<p>While much of program comprehension is skill, some of it is determined by design. For example, some programming languages result in programs that are more comprehensible. One framework called the <em>Cognitive Dimensions of Notations</em> (<a href="#green">Green 1989</a>) lays out some of the tradeoffs in programming language design that result in these differences in comprehensibility. For example, one of the dimensions in the framework is <strong>consistency</strong>, which refers to how much of a notation can be guessed based on an initial understanding of a language. JavaScript is a low-consistency language because of operators like <code>==</code>, which behave differently depending on what the type of the left and right operands are. Knowing the behavior for Booleans doesn't tell you the behavior for a Boolean being compared to an integer. In contrast, Java is a high consistency language: <code>==</code> is only ever valid when both operands are of the same type.</p> <p>
While much of program comprehension is skill, some of it is determined by design.
<p>These differences in notation have real impact. Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages (<a href="#woodfield">Woodfield et al. 1981</a>, <a href="#bhattacharya">Bhattacharya & Neamtiu 2011</a>). Declarative programming paradigms (like the JavaScript view framework <a href="https://facebook.github.io/react/">React</a>) have greater comprehensibility than imperative languages (<a href="#salvaneschi">Salvaneschi et al. 2014</a>). In general, languages that are statically typed result in fewer defects (<a href="#ray">Ray et la. 2014</a>), better comprehensibility because of the ability to construct better documentation (<a href="#endrikat">Endrikat et al. 2014</a>), and result in easier debugging (<a href="#hanenberg">Hanenberg et al. 2013</a>). In fact, studies of more dynamic languages like JavaScript and Smalltalk (<a href="#callau">Calla&uacute et al. 2013</a>) show that the dynamic features of these languages aren't really used all that much anyway. All of this evidence suggests that that the more you tell a compiler about what your code means (by declaring types, writing functional specifications, etc.), the more it helps the other developers know what it means too.</p> For example, some programming languages result in programs that are more comprehensible.
One framework called the <em>Cognitive Dimensions of Notations</em> (<a href="#green">Green 1989</a>) lays out some of the tradeoffs in programming language design that result in these differences in comprehensibility.
For example, one of the dimensions in the framework is <strong>consistency</strong>, which refers to how much of a notation can be <em>guessed</em> based on an initial understanding of a language.
JavaScript has low consistency because of operators like <code>==</code>, which behave differently depending on what the type of the left and right operands are.
Knowing the behavior for Booleans doesn't tell you the behavior for a Boolean being compared to an integer.
In contrast, Java is a high consistency language: <code>==</code> is only ever valid when both operands are of the same type.
</p>
<p>
These differences in notation have real impact.
Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages (<a href="#woodfield">Woodfield et al. 1981</a>, <a href="#bhattacharya">Bhattacharya & Neamtiu 2011</a>).
Declarative programming paradigms (like CSS or HTML) have greater comprehensibility than imperative languages (<a href="#salvaneschi">Salvaneschi et al. 2014</a>).
In general, statically typed languages like Java (which require developers to declare the data type of all variables) result in fewer defects (<a href="#ray">Ray et la. 2014</a>), better comprehensibility because of the ability to construct better documentation (<a href="#endrikat">Endrikat et al. 2014</a>), and result in easier debugging (<a href="#hanenberg">Hanenberg et al. 2013</a>).
In fact, studies of more dynamic languages like JavaScript and Smalltalk (<a href="#callau">Calla&uacute et al. 2013</a>) show that the dynamic features of these languages aren't really used all that much anyway.
All of this evidence suggests that that the more you tell a compiler about what your code means (by declaring types, writing functional specifications, etc.), the more it helps the other developers know what it means too.
</p>
<p>Code editors, development environments, and program comprehension tools can also be helpful. Early evidence showed that simple features like syntax highlighting and careful typographic choices can improve the speed of program comprehension (<a href="#baecker">Baecker 1988</a>). I have also worked on several tools to support program comprehension, including the Whyline, which automates many of the more challenging aspects of navigating dependencies in code, and visualizes them (<a href="#ko">Ko & Myers 2009</a>):</p> <p>Code editors, development environments, and program comprehension tools can also be helpful. Early evidence showed that simple features like syntax highlighting and careful typographic choices can improve the speed of program comprehension (<a href="#baecker">Baecker 1988</a>). I have also worked on several tools to support program comprehension, including the Whyline, which automates many of the more challenging aspects of navigating dependencies in code, and visualizes them (<a href="#ko">Ko & Myers 2009</a>):</p>
<p class="embed-responsive embed-responsive-16by9"> <p class="embed-responsive embed-responsive-16by9">