diff --git a/comprehension.html b/comprehension.html index ecf7871..7cb4647 100644 --- a/comprehension.html +++ b/comprehension.html @@ -21,7 +21,6 @@

Back to table of contents

- Credit: public domain @@ -113,10 +112,26 @@

So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question (von Mayrhauser & Vans 1994, LaToza et al. 2007). Reading and comprehending source code is fundamentally different from those of reading and comprehending natural language (Binkley et al. 2013); what experts are doing is ultimately reasoning about dependencies between code (Weiser 1981). Dependencies include things like data dependencies (where a variable is used to compute something, what modifies a data structure, how data flows through a program, etc.) and control dependencies (which components call which functions, which events can trigger a function to be called, how a function is reached, etc.). All of the questions above fundamentally get at different types of data and control dependencies. In fact, theories of how developers navigate code by following these dependencies are highly predictive of what information a developer will seek next (Fleming et al. 2013), suggesting that expert behavior is highly procedural. This work, and work explicitly investigating the role of identifier names (Lawrie et al. 2006), finds that names are actually critical to facilitating higher level comprehension of program behavior.

-

While much of program comprehension is skill, some of it is determined by design. For example, some programming languages result in programs that are more comprehensible. One framework called the Cognitive Dimensions of Notations (Green 1989) lays out some of the tradeoffs in programming language design that result in these differences in comprehensibility. For example, one of the dimensions in the framework is consistency, which refers to how much of a notation can be guessed based on an initial understanding of a language. JavaScript is a low-consistency language because of operators like ==, which behave differently depending on what the type of the left and right operands are. Knowing the behavior for Booleans doesn't tell you the behavior for a Boolean being compared to an integer. In contrast, Java is a high consistency language: == is only ever valid when both operands are of the same type.

- -

These differences in notation have real impact. Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages (Woodfield et al. 1981, Bhattacharya & Neamtiu 2011). Declarative programming paradigms (like the JavaScript view framework React) have greater comprehensibility than imperative languages (Salvaneschi et al. 2014). In general, languages that are statically typed result in fewer defects (Ray et la. 2014), better comprehensibility because of the ability to construct better documentation (Endrikat et al. 2014), and result in easier debugging (Hanenberg et al. 2013). In fact, studies of more dynamic languages like JavaScript and Smalltalk (Callaú et al. 2013) show that the dynamic features of these languages aren't really used all that much anyway. All of this evidence suggests that that the more you tell a compiler about what your code means (by declaring types, writing functional specifications, etc.), the more it helps the other developers know what it means too.

+

+ While much of program comprehension is skill, some of it is determined by design. + For example, some programming languages result in programs that are more comprehensible. + One framework called the Cognitive Dimensions of Notations (Green 1989) lays out some of the tradeoffs in programming language design that result in these differences in comprehensibility. + For example, one of the dimensions in the framework is consistency, which refers to how much of a notation can be guessed based on an initial understanding of a language. + JavaScript has low consistency because of operators like ==, which behave differently depending on what the type of the left and right operands are. + Knowing the behavior for Booleans doesn't tell you the behavior for a Boolean being compared to an integer. + In contrast, Java is a high consistency language: == is only ever valid when both operands are of the same type. +

+

+ These differences in notation can have some impact. + Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages (Woodfield et al. 1981, Bhattacharya & Neamtiu 2011). + Declarative programming paradigms (like CSS or HTML) have greater comprehensibility than imperative languages (Salvaneschi et al. 2014). + Statically typed languages like Java (which require developers to declare the data type of all variables) result in fewer defects (Ray et la. 2014), better comprehensibility because of the ability to construct better documentation (Endrikat et al. 2014), and result in easier debugging (Hanenberg et al. 2013). + In fact, studies of more dynamic languages like JavaScript and Smalltalk (Callaú et al. 2013) show that the dynamic features of these languages aren't really used all that much anyway. + Despite all of these measurable differences, the impact of notation seems to be modest in practice (Ray et al. 2014). + All of this evidence suggests that that the more you tell a compiler about what your code means (by declaring types, writing functional specifications, etc.), the more it helps the other developers know what it means too, but that this doesn't translate into huge differences in defects. +

+

Code editors, development environments, and program comprehension tools can also be helpful. Early evidence showed that simple features like syntax highlighting and careful typographic choices can improve the speed of program comprehension (Baecker 1988). I have also worked on several tools to support program comprehension, including the Whyline, which automates many of the more challenging aspects of navigating dependencies in code, and visualizes them (Ko & Myers 2009):

diff --git a/process.html b/process.html index e0265ad..f2d4146 100644 --- a/process.html +++ b/process.html @@ -29,7 +29,7 @@

So you know what you're going to build and how you're going to build it. What process should you go about building it? Who's going to build what? What order should you build it in? How do you make sure everyone is in sync while you're building it? And most importantly, how to do you make sure you build well and on time? These are fundamental questions in software engineering with many potential answers. Unfortunately, we still don't know which of those answers are right.

-

At the foundation of all of these questions are basic matters of project management: plan, execute, and monitor. But developers in the 1970's and on found that traditional project management ideas didn't seem to work. The earliest process ideas followed a "waterfall" model, in which a project begins by identifying requirements, writing specifications, implementing, testing, and releasing, all under the assumption that every stage could be fully tested and verified. (Recognize this? It's the order of topics we're discussing in this class!) Many managers seemed to like the waterfall model because it seemed structured and predictable; however, because most managers were originally software developers, they preferred a structured approach to project management (Weinberg 1982). The reality, however, was that no matter how much verification one did of each of these steps, there always seemed to be more information in later steps that caused a team to reconsider it's earlier decision (e.g., imagine a customer liked a requirement when it was described in the abstract, but when it was actually built, they rejected it, because they finally saw what the requirement really meant).

+

At the foundation of all of these questions are basic matters of project management: plan, execute, and monitor. But developers in the 1970's and on found that traditional project management ideas didn't seem to work. The earliest process ideas followed a "waterfall" model, in which a project begins by identifying requirements, writing specifications, implementing, testing, and releasing, all under the assumption that every stage could be fully tested and verified. (Recognize this? It's the order of topics we're discussing in this class!). Many managers seemed to like the waterfall model because it seemed structured and predictable; however, because most managers were originally software developers, they preferred a structured approach to project management (Weinberg 1982). The reality, however, was that no matter how much verification one did of each of these steps, there always seemed to be more information in later steps that caused a team to reconsider it's earlier decision (e.g., imagine a customer liked a requirement when it was described in the abstract, but when it was actually built, they rejected it, because they finally saw what the requirement really meant).

In 1988, Barry Boehm proposed an alternative to waterfall called the Spiral model (Boehm 1988): rather than trying to verify every step before proceeding to the next level of detail, prototype every step along the way, getting partial validation, iteratively converging through a series of prototypes toward both an acceptable set of requirements and an acceptable product. Throughout, risk assessment is key, encouraging a team to reflect and revise process based on what they are learning. What was important about these ideas were not the particulars of Boehm's proposed process, but the disruptive idea that iteration and process improvement are critical to engineering great software.

@@ -47,7 +47,16 @@

Because of the importance of awareness and communication, the distance between teammates is also a critical factor. This is most visible in companies that hire remote developers, building distributed teams. The primary motivation for doing this is to reduce costs or gain access to engineering talent that is distant from a team's geographical center, but over time, companies have found that doing so necessitates significant investments in travel and socialization to ensure quality, minimizing geographical, temporal and cultural separation (Smite 2010). Researchers have found that there appear to be fundamental tradeoffs between productivity, quality, and/or profits in these settings (Ramasubbu et al. 2011). For example, more distance appears to lead to slower communication (Wagstrom & Datta 2014). Despite these tradeoffs, most rigorous studies of the cost of distributed development have found that when companies work hard to minimize temporal and cultural separation, the actual impact on defects was small (Kocaguneli et al. 2013). Some researchers have begun to explore even more extreme models of distributed development, hiring contract developers to complete microtasks over a few days without hiring them as employees; early studies suggest that these models have the worst of outcomes, with greater costs, poor scalability, and more significant quality issues (Stol & Fitzgerald 2014).

-

While all of these research was being conducted, industry explored its own ideas about process, devising frameworks that addressed issues of distance, pace, ownership, awareness, and process improvement. Extreme Programming (Beck 1999) was one of these frameworks and it was full of ideas:

+

+ A critical part of ensuring all that a team is successful is having someone responsible for managing these factors of distance, pace, ownership, awareness, and overall process. + The most obvious person to oversee this is, of course, a project manager. + Research on what skills software engineering project managers need suggests that while some technical knowledge is necessary, it the soft skills necessary for managing all of these factors in communication and coordination that distinguish great managers (Kalliamvakou et al. 2017). +

+ +

+ While all of this research has strong implications for practice, industry has largely explored its own ideas about process, devising frameworks that addressed issues of distance, pace, ownership, awareness, and process improvement. + Extreme Programming (Beck 1999) was one of these frameworks and it was full of ideas: +