From 475bd9331933d30262f843e75bf39b7537126f99 Mon Sep 17 00:00:00 2001 From: Andy Ko Date: Tue, 18 Apr 2017 08:45:50 -0700 Subject: [PATCH] Corrected a range of typos. --- comprehension.html | 10 ++++------ process.html | 24 ++++++++++++++++++++---- 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/comprehension.html b/comprehension.html index 44d395e..95bd27f 100644 --- a/comprehension.html +++ b/comprehension.html @@ -28,11 +28,9 @@

Program Comprehension

Andrew J. Ko
-

Despite all of the different activities that we've talked about so far—communicating, coordinating, planning, designing, architecting—really, most of a software engineers time is spent reading code (Maalej et al. 2014). Sometimes this is their own code, which makes this reading easier. Most of the time, it is someone else's code, whether it's a teammate's, or part of a library or API you're using. We call this reading program comprehension.

+

Despite all of the activities that we've talked about so far—communicating, coordinating, planning, designing, architecting—really, most of a software engineers time is spent reading code (Maalej et al. 2014). Sometimes this is their own code, which makes this reading easier. Most of the time, it is someone else's code, whether it's a teammate's, or part of a library or API you're using. We call this reading program comprehension.

Being good at program comprehension is a critical skill. You need to be able to read a function and know what it will do with its inputs; you need to be able to read a class and understand its state and functionality; you also need to be able to comprehend a whole implementation, understanding its architecture. Without these skills, you can't test well, you can't debug well, and you can fix or enhance the systems you're building or maintaining. In fact, studies of software engineers' first year at their first job show that a significant majority of their time is spent trying to simply comprehend the architecture of the system they are building or maintaining and understanding the processes that are being followed to modify and enhance them (Dagenais et al. 2010).

- -

What's going on when developers comprehend code? Usually, developers are trying to answer questions about code that help them build larger models of how a program works. Because program comprehension is hard, they avoid it when they can, relying on explanations from other developers rather than trying to build precise models of how a program works on their own (Roehm et al. 2012). When they do try to comprehend code, developers are usually trying to answer questions. Several studies have many general questions that developers must be able to answer in order to understand programs (Sillito et al. 2006, LaToza & Myers 2010). Here are over forty common questions that developers ask:

@@ -111,13 +109,13 @@ -

If you think about the massive diversity in this list, you can see why program comprehension requires expertise. You not only need to understand programming languages quite well, but you also need to have strategies for answering all of the questions above (and more) quickly, effectively, and accurately.

+

If you think about the diversity of questions in this list, you can see why program comprehension requires expertise. You not only need to understand programming languages quite well, but you also need to have strategies for answering all of the questions above (and more) quickly, effectively, and accurately.

-

So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question ((von Mayrhauser & Vans 1994), LaToza et al. 2007). Fundamentally, reading and comprehending source code is fundamentally different from those of reading and comprehending natural language (Binkley et al. 2013); what experts are doing is ultimately reasoning about dependencies between code (Weiser 1981). Dependencies include things like data dependencies (where a variable is used to compute something, what modifies a data structure, how data flows through a program, etc.) and control dependencies (which components call which functions, which events can trigger a function to be called, how a function is reached, etc.). All of the questions above fundamentally get at different types of data and control dependencies. In fact, theories of how developers navigate code by following these dependencies are highly predictive of what information a developer will seek next (Fleming et al. 2013), suggesting that expert behavior is highly routine.

+

So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question (von Mayrhauser & Vans 1994, LaToza et al. 2007). Reading and comprehending source code is fundamentally different from those of reading and comprehending natural language (Binkley et al. 2013); what experts are doing is ultimately reasoning about dependencies between code (Weiser 1981). Dependencies include things like data dependencies (where a variable is used to compute something, what modifies a data structure, how data flows through a program, etc.) and control dependencies (which components call which functions, which events can trigger a function to be called, how a function is reached, etc.). All of the questions above fundamentally get at different types of data and control dependencies. In fact, theories of how developers navigate code by following these dependencies are highly predictive of what information a developer will seek next (Fleming et al. 2013), suggesting that expert behavior is highly procedural.

While much of program comprehension is skill, some of it is determined by design. For example, some programming languages result in programs that are more comprehensible. One framework called the Cognitive Dimensions of Notations (Green 1989) lays out some of the tradeoffs in programming language design that result in these differences in comprehensibility. For example, one of the dimensions in the framework is consistency, which refers to how much of a notation can be guessed based on an initial understanding of a language. JavaScript is a low-consistency language because of operators like ==, which behave differently depending on what the type of the left and right operands are. Knowing the behavior for Booleans doesn't tell you the behavior for a Boolean being compared to an integer. In contrast, Java is a high consistency language: == is only ever valid when both operands are of the same type.

-

These differences in notation have real impact. Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages (Woodfield et al. 1981, Bhattacharya & Neamtiu). Declarative programming paradigms (like the JavaScript React) have greater comprehensibility than imperative languages (Salvaneschi et al. 2014). In general, languages that are statically typed result in fewer defects (Ray et la. 2014), better comprehensibility because of the ability to construct better documentation (Endrikat et al. 2014), and result in easier debugging (Hanenberg et al. 2013). In fact, studies of more dynamic languages like JavaScript and Smalltalk (Callaú et al. 2013) show that the dynamic features of these languages aren't really used all that much anywhere. It appears then that the more you tell a compiler about what your code means, the more it helps the other developers know what it means too.

+

These differences in notation have real impact. Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages (Woodfield et al. 1981, Bhattacharya & Neamtiu). Declarative programming paradigms (like the JavaScript view framework React) have greater comprehensibility than imperative languages (Salvaneschi et al. 2014). In general, languages that are statically typed result in fewer defects (Ray et la. 2014), better comprehensibility because of the ability to construct better documentation (Endrikat et al. 2014), and result in easier debugging (Hanenberg et al. 2013). In fact, studies of more dynamic languages like JavaScript and Smalltalk (Callaú et al. 2013) show that the dynamic features of these languages aren't really used all that much anyway. All of this evidence suggests that that the more you tell a compiler about what your code means (by declaring types, writing functional specifications, etc.), the more it helps the other developers know what it means too.

Code editors, development environments, and program comprehension tools can also be helpful. Early evidence showed that simple features like syntax highlighting and careful typographic choices can improve the speed of program comprehension (Baecker 1988). I have also worked on several tools to support program comprehension, including the Whyline, which automates many of the more challenging aspects of navigating dependencies in code, and visualizes them (Ko & Myers 2009):

diff --git a/process.html b/process.html index be76cd3..ed2bbb9 100644 --- a/process.html +++ b/process.html @@ -36,7 +36,7 @@ -

Around the same time, two influential books were published. Fred Brooks wrote The Mythical Man Month (Brooks 1995), a book about software project management, full of provocative ideas that would be tested over the next three decades, including the idea that adding more people to a project would not necessarily increase productivity. Tom DeMarco and Timothy Lister wrote another famous book, Peopleware: Productive Projects and Teams (DeMarco 1987), arguing that the major challenges in software engineering are human, not technical. Both of these works still represent some of the most widely-read statements of the problem of managing software development processes.

+

Around the same time, two influential books were published. Fred Brooks wrote The Mythical Man Month (Brooks 1995), a book about software project management, full of provocative ideas that would be tested over the next three decades, including the idea that adding more people to a project would not necessarily increase productivity. Tom DeMarco and Timothy Lister wrote another famous book, Peopleware: Productive Projects and Teams (DeMarco 1987), arguing that the major challenges in software engineering are human, not technical. Both of these works still represent some of the most widely-read statements of the problem of managing software development.

These early ideas in software project management led to a wide variety of other discoveries about process. For example, organizations of all sizes can improve their process if they are very aware of what the people in the organization know, what it's capable of learning, and if it builds robust processes to actually continually improve process (Dybȧ 2002, Dybȧ 2003). This might mean monitoring the pace of work, incentivizing engineers to reflect on inefficiencies in process, and teaching engineers how to be comfortable with process change.

@@ -48,13 +48,28 @@

Because of the importance of awareness and communication, the distance between teammates is also a critical factor. This is most visible in companies that hire remote developers, building distributed teams. The primary motivation for doing this is to reduce costs or gain access to engineering talent that is distant from a team's geographical center, but over time, companies have found that doing so necessitates significant investments in travel and socialization to ensure quality, minimizing geographical, temporal and cultural separation (Smite 2010). Researchers have found that there appear to be fundamental tradeoffs between productivity, quality, and/or profits in these settings (Ramasubbu et al. 2011). For example, more distance appears to lead to slower communication (Wagstrom & Datta 2014). Despite these tradeoffs, most rigorous studies of the cost of distributed development have found that when companies work hard to minimize temporal and cultural separation, the actual impact on defects was small (Kocaguneli et al. 2013). Some researchers have begun to explore even more extreme models of distributed development, hiring contract developers to complete microtasks over a few days without hiring them as employees; early studies suggest that these models have the worst of outcomes, with greater costs, poor scalability, and more significant quality issues (Stol & Fitzgerald 2014).

-

While all of these research was being conducted, industry explored its own ideas about process, devising frameworks that addressed issues of distance, pace, ownership, awareness, and process improvement. Extreme Programming (Beck 1999) was one of these frameworks and it was full of ideas: be iterative, do small releases, keep design simple, write unit tests, refactor to iterate, use pair programming, integrate continuously, everyone owns everything, use an open workspace, work sane hours. Beck described in his original proposal that these ideas were best for "outsourced or in-house development of small- to medium-sized systems where requirements are vague and likely to change", but as industry often does, it began hyping it as a universal solution to software project management woes and adopted all kinds of combinations of these ideas, adapting them to their existing processes. In reality, the value of XP appears to depend on highly project-specific factors (Müller & Padberk 2013), while the core ideas that industry has adopted are valuing feedback, communication, simplicity, and respect for individuals and the team (Sharp & Robinson 2004).

+

While all of these research was being conducted, industry explored its own ideas about process, devising frameworks that addressed issues of distance, pace, ownership, awareness, and process improvement. Extreme Programming (Beck 1999) was one of these frameworks and it was full of ideas:

+ + + +

Note that none of these had any empirical evidence to back them. Moreover, Beck described in his original proposal that these ideas were best for "outsourced or in-house development of small- to medium-sized systems where requirements are vague and likely to change", but as industry often does, it began hyping it as a universal solution to software project management woes and adopted all kinds of combinations of these ideas, adapting them to their existing processes. In reality, the value of XP appears to depend on highly project-specific factors (Müller & Padberk 2013), while the core ideas that industry has adopted are valuing feedback, communication, simplicity, and respect for individuals and the team (Sharp & Robinson 2004). Researchers continue to investigate the merits of the list above; for example, numerous studies have investigated the effects of pair programming on defects, finding small but measurable benefits (di Bella et al. 2012)

At the same time, Beck began also espousing the idea of "Agile" methods, which celebrated many of the values underlying Extreme Programming, such as focusing on individuals, keeping things simple, collaborating with customers, and being iterative. This idea of begin agile was even more popular and spread widely in industry and research, even though many of the same ideas appeared much earlier in Boehm's work on the Spiral model. Researchers found that Agile methods can increase developer enthusiasm (Syed-Abdulla et al. 2006), that agile teams need different roles such as Mentor, Co-ordinator, Translator, Champion, Promoter, and Terminator (Hoda et al. 2010), and that teams are combing agile methods with all kinds of process ideas from other project management frameworks such as Scrum (meet daily to plan work, plan two-week sprints, maintain a backlog of work) and Kanban (visualize the workflow, limit work-in-progress, manage flow, make policies explicit, and implement feedback loops) (Al-Baik & Miller 2015). I don't define any of these ideas here because there aren't standard definitions to share.

-

Ultimately, all of this energy around process ideas in industry is exciting, but disorganized. None of these efforts really get to the core of what makes software projects difficult to manage. One effort in research to get to this core by contributing new theories that explain these difficulties. The first is Herbsleb's Socio-Technical Theory of Coordination (STTC). The idea of the theory is quite simple: dependencies in engineering decisions (e.g., this function calls this other function, this data type stores this other data type) define the social constraints that the organization must solve in a variety of ways to build and maintain software (Herbsleb & Mockus 2003, Herbsleb 2016). The better the organization builds processes and awareness tools to ensure that the people who own those engineering dependencies are communicating and aware of each others' work, the fewer defects that will occur. Herbsleb referred this alignment as sociotechnical congruence, and conducted a number of studies demonstrating its predictive and explanatory power.

+

Ultimately, all of this energy around process ideas in industry is exciting, but disorganized. None of these efforts really get to the core of what makes software projects difficult to manage. One effort in research to get to this core by contributing new theories that explain these difficulties. The first is Herbsleb's Socio-Technical Theory of Coordination (STTC). The idea of the theory is quite simple: technical dependencies in engineering decisions (e.g., this function calls this other function, this data type stores this other data type) define the social constraints that the organization must solve in a variety of ways to build and maintain software (Herbsleb & Mockus 2003, Herbsleb 2016). The better the organization builds processes and awareness tools to ensure that the people who own those engineering dependencies are communicating and aware of each others' work, the fewer defects that will occur. Herbsleb referred this alignment as sociotechnical congruence, and conducted a number of studies demonstrating its predictive and explanatory power.

-

In my recent work (Ko 2017), I extend this idea to congruence with beliefs about product value, claiming that successful software products require the constant, collective communication and agreement of a coherent proposition of a product's value across UX, design, engineering, product, marketing, sales, support, and even customers. A team needs to achieve Herbsleb's sociotechnical congruence to have a successful product, but that alone is not enough: the rest of the organization has to have a consistent understanding of what is being built and why, even as that understanding evolves over time.

+

In my recent work (Ko 2017), I extend this idea to congruence with beliefs about product value, claiming that successful software products require the constant, collective communication and agreement of a coherent proposition of a product's value across UX, design, engineering, product, marketing, sales, support, and even customers. A team needs to achieve Herbsleb's sociotechnical congruence to have a successful product, but that alone is not enough: the rest of the organization has to have a consistent understanding of what is being built and why, even as that understanding evolves over time.

Next chapter: Comprehension
@@ -67,6 +82,7 @@

Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2011. Don't touch my code! Examining the effects of ownership on software quality. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE '11). ACM, New York, NY, USA, 4-14.

Boehm, B. W. (1988). A spiral model of software development and enhancement. Computer, 21(5), 61-72.

Brooks, F.P. (1995). The Mythical Man Month.

+

di Bella, E., Fronza, I., Phaphoom, N., Sillitti, A., Succi, G., & Vlasenko, J. (2013). Pair Programming and Software Defects--A Large, Industrial Case Study. IEEE Transactions on Software Engineering, 39(7), 930-953.

DeMarco, T. and Lister, T. (1987). Peopleware: Productive Projects and Teams.

Tore Dybȧ. 2003. Factors of software process improvement success in small and large organizations: an empirical study in the scandinavian context. In Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering (ESEC/FSE-11). ACM, New York, NY, USA, 148-157.

Dybȧ, T. (2002). Enabling software process improvement: an investigation of the importance of organizational issues. Empirical Software Engineering, 7(4), 387-390.