diff --git a/comprehension.html b/comprehension.html index 8419238..a49f3e1 100644 --- a/comprehension.html +++ b/comprehension.html @@ -111,7 +111,7 @@

If you think about the diversity of questions in this list, you can see why program comprehension requires expertise. You not only need to understand programming languages quite well, but you also need to have strategies for answering all of the questions above (and more) quickly, effectively, and accurately.

-

So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question (von Mayrhauser & Vans 1994, LaToza et al. 2007). Reading and comprehending source code is fundamentally different from those of reading and comprehending natural language (Binkley et al. 2013); what experts are doing is ultimately reasoning about dependencies between code (Weiser 1981). Dependencies include things like data dependencies (where a variable is used to compute something, what modifies a data structure, how data flows through a program, etc.) and control dependencies (which components call which functions, which events can trigger a function to be called, how a function is reached, etc.). All of the questions above fundamentally get at different types of data and control dependencies. In fact, theories of how developers navigate code by following these dependencies are highly predictive of what information a developer will seek next (Fleming et al. 2013), suggesting that expert behavior is highly procedural.

+

So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question (von Mayrhauser & Vans 1994, LaToza et al. 2007). Reading and comprehending source code is fundamentally different from those of reading and comprehending natural language (Binkley et al. 2013); what experts are doing is ultimately reasoning about dependencies between code (Weiser 1981). Dependencies include things like data dependencies (where a variable is used to compute something, what modifies a data structure, how data flows through a program, etc.) and control dependencies (which components call which functions, which events can trigger a function to be called, how a function is reached, etc.). All of the questions above fundamentally get at different types of data and control dependencies. In fact, theories of how developers navigate code by following these dependencies are highly predictive of what information a developer will seek next (Fleming et al. 2013), suggesting that expert behavior is highly procedural. This work, and work explicitly investigating the role of identifier names (Lawrie et al. 2006), finds that names are actually critical to facilitating higher level comprehension of program behavior.

While much of program comprehension is skill, some of it is determined by design. For example, some programming languages result in programs that are more comprehensible. One framework called the Cognitive Dimensions of Notations (Green 1989) lays out some of the tradeoffs in programming language design that result in these differences in comprehensibility. For example, one of the dimensions in the framework is consistency, which refers to how much of a notation can be guessed based on an initial understanding of a language. JavaScript is a low-consistency language because of operators like ==, which behave differently depending on what the type of the left and right operands are. Knowing the behavior for Booleans doesn't tell you the behavior for a Boolean being compared to an integer. In contrast, Java is a high consistency language: == is only ever valid when both operands are of the same type.

@@ -142,6 +142,7 @@

Ko, A. J., & Myers, B. A. (2009, April). Finding causes of program output with the Java Whyline. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1569-1578).

Thomas D. LaToza and Brad A. Myers. 2010. Developers ask reachability questions. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE '10), Vol. 1. ACM, New York, NY, USA, 185-194.

Thomas D. LaToza, David Garlan, James D. Herbsleb, and Brad A. Myers. 2007. Program comprehension as fact finding. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering (ESEC-FSE '07). ACM, New York, NY, USA, 361-370.

+

Lawrie, D., Morrell, C., Feild, H., & Binkley, D. (2006, June). What's in a name? a study of identifiers. IEEE International Conference on Program Comprehension, 3-12.

Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke. 2014. On the Comprehension of Program Comprehension. ACM Transactions on Software Engineering and Methodology. 23, 4, Article 31 (September 2014), 37 pages.

A. von Mayrhauser and A. M. Vans. 1994. Comprehension processes during large scale maintenance. In Proceedings of the 16th international conference on Software engineering (ICSE '94). IEEE Computer Society Press, Los Alamitos, CA, USA, 39-48.

Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in GitHub. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 155-165.