commit 5b15ce12a6a735de00f28b818aa0f07488f2dc6e Author: Andy Ko Date: Sun Apr 16 11:59:49 2017 -0700 First check-in of book. diff --git a/architecture.html b/architecture.html new file mode 100644 index 0000000..4dcf91c --- /dev/null +++ b/architecture.html @@ -0,0 +1,91 @@ + + + + + + + + + + + + + + + + + + Architecture + + + +

Back to table of contents

+ + + + Credit: Creative Commons 0 + + +

Architecture

+
Andrew J. Ko
+ +

Once you have a sense of what your design must do (in the form of requirements or other less formal specifications), the next big problem is one of organization. How will you order all of the different data, algorithms, and control implied by your requirements? With a small program of a few hundred lines, you can get away without much organization, but as programs scale, they quickly become impossible to manage alone, let alone with multiple developers. Much of this challenge occurs because requirements change, and every time they do, code has to change to accommodate. The more code their is and the more entangled it is, the harder it is to change and more likely you are to break things.

+ +

This is where architecture comes in. Architecture is a way of organizing code, just like building architecture is a way of organizing space. The idea of software architecture has at its foundation a principle of information hiding: the less a part of a program knows about other parts of a program, the easier it is to change. The most popular information hiding strategy is encapsulation: this is the idea of designing self-contained abstractions with well-defined interfaces that separate different concerns in a program. Programming languages offer encapsulation support through things like functions and classes, which encapsulate data and functionality together. Another programming language encapsulation method is scoping, which hides variables and other names from other parts of program outside a scope. All of these strategies attempt to encourage developers to maximize information hiding and separation of concerns. If you get your encapsulation right, you should be able to easily make changes to a program's behavior without having to change everything about it's implementation.

+ +

When encapsulation strategies fail, one can end up with what some affectionately call a "ball of mud" architecture or "spaghetti code". A more precise concept is cross-cutting concerns, which are things like features and functionality that span multiple different components of a system, or even an entire system. There is some evidence that cross-cutting concerns can lead to difficulties in program comprehension and long-term design degradation (Walker et al. 2012), all of which reduce productivity and increase the risk of defects. As long-lived systems get harder to change, they can take on technical debt, which is the degree to which an implementation is out of sync with a team's understanding of what a product is intended to be. Many developers view such debt as emerging from primarily from poor architectural decisions (Ernst et al. 2015). Over time, this debt can further result in organizational challenges (Khadka et al. 2014), making change even more difficult.

+ +

The preventative solution to this problems is to try to design architecture up front, mitigating the various risks that come from cross-cutting concerns (defects, low modifiability, etc.) (Fairbanks 2010). A popular method in the 1990's was the Unified Modeling Language (UML), which was a series of notations for expressing the architectural design of a system before implementing it. Recent studies show that UML generally not used and generally not universal (Petre 2013). More recent developers have investigated ideas of architectural styles, which are patterns of interactions and information exchange between encapsulated components. Some common architectural styles include:

+ + + +

Architectural styles come in all shapes and sizes. Some are smaller design patterns of information sharing (Beck et al. 2006), whereas others are ubiquitous but specialized patterns such as the architectures required to support undo and cancel in user interfaces (Bass et al. 2004).

+ +

One fundamental unit of which an architecture is composed is a component. This is basically a word that refers to any abstraction—any code, really—that attempts to encapsulate some well defined functionality or behavior separate from other functionality and behavior. Components have interfaces that decide how it can communicate with other components. It might be a class, a data structure, a set of functions, a library, or even something like a web service. All of these are abstractions that encapsulate interrelated computation and state. The second fundamental unit of architecture is connectors. Connectors are abstractions (code) that transmit information between components. They're brokers that connect components, but do not necessarily have meaningful computation or state of their own. Connectors can be things like function calls, web service API calls, events, requests, and so on.

+ +

Even with carefully selected architectures, systems can still be difficult to put together, leading to architectural mismatch (Garlan et al. 1995). When mismatch occurs, connecting two styles can require dramatic amounts of code to connect, imposing significant risk of defects and cost of maintenance. One common example of mismatches occurs with the ubiquitous use of database schemas with client/server web-applications. A single change in a database schema can often result in dramatic changes in an application, as every line of code that uses that part of the scheme either directly or indirectly must be updated (Qiu et al. 2013). This kind of mismatch occurs because the component that manages data (the database) and the component that renders data (the user interface) is highly "coupled" with the database schema: the user interface needs to know a lot about the data, its meaning, and its structure in order to render it meaningfully.

+ +

The most common approach to dealing with both architectural mismatch and the changing of requirements over time is refactoring, which means changing the architecture of an implementation without changing its behavior. Refactoring is something most developers do as part of changing a system (Murphy-Hill et al 2009, Silva et al. 2016). Refactoring code to eliminate mismatch and technical debt can simplify change in the future, saving time (Ng et al. 2006) and prevent future defects (Kim et al. 2012). + +

Next chapter: Specifications
+ +

Further reading

+ + + +

Len Bass, Bonnie E. John, Natalia Juristo, and Maria-Isabel Sanchez-Segura. 2004. Usability-Supporting Architectural Patterns. In Proceedings of the 26th International Conference on Software Engineering (ICSE '04). IEEE Computer Society, Washington, DC, USA, 716-717.

+

Kent Beck, Ron Crocker, Gerard Meszaros, John Vlissides, James O. Coplien, Lutz Dominick, and Frances Paulisch. 1996. Industrial experience with design patterns. In Proceedings of the 18th international conference on Software engineering (ICSE '96). IEEE Computer Society, Washington, DC, USA, 103-114.

+

Jürgen Cito, Philipp Leitner, Thomas Fritz, and Harald C. Gall. 2015. The making of cloud applications: an empirical study on software development for the cloud. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 393-403.

+

Neil A. Ernst, Stephany Bellomo, Ipek Ozkaya, Robert L. Nord, and Ian Gorton. 2015. Measure it? Manage it? Ignore it? Software practitioners and technical debt. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 50-60.

+

Fairbanks, G. (2010). Just enough software architecture: a risk-driven approach. Marshall & Brainerd.

+

Garlan, D., Allen, R., & Ockerbloom, J. (1995). Architectural mismatch or why it's hard to build systems out of existing parts. In Proceedings of the 17th international conference on Software engineering (pp. 179-185).

+

Ravi Khadka, Belfrit V. Batlajery, Amir M. Saeidi, Slinger Jansen, and Jurriaan Hage. 2014. How do professionals perceive legacy systems and software modernization? In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 36-47.

+

Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. 2012. A field study of refactoring challenges and benefits. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE '12). ACM, New York, NY, USA, , Article 50 , 11 pages.

+

Emerson Murphy-Hill, Chris Parnin, and Andrew P. Black. 2009. How we refactor, and how we know it. In Proceedings of the 31st International Conference on Software Engineering (ICSE '09). IEEE Computer Society, Washington, DC, USA, 287-297.

+

T. H. Ng, S. C. Cheung, W. K. Chan, and Y. T. Yu. 2006. Work experience versus refactoring to design patterns: a controlled experiment. In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering (SIGSOFT '06/FSE-14). ACM, New York, NY, USA, 12-22.

+

Marian Petre. 2013. UML in practice. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA, 722-731.

+

Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente. 2016. Why we refactor? Confessions of GitHub contributors. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 858-870.

+

Dong Qiu, Bixin Li, and Zhendong Su. 2013. An empirical analysis of the co-evolution of schema and code in database applications. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY, USA, 125-135.

+

Robert J. Walker, Shreya Rawal, and Jonathan Sillito. 2012. Do crosscutting concerns cause modularity problems? In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE '12). ACM, New York, NY, USA, , Article 49 , 11 pages.

+ +
+ +

Podcasts

+ + + +

Software Engineering Daily, React JS with Sebastian Marbage and Christopher Chedeua

+ +
+ + + + + + + diff --git a/communication.html b/communication.html new file mode 100644 index 0000000..82d2826 --- /dev/null +++ b/communication.html @@ -0,0 +1,91 @@ + + + + + + + + + + + + + + + + + + + Communication + + + +

Back to table of contents

+ + + Credit: public domain + +

Communication

+
Andrew J. Ko
+ +

Because software engineering often times distributes work across multiple people, a fundamental challenge in software engineering is ensuring that everyone on a team has the same understanding of what is being built and why. In the seminal book "The Mythical Man Month", Fred Brooks argued that good software needs to have conceptual integrity, both in how it is designed, but also how it is implemented (Brooks 1995). When multiple people are responsible for implementing a single coherent idea, how can they ensure they are building the same idea?

+ +

The solution is effective communication. When communication is poor and teams are disconnected, software defects are the result (Bettenburg & Hassan 2013). The social relationships in a team also play a large role in structuring how projects evolve (Zhou & Mockus 2011). Perhaps the most notable theory underlying these ideas is Conway's Law (Conway 1968), which argues that any designed system—software included—will reflect the social relationships behind it's design. For example, look at any online banking website: the way the application is designed, how information is organized, the terminology that is used, and even the visual design is a reflection of how the teams inside that bank are organized and socially connected.

+ +

Because communication is so central, software engineers are constantly seeking information to further their work, going to their coworkers' desks, emailing them, chatting via messaging platforms, and even using social media (Ko et al. 2007). Some of the information that developers are seeking is easier to find than others. For example, in the study I just cited, it was pretty trivial to find information about how wrote a line of code or whether a build was done, but when the information they needed resided in someone else's head (e.g., why a particular line of code was written), it was slow or often impossible to retrieve it. Sometimes it's not even possible to find out who has the information. Researchers have investigated tools for trying to quantify expertise by automatically analyzing the code that developers have written, building platforms to help developers search for other developers who might know what they need to know (Mockus & Herbsleb 2002, Begel et al. 2010).

+ +

Communication is not always effective. In fact, there are many kinds of communication that are highly problematic in software engineering teams. For example, Perlow (1999) conducted an ethnography of one team and found a highly dysfunctional use of interruptions in which the most expert members of a team were constantly interrupted to "fight fires" (immediately address critical problems) in other parts of the organization, and then the organization rewarded them for their heroics. This not only made the most expert engineers less productive, but it also disincentivized the rest of the organization to find effective ways of preventing the disasters from occurring in the first place. Not all interruptions are bad, and they can increase productivity, but they do increase stress (Mark et al. 2008).

+ +

Communication isn't just about transmitting information; it's also about relationships and identity. For example, the dominant culture of many software engineering work environments—and even the perceived culture—is one that can deter many people from even pursuing careers in computer science. Modern work environments are still dominated by men, who speak loudly, out of turn, and disrespectfully, with some even bordering on sexual harassment. These are not the conditions for trusting, effective communication.

+ +

When communication is effective, it still takes time. One of the key strategies for reducing the amount of communication necessary is knowledge sharing tools, which broadly refers to any information system that stores facts that developers would normally have to retrieve from a person. By storing them in a database and making them easy to search, teams can avoid interruptions. The most common knowledge sharing tools in software teams are issue trackers, which are often at the center of communication not only between developers, but also with every other part of a software organization (Bertram et al. 2010). Community portals, such as GitHub pages or Slack teams, can also be effective ways of sharing documents and archiving decisions (Treude & Storey 2011). Perhaps the most popular knowledge sharing tool in software engineering today is Stack Overflow, which archives facts about programming language and API usage.

+ +

Because all of this knowledge is so critical to progress, when developers leave an organization and haven't archived their knowledge somewhere, it can be quite disruptive to progress. Organizations often have single points of failure, in which a single developer may be critical to a team's ability to maintain and enhance a software product (Rigby et al. 2016). When newcomers join a team and lack the right knowledge, they introduce defects (Foucault et al. 2015). Some companies try to mitigate this by rotating developers between projects, "cross-training" them to ensure that the necessary knowledge to maintain a project is distributed across multiple engineers.

+ +

What does all of this mean for you as an individual developer? To put it simply, don't underestimate the importance of talking. Know who you need to talk to, talk to them frequently, and to the extent that you can, write down what you know both to lessen the demand for talking and mitigate the risk of you not being available, but also to make your knowledge more precise and accessible in the future. It often takes decades for engineers to excel at communication. The very fact that you know why communication is important gives you an critical head start.

+ +
Next chapter: Productivity
+ +

Further reading

+ + + +

Salah Bendifallah and Walt Scacchi. 1989. Work structures and shifts: an empirical analysis of software specification teamwork. In Proceedings of the 11th international conference on Software engineering (ICSE '89). ACM, New York, NY, USA, 260-270.

+

Andrew Begel, Yit Phang Khoo, and Thomas Zimmermann. 2010. Codebook: discovering and exploiting relationships in software repositories. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE '10), Vol. 1. ACM, New York, NY, USA, 125-134.

+

Dane Bertram, Amy Voida, Saul Greenberg, and Robert Walker. 2010. Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams. In Proceedings of the 2010 ACM conference on Computer supported cooperative work (CSCW '10). ACM, New York, NY, USA, 291-300.

+

Bettenburg, N., & Hassan, A. E. (2013). Studying the impact of social interactions on software quality. Empirical Software Engineering, 18(2), 375-431.

+

Brooks, F.B. (1995). The Mythical Man-Month: Essays on Software Engineering, Addison-Wesley.

+

Conway, M. E. (1968). How do committees invent. Datamation, 14(4), 28-31.

+

Torgeir Dingsøyr and Emil Røyrvik. 2003. An empirical study of an informal knowledge repository in a medium-sized software consulting company. In Proceedings of the 25th International Conference on Software Engineering (ICSE '03). IEEE Computer Society, Washington, DC, USA, 84-92.

+

Matthieu Foucault, Marc Palyart, Xavier Blanc, Gail C. Murphy, and Jean-Rémy Falleri. 2015. Impact of developer turnover on quality in open-source software. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 829-841.

+

Andrew J. Ko, Robert DeLine, and Gina Venolia. 2007. Information Needs in Collocated Software Development Teams. In Proceedings of the 29th international conference on Software Engineering (ICSE '07). IEEE Computer Society, Washington, DC, USA, 344-353.

+

Mark, G., Gudith, D., & Klocke, U. (2008, April). The cost of interrupted work: more speed and stress. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (pp. 107-110).

+

Audris Mockus. 2010. Organizational volatility and its effects on software defects. In Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering (FSE '10). ACM, New York, NY, USA, 117-126.

+

Audris Mockus and James D. Herbsleb. 2002. Expertise browser: a quantitative approach to identifying expertise. In Proceedings of the 24th International Conference on Software Engineering (ICSE '02). ACM, New York, NY, USA, 503-512.

+

Perlow, L. A. (1999). The time famine: Toward a sociology of work time. Administrative science quarterly, 44(1), 57-81.

+

Pikkarainen, M., Haikara, J., Salo, O., Abrahamsson, P., & Still, J. (2008). The impact of agile practices on communication in software development. Empirical Software Engineering, 13(3), 303-337.

+

Peter C. Rigby, Yue Cai Zhu, Samuel M. Donadelli, and Audris Mockus. 2016. Quantifying and mitigating turnover-induced knowledge loss: case studies of chrome and a project at Avaya. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 1006-1016.

+

Ronnie E. S. Santos, Fabio Q. B. da Silva, Cleyton V. C. de Magalhães, and Cleviton V. F. Monteiro. 2016. Building a theory of job rotation in software engineering from an instrumental case study. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 971-981.

+

Sfetsos, P., Stamelos, I., Angelis, L., & Deligiannis, I. (2009). An experimental investigation of personality types impact on pair effectiveness in pair programming. Empirical Software Engineering, 14(2), 187.

+

Christoph Treude and Margaret-Anne Storey. 2011. Effective communication of software development knowledge through community portals. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE '11). ACM, New York, NY, USA, 91-101.

+

Christoph Treude and Margaret-Anne Storey. 2009. How tagging helps bridge the gap between social and technical aspects in software development. In Proceedings of the 31st International Conference on Software Engineering (ICSE '09). IEEE Computer Society, Washington, DC, USA, 12-22.

+

Keiji Uemura and Miki Ohori. 1984. A cooperative approach to software development by application engineers and software engineers. In Proceedings of the 7th international conference on Software engineering (ICSE '84). IEEE Press, Piscataway, NJ, USA, 86-96.

+

Minghui Zhou and Audris Mockus. 2011. Does the initial environment impact the future of developers? In Proceedings of the 33rd International Conference on Software Engineering (ICSE '11). ACM, New York, NY, USA, 271-280.

+ +
+ +

Podcasts

+ + + +

Software Engineering Daily. Female Pursuit of Computer Science with Jennifer Wang.

+

Software Engineering Daily. The State of Programming with Stack Overflow Co-Founder Jeff Atwood.

+ +
+ + + + + + + diff --git a/comprehension.html b/comprehension.html new file mode 100644 index 0000000..44d395e --- /dev/null +++ b/comprehension.html @@ -0,0 +1,173 @@ + + + + + + + + + + + + + + + + + + Comprehension + + + +

Back to table of contents

+ + + + Credit: public domain + +

Program Comprehension

+
Andrew J. Ko
+ +

Despite all of the different activities that we've talked about so far—communicating, coordinating, planning, designing, architecting—really, most of a software engineers time is spent reading code (Maalej et al. 2014). Sometimes this is their own code, which makes this reading easier. Most of the time, it is someone else's code, whether it's a teammate's, or part of a library or API you're using. We call this reading program comprehension.

+ +

Being good at program comprehension is a critical skill. You need to be able to read a function and know what it will do with its inputs; you need to be able to read a class and understand its state and functionality; you also need to be able to comprehend a whole implementation, understanding its architecture. Without these skills, you can't test well, you can't debug well, and you can fix or enhance the systems you're building or maintaining. In fact, studies of software engineers' first year at their first job show that a significant majority of their time is spent trying to simply comprehend the architecture of the system they are building or maintaining and understanding the processes that are being followed to modify and enhance them (Dagenais et al. 2010).

+ + + +

What's going on when developers comprehend code? Usually, developers are trying to answer questions about code that help them build larger models of how a program works. Because program comprehension is hard, they avoid it when they can, relying on explanations from other developers rather than trying to build precise models of how a program works on their own (Roehm et al. 2012). When they do try to comprehend code, developers are usually trying to answer questions. Several studies have many general questions that developers must be able to answer in order to understand programs (Sillito et al. 2006, LaToza & Myers 2010). Here are over forty common questions that developers ask:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Which type represents this domain concept or this UI element or action?Where in the code is the text in this error message or UI element?
Where is there any code involved in the implementation of this behavior?Is there an entity named something like this in that unit (for example in a project, package or class)?
What are the parts of this type?Which types is this type a part of?
Where does this type fit in the type hierarchy?Does this type have any siblings in the type hierarchy?
Where is this field declared in the type hierarchy?Who implements this interface or these abstract methods?
Where is this method called or type referenced?When during the execution is this method called?
Where are instances of this class created?Where is this variable or data structure being accessed?
What data can we access from this object?What does the declaration or definition of this look like?
What are the arguments to this function?What are the values of these arguments at runtime?
What data is being modified in this code?How are instances of these types created and assembled?
How are these types or objects related?How is this feature or concern (object ownership, UI control, etc) implemented?
What in this structure distinguishes these cases?What is the "correct" way to use or access this data structure?
How does this data structure look at runtime?How can data be passed to (or accessed at) this point in the code?
How is control getting (from here to) here?Why isn't control reaching this point in the code?
Which execution path is being taken in this case?Under what circumstances is this method called or exception thrown?
What parts of this data structure are accessed in this code?How does the system behavior vary over these types or cases?
What are the differences between these files or types?What is the difference between these similar parts of the code (e.g., between sets of methods)?
What is the mapping between these UI types and these model types?How can we know this object has been created and initialized correctly?
+ +

If you think about the massive diversity in this list, you can see why program comprehension requires expertise. You not only need to understand programming languages quite well, but you also need to have strategies for answering all of the questions above (and more) quickly, effectively, and accurately.

+ +

So how do developers go about answering these questions? Studies comparing experts and novices show that experts use prior knowledge that they have about architecture, design patterns, and the problem domain a program is built for to know what questions to ask and how to answer them, whereas novices use surface features of code, which leads them to spend considerable time reading code that is irrelevant to a question ((von Mayrhauser & Vans 1994), LaToza et al. 2007). Fundamentally, reading and comprehending source code is fundamentally different from those of reading and comprehending natural language (Binkley et al. 2013); what experts are doing is ultimately reasoning about dependencies between code (Weiser 1981). Dependencies include things like data dependencies (where a variable is used to compute something, what modifies a data structure, how data flows through a program, etc.) and control dependencies (which components call which functions, which events can trigger a function to be called, how a function is reached, etc.). All of the questions above fundamentally get at different types of data and control dependencies. In fact, theories of how developers navigate code by following these dependencies are highly predictive of what information a developer will seek next (Fleming et al. 2013), suggesting that expert behavior is highly routine.

+ +

While much of program comprehension is skill, some of it is determined by design. For example, some programming languages result in programs that are more comprehensible. One framework called the Cognitive Dimensions of Notations (Green 1989) lays out some of the tradeoffs in programming language design that result in these differences in comprehensibility. For example, one of the dimensions in the framework is consistency, which refers to how much of a notation can be guessed based on an initial understanding of a language. JavaScript is a low-consistency language because of operators like ==, which behave differently depending on what the type of the left and right operands are. Knowing the behavior for Booleans doesn't tell you the behavior for a Boolean being compared to an integer. In contrast, Java is a high consistency language: == is only ever valid when both operands are of the same type.

+ +

These differences in notation have real impact. Encapsulation through data structures leads to better comprehension that monolithic or purely functional languages (Woodfield et al. 1981, Bhattacharya & Neamtiu). Declarative programming paradigms (like the JavaScript React) have greater comprehensibility than imperative languages (Salvaneschi et al. 2014). In general, languages that are statically typed result in fewer defects (Ray et la. 2014), better comprehensibility because of the ability to construct better documentation (Endrikat et al. 2014), and result in easier debugging (Hanenberg et al. 2013). In fact, studies of more dynamic languages like JavaScript and Smalltalk (Callaú et al. 2013) show that the dynamic features of these languages aren't really used all that much anywhere. It appears then that the more you tell a compiler about what your code means, the more it helps the other developers know what it means too.

+ +

Code editors, development environments, and program comprehension tools can also be helpful. Early evidence showed that simple features like syntax highlighting and careful typographic choices can improve the speed of program comprehension (Baecker 1988). I have also worked on several tools to support program comprehension, including the Whyline, which automates many of the more challenging aspects of navigating dependencies in code, and visualizes them (Ko & Myers 2009):

+ +

+ +

+ +

The path from novice to expert in program comprehension is one that involves understanding programming language semantics exceedingly well and reading a lot of code, design patterns, and architectures. Anticipate that as you develop these skills, it will take you time to build robust understandings of what a program is doing, slowing down your writing, testing, and debugging.

+ +
Next chapter: Verification
+ +

Further reading

+ + + +

R. Baecker. 1988. Enhancing program readability and comprehensibility with tools for program visualization. In Proceedings of the 10th international conference on Software engineering (ICSE '88). IEEE Computer Society Press, Los Alamitos, CA, USA, 356-366.

+

Pamela Bhattacharya and Iulian Neamtiu. 2011. Assessing programming language impact on development and maintenance: a study on C and C++. In Proceedings of the 33rd International Conference on Software Engineering (ICSE '11). ACM, New York, NY, USA, 171-180.

+

Binkley, D., Davis, M., Lawrie, D., Maletic, J. I., Morrell, C., & Sharif, B. (2013). The impact of identifier style on effort and comprehension. Empirical Software Engineering, 18(2), 219-276.

+

Callaú, O., Robbes, R., Tanter, É., & Röthlisberger, D. (2013). How (and why) developers use the dynamic features of programming languages: the case of Smalltalk. Empirical Software Engineering, 18(6), 1156-1194. +

Barthélémy Dagenais, Harold Ossher, Rachel K. E. Bellamy, Martin P. Robillard, and Jacqueline P. de Vries. 2010. Moving into a new software project landscape. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE '10), Vol. 1. ACM, New York, NY, USA, 275-284.

+

Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik. 2014. How do API documentation and static typing affect API usability? In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 632-642.

Green, T. R. (1989). Cognitive dimensions of notations. People and computers V, 443-460.

+

Fleming, S. D., Scaffidi, C., Piorkowski, D., Burnett, M., Bellamy, R., Lawrance, J., & Kwan, I. (2013). An information foraging theory perspective on tools for debugging, refactoring, and reuse tasks. ACM Transactions on Software Engineering and Methodology (TOSEM), 22(2), 14.

+

Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, Andreas Stefik. An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering. 2013.

+

Ko, A. J., & Myers, B. A. (2009, April). Finding causes of program output with the Java Whyline. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1569-1578).

+

Thomas D. LaToza and Brad A. Myers. 2010. Developers ask reachability questions. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE '10), Vol. 1. ACM, New York, NY, USA, 185-194.

+

Thomas D. LaToza, David Garlan, James D. Herbsleb, and Brad A. Myers. 2007. Program comprehension as fact finding. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering (ESEC-FSE '07). ACM, New York, NY, USA, 361-370.

+

Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke. 2014. On the Comprehension of Program Comprehension. ACM Transactions on Software Engineering and Methodology. 23, 4, Article 31 (September 2014), 37 pages.

+

A. von Mayrhauser and A. M. Vans. 1994. Comprehension processes during large scale maintenance. In Proceedings of the 16th international conference on Software engineering (ICSE '94). IEEE Computer Society Press, Los Alamitos, CA, USA, 39-48.

+

Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in GitHub. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 155-165.

+

Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej. 2012. How do professional developers comprehend software? In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 255-265.

+

Guido Salvaneschi, Sven Amann, Sebastian Proksch, and Mira Mezini. 2014. An empirical study on program comprehension with reactive programming. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 564-575.

+

Jonathan Sillito, Gail C. Murphy, and Kris De Volder. 2006. Questions programmers ask during software evolution tasks. In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering (SIGSOFT '06/FSE-14). ACM, New York, NY, USA, 23-34.

+

S. N. Woodfield, H. E. Dunsmore, and V. Y. Shen. 1981. The effect of modularization and comments on program comprehension. In Proceedings of the 5th international conference on Software engineering (ICSE '81). IEEE Press, Piscataway, NJ, USA, 215-223.

+

Andreas Stefik and Susanna Siebert. 2013. An Empirical Investigation into Programming Language Syntax. ACM Transactions on Computing Education 13, 4, Article 19 (November 2013), 40 pages.

+

Yida Tao, Yingnong Dang, Tao Xie, Dongmei Zhang, and Sunghun Kim. 2012. How do software engineers understand code changes? An exploratory study in industry. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE '12). ACM, New York, NY, USA, , Article 51 , 11 pages.

+

Mark Weiser. 1981. Program slicing. In Proceedings of the 5th international conference on Software engineering (ICSE '81). IEEE Press, Piscataway, NJ, USA, 439-449.

+ +
+ +

Podcasts

+ + + +

Software Engineering Daily, Language Design with Brian Kernighan.

+ +
+ + + + + + + diff --git a/debugging.html b/debugging.html new file mode 100644 index 0000000..0139823 --- /dev/null +++ b/debugging.html @@ -0,0 +1,112 @@ + + + + + + + + + + + + + + + + + + Debugging + + + +

Back to table of contents

+ + + Credit: public domain + +

Debugging

+
Andrew J. Ko
+ +

Despite all of your hard work at design, implementation, and verification, your software has failed. Somewhere in its implementation there's a line of code, or multiple lines of code, that, given a particular set of inputs, causes the program to fail. How do you find those defective lines of code?

+ +

To start, you have to reproduce the failure. Failure reproduction is a matter of identifying inputs to the program (whether data it receives upon being executed, user inputs, network traffic, or any other form of input) that causes the failure to occur. If you found this failure while you were executing the program, then you're lucky: you should be able to repeat whatever you just did and identify the inputs or series of inputs that caused the problem, giving you a way of testing that the program no longer fails once you've fixed the defect. If someone else was the one executing the program (for example, a user, or someone on your team), you better hope that they reported clear steps for reproducing the problem. When bug reports lack clear reproduction steps, bugs often can't be fixed (Bettenburg et al. 2008).

+ +

If you can reproduce the problem, the next challenge is to localize the defect, trying to identify the cause of the failure in code. There are many different strategies for localizing defects. At the highest level, one can think of this process as a hypothesis testing activity (Gilmore 1991):

+ +
    +
  1. Observe failure
  2. +
  3. Form hypothesis of cause of failure
  4. +
  5. Devise a way to test hypothesis, such as analyzing the code you believe caused it or executing the program with the reproduction steps and stopping at the line you believe is wrong.
  6. +
  7. If the hypothesis was supported (meaning the program failed for the reason you thought it did), stop. Otherwise, return to 1.
  8. +
+ +

The problems with the strategy above are numerous. First, what if you can't think of a possible cause? Second, what if your hypothesis is way off? You could spend hours generating hypotheses that are completely off base, effectively analyzing all of your code before finding the defect.

+ +

Another strategy is working backwards (Ko & Myers):

+ +
    +
  1. Observe failure
  2. +
  3. Identify the line of code that caused the failing output
  4. +
  5. Identify the lines of code that caused the line of code in step 2 and any data used on the line in step 2
  6. +
  7. Repeat three recursively, analyzing all lines of code for defects along the chain of causality
  8. +
+ +

The nice thing about this strategy is that you're guaranteed to find the defect if you can accurately identify the causes of each line of code contributing to the failure. It still requires you to analyze each line of code and potentially execute to it in order to inspect what might be wrong, but it requires potentially less work than guessing. My doctoral dissertation work investigated how to automate this strategy, allowing you to simply click on the fault output and then immediately see all upstream causes of it (Ko & Myers).

+ +

Yet another strategy called delta debugging is to compare successful and failing executions of the program (Zeller 2002):

+ +
    +
  1. Identify a successful set of inputs
  2. +
  3. Identify a failing set of inputs
  4. +
  5. Compare the differences in state from the successful and failing executions
  6. +
  7. Identify a change to input that minimizes the differences in states between the two executions
  8. +
  9. Variables and values that are different in these two executions contain the defect
  10. +
+ +

This is a powerful strategy, but only when you have successful inputs and when you can automate comparing runs and identifying changes to inputs.

+ +

One of the simplest strategies is to work forward:

+ +
    +
  1. Execute the program with the reproduction steps
  2. +
  3. Step forward one instruction at a time until the program deviates from intended behavior
  4. +
  5. This step that deviates or one of the previous steps caused the failure
  6. +
+ +

This strategy is easy to follow, but can take a long time because there are so many instructions that can execute.

+ +

For particularly complex software, it can sometimes be necessary to debug with the help of teammates, helping to generate hypotheses, identify more effective search strategies, or rule out the influence of particular components in a bug (Aranda and Venolia 2009).

+ +

Ultimately, all of these strategies are essentially search algorithms, seeking the events that occurred while a program executed with a particular set of inputs that caused its output to be incorrect. Because programs execution millions and potentially billions of instructions, these strategies are necessary to reduce the scope of your search.

+ +

Once you've found the defect, what do you do? It turns out that there are usually many ways to repair a defect. How professional developers fix defects depends a lot on the circumstances: if they're near a release, they may not even fix it if it's too risky; if there's no pressure, and the fix requires major changes, they may refactor or even redesign the program to prevent the failure (Murphy-Hill et al. 2013). This can be a delicate, risky process: in one study of open source operating systems bug fixes, 27% of the incorrect fixes were made by developers who had never read the source code files they changed, suggesting that key to correct fixes is a deep comprehension of exactly how the defective code is intended to behave (Yin et al. 2011).

+ +

Further reading

+ + + +

Jorge Aranda and Gina Venolia. 2009. The secret life of bugs: Going past the errors and omissions in software repositories. In Proceedings of the 31st International Conference on Software Engineering (ICSE '09). IEEE Computer Society, Washington, DC, USA, 298-308.

+

Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann. 2008. What makes a good bug report? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering (SIGSOFT '08/FSE-16). ACM, New York, NY, USA, 308-318.

+

Gilmore, D. (1991). Models of debugging. Acta Psychologica, 78, 151-172.

+

Andrew J. Ko and Brad A. Myers. 2008. Debugging reinvented: asking and answering why and why not questions about program behavior. In Proceedings of the 30th international conference on Software engineering (ICSE '08). ACM, New York, NY, USA, 301-310.

+

Emerson Murphy-Hill, Thomas Zimmermann, Christian Bird, and Nachiappan Nagappan. 2013. The design of bug fixes. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA, 332-341.

+

Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram. 2011. How do fixes become bugs? In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE '11). ACM, New York, NY, USA, 26-36.

+

Andreas Zeller. 2002. Isolating cause-effect chains from computer programs. In Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering (SIGSOFT '02/FSE-10). ACM, New York, NY, USA, 1-10.

+ +
+ +

Podcasts

+ + + +

Software Engineering Daily, Debugging Stories with Haseeb Qureshi

+ +
+ + + + + + + + diff --git a/evolution.html b/evolution.html new file mode 100644 index 0000000..e94933b --- /dev/null +++ b/evolution.html @@ -0,0 +1,75 @@ + + + + + + + + + + + + + + + + + + Evolution + + + +

Back to table of contents

+ + + Credit: public domain + +

Evolution

+
Andrew J. Ko
+ +

Programs change. You find bugs, you fix them. You discover a new requirement, you add a feature. A requirement changes because the world changes, you revise a feature. The simple fact about programs are that they're rarely stable, but rather a constantly changing, living documents that shift as much as the world around them shift.

+ +

Nowhere is this constant evolution more apparent then in our daily encounters with software updates. The apps on our phones are constantly being updated to improve our experiences, while the web sites we visit potentially change every time we visit them, without us noticing. These different models have different notions of who controls changes to user experience: should software companies control when your experience changes or should you? And with systems with significant backend dependencies, is it even possible to give users control over when things change?

+ +

To manage change, developers use all kinds of tools and practices.

+ +

One of the most common ways of managing change is to refactor code. Refactoring helps developers modify the architecture of a program while keeping its behavior the same, enabling them to implement or modify functionality more easily. For example, one of the most common and simple refactorings is to rename a variable (renaming its definition and all of its uses). This doesn't change the architecture of a program at all, but does improve its readability. Other refactors can be more complex. For example, consider adding a new parameter to a function: all calls to that function need to pass that new parameter, which means you need to go through each call and decide on a value to send from that call site. Studies of refactoring in practice have found that refactorings can be big and small, that they don't always preserve the behavior of a program, and that developers perceive them as involving substantial costs and risks (Kim et al. 2012).

+ +

Another fundamental way that developers manage change is version control systems. As you know, they help developers track changes to code, allowing them to revert, merge, fork, and clone projects in a way that is traceable and reliable. While today the most popular version control system is Git, there are actually many types. Some are centralized, representing one single ground truth of a project's code, usually stored on a server. Commits to centralized repositories become immediately available to everyone else on a project. Other version control systems are distributed, such as Git, allowing one copy of a repository on every local machine. Commits to these local copies don't automatically go to everyone else; rather, they are pushed to some central copy, from which others can pull updates. Research comparing centralized and distributed revision control systems mostly reveal tradeoffs rather than a clear winner. Distributed version control, for example, appears to lead to commits that are smaller and more scoped to single changes, since developers can manage their own history of commits to their local repository (Brindescu et al. 2014). Google uses one big centralized version control repository for all of its projects, however, because it offers one source of truth, simplified dependency management, large-scale refactoring, and flexible team boundaries (Potvin & Levenberg 2016).

+ +

When code changes, you need to test it, which often means you need to build it, compiling source, data, and other resources into an executable format suitable for testing (and possibly release). Build systems can be as simple as nothing (e.g., loading an HTML file in a web browser interprets the HTML and displays it, requiring no special preparation) and as complex is hundreds and thousands of lines of build script code, compiling, linking, and managing files in a manner that prepares a system for testing, such as those used to build operating systems like Windows or Linux. To write these complex build procedures, developers use build automation tools like make, ant, gulp and dozens of others, each helping to automate builds. In large companies, there are whole teams that maintain build automation scripts to ensure that developers can always quickly build and test. In these teams, most of the challenges are social and not technical: teams need to clarify role ambiguity, knowledge sharing, communication, trust, and conflict in order to be productive, just like other software engineering teams (Phillips et al. 2014).

+ +

Perhaps the most modern form of build practice is continuous integration. This is the idea of completely automating not only builds, but also the running of a collection of tests, every time a bundle of changes is pushed to a central version control repository. The claimed benefit of continuous integration is that every major change is quickly built, tested, and ready for deployment, shortening the time between a change and the discovery of failures. This only works if builds are fast. For example, some large projects like Windows can take a whole day to build, making continuous integration of the whole operating system infeasible. When builds and tests are fast, continuous integration can accelerate development, especially in projects with large numbers of contributors (Vasilescu et al. 2015) + +

One last problem with changes in software is managing the releases of software. Good release management should archive new versions of software, automatically post the version online, make the version accessible to users, keep a history of who accesses the new version, and provide clear release notes describing changes from the previous version (van der Hoek et al. 1997). By default, all of this is quite manual, but many of these steps can be automated, streamlining how teams release changes to the world. You've probably encountered these most in the form of software updates to applications and operating systems.

+ +
Next chapter: Debugging
+ +

Further reading

+ + + +

Caius Brindescu, Mihai Codoban, Sergii Shmarkatiuk, and Danny Dig. 2014. How do centralized and distributed version control systems impact software changes? In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 322-333.

+

Miryung Kim, Thomas Zimmermann, and Nachiappan Nagappan. 2012. A field study of refactoring challenges and benefits. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE '12).

+

Shaun Phillips, Thomas Zimmermann, and Christian Bird. 2014. Understanding and improving software build teams. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 735-744.

+

Potvin, R., & Levenberg, J. (2016). Why Google stores billions of lines of code in a single repository. Communications of the ACM, 59(7), 78-87.

+

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and productivity outcomes relating to continuous integration in GitHub. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 805-816.

+

André van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf. 1997. Software release management. In Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering (ESEC '97/FSE-5), Mehdi Jazayeri and Helmut Schauer (Eds.). Springer-Verlag New York, Inc., New York, NY, USA, 159-175.

+ +
+ +

Podcasts

+ + + +

Software Engineering Daily, Continuous Delivery with David Rice.

+ +
+ + + + + + + + diff --git a/history.html b/history.html new file mode 100644 index 0000000..bd1d60b --- /dev/null +++ b/history.html @@ -0,0 +1,82 @@ + + + + + + + + + + + + + + + + + + + The history of software engineering + + + +

Back to table of contents

+ + + + Credit: unknown + + +

A brief history of software engineering

+
Andrew J. Ko
+ +

Computers haven't been around for long. If you read one of the many histories of computing and information, such as James Gleick's The Information, or Jonathan Grudin's History of HCI, you'll learn that before digital computers, computers were people, calculating things manually. And that after digital computers, programming wasn't something that many people did. It was reserved for whoever had access to the mainframe and they wrote their programs on punchcards like the one above. Computing was in no way a ubiquitous, democratized activity—it was reserved for the few that could afford and maintain a room-sized machine.

+ +

Because programming required such painstaking planning in machine code and computers were slow, most programs were not that complex. Their value was in calculating things faster than a person could do by hand, which meant thousands of calculations in a minute rather than one calculation in a minute. Computer programmers were not solving problems that had no solutions; they were translating existing solutions (for example, a quadratic formula) into the notation a computer understood. Their power wasn't in creating new realities or facilitating new tasks, it was accelerating old tasks.

+ +

The birth of software engineering, therefore, did not come until programmers started solving problems that didn't have existing solutions, or were new ideas entirely. Most of these were done in academic contexts to develop things like basic operating systems and methods of input and output. These were complex projects, but as research, they didn't need to scale; they just needed to work. It wasn't until the late 1960's, when the first truly large software projects were attempted commercially, and software had to actually work.

+ +

The IBM 360 operating system was one of the first big projects of this kind. Suddenly, there were multiple people working on multiple components, all which interacted. Different parts of the program needed to coordinate, which meant different people needed to coordinate, and the term software engineering was born. Programmers and academics from around the world, especially those that were working on big projects, began to start conferences so they could meet and discuss their challenges. In the first software engineering conference in 1968, attendees speculated about why projects were shipping late, why they were over budget, and what to do about it.

+ +

In these early days of software engineering, programmers, managers, and researchers discovered many problems that had no clear solutions:

+ + + +

These questions are at the foundation of the field of software engineering and are the core content of this course. Some of them have pretty good answers. For example, the research community rapidly converged toward the concept of a version control systems, software testing, and a wide array of high-level programming languages such as Fortran (Metcalf 2002), LISP (McCarthy 1978), C++ (Stroustrup 1996), and Smalltalk (Kay 1996), all of which were precursors to today's modern languages such as Java, Python, and JavaScript.

+ +

Other questions, particularly those concerning the human aspects of software engineering, were hopelessly difficult to understand and improve. One of the seminal books on these issues was Fred P. Brooks, Jr.'s The Mythical Man Month. In it, he presented hundreds of claims about software engineering. For example, he hypothesized that adding more programmers to a project would actually make productivity worse at some level, not better, because of the added burden of knowledge sharing. He also claimed that the first implementation of a solution is usually terrible and should be treated like a prototype: used to learn and then discarded. These and other claims have been the foundation of decades of years of research, all in search of some deeper answer to the questions above.

+ +

If we step beyond software engineering and think more broadly about the role that software is playing in society today, there are also other, newer questions that we've only begun to answer. If every part of society now runs on code, what responsibility do software engineers have to ensure that code is right? What responsibility do software engineers have to avoid algorithmic bias? If our cars are to soon drive us around, who's responsible for the first death: the car, the driver, or the software engineers who built it? These ethical questions are in some ways the future of software engineering, likely to shape it's regulatory context, its processes, and it's responsibilities.

+ +

There are also economic roles that software plays in society that it didn't before. Around the world, it's a major source of job growth, but also a major source of automation, eliminating jobs that people used to do. These larger forces that software is playing on the world demand that software engineers have a stronger understanding of the roles that software plays in society, as the decisions that engineers make can have profoundly impactful unintended consequences.

+ +

We're nowhere close to having deep answers about these questions, neither the old ones or the new ones. We know a lot about programming languages and a lot about testing. These are areas amenable to automation and so computer science has rapidly improved and accelerated these parts of software engineering. The rest of it, as we shall see in this, has not made much progress. In this class, we'll discuss what we know and the much larger space of what we don't.

+ +
Next chapter: Organizations
+ +

Further reading

+ +

Brooks Jr, F. P. (1995). The Mythical Man-Month (anniversary ed.). Chicago

+

Gleick, James (2011). The Information: A History, A Theory, A Flood. Pantheon Books.

+

Grudin, Jonathan (2017). From Tool to Partner: The Evolution of Human-Computer Interaction.

+

Kay, A. C. (1996, January). The early history of Smalltalk. In History of programming languages---II (pp. 511-598). ACM.

+

Ko, A. J. (2016). Interview with Andrew Ko on Software Engineering Daily about Software Engineering Research and Practice.

+

McCarthy, J. (1978, June). History of LISP. In History of programming languages I (pp. 173-185). ACM.

+

Metcalf, M. (2002, December). History of Fortran. In ACM SIGPLAN Fortran Forum (Vol. 21, No. 3, pp. 19-20). ACM.

+

Stroustrup, B. (1996, January). A history of C++: 1979--1991. In History of programming languages---II (pp. 699-769). ACM.

+ + + + + + + diff --git a/images/atsign.png b/images/atsign.png new file mode 100644 index 0000000..96cc0b3 Binary files /dev/null and b/images/atsign.png differ diff --git a/images/blueprint.jpg b/images/blueprint.jpg new file mode 100644 index 0000000..10a55db Binary files /dev/null and b/images/blueprint.jpg differ diff --git a/images/check.png b/images/check.png new file mode 100644 index 0000000..54c2eb5 Binary files /dev/null and b/images/check.png differ diff --git a/images/church.jpg b/images/church.jpg new file mode 100644 index 0000000..e22d434 Binary files /dev/null and b/images/church.jpg differ diff --git a/images/code.jpg b/images/code.jpg new file mode 100644 index 0000000..5456b03 Binary files /dev/null and b/images/code.jpg differ diff --git a/images/communication.png b/images/communication.png new file mode 100644 index 0000000..269acfa Binary files /dev/null and b/images/communication.png differ diff --git a/images/flow.jpg b/images/flow.jpg new file mode 100644 index 0000000..f177716 Binary files /dev/null and b/images/flow.jpg differ diff --git a/images/network.png b/images/network.png new file mode 100644 index 0000000..b6a7124 Binary files /dev/null and b/images/network.png differ diff --git a/images/police.jpg b/images/police.jpg new file mode 100644 index 0000000..2d64969 Binary files /dev/null and b/images/police.jpg differ diff --git a/images/pomegranate.jpg b/images/pomegranate.jpg new file mode 100644 index 0000000..c806503 Binary files /dev/null and b/images/pomegranate.jpg differ diff --git a/images/productivity.jpg b/images/productivity.jpg new file mode 100644 index 0000000..96dd7e9 Binary files /dev/null and b/images/productivity.jpg differ diff --git a/images/punchcard.jpg b/images/punchcard.jpg new file mode 100644 index 0000000..0c95a97 Binary files /dev/null and b/images/punchcard.jpg differ diff --git a/images/scaffolding.jpg b/images/scaffolding.jpg new file mode 100644 index 0000000..f740c94 Binary files /dev/null and b/images/scaffolding.jpg differ diff --git a/images/spiral.png b/images/spiral.png new file mode 100644 index 0000000..1405a75 Binary files /dev/null and b/images/spiral.png differ diff --git a/images/swatter.png b/images/swatter.png new file mode 100644 index 0000000..27d673f Binary files /dev/null and b/images/swatter.png differ diff --git a/images/team.jpg b/images/team.jpg new file mode 100755 index 0000000..5531f8b Binary files /dev/null and b/images/team.jpg differ diff --git a/index.html b/index.html new file mode 100644 index 0000000..788e670 --- /dev/null +++ b/index.html @@ -0,0 +1,86 @@ + + + + + + + + + + + + + + + + + + Cooperative Software Development + + + + + + Credit: Creative Commons + +

Cooperative Software Design

+
Andrew J. Ko
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

After teaching software engineering for many years, I've been frustrated by the lack of a simple, concise, and practical introduction to the human aspects of software engineering for students interested in becoming software engineers.

+ +

In response, I've distilled my lectures from the past decade into these brief writings. They don't represent everything we know about software engineering (in particular, I don't discuss the deep technical contributions from the field), but the chapters do synthesize the broad evidence we have about how teams have to work together to succeed.

+ +

I hope you enjoy! If you see something missing or wrong, please send me feedback.

+
Chapter 1. History of software engineering
Chapter 2. Software engineering organizations
Chapter 3. Communication
Chapter 4. Productivity
Chapter 5. Software quality
Chapter 6. Requirements engineering
Chapter 7. Architecture
Chapter 8. Functional specifications
Chapter 9. Process
Chapter 10. Comprehension
Chapter 11. Verification
Chapter 12. Monitoring
Chapter 13. Evolution
Chapter 14. Debugging
+ + + + + + + diff --git a/monitoring.html b/monitoring.html new file mode 100644 index 0000000..eb57110 --- /dev/null +++ b/monitoring.html @@ -0,0 +1,111 @@ + + + + + + + + + + + + + + + + + + + Monitoring + + + +

Back to table of contents

+ + + Credit: public domain + +

Monitoring

+
Andrew J. Ko
+ +

The first application I ever wrote was a complete and utter failure.

+ +

I was an eager eighth grader, full of wonder and excitement about the infinite possibilities in code, with an insatiable desire to build, build, build. I'd made plenty of little games and widgets for myself, but now was my chance to create something for someone else: my friend and I were making a game and he needed a tool to create pixel art for it. We had no money for fancy Adobe licenses, and so I decided to make a tool.

+ +

In designing the app, I made every imaginable software engineering mistake. I didn't talk to him about requirements. I didn't test on his computer before sending the finished app. I certainly didn't conduct any usability tests, performance tests, or acceptance tests. The app I ended up shipping was a pure expression of what I wanted to build, not what he needed to be creative or productive. As a result, it was buggy, slow, confusing, and useless, and blinded by my joy of coding, I had no clue.

+ +

Now, ideally my "customer" would have reported any of these problems to me right away, and I would have learned some tough lessons about software engineering. But this customer was my best friend, and also a very nice guy. He wasn't about to trash all of my hard work. Instead, he suffered in silence. He struggled to install, struggled to use, and worst of all struggled to create. He produced some amazing art a few weeks after I gave him the app, but it was only after a few months of progress on our game that I learned he hadn't used my app for a single asset, preferring instead to suffer through Microsoft Paint. My app was too buggy, too slow, and too confusing to be useful. I was devastated.

+ +

Why didn't I know it was such a complete failure? Because I wasn't looking. I'd ignored the ultimate test suite: my customer. I'd learned that the only way to really know whether software requirements are right is by watching how it executes in the world through monitoring.

+ +

Discovering Failures

+ +

Of course, this is easier said than done. That's because the (ideally) massive numbers of people executing your software is not easily observable. Moreover, each software quality you might want to monitor (performance, functional correctness, usability) requires entirely different methods of observation and analysis. Let's talk about some of the most important qualities to monitor and how to monitor them.

+ +

These are some of the easiest failures to detect because they are overt and unambiguous. Microsoft was one of the first organizations to do this comprehensively, building what eventually became known as Windows Error Reporting (Gelrum et al 2009). It turns out that actually capturing these errors at scale and mining them for repeating, reproducible failures is quite complex, requiring classification, progressive data collection, and many statistical techniques to extract signal from noise. In fact, Microsoft has a dedicated team of data scientists and engineers whose sole job is to manage the error reporting infrastructure, monitor and triage incoming errors, and use trends in errors to make decisions about improvements to future releases and release processes. This is now standard practice in most companies and organizations, including other big software companies (Google, Apple, IBM, etc.), as well as open source projects (eg, Mozilla). In fact, many application development platforms now include this as a standard operating system feature.

+ +

Performance, like crashes, kernel panics, and hangs, is easily observable in software, but a bit trickier to characterize as good or bad. How slow is too slow? How bad is it if something is slow occasionally? You'll have to define acceptable thresholds for different use cases to be able to identify problems automatically. Some experts in industry still view this as an art.

+ +

It's also hard to monitor performance without actually harming performance. Many tools and services (e.g., New Relic) are getting better at reducing this overhead and offering real time data about performance problems through sampling.

+ +

Monitoring for data breaches, identity theft, and other security and privacy concerns are incredibly important parts of running a service, but also very challenging. This is partly because the tools for doing this monitoring are not yet well integrated, requiring each team to develop its own practices and monitoring infrastructure. But it's also because protecting data and identity is more than just detecting and blocking malicious payloads, but also about recovering from ones that get through, developing reliable data streams about application network activity, monitoring for anomalies and trends in those streams, and developing practices for tracking and responding to warnings that your monitoring system might generate. Researchers are still actively inventing more scalable, usable, and deployable techniques for all of these activities.

+ +

Discovering Missing Requirements

+ +

Usability problems and missing features, unlike some of the preceding problems, are even harder to detect or observe, because the only true indicator that something is hard to use is in a user's mind. That said, there are a couple of approaches to detecting the possibility of usability problems.

+ +

One is by monitoring application usage. Assuming your users will tolerate being watched, there are many techniques for automatically instrumenting applications for user interaction events, for mining these events for problematic patterns, and for browsing and analyzing these patterns for more subjective issues (Ivory & Hearst 2001). Modern tools and services like Intercom make it easier to capture, store, and analyze this usage data, although they still require you to have some upfront intuition about what to monitor. More advanced, experimental techniques in research automatically analyze undo events as indicators of usability problems (Akers et al. 2009).

+ +

All of the usage data above can tell you what your users are doing, but not why. For this, you'll need to get explicit feedback from support tickets, support forums, product reviews, and other critiques of user experience. Some of these types of reports go directly to engineering teams, becoming part of bug reporting systems, while others end up in customer service or marketing departments. While all of this data is valuable for monitoring user experience, most companies still do a bad job of using anything but bug reports to improve user experience, overlooking the rich insights in customer service interactions (Chilana et al. 2011).

+ +

Although bug reports are widely used, they have significant problems as a way to monitor: for developers to fix a problem, they need detailed steps to reproduce the problem, or stack traces or other state to help them track down the cause of a problem (Bettenburg et al. 2008); these are precisely the kinds of information that are hard for users to find and submit, given that most people aren't trained to produce reliable, precise information for failure reproduction. Additionally, once the information is recorded in a bug report, even interpreting the information requires social, organizational, and technical knowledge, meaning that if a problem is not addressed soon, an organization's ability to even interpret what the failure was and what caused it can decay over time (Aranda & Venolia 2009). All of these issues can lead to intractable debugging challenges.

+ +

Larger software organizations now employ data scientists to help mitigate these challenges of analyzing and maintaining monitoring data and bug reports. Most of them try to answer questions such as (Begel & Zimmermann 2014):

+ + + +

The most mature data science roles in software engineering teams even have multiple distinct roles, including Insight Providers, who gather and analyze data to inform decisions, Modeling Specialists, who use their machine learning expertise to build predictive models, Platform Builders, who create the infrastructure necessary for gathering data (Kim et al. 2016). Of course, smaller organizations may have individuals who take on all of these roles.

+ +

All of this effort to capture and maintain user feedback can be messy to analyze because it usually comes in the form of natural language text. Services like AnswerDash (a company I co-founded) structure this data by organizing requests around frequently asked questions. AnswerDash imposes a little widget on every page in a web application, making it easy for users to submit questions and find answers to previously asked questions. This generates data about the features and use cases that are leading to the most confusion, which types of users are having this confusion, and where in an application the confusion is happening most frequently. This product was based on several years of research in my lab (Chilana et al. 2013).

+ +
Next chapter: Evolution
+ +

Further reading

+ + + +

David Akers, Matthew Simpson, Robin Jeffries, and Terry Winograd. 2009. Undo and erase events as indicators of usability problems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '09). ACM, New York, NY, USA, 659-668.

+

Jorge Aranda and Gina Venolia. 2009. The secret life of bugs: Going past the errors and omissions in software repositories. In Proceedings of the 31st International Conference on Software Engineering (ICSE '09). IEEE Computer Society, Washington, DC, USA, 298-308.

+

Begel, A., & Zimmermann, T. (2014). Analyze this! 145 questions for data scientists in software engineering. In Proceedings of the 36th International Conference on Software Engineering (pp. 12-23). +

Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann. 2008. What makes a good bug report? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering (SIGSOFT '08/FSE-16). ACM, New York, NY, USA, 308-318.

+

Chilana, P. K., Ko, A. J., Wobbrock, J. O., & Grossman, T. (2013). A multi-site field study of crowdsourced contextual help: usage and perspectives of end users and software teams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 217-226).

+

Parmit K. Chilana, Andrew J. Ko, Jacob O. Wobbrock, Tovi Grossman, and George Fitzmaurice. 2011. Post-deployment usability: a survey of current practices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11). ACM, New York, NY, USA, 2243-2246.

+

Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt. 2009. Debugging in the (very) large: ten years of implementation and experience. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP '09). ACM, New York, NY, USA, 103-116.

+

Ivory M.Y., Hearst, M.A. (2001). The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys, 33(4).

+

Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 96-107.

+ +
+ +

Podcasts

+ + + +

Software Engineering Daily, Performance Monitoring with Andi Grabner

+

Software Engineering Daily, The Art of Monitoring with James Turnbull

+

Software Engineering Daily, Debugging Stories with Haseeb Qureshi + + + + + + + + + + diff --git a/organizations.html b/organizations.html new file mode 100644 index 0000000..0b37652 --- /dev/null +++ b/organizations.html @@ -0,0 +1,104 @@ + + + + + + + + + + + + + + + + + + + Software engineering organizations + + + +

Back to table of contents

+ + + Credit: Andrew J. Ko + +

Software organizations

+
Andrew J. Ko
+ +

The photo above is candid shot of some of the software engineers of AnswerDash, a company I co-founded in 2012. There are a few things to notice. First, you see one of the employees explaining something, while others are diligently working off to the side. It's not a huge team; just a few engineers, plus several employees in other parts of the organization in another room. This, as simple as it looks, is pretty much what all software engineering work looks like. Some organizations have one of these teams; others have thousands.

+ +

What you can't see is just how much complexity underlies this work. You can't see the organizational structures that exist to manage this complexity. Inside this room and the rooms around it were processes, standards, reviews, workflows, managers, values, culture, decision making, analytics, marketing, sales. And at the center of it were people executing all of these things as well as they could to achieve the organization's goal.

+ +

Organizations are a much bigger topic than I could possibly address here. To deeply understand them, you'd need to learn about organizational studies, organizational behavior, information systems, and business in general.

+ +

The subset of this knowledge that's critical to understand about software engineering is limited to a few important concepts. The first and most important concept is that even in software organizations, the point of the company is rarely to make software; it's to provide value (Osterwalder et al. 2015). Software is sometimes the central means to providing that value, but more often than not, it's the information flowing through that software that's the truly valuable piece. Requirements, which will discuss in more detail soon, help engineers organize how software will provide value.

+ +

The individuals in a software organization take on different roles to achieve that value. These roles are sometimes spread across different people and sometimes bundled up into one person, depending on how the organization is structured, but the roles are always there. Let's go through each one in detail so you understand how software engineers relate to each role. + +

+ +

As I noted above, sometimes the roles above get merged into individuals. When I was CTO at AnswerDash, I had software engineering roles, design roles, product roles, sales roles, and support roles. This was partly because it was a small company when I was there. As organizations grow, these roles tend to be divided into smaller pieces. This division often means that different parts of the organization don't share knowledge, even when it would be advantageous (Chilana 2011).

+ +

Note that in the division of responsibilities above, software engineers really aren't the designers by default. They don't decide what product is made or what problems that product solves. They may have opinions—and a great deal of power to enforce their opinions, as the people building the product—but it's not ultimately their decision.

+ +

There are other roles you might be thinking of that I haven't mentioned:

+ + + +

Every decision made in a software team is under uncertainty, and so another important concept in organizations is risk (Boehm 1991). It's rarely possible to predict the future, and so organizations must take risks. Much of an organization's function is to mitigate the consequences of risks. Data scientists and researchers mitigate risk by increasing confidence in an organization's understanding of the market and it's consumers. Engineers manage risk by trying to avoid defects and moving fast.

+ +

Open source communities are organizations too. The core activities of design, engineering, and support still exist in these, but how much a community is engaged in marketing and sales depends entirely on the purpose of the community. Big, established open source projects like Mozilla have revenue, buildings, and a CEO, and while they don't sell anything, they do market. Others like Linux (Lee & Cole 2013) rely heavily on contributions both from volunteers (Ye & Kishida 2003), but also paid employees from companies that depend on Linux, like IBM, Google, and others. In these settings, there are still all of the challenges that come with software engineering, but fewer of the constraints that come from a for-profit or non-profit motive.

+ +

All of the above has some important implications for what it means to be a software engineer:

+ + + +

All that said, without engineers, products wouldn't exist. They ensure that every detail about a product reflects the best knowledge of the people in their organization, and so attention to detail is paramount. In future chapters, we'll discuss all of the ways that software engineers manage this detail, mitigating the burden on their memories with tools and processes.

+ +
Next chapter: Communication
+ +

Further reading

+ +

Begel, A., & Zimmermann, T. (2014, May). Analyze this! 145 questions for data scientists in software engineering. In Proceedings of the 36th International Conference on Software Engineering (pp. 12-23). + +

Boehm, B. W. (1991). Software risk management: principles and practices. IEEE software, 8(1), 32-41.

+ +

Chilana, P. K., Ko, A. J., Wobbrock, J. O., Grossman, T., & Fitzmaurice, G. (2011, May). Post-deployment usability: a survey of current practices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2243-2246). ACM. + +

Clegg, S. and Bailey, J.R. (2008). International Encyclopedia of Organization Studies. Sage Publications.

+ +

Ko, Andrew J. (2017). A Three-Year Participant Observation of Software Startup Software Evolution. International Conference on Software Engineering, Software Engineering in Practice, to appear.

+ +

Lee, G. K., & Cole, R. E. (2003). From a firm-based to a community-based model of knowledge creation: The case of the Linux kernel development. Organization science, 14(6), 633-649.

+ +

Li, Paul, Ko, Andrew J., and Begel, Andrew (2017). Collaborating with Software Engineers: Perspectives from Non-Software Experts. In review.

+ +

A. Osterwalder, Y. Pigneur, G. Bernarda, & A. Smith (2015). Value proposition design: how to create products and services customers want. John Wiley & Sons.

+ +

Yunwen Ye and Kouichi Kishida. 2003. Toward an understanding of the motivation Open Source Software developers. In Proceedings of the 25th International Conference on Software Engineering (ICSE '03). IEEE Computer Society, Washington, DC, USA, 419-429. + + + + + + + + diff --git a/process.html b/process.html new file mode 100644 index 0000000..be76cd3 --- /dev/null +++ b/process.html @@ -0,0 +1,107 @@ + + + + + + + + + + + + + + + + + + Process + + + +

Back to table of contents

+ + + Credit: public domain + + +

Process

+
Andrew J. Ko
+ +

So you know what you're going to build and how you're going to build it. What process should you go about building it? Who's going to build what? What order should you build it in? How do you make sure everyone is in sync while you're building it? And most importantly, how to do you make sure you build well and on time? These are fundamental questions in software engineering with many potential answers. Unfortunately, we still don't know which of those answers are right.

+ +

At the foundation of all of these questions are basic matters of project management: plan, execute, and monitor. But developers in the 1970's and on found that traditional project management ideas didn't seem to work. The earliest process ideas followed a "waterfall" model, in which a project begins by identifying requirements, writing specifications, implementing, testing, and releasing, all under the assumption that every stage could be fully tested and verified. (Recognize this? It's the order of topics we're discussing in this class!). Many managers seemed to like the waterfall model because it seemed structured and predictable; however, because most managers were originally software developers, they preferred a structured approach to project management (Weinberg 1982). The reality, however, was that no matter how much verification one did of each of these steps, there always seemed to be more information in later steps that caused a team to reconsider it's earlier decision (e.g., imagine a customer liked a requirement when it was described in the abstract, but when it was actually built, they rejected it, because they finally saw what the requirement really meant).

+ +

In 1988, Barry Boehm proposed an alternative to waterfall called the Spiral model (Boehm 1988): rather than trying to verify every step before proceeding to the next level of detail, prototype every step along the way, getting partial validation, iteratively converging through a series of prototypes toward both an acceptable set of requirements and an acceptable product. Throughout, risk assessment is key, encouraging a team to reflect and revise process based on what they are learning. What was important about these ideas were not the particulars of Boehm's proposed process, but the disruptive idea that iteration and process improvement are critical to engineering great software.

+ + + +

Around the same time, two influential books were published. Fred Brooks wrote The Mythical Man Month (Brooks 1995), a book about software project management, full of provocative ideas that would be tested over the next three decades, including the idea that adding more people to a project would not necessarily increase productivity. Tom DeMarco and Timothy Lister wrote another famous book, Peopleware: Productive Projects and Teams (DeMarco 1987), arguing that the major challenges in software engineering are human, not technical. Both of these works still represent some of the most widely-read statements of the problem of managing software development processes.

+ +

These early ideas in software project management led to a wide variety of other discoveries about process. For example, organizations of all sizes can improve their process if they are very aware of what the people in the organization know, what it's capable of learning, and if it builds robust processes to actually continually improve process (Dybȧ 2002, Dybȧ 2003). This might mean monitoring the pace of work, incentivizing engineers to reflect on inefficiencies in process, and teaching engineers how to be comfortable with process change.

+ +

Beyond process improvement, other factors emerged. For example, researchers discovered that critical to team productivity was awareness of teammates' work (Ko et al. 2007). Teams need tools like dashboards to help make them aware of changing priorities and tools like feeds to coordinate short term work (Treude & Storey 2010). Moreover, researchers found that engineers tended to favor non-social sources such as documentation for factual information, but social sources for information to support problem solving (Milewski 2007). Decades ago, developers used tools like email and IRC for awareness; now they use tools like Slack, Trello, GitHub, and JIRA, which have the same basic functionality, but are much more polished, streamlined, and customizable.

+ +

In addition to awareness, ownership is a critical idea in process. This is the idea that for every line of code, someone is responsible for it's quality. The owner might be the person who originally wrote the code, but it could also shift to new team members. Studies of code ownership on Windows Vista and Windows 7 found that less a component had a clear owner, the more pre-release defects it had and the more post-release failures were reported by users (Bird et al. 2011). This means that in addition to getting code written, having clear ownership and clear processes for transfer of ownership are key to functional correctness.

+ +

Pace is another factor that affects quality. Clearly, there's a tradeoff between how fast a team works and the quality of the product it can build. In fact, interview studies of engineers at Google, Facebook, Microsoft, Intel, and other large companies found that the pressure to reduce "time to market" harmed nearly every aspect of teamwork: the availability and discoverability of information, clear communication, planning, integration with others' work, and code ownership (Rubin & Rinard 2016). Not only did a fast pace reduce quality, but it also reduced engineers' personal satisfaction with their job and their work. I encountered similar issues as CTO of my startup: while racing to market, I was often asked to meet impossible deadlines with zero defects and had to constantly communicate to the other executives in the company why this was not possible (Ko 2017).

+ +

Because of the importance of awareness and communication, the distance between teammates is also a critical factor. This is most visible in companies that hire remote developers, building distributed teams. The primary motivation for doing this is to reduce costs or gain access to engineering talent that is distant from a team's geographical center, but over time, companies have found that doing so necessitates significant investments in travel and socialization to ensure quality, minimizing geographical, temporal and cultural separation (Smite 2010). Researchers have found that there appear to be fundamental tradeoffs between productivity, quality, and/or profits in these settings (Ramasubbu et al. 2011). For example, more distance appears to lead to slower communication (Wagstrom & Datta 2014). Despite these tradeoffs, most rigorous studies of the cost of distributed development have found that when companies work hard to minimize temporal and cultural separation, the actual impact on defects was small (Kocaguneli et al. 2013). Some researchers have begun to explore even more extreme models of distributed development, hiring contract developers to complete microtasks over a few days without hiring them as employees; early studies suggest that these models have the worst of outcomes, with greater costs, poor scalability, and more significant quality issues (Stol & Fitzgerald 2014).

+ +

While all of these research was being conducted, industry explored its own ideas about process, devising frameworks that addressed issues of distance, pace, ownership, awareness, and process improvement. Extreme Programming (Beck 1999) was one of these frameworks and it was full of ideas: be iterative, do small releases, keep design simple, write unit tests, refactor to iterate, use pair programming, integrate continuously, everyone owns everything, use an open workspace, work sane hours. Beck described in his original proposal that these ideas were best for "outsourced or in-house development of small- to medium-sized systems where requirements are vague and likely to change", but as industry often does, it began hyping it as a universal solution to software project management woes and adopted all kinds of combinations of these ideas, adapting them to their existing processes. In reality, the value of XP appears to depend on highly project-specific factors (Müller & Padberk 2013), while the core ideas that industry has adopted are valuing feedback, communication, simplicity, and respect for individuals and the team (Sharp & Robinson 2004).

+ +

At the same time, Beck began also espousing the idea of "Agile" methods, which celebrated many of the values underlying Extreme Programming, such as focusing on individuals, keeping things simple, collaborating with customers, and being iterative. This idea of begin agile was even more popular and spread widely in industry and research, even though many of the same ideas appeared much earlier in Boehm's work on the Spiral model. Researchers found that Agile methods can increase developer enthusiasm (Syed-Abdulla et al. 2006), that agile teams need different roles such as Mentor, Co-ordinator, Translator, Champion, Promoter, and Terminator (Hoda et al. 2010), and that teams are combing agile methods with all kinds of process ideas from other project management frameworks such as Scrum (meet daily to plan work, plan two-week sprints, maintain a backlog of work) and Kanban (visualize the workflow, limit work-in-progress, manage flow, make policies explicit, and implement feedback loops) (Al-Baik & Miller 2015). I don't define any of these ideas here because there aren't standard definitions to share.

+ +

Ultimately, all of this energy around process ideas in industry is exciting, but disorganized. None of these efforts really get to the core of what makes software projects difficult to manage. One effort in research to get to this core by contributing new theories that explain these difficulties. The first is Herbsleb's Socio-Technical Theory of Coordination (STTC). The idea of the theory is quite simple: dependencies in engineering decisions (e.g., this function calls this other function, this data type stores this other data type) define the social constraints that the organization must solve in a variety of ways to build and maintain software (Herbsleb & Mockus 2003, Herbsleb 2016). The better the organization builds processes and awareness tools to ensure that the people who own those engineering dependencies are communicating and aware of each others' work, the fewer defects that will occur. Herbsleb referred this alignment as sociotechnical congruence, and conducted a number of studies demonstrating its predictive and explanatory power.

+ +

In my recent work (Ko 2017), I extend this idea to congruence with beliefs about product value, claiming that successful software products require the constant, collective communication and agreement of a coherent proposition of a product's value across UX, design, engineering, product, marketing, sales, support, and even customers. A team needs to achieve Herbsleb's sociotechnical congruence to have a successful product, but that alone is not enough: the rest of the organization has to have a consistent understanding of what is being built and why, even as that understanding evolves over time.

+ +
Next chapter: Comprehension
+ +

Further reading

+ + + +

Al-Baik, O., & Miller, J. (2015). The kanban approach, between agility and leanness: a systematic review. Empirical Software Engineering, 20(6), 1861-1897.

+

Beck, K. (1999). Embracing change with extreme programming. Computer, 32(10), 70-77.

+

Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2011. Don't touch my code! Examining the effects of ownership on software quality. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE '11). ACM, New York, NY, USA, 4-14. +

Boehm, B. W. (1988). A spiral model of software development and enhancement. Computer, 21(5), 61-72.

+

Brooks, F.P. (1995). The Mythical Man Month.

+

DeMarco, T. and Lister, T. (1987). Peopleware: Productive Projects and Teams.

+

Tore Dybȧ. 2003. Factors of software process improvement success in small and large organizations: an empirical study in the scandinavian context. In Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering (ESEC/FSE-11). ACM, New York, NY, USA, 148-157.

+

Dybȧ, T. (2002). Enabling software process improvement: an investigation of the importance of organizational issues. Empirical Software Engineering, 7(4), 387-390.

+

James D. Herbsleb and Audris Mockus. 2003. Formulation and preliminary test of an empirical theory of coordination in software engineering. In Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering (ESEC/FSE-11). ACM, New York, NY, USA, 138-137.

+

James Herbsleb. 2016. Building a socio-technical theory of coordination: why and how. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 2-10.

+

Rashina Hoda, James Noble, and Stuart Marshall. 2010. Organizing self-organizing teams. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE '10), Vol. 1. ACM, New York, NY, USA, 285-294.

+

Andrew J. Ko, Robert DeLine, and Gina Venolia. 2007. Information Needs in Collocated Software Development Teams. In Proceedings of the 29th international conference on Software Engineering (ICSE '07). IEEE Computer Society, Washington, DC, USA, 344-353.

+

Andrew J. Ko (2017). A Three-Year Participant Observation of Software Startup Software Evolution. International Conference on Software Engineering (ICSE), Software Engineering in Practice, to appear. +

Ekrem Kocaguneli, Thomas Zimmermann, Christian Bird, Nachiappan Nagappan, and Tim Menzies. 2013. Distributed development considered harmful? In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA, 882-890.

+

Milewski, A. E. (2007). Global and task effects in information-seeking among software engineers. Empirical Software Engineering, 12(3), 311-326. +

Matthias M. Müller and Frank Padberg. 2003. On the economic evaluation of XP projects. In Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering (ESEC/FSE-11). ACM, New York, NY, USA, 168-177.

+

Narayan Ramasubbu, Marcelo Cataldo, Rajesh Krishna Balan, and James D. Herbsleb. 2011. Configuring global software teams: a multi-company analysis of project productivity, quality, and profits. In Proceedings of the 33rd International Conference on Software Engineering (ICSE '11). ACM, New York, NY, USA, 261-270.

Sharp, H., & Robinson, H. (2004). An ethnographic study of XP practice. Empirical Software Engineering, 9(4), 353-375.

+

Julia Rubin and Martin Rinard. 2016. The challenges of staying together while moving fast: an exploratory study. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 982-993.

+

Smite, D., Wohlin, C., Gorschek, T., & Feldt, R. (2010). Empirical evidence in global software engineering: a systematic review. Empirical software engineering, 15(1), 91-118.

+

Klaas-Jan Stol and Brian Fitzgerald. 2014. Two's company, three's a crowd: a case study of crowdsourcing software development. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 187-198.

+

Syed-Abdullah, S., Holcombe, M., & Gheorge, M. (2006). The impact of an agile methodology on the well being of development teams. Empirical Software Engineering, 11(1), 143-167.

+

Christoph Treude and Margaret-Anne Storey. 2010. Awareness 2.0: staying aware of projects, developers and tasks using dashboards and feeds. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE '10), Vol. 1. ACM, New York, NY, USA, 365-374.

+

Patrick Wagstrom and Subhajit Datta. 2014. Does latitude hurt while longitude kills? Geographical and temporal separation in a large scale software development project. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 199-210.

+

Gerald M. Weinberg. 1982. Over-structured management of software engineering. In Proceedings of the 6th international conference on Software engineering (ICSE '82). IEEE Computer Society Press, Los Alamitos, CA, USA, 2-8.

+ +
+ +

Podcasts

+ + + +

Software Engineering Daily (2016). Git Workflows with Tim Pettersen.

+

Software Engineering Daily (2017). Engineering Management with Mike Borozdin.

+

Software Engineering Daily (2017). Tech Leadership with Jeff Norris.

+ +
+ + + + + + + diff --git a/productivity.html b/productivity.html new file mode 100644 index 0000000..8dbb290 --- /dev/null +++ b/productivity.html @@ -0,0 +1,84 @@ + + + + + + + + + + + + + + + + + + + Productivity + + + +

Back to table of contents

+ + + Credit: unknown + +

Productivity

+
Andrew J. Ko
+ +

When we think of productivity, we usually have a vague concept of a rate of work per unit time. Where it gets tricky is in defining "work". On an individual level, work can be easier to define, because developers often have specific concrete tasks that they're assigned. But until they're not, it's not really easy to define progress (well, it's not that easy to define "done" sometimes either, but that's a topic for a later chapter). When you start considering work at the scale of a team or an organization, productivity gets even harder to define, since an individual's productivity might be increased by ignoring every critical request from a teammate, harming the team's overall productivity.

+ +

Despite the challenge in defining productivity, there are numerous factors that affect productivity. For example, at the individual level, having the right tools can result in an order of magnitude difference in speed at accomplishing a task. One study I ran found that developers using the Eclipse IDE spent a third of their time just physically navigating between source files (Ko et al. 2005). With the right navigation aids, developers could be writing code and fixing bugs 30% faster. In fact, some tools like Mylyn automatically bring relevant code to the developer rather than making them navigate to it, greatly increasing the speed which with developers can accomplish a task (Kersten & Murphy 2006). Long gone are the days when developers should be using bare command lines and text editors to write code: IDEs can and do greatly increase productivity when used and configured with speed in mind.

+ +

That said, productivity is not just about individual developers. Because communication is a key part of team productivity, an individual's productivity is as much determined by their ability to collaborate and communicate with other developers. In a study spanning dozens of interviews with senior software engineerings, Li et al. found that the majority of critical attributes for software engineering skill (productivity included) concerned their interpersonal skills, their communication skills, and their ability to be resourceful within their organization (Li et al. 2015). Similarly, LaToza et al. found that the primary bottleneck in productivity was communication with teammates, primarily because waiting for replies was slower than just looking something up (LaToza et al. 2006). Of course, looking something up has its own problems. While StackOverflow is an incredible resource for missing documentation (Mamykina et al. 2001), it also is full of all kinds of misleading and incorrect information contributed by developers without sufficient expertise to answer questions (Barua et la. 2014). Finally, because communication is such a critical part of retrieving information, adding more developers to a team has surprising effects. One study found that adding people to a team slowly enough to allow them to onboard effectively could increase reduce defects, but adding them too fast led to increases in defects (Meneely et al. 2011).

+ +

Another dimension of productivity is learning. Great engineers are resourceful, quick learners (Li et al. 2015); new engineers must be even more resourceful, even though their instincts are often to hide their lack of expertise from exactly the people they need help from (Begel & Simon 2008). Experienced developers know that learning is important and now rely heavily on social media such as Twitter to follow industry changes, build learning relationships, and discover new concepts and platforms to learn (Singer et al. 2012).

+ +

Unfortunately, learning is no easy task. One of my earliest studies as a researcher investigated the barriers to learning new programming languages and systems, finding six distinct types of content that are challenging (Ko & Myers 2004). To use a programming platform successfully, they need to overcome design barriers, which are the abstract computational problems that must be solved, independent of the languages and APIs. They need to overcome selection barriers, which involve finding the right abstractions or APIs to achieve the design they have identified. They need to overcome use and coordination barriers, which involve operating and coordinating different parts of a language or API together to achieve novel functionality. They need to overcome comprehension barriers, which involve knowing what can go wrong when using part of a language or API. And finally, they need to overcome information barriers, which are posed by the limited ability of tools to inspect a program's behavior at runtime during debugging. Every single one of these barriers has its own challenges, and developers encounter them every time they are learning a new platform, regardless of how much expertise they have.

+ +

Aside from individual and team factors, productivity is also influenced by the particular features of a project's code and how the project is managed (Vosburgh et al. 1984, DeMarco & Lister 1985). In fact, these might actually be the biggest factors in determining developer productivity. This means that even a developer that is highly productive individually cannot rescue a team that is poorly structured working on poorly architected code. This might be why highly productive developers are so difficult to recruit to poorly managed teams.

+ +
Next chapter: Quality
+ +

Further reading

+ + + +

Barua, A., Thomas, S. W., & Hassan, A. E. (2014). What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Software Engineering, 19(3), 619-654.

+

Begel, A., & Simon, B. (2008, September). Novice software developers, all over again. In Proceedings of the Fourth international Workshop on Computing Education Research (pp. 3-14). ACM.

+

Casey Casalnuovo, Bogdan Vasilescu, Premkumar Devanbu, and Vladimir Filkov. 2015. Developer onboarding in GitHub: the role of prior social links and language experience. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 817-828.

+

Jan Chong and Tom Hurlbutt. 2007. The Social Dynamics of Pair Programming. In Proceedings of the 29th international conference on Software Engineering (ICSE '07). IEEE Computer Society, Washington, DC, USA, 354-363.

+

Tom DeMarco and Tim Lister. 1985. Programmer performance and the effects of the workplace. In Proceedings of the 8th international conference on Software engineering (ICSE '85). IEEE Computer Society Press, Los Alamitos, CA, USA, 268-272.

+

Ekwa Duala-Ekoko and Martin P. Robillard. 2012. Asking and answering questions about unfamiliar APIs: an exploratory study. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 266-276.

+

Paul Luo Li, Andrew J. Ko, and Jiamin Zhu. 2015. What makes a great software engineer?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15), Vol. 1. IEEE Press, Piscataway, NJ, USA, 700-710.

+

Brittany Johnson, Rahul Pandita, Emerson Murphy-Hill, and Sarah Heckman. 2015. Bespoke tools: adapted to the concepts developers know. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 878-881.

+

Mik Kersten and Gail C. Murphy. 2006. Using task context to improve programmer productivity. In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering (SIGSOFT '06/FSE-14). ACM, New York, NY, USA, 1-11.

+

Ko, A. J., Myers, B. A., & Aung, H. H. (2004, September). Six learning barriers in end-user programming systems. In Visual Languages and Human Centric Computing, 2004 IEEE Symposium on (pp. 199-206). IEEE.

+

Andrew J. Ko, Htet Aung, and Brad A. Myers. 2005. Eliciting design requirements for maintenance-oriented IDEs: a detailed study of corrective and perfective maintenance tasks. In Proceedings of the 27th international conference on Software engineering (ICSE '05). ACM, New York, NY, USA, 126-135.

+

Thomas D. LaToza, Gina Venolia, and Robert DeLine. 2006. Maintaining mental models: a study of developer work habits. In Proceedings of the 28th international conference on Software engineering (ICSE '06). ACM, New York, NY, USA, 492-501.

+

Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., & Hartmann, B. (2011, May). Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 2857-2866).

+

Andrew Meneely, Pete Rotella, and Laurie Williams. 2011. Does adding manpower also affect quality? An empirical, longitudinal analysis. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE '11). ACM, New York, NY, USA, 81-90.

+

André N. Meyer, Thomas Fritz, Gail C. Murphy, and Thomas Zimmermann. 2014. Software developers' perceptions of productivity. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 19-29.

+

Leif Singer, Fernando Figueira Filho, and Margaret-Anne Storey. 2014. Software engineering at the speed of light: how developers stay current using twitter. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 211-221.

+

Jeffrey Stylos and Brad A. Myers. 2008. The implications of method placement on API learnability. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering (SIGSOFT '08/FSE-16). ACM, New York, NY, USA, 105-112.

+

J. Vosburgh, B. Curtis, R. Wolverton, B. Albert, H. Malec, S. Hoben, and Y. Liu. 1984. Productivity factors and programming environments. In Proceedings of the 7th international conference on Software engineering (ICSE '84). IEEE Press, Piscataway, NJ, USA, 143-152.

+ +
+ +

Podcasts

+ + + +

Software Engineering Daily, Reflections of an Old Programmer

+

Software Engineering Daily, Hiring Engineers with Ammon Bartram

+ +
+ + + + + + + diff --git a/quality.html b/quality.html new file mode 100644 index 0000000..d82ef71 --- /dev/null +++ b/quality.html @@ -0,0 +1,124 @@ + + + + + + + + + + + + + + + + + + + Software Quality + + + + +

Back to table of contents

+ + + Credit: Anton Croos + +

Software Quality

+
Andrew J. Ko
+ +

There are numerous ways a software project can fail: projects can be over budget, they can ship late, they can fail to be useful, or they can simply not be useful enough. Evidence clearly shows that success is highly contextual and stakeholder-dependent: success might be financial, social, physical and even emotional, suggesting that software engineering success is a multifaceted variable that cannot explained simply by user satisfaction, profitability or meeting requirements, budgets and schedules (Ralph & Kelly 2014).

+ +

One of the central reasons for this is that there are many distinct software qualities that software can have and depending on the stakeholders, each of these qualities might have more or less importance. For example, a safety critical system such as flight automation software should be reliable and defect-free, but it's okay if it's not particularly learnable—that's what training is for. A video game, however, should probably be fun and learnable, but it's fine if it ships with a few defects, as long as they don't interfere with fun (Murphy-Hill et al. 2014).

+ +

There are a surprisingly large number of software qualities (Boehm 1976):

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CorrectnessThe extent to which a program behaves according to its specification. If your specifications are ambiguous, correctness is ambiguous.
ReliabilityThe extent to which a program behaves the same way over time. If your online banking app crashes sometimes, it's not reliable. (It's probably also not correct, unless it's specification said that it should crash randomly).
RobustnessThe extent to which a program behaves similarly in different operating environments. A touch screen is less robust because it stops working consistently in the rain. Web forms that reject my last name as invalid are not robust.
PerformanceThe extent to which a program uses computing resources economically. Synonymous with "fast" and "zippy". Performance is directly determined by how many instructions a program has to execute to accomplish it's operations, but it is difficult to measure because operations, inputs, and the operating environment can vary widely.
LearnabilityThe ease with which a person can learn to operate a program. Learnability is multi-dimensional and can be difficult to measure (Grossman et al. 2009)
User efficiencyThe speed with which a person can perform tasks with a program. For example, think about how many taps and keystrokes it takes you to log in to an app on your phone compared to using a fingerprint sensor like Apple's TouchID.
AccessibilityThe diversity of physical or cognitive abilities that can successfully operate software. For example, something that can only be used with a mouse is less accessible than something that can be used with a mouse, keyboard, or speech. Software can be designed for all abilities, and even automatically adapted for individual abilities (Wobbrock et al. 2011).
UsefulnessThe extent to which software solves a problem. Utility is often the most important quality because it subsumes all of the other lower-level qualities software can have (e.g., part of what makes a messaging app useful is that it's performant, user efficient, and reliable). That also makes it less useful as a concept, because it can be so difficult to measure for most problems. That said, usefulness is not always the most important quality. For example, if you can sell a product to a customer and get a one time payment of their money, it might not matter that the product has low usefulness.
VerifiabilityThe effort required to verify that software does what it is intended to do. For example, it is hard to verify a safety critical system without either proving it correct or testing it in a safety-critical context (which isn't safe). Take driverless cars, for example: for Google to test their software, they've had to set up thousands of paid drivers to monitor and report problems on the road. In contrast, verifying that a simple static HTML web page works correctly is as simple as opening it in a browser.
MaintainabilityThe extent to which software can be corrected, adapted, or perfected. This depends mostly on how comprehensible the implementation of a program is.
ReusabilityThe extent to which a program's components can be used for unintended purposes. APIs are quite reusable, whereas black box embedded software (like the software built into your car's traction systems) is not.
PortabilityThe extent to which an implementation can run on different platforms and environments
InteroperabilityThe extent to which a system uses standard interfaces.
SecurityThe extent to which a system prevents access to information that is restricted to a certain population
+ +

Although the list above is not complete, you might have already noticed some tradeoffs between different qualities. A secure system is necessarily going to be less learnable, because there will be more to learn to operate it. A robust system will likely be less maintainable because it it will likely have more code to account for its diverse operating environments. Because one cannot achieve all software qualities, and achieving each quality takes significant time, it is necessary to prioritize qualities for each project.

+ +

These external notions of quality are not the only qualities that matter. For example, developers often view projects as successful if they offer intrinsically rewarding work (Procaccino et al. 2005). That may sound selfish, but if developers aren't enjoying their work, they're probably not going to achieve any of the qualities very well. Moreover, there are many organizational factors that can inhibit developers' ability to obtain these rewards. Project complexity, internal and external dependencies that are out of a developers control, process barriers, budget limitations, deadlines, poor HR planning, and pressure to ship can all interfere with project success (Lavallee & Robillard 2015).

+ +

As I've noted before, the person most responsible for isolating developers from these organizational problems, and most responsible for prioritizing software qualities is a product manager. Check out the podcast below for one product manager's perspectives on the challenges of balancing these different priorities.

+ +
Next chapter: Requirements
+ +

Further reading

+ +

Boehm, B.W. 1976. Software Engineering, IEEE Transactions on Computers, 25(12), 1226-1241.

+

Grossman, T., Fitzmaurice, G., & Attar, R. (2009, April). A survey of software learnability: metrics, methodologies and guidelines. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 649-658).

+

Emerson Murphy-Hill, Thomas Zimmermann, and Nachiappan Nagappan. 2014. Cowboys, ankle sprains, and keepers of quality: how is video game development different from software development? In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 1-11.

+

Procaccino, J. D., Verner, J. M., Shelfer, K. M., & Gefen, D. (2005). What do software practitioners really think about project success: an exploratory study. Journal of Systems and Software, 78(2), 194-203.

+

Paul Ralph and Paul Kelly. 2014. The dimensions of software engineering success. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 24-35.

+

Mathieu Lavallee and Pierre N. Robillard. 2015. Why good developers write bad code: an observational case study of the impacts of organizational factors on software quality. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15), Vol. 1. IEEE Press, Piscataway, NJ, USA, 677-687.

+

Wobbrock, J. O., Kane, S. K., Gajos, K. Z., Harada, S., & Froehlich, J. (2011). Ability-based design: Concept, principles and examples. ACM Transactions on Accessible Computing (TACCESS), 3(3), 9.

+ +

Podcasts

+ +

Software Engineering Daily, Product Management with Suzie Prince

+ + + + + + + diff --git a/requirements.html b/requirements.html new file mode 100644 index 0000000..8600641 --- /dev/null +++ b/requirements.html @@ -0,0 +1,79 @@ + + + + + + + + + + + + + + + + + + Requirements + + + +

Back to table of contents

+ + + Credit: public domain + +

Requirements

+
Andrew J. Ko
+ +

Once you have a problem, a solution, and a design specification, it's entirely reasonable to start thinking about code. What libraries should we use? What platform is best? Who will build what? After all, there's no better way to test the feasibility of an idea than to build it, deploy it, and find out if it works. Right?

+ +

It depends. This mentality towards product design works fine with building and deploying something is cheap and getting feedback has no consequences. Simple consumer applications often benefit from this simplicity, especially early stage ones, because there's little to lose. But what if a beta isn't cheap to build? What if your product only has one shot at adoption? What if you're building something for a client and they want to define success? Worse yet, what if your product could kill people if it's not built properly? In these settings, software teams take an approach of translating a design into a specific explicit set of goals that must be satisfied in order for the implementation to be complete. We call these goals requirements and we call this process of requirements engineering (Sommerville & Sawyer 1997).

+ +

The relationship between requirements and design is somewhat murky. In design disciplines, designers tend to express requirements in the form of prototypes and mockups. These implicitly state requirements, because they suggest what the software is supposed to do without saying it directly. For some types of requirements, they actually imply nothing. For example, how responsive should a web page be to be? A prototype doesn't really say; an explicit requirement of an average page load time of less than 1 second is quite explicit. Requirements can therefore be thought of more like an architect's blueprint: they provide explicit definitions and scaffolding of project success.

+ +

And yet, like design, requirements come from the world and the people in it and not from software (Jackson 2001). Therefore, the methods that people use to do requirements engineering are quite similar to design methods. Requirements engineers do interviews, conduct user research, create prototypes, and iteratively converge toward requirements (Lamsweerd 2008). The big difference between design and requirements engineering is that requirements engineers take the process one step further than designers, enumerating in detail every property that the software must satisfy. They sometimes even use formal methods to specify requirements, allowing them to automatically identify conflicting requirements, so they don't end up proposing a design that can't possibly exist. Some even use systems to make requirements "traceable", meaning the high level requirement can be linked directly to the code that meets that requirement (Mader & Egyed 2015). All of this formality has tradeoffs: not only does it take more time to be so precise, but it can negatively effect creativity in concept generation as well (Mohanani et al. 2014).

+ +

Expressing requirements in natural language can mitigate these effects, at the expense of precision. They just have to be complete, precise, non-conflicting, and verifiable. For example, consider a design for a simple to do list application. It's requirements might be something like the following:

+ + + +

Let's review these requirements against the criteria for good requirements that I listed above:

+ + + +

Now, the flaws above don't make the requirements "wrong". They just make them "less good." The more complete, precise, non-conflicting, and testable your requirements are, the easier it is to anticipate risk, estimate work, and evaluate progress, since requirements essentially give you a to do list for building and testing your code.

+ +
Next chapter: Architecture
+ +

Further reading

+ + + +

Jackson, Michael (2001). Problem Frames. Addison-Wesley.

+

Axel van Lamsweerde. 2008. Requirements engineering: from craft to discipline. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering (SIGSOFT '08/FSE-16). ACM, New York, NY, USA, 238-249.

+

Mäder, P., & Egyed, A. (2015). Do developers benefit from requirements traceability when evolving and maintaining a software system? Empirical Software Engineering, 20(2), 413-441.

+

Rahul Mohanani, Paul Ralph, and Ben Shreeve. 2014. Requirements fixation. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 895-906.

+

Sommerville, I., & Sawyer, P. (1997). Requirements engineering: a good practice guide. John Wiley & Sons, Inc.

+ +
+ + + + + + + diff --git a/specifications.html b/specifications.html new file mode 100644 index 0000000..778f6bc --- /dev/null +++ b/specifications.html @@ -0,0 +1,117 @@ + + + + + + + + + + + + + + + + + + Functional Specifications + + + +

Back to table of contents

+ + + Credit: public domain + +

Functional Specifications

+
Andrew J. Ko
+ +

When you make something with code, you're probably used to figuring out a design as you go. You write a function, you choose some arguments, and if you don't like what you see, perhaps you add a new argument to that function and test again. This cowboy coding as some people like to call it can be great fun! It allows systems to emerge more organically, as you iteratively see your front-end design emerge, the design of your implementation emerges too, co-evolving with how you're feeling about the final product.

+ +

As you've probably noticed by now, this type of process doesn't really scale, even when you're working with just a few other people. That argument you added? You just broke a bunch of functions one of your teammates was planning and when she commits her code, now she gets merge conflicts, which cost her an hour to fix because she has to catch up to whatever design change you made. This lack of planning quickly turns into an uncoordinated mess of individual decision making. Suddenly you're spending all of your time cleaning up coordination messes instead of writing code.

+ +

The techniques we've discussed so far for avoiding this boil down to specifying what code should do, so everyone can write code according to a plan. We've talked about requirements specifications, which are declarations of what software must do from a users' perspective. We've also talked about architectural specifications, which are high-level declarations of how code will be organized, encapsulated, and coordinated. At the lowest level are functional specifications, which are declarations about the properties of input and output of functions in a program. + +

In their simplest form, a functional specification can be just some natural language that says what a function is supposed to do:

+ +
+// Return the smaller of the two numbers, or if they're equal, the second number.
+function min(a, b) {
+	return a < b ? a : b;
+}		
+		
+ +

This comment achieves the core purpose of a specification: to help other developers understand what the requirements and intended behavior of a function are. As long as everyone sticks to this "plan" (everyone calls the function with only numbers and the function always returns the smaller of them), then there shouldn't be any problems.

+ +

The comment above is okay, but it's not very precise. It says what is returned and what properties it has, but it only implies that numbers are allowed without saying anything about what kind of numbers. Are decimals allowed or just integers? What about not-a-number (the result of dividing 1 by 0). Or infinity?

+ +

To make these clearer, many languages use static typing to allow developers to specify types explicitly:

+ +
+// Return the smaller of the two integers, or if they're equal, the second number.
+function min(int a, int b) {
+	return a < b ? a : b;
+}		
+		
+ +

Because an int is well-defined in most languages, the two inputs to the function are well-defined.

+ +

Of course, if the above was JavaScript code (which doesn't support static typing), JavaScript does nothing to actually verify that the data given to min() are actually integers. It's entirely fine with someone sending a string and an object. This probably won't do what you intended, leading to defects.

+ +

This brings us to a second purpose of writing functional specifications: to help verify that functions, their input, and their output are correct. There are many ways to use specifications to verify correctness. By far, one of the most widely used is assertions (Clarke & Rosenblum 2006). Assertions consist of two things: 1) a check on some property of a function's input or output and 2) some action to notify about violations of these properties. For example, if we wanted to verify that the JavaScript function above had integer values as inputs, we would do this:

+ +
+// Return the smaller of the two numbers, or if they're equal, the second number.
+function min(a, b) {
+	if(!Number.isInteger(a)) alert("First input to min() isn't an integer!");
+	if(!Number.isInteger(b)) alert("Second input to min() isn't an integer!");
+	return a < b ? a : b;
+}		
+		
+ +

These two new lines of code are essentially functional specifications that declare "If either of those inputs is not an integer, the caller of this function is doing something wrong". This is useful to declare, but assertions have a bunch of problems: if your program can send a non-integer value to min, but you never test it in a way that does, you'll never see those alerts. This form of dynamic verification is therefore very limited, since it provides weaker guarantees about correctness. That said, a study of the use of assertions in a large database of GitHub projects shows that use of assertions is related to fewer defects (Casalnuovo et al. 2015) (though note that I said "related": we have no evidence that assertions actually prevent defects. It may be possible that developers who use assertions are just better at avoiding defects.)

+ +

Assertions are related to the broader category of error handling language features. Error handling includes assertions, but also programming language features like exceptions and exception handlers. Error handling is a form of specification in that checking for errors usually entails explicitly specifying the conditions that determine an error. For example, in the code above, the condition Number.isInteger(a) specifies that the parameter a must be an integer. Other exception handling code such as the Java throws statement indicates the cases in which errors can occur and the corresponding catch statement indicates what is to done about errors. It is difficult to implement good exception handling that provides granular, clear ways of recovering from errors (Chen et al. 2009). Evidence shows that modern developers are still exceptionally bad at designing for errors; one study found that errors are not designed for, few errors are tested for, and exception handling is often overly general, providing little ability for users to understand errors or for developers to debug them (Ebert et al. 2015). These difficulties appear to be because it difficult to imagine the vast range of errors that can occur (Maxion & Olszewski 2000).

+ +

Researchers have invented many forms of specification that require more work and more thought to write, but can be used to make stronger guarantees about correctness (Woodcock et al. 2009). For example, many languages support the expression of formal pre-conditions and post-conditions that represent contracts that must be kept. Because these contracts are essentially mathematical promises, we can build tools that automatically read a function's code and verify that what it computes exhibits those mathematical properties using automated theorem proving systems. For example, suppose we wrote some formal specifications for our example above to replace our assertions (using a fictional notation for illustration purposes):

+ +
+// pre-conditions: a in Integers, b in Integers
+// post-conditions: result <= a and result <= b
+function min(a, b) {
+	return a < b ? a : b;
+}		
+		
+ +

The annotations above require that, no matter what, the inputs have to be integers and the output has to be less than or equal to both values. The automatic theorem prover can then start with the claim that result is always less than or equal to both and begin searching for a counterexample. Can you find a counterexample?

+ +

There are definite tradeoffs with writing detailed, formal specifications. The benefits are clear: many companies have written formal functional specifications in order to make completely unambiguous the required behavior of their code, particularly systems that are capable of killing people or losing money, such as flight automation software, banking systems, and even compilers that create executables from code (Woodcock et al. 2009). In these settings, it's worth the effort of being 100% certain that the program is correct because if it's not, people can die.

+ +

When the consequences aren't so high, other factors dominate: writing functional specifications is very hard and very time consuming, you need tools to verify the annotations themselves, and you have to maintain annotations. These barriers deter many developers from writing them (Schiller et al. 2014). Some forms of specifications, like the UML diagrams we described when discussing architecture, lack the benefits of formal specifications and require a lot of work to create, leading many practitioners to find them not worth the effort (Petre 2013).

+ +

Specifications can have other benefits. The very act of writing down what you expect a function to do in the form of test cases can slow developers down, causing to reflect more carefully and systematically about exactly what they expect a function to do (Fuci et al. 2016). Perhaps if this is true in general, there's value in simply stepping back before you write a function, mapping out pre-conditions and post-conditions in the form of simple natural language comments, and then writing the function to match your intentions.

+ +
Next chapter: Process
+ +

Further reading

+ +

Casey Casalnuovo, Prem Devanbu, Abilio Oliveira, Vladimir Filkov, and Baishakhi Ray. 2015. Assert use in GitHub projects. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15), Vol. 1. IEEE Press, Piscataway, NJ, USA, 755-766.

+

Chen, Chien-Tsun, Yu Chin Cheng, Chin-Yun Hsieh, and I-Lang Wu. "Exception handling refactorings: Directed by goals and driven by bug fixing." Journal of Systems and Software 82, no. 2 (2009): 333-345.

+

Clarke, L. A., & Rosenblum, D. S. (2006). A historical perspective on runtime assertion checking in software development. ACM SIGSOFT Software Engineering Notes, 31(3), 25-37.

+

Ebert, F., Castor, F., and Serebrenik, A. (2015). An exploratory study on exception handling bugs in Java programs." Journal of Systems and Software, 106, 82-101.

+

Fucci, D., Erdogmus, H., Turhan, B., Oivo, M., & Juristo, N. (2016). A Dissection of Test-Driven Development: Does It Really Matter to Test-First or to Test-Last?. IEEE Transactions on Software Engineering.

+

Maxion, Roy A., and Robert T. Olszewski. Eliminating exception handling errors with dependability cases: a comparative, empirical study." IEEE Transactions on Software Engineering 26, no. 9 (2000): 888-906.

+

Schiller, T. W., Donohue, K., Coward, F., & Ernst, M. D. (2014, May). Case studies and tools for contract specifications. In Proceedings of the 36th International Conference on Software Engineering (pp. 596-607). +

Marian Petre. 2013. UML in practice. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA, 722-731.

+

Jim Woodcock, Peter Gorm Larsen, Juan Bicarregui, and John Fitzgerald. 2009. Formal methods: Practice and experience. ACM Comput. Surv. 41, 4, Article 19 (October 2009), 36 pages.

+ + + + + + + + + + diff --git a/style.css b/style.css new file mode 100644 index 0000000..05a416a --- /dev/null +++ b/style.css @@ -0,0 +1,10 @@ +body { + max-width: 800px; + padding: 3em; + margin-left: auto; + margin-right: auto; +} + +table.text-left td { + text-align: left; +} \ No newline at end of file diff --git a/verification.html b/verification.html new file mode 100644 index 0000000..562d926 --- /dev/null +++ b/verification.html @@ -0,0 +1,131 @@ + + + + + + + + + + + + + + + + + + Verification + + + +

Back to table of contents

+ + + Credit: public domain + +

Verification

+
Andrew J. Ko
+ +

How do you know a program does what you intended? Part of this is being clear about what you intended (by writing specifications, for example), but your intents, however clear, are not enough: you need evidence that your intents were correctly expressed computationally. To get this evidence, we do verification.

+ +

There are many ways to verify code. A reasonable first instinct is to simply run your program. After all, what better way to check whether you expressed your intents then to see with your own eyes what your program does? This is an empirical approach is called testing. Some testing is manual, in that a human executes a program and verifies that it does what was intended. Some testing is automated, in that the test is run automatically by a computer. Another way to verify code is to analyze it, using logic to verify its correct operation. As with testing, some analysis is manual, since humans do it. We call this manual analysis inspection, whereas other analysis is automated, since computers do it. We call this program analysis. This leads to a nice complementary set of verification technique along two axes:

+ + + + + + + + + + + + + + + + + +
manualautomatic
empiricalmanual testingautomated testing
analyticalinspectionprogram analysis
+ +

To discuss each of these and their tradeoffs, first we have to cover some theory about verification. The first and simplest ideas are some terminology:

+ + + +

Note that because defects are defined relative to intent, whether a behavior is a failure depends entirely the definition of intent. If that intent is vague, whether something is a defect is vague. Moreover, you can define intents that result in behaviors that seem like failures: for example, I can write a program that intentionally crashes. A crash isn't a failure if it was intended! This might be pedantic, but you'd be surprised how many times I've seen professional developers in bug triage meetings say:

+ +

"Well, it's worked this way for a long time, and people have built up a lot of workarounds for this bug. It's also really hard to fix. Let's just call this by design. Closing this bug as won't fix."

+ +

Testing

+ +

So how do you find defects in a program? Let's start with testing. Testing is generally the easiest kind of verification to do, but as a practice, it has questionable efficacy. Empirical studies of testing find that it is related to fewer defects in the future, but not strongly related, and it's entirely possible that it's not the testing itself that results in fewer defects, but that other activities (such as more careful implementation) result in fewer defects and testing efforts (Ahmed et al. 2016). At the same time, modern developers don't test as much as they think they do (Beller et al. 2015). Moreover, students are often not convinced of the return on investment of automated tests and often opt for laborious manual tests (even though they regret it later) (Pham et al. 2014). Testing is therefore in a strange place: it's a widespread activity in industry, but it's not very systematically, and it doesn't seem to help very much when we try to measure its benefits.

+ +

Why is this? One possibility is that no amount of testing can prove a program correct with respect to its specifications. Why? It boils down to the same limitations that exist in science: with empiricism, we can provide evidence that a program does have defects, but we can't provide complete evidence that a program doesn't have defects. This is because even simple programs can execute in a infinite number of different ways.

+ +

Consider this JavaScript program:

+ +
+function count(input) {
+	while(input > 0)
+		input--;
+	return input;
+}
+ +

The function should always return 0, right? How many possible values of input do we have to try manually to verify that it always does? Well, if input is an integer, then there are 232 possible integer values. That's not infinite, but that's a lot. But what if input is a string? There are an infinite number of possible strings because they can have any sequence of characters of any length. Now we have to manually test an infinite number of possible inputs. So if we were restricting ourselves to testing, we will never know that the program is correct for all possible inputs. In this case, automatic testing doesn't even help, since there are an infinite number of tests to run.

+ +

There are some ideas in testing that can improve how well we can find defects. For example, rather than just testing the inputs you can think of, focus on all of the lines of code in your program. If you find a set of tests that can cause all of the lines of code to execute, you have one notion of test coverage. Of course, lines of code aren't enough, because an individual line can contain multiple different paths in it (e.g., value ? getResult1() : getResult2()). So another notion of coverage is executing all of the possible control flow paths through the various conditionals in your program. Executing all of the possible paths is hard, of course, because every conditional in your program doubles the number of possible paths (you have 200 if statements in your program? That's 2200 possible paths, which is more paths than there are atoms in the universe).

+ +

There are many types of testing that are common in software engineering:

+ + + +

Which tests you should write depends on what risks you want to take. Don't care about failures? Don't write any tests. If failures of a particular kind are highly consequential to your team, you should probably write tests that check for those failures. As we noted above, you can't write enough tests to catch all bugs, so deciding which tests to write and maintain is a key challenge.

+ +

Analysis

+ +

Now, you might be thinking that it's obvious that the program above is defective for some integers and strings. How did you know? You analyzed the program rather than executing it with specific inputs. For example, when I read (analyzed) the program, I thought:

+ +

"if we assume input is an integer, then there are only three types of values to meaningfully consider with respect to the > in the loop condition: positive, zero, and negative. Positive numbers will always decrement to 0 and return 0. Zero will return zero. And negative numbers just get returned as is, since they're less then zero, which is wrong with respect to the specification. And in JavaScript, strings are never greater than 0 (as if that even makes sense), so the string is returned, which is wrong."

+ +

The above is basically an informal proof, using logic to divide the possible states of input and their effect on the program's behavior. It used symbolic execution to verify all possible paths through the function, finding the paths that result in correct and incorrect values. The strategy was an inspection because we did it manually. If we had written a program that read the program to perform this deduction automatically, we would have called it program analysis.

+ +

The benefits of analysis is that it can demonstrate that a program is correct in all cases. This is because they can handle infinite spaces of possible inputs by mapping those infinite inputs onto a finite space of possible executions. It's not always possible to do this in practice, since many kinds of programs can execute in infinitely possible ways, but it gets us closer.

+ +

One popular type of automatic program analysis tools is a static analysis tool. These tools read programs and identify potential defects using the types of formal proofs like the ones above. They typically result in a set of warnings, each one requiring inspection by a developer to verify, since some of the warnings may be false positives (something the tool thought was a defect, but wasn't). Although static analysis tools can find all kinds of defects, they aren't yet viewed by developers to be that useful because the false positives are often large in number and the way they are presented make them difficult to understand (Johnson et al. 2013).

+ +

Not all analytical techniques rely entirely on logic. In fact, one of the most popular methods of verification in industry are code reviews, also known as inspections. The basic idea of an inspection is to read the program analytically, following the control and data flow inside the code to look for defects. This can be done alone, in groups, and even included as part of process of integrating changes, to verify them before they are committed to a branch. Modern code reviews, while informal, help find defects, stimulate knowledge transfer between developers, increase team awareness, and help identify alternative implementations that can improve quality (Bacchelli & Bird 2013). One study found that measures of how much a developer knows about an architecture can increase 66% to 150% depending on the project (Rigby & Bird 2013). That said, not all reviews are created equal: the best ones are thorough and conducted by a reviewer with strong familiarity with the code (Kononenko et al. 2016); including reviewers that do not know each other or do not know the code can result in longer reviews, especially when run as meetings (Seaman & Basili 1997). Soliciting reviews asynchronously by allowing developers to request reviewers of their peers is generally much more scalable (Rigby & Storey 2011), but this requires developers to be careful about which reviews they invest in.

+ +

Beyond these more technical considerations around verifying a program's correctness are organizational issues around different software qualities. For example, different organizations have different sensitivities to defects. If a $0.99 game on the app store has a defect, that might not hurt its sales much, unless that defect prevents a player from completing the game. If Boeing's flight automation software has a defect, hundreds of people might die. The game developer might do a little manual play testing, release, and see if anyone reports a defect. Boeing will spend years proving mathematically with automatic program analysis that every line of code does what is intended, and repeating this verification every time a line of code changes. What type of verification is right for your team depends entirely on what you're building, who's using it, and how they're depending on it.

+ +
Next chapter: Monitoring
+ +

Further reading

+ + +

Iftekhar Ahmed, Rahul Gopinath, Caius Brindescu, Alex Groce, and Carlos Jensen. 2016. Can testedness be effectively measured? In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 547-558.

+

Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA, 712-721.

+

Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman. 2015. When, how, and why developers (do not) test in their IDEs. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 179-190.

+

Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don't software developers use static analysis tools to find bugs? In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA, 672-681.

+

Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey. 2016. Code review quality: how developers see it. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 1028-1038.

+

Raphael Pham, Stephan Kiesling, Olga Liskin, Leif Singer, and Kurt Schneider. 2014. Enablers, inhibitors, and perceptions of testing in novice software teams. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 30-40.

+

Peter C. Rigby and Margaret-Anne Storey. 2011. Understanding broadcast based peer review on open source software projects. In Proceedings of the 33rd International Conference on Software Engineering (ICSE '11). ACM, New York, NY, USA, 541-550.

+

Peter C. Rigby and Christian Bird. 2013. Convergent contemporary software peer review practices. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY, USA, 202-212.

+

Carolyn B. Seaman and Victor R. Basili. 1997. An empirical study of communication in code inspections. In Proceedings of the 19th international conference on Software engineering (ICSE '97). ACM, New York, NY, USA, 96-106.

+ +
+ + + + + + +