Minor revisions.

This commit is contained in:
Andy Ko 2017-04-26 20:36:15 -07:00
parent ed6435e8e7
commit 0ae870795d

View file

@ -35,11 +35,13 @@
<p>One of the most common ways of managing change is to <strong>refactor</strong> code. Refactoring helps developers modify the <em>architecture</em> of a program while keeping its behavior the same, enabling them to implement or modify functionality more easily. For example, one of the most common and simple refactorings is to rename a variable (renaming its definition and all of its uses). This doesn't change the architecture of a program at all, but does improve its readability. Other refactors can be more complex. For example, consider adding a new parameter to a function: all calls to that function need to pass that new parameter, which means you need to go through each call and decide on a value to send from that call site. Studies of refactoring in practice have found that refactorings can be big and small, that they don't always preserve the behavior of a program, and that developers perceive them as involving substantial costs and risks (<a href="#kim">Kim et al. 2012</a>).</p>
<p>Another fundamental way that developers manage change is <strong>version control</strong> systems. As you know, they help developers track changes to code, allowing them to revert, merge, fork, and clone projects in a way that is traceable and reliable. While today the most popular version control system is Git, there are actually many types. Some are <em>centralized</em>, representing one single ground truth of a project's code, usually stored on a server. Commits to centralized repositories become immediately available to everyone else on a project. Other version control systems are <em>distributed</em>, such as Git, allowing one copy of a repository on every local machine. Commits to these local copies don't automatically go to everyone else; rather, they are pushed to some central copy, from which others can pull updates. Research comparing centralized and distributed revision control systems mostly reveal tradeoffs rather than a clear winner. Distributed version control, for example, appears to lead to commits that are smaller and more scoped to single changes, since developers can manage their own history of commits to their local repository (<a href="#brindescu">Brindescu et al. 2014</a>). Google uses one big centralized version control repository for all of its projects, however, because it offers one source of truth, simplified dependency management, large-scale refactoring, and flexible team boundaries (<a href="#potvin">Potvin & Levenberg 2016</a>).</p>
<p>Another fundamental way that developers manage change is <strong>version control</strong> systems. As you know, they help developers track changes to code, allowing them to revert, merge, fork, and clone projects in a way that is traceable and reliable. While today the most popular version control system is Git, there are actually many types. Some are <em>centralized</em>, representing one single ground truth of a project's code, usually stored on a server. Commits to centralized repositories become immediately available to everyone else on a project. Other version control systems are <em>distributed</em>, such as Git, allowing one copy of a repository on every local machine. Commits to these local copies don't automatically go to everyone else; rather, they are pushed to some central copy, from which others can pull updates.</p>
<p>Research comparing centralized and distributed revision control systems mostly reveal tradeoffs rather than a clear winner. Distributed version control, for example, appears to lead to commits that are smaller and more scoped to single changes, since developers can manage their own history of commits to their local repository (<a href="#brindescu">Brindescu et al. 2014</a>). Google uses one big centralized version control repository for all of its projects, however, because it offers one source of truth, simplified dependency management, large-scale refactoring, and flexible team boundaries (<a href="#potvin">Potvin & Levenberg 2016</a>).</p>
<p>When code changes, you need to test it, which often means you need to <strong>build</strong> it, compiling source, data, and other resources into an executable format suitable for testing (and possibly release). Build systems can be as simple as nothing (e.g., loading an HTML file in a web browser interprets the HTML and displays it, requiring no special preparation) and as complex is hundreds and thousands of lines of build script code, compiling, linking, and managing files in a manner that prepares a system for testing, such as those used to build operating systems like Windows or Linux. To write these complex build procedures, developers use build automation tools like <code>make</code>, <code>ant</code>, <code>gulp</code> and dozens of others, each helping to automate builds. In large companies, there are whole teams that maintain build automation scripts to ensure that developers can always quickly build and test. In these teams, most of the challenges are social and not technical: teams need to clarify role ambiguity, knowledge sharing, communication, trust, and conflict in order to be productive, just like other software engineering teams (<a href="#phillips">Phillips et al. 2014</a>).</p>
<p>Perhaps the most modern form of build practice is <strong>continuous integration</strong>. This is the idea of completely automating not only builds, but also the running of a collection of tests, every time a bundle of changes is pushed to a central version control repository. The claimed benefit of continuous integration is that every major change is quickly built, tested, and ready for deployment, shortening the time between a change and the discovery of failures. This only works if builds are fast. For example, some large projects like Windows can take a whole day to build, making continuous integration of the whole operating system infeasible. When builds and tests are fast, continuous integration can accelerate development, especially in projects with large numbers of contributors (<a href="#vasilescu">Vasilescu et al. 2015</a>)
<p>Perhaps the most modern form of build practice is <strong>continuous integration</strong>. This is the idea of completely automating not only builds, but also the running of a collection of tests, every time a bundle of changes is pushed to a central version control repository. The claimed benefit of continuous integration is that every major change is quickly built, tested, and ready for deployment, shortening the time between a change and the discovery of failures. This only works if builds are fast. For example, some large projects like Windows can take a whole day to build, making continuous integration of the whole operating system infeasible. When builds and tests are fast, continuous integration can accelerate development, especially in projects with large numbers of contributors (<a href="#vasilescu">Vasilescu et al. 2015</a>).
<p>One last problem with changes in software is managing the <strong>releases</strong> of software. Good release management should archive new versions of software, automatically post the version online, make the version accessible to users, keep a history of who accesses the new version, and provide clear release notes describing changes from the previous version (<a href="#vanderhoek">van der Hoek et al. 1997</a>). By default, all of this is quite manual, but many of these steps can be automated, streamlining how teams release changes to the world. You've probably encountered these most in the form of software updates to applications and operating systems.</p>
@ -54,7 +56,7 @@
<p id="phillips">Shaun Phillips, Thomas Zimmermann, and Christian Bird. 2014. <a href="http://dx.doi.org/10.1145/2568225.2568274">Understanding and improving software build teams</a>. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 735-744.</p>
<p id="potvin">Potvin, R., & Levenberg, J. (2016). <a href="">Why Google stores billions of lines of code in a single repository</a>. Communications of the ACM, 59(7), 78-87.</p>
<p id="vasilescu">Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. <a href="https://doi.org/10.1145/2786805.2786850">Quality and productivity outcomes relating to continuous integration in GitHub</a>. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 805-816.</p>
<p id="#vanderheok">Andr&eacute van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf. 1997. Software release management. In Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering (ESEC '97/FSE-5), Mehdi Jazayeri and Helmut Schauer (Eds.). Springer-Verlag New York, Inc., New York, NY, USA, 159-175.</p>
<p id="#vanderheok">Andr&eacute van der Hoek, Richard S. Hall, Dennis Heimbigner, and Alexander L. Wolf. 1997. <a href="https://doi.org/10.1145/267896.267909">Software release management</a>. ACM SIGSOFT International Symposium on Foundations of Software Engineering, 159-175.</p>
</small>