cppannotations/contributions/chist.html
fbbrokken 6881bc3814 The trunk directory contains the latest version (6.4.0c) of the C++
Annotations. 

The branches and tags directory are empty, since I couldn't
svnadmin import a repostitory dump. Many earlier versions exist, though, and
if you want the full archive, just let me know and I'll send you the svnadmin
dump of my full C++ Annotations archive.

Frank B. Brokken <f.b.brokken@rug.nl>



git-svn-id: https://cppannotations.svn.sourceforge.net/svnroot/cppannotations/trunk@3 f6dd340e-d3f9-0310-b409-bdd246841980
2006-09-04 08:26:34 +00:00

1717 lines
63 KiB
HTML

<html><head>
<title>
Chistory
</title></head><body>
<H1>The Development of the C Language*
</H1>
<DL><DD><I>Dennis M. Ritchie<br>
Bell Labs/Lucent Technologies<br>
Murray Hill, NJ 07974 USA<br>
<br>&#32;<br>
<TT>dmr@bell-labs.com</TT>
</I></DL>
<DL><DD><H4>ABSTRACT</H4>
The C programming language was devised in the early 1970s as a system
implementation language for the nascent Unix operating system.
Derived from the typeless language BCPL, it evolved
a type structure;
created on a tiny machine as a tool to improve
a meager programming environment, it has become
one of the dominant languages of today.
This paper studies its evolution.
</DL>
<H4>Introduction
</H4>
<P>
</P>
<DL>
<DT>&#32;<DD><DT><DD>
NOTE:<I> *Copyright 1993 Association for Computing Machinery, Inc.
This electronic reprint made available by the author as a courtesy.
For further publication rights contact ACM or the author.
This article was presented at Second History of Programming
Languages conference, Cambridge, Mass., April, 1993.
</I><DT>&#32;<DD></dl>
<br>
This paper is about the development of the C programming language,
the influences on it,
and the conditions under which it was created.
For the sake of brevity, I omit full descriptions of C itself,
its parent B [Johnson 73] and its grandparent BCPL [Richards 79],
and instead concentrate on characteristic elements
of each language and how they evolved.
<P>
C came into being in the years 1969-1973,
in parallel with the early development
of the Unix operating system;
the most creative period occurred during 1972.
Another spate of changes peaked between 1977 and 1979,
when portability of the Unix system was being demonstrated.
In the middle of this second period, the first widely available description
of the language appeared:
<I>The C Programming Language,</I>
often called the `white book' or `K&amp;R' [Kernighan 78].
Finally, in the middle 1980s, the language was officially standardized
by the ANSI X3J11 committee, which made further changes.
Until the early 1980s, although compilers existed for a variety
of machine architectures and operating systems, the language was almost exclusively
associated with Unix;
more recently, its use has spread much more widely, and today it
is among the languages most commonly used throughout the computer industry.
</P>
<H4>History: the setting
</H4>
<P>
The late 1960s were a turbulent era
for computer systems research at Bell Telephone Laboratories
[Ritchie 78] [Ritchie 84].
The company was pulling
out of the Multics project [Organick 75], which had started as a joint venture
of MIT, General Electric, and Bell Labs; by 1969, Bell Labs management,
and even the researchers, came to believe
that the promises of Multics could be fulfilled
only too late and too expensively. Even before the GE-645 Multics
machine was removed from the premises, an informal group, led
primarily by Ken Thompson, had begun investigating alternatives.
</P>
<P>
Thompson wanted to create a comfortable computing
environment constructed according to his own design, using whatever
means were available. His plans, it is evident in retrospect,
incorporated many of the innovative aspects of Multics, including an
explicit notion of a process as a locus of control,
a tree-structured file system, a command interpreter
as user-level program, simple representation of text files, and generalized
access to devices.
They excluded others, such as unified access to
memory and to files. At the start, moreover, he and the rest
of us deferred another pioneering (though not original)
element of Multics, namely writing almost exclusively in a
higher-level language.
PL/I, the implementation language of Multics, was not
much to our tastes, but we were also using other languages, including BCPL, and
we regretted losing the advantages of writing programs in a
language above the level of assembler, such as
ease of writing and clarity
of understanding.
At the time we did not put much weight
on portability; interest in this arose later.
</P>
<P>
Thompson was faced with
a hardware environment cramped and spartan even for the time:
the DEC PDP-7 on which he started in 1968 was a machine with 8K 18-bit words
of memory and no software useful to him.
While wanting to use a higher-level language,
he wrote the original Unix system in PDP-7 assembler.
At the start, he did not even program
on the PDP-7 itself, but instead used a set of macros
for the GEMAP assembler on a
GE-635 machine.
A postprocessor
generated a paper tape readable by the PDP-7.
</P>
<P>
These tapes were
carried from the GE machine to the PDP-7 for testing until a primitive Unix kernel,
an editor, an assembler, a simple shell (command interpreter), and a few utilities
(like
the Unix
<I>rm, cat, cp</I>
commands)
were completed. After this point, the operating system was self-supporting:
programs could be written and tested without resort to paper tape,
and development continued on the PDP-7 itself.
</P>
<P>
Thompson's PDP-7 assembler outdid even DEC's in simplicity;
it evaluated expressions and emitted the corresponding bits.
There were no libraries,
no loader or link editor: the entire source of a program was presented to
the assembler, and the output file&#173;with a fixed name&#173;that emerged was directly
executable.
(This name,
<I>a.out</I>,
explains a bit of Unix etymology;
it is the output of the assembler.
Even after the system gained a linker
and a means of specifying another name explicitly,
it was retained as the default executable
result of a compilation.)
</P>
<P>
Not long after Unix first ran on the PDP-7, in 1969, Doug McIlroy created
the new system's first higher-level language: an implementation of
McClure's TMG [McClure 65]. TMG is a language for writing compilers
(more generally, TransMoGrifiers) in a top-down, recursive-descent
style that combines context-free syntax notation with
procedural elements. McIlroy and Bob Morris had used TMG to write the early
PL/I compiler for Multics.
</P>
<P>
Challenged by McIlroy's feat in reproducing TMG,
Thompson decided that Unix&#151;possibly it had not even been
named yet&#151;needed a system programming language.
After a rapidly scuttled attempt at Fortran,
he created instead a language of his own,
which he called B.
B can be thought of
as C without types; more accurately, it is BCPL squeezed into 8K bytes of memory
and filtered through Thompson's brain.
Its name most probably represents
a contraction of BCPL, though
an alternate theory holds that
it derives from Bon [Thompson 69],
an unrelated language created by
Thompson during the Multics days.
Bon in turn was named either after his wife Bonnie,
or (according to an encyclopedia quotation in its manual),
after a religion whose rituals involve the murmuring of magic formulas.
</P>
<H4>Origins: the languages
</H4>
<P>
BCPL was
designed by Martin Richards in the mid-1960s while he was visiting MIT,
and was used during the early 1970s
for several interesting projects, among them the OS6 operating system
at Oxford [Stoy 72], and parts of the seminal Alto work at Xerox PARC [Thacker 79].
We became familiar with it
because the
MIT CTSS system [Corbato 62] on which Richards worked was used for Multics development.
The original BCPL compiler was transported both to Multics and to the GE-635
GECOS system
by Rudd Canaday
and others at Bell Labs [Canaday 69];
during the final throes of Multics's life at Bell Labs
and immediately after, it was the language of choice
among the group of people who would later become involved with Unix.
</P>
<P>
BCPL, B, and C all fit firmly in the traditional
procedural family typified by Fortran and Algol 60.
They are particularly oriented towards
system programming, are small and compactly described,
and are amenable to translation by simple compilers. They are `close
to the machine' in that the abstractions they introduce are readily
grounded in the concrete data types and operations supplied by
conventional computers, and they rely on library routines
for input-output and other interactions with an operating system.
With less success, they also use library procedures to specify interesting
control constructs such as coroutines and procedure
closures. At the same time, their abstractions lie at a sufficiently
high level that, with care, portability between machines can
be achieved.
</P>
<P>
BCPL, B and C differ syntactically in many details, but broadly
they are similar.
Programs consist of a sequence of
global declarations and function (procedure) declarations.
Procedures can be nested in BCPL, but may not refer to non-static
objects defined in containing procedures.
B and C avoid this restriction by imposing a more severe one:
no nested procedures at all.
Each of the languages (except for earliest versions of B)
recognizes
separate compilation, and provides a means for including
text from named files.
</P>
<P>
Several syntactic and lexical mechanisms of BCPL are
more elegant and regular than those of B and C.
For example, BCPL's procedure and data declarations
have a more uniform structure, and it supplies a more complete
set of looping constructs.
Although BCPL programs are notionally supplied from an undelimited
stream of characters, clever rules allow most semicolons to be elided
after statements that end on a line boundary.
B and C omit this convenience, and end
most statements with semicolons.
In spite of the differences, most of the statements and operators of BCPL map
directly into corresponding B and C.
</P>
<P>
Some of the structural differences between BCPL and B
stemmed from limitations on intermediate memory.
For example, BCPL declarations may take the form
<DL><DT><DD><TT><PRE>
let P1 be <I>command</I>
and P2 be <I>command</I>
and P3 be <I>command</I>
...
</PRE></TT></DL>
where the program text represented by the commands
contains whole procedures.
The subdeclarations
connected by
<TT>and</TT>
occur simultaneously, so the name
<TT>P3</TT>
is known inside procedure
<TT>P1</TT>.
Similarly, BCPL can package a group of declarations and statements into
an expression that yields a value, for example
<DL><DT><DD><TT><PRE>
E1 := valof $( <I>declarations</I> ; <I>commands</I> ; resultis E2 $) + 1
</PRE></TT></DL>
The BCPL compiler readily handled such constructs
by storing and analyzing a parsed representation of the entire
program in memory before producing output.
Storage limitations on the B compiler demanded a one-pass technique in which
output was generated as soon as possible, and the syntactic
redesign that made this possible was carried forward into C.
</P>
<P>
Certain less pleasant aspects of BCPL owed to its own technological
problems and were consciously
avoided in the design of B.
For example, BCPL uses a `global vector' mechanism for communicating
between separately compiled programs.
In this scheme,
the programmer explicitly associates the name of each externally visible
procedure and data object with a numeric offset in the global
vector; the linkage is accomplished in the compiled code
by using these numeric offsets.
B evaded this inconvenience initially by insisting that the entire
program be presented all at once to the compiler.
Later implementations of B, and all those of C, use a conventional
linker to resolve external names occurring in files compiled separately,
instead of placing the burden of assigning offsets on the programmer.
</P>
<P>
Other fiddles in the transition from BCPL to B were introduced as
a matter of taste, and some remain controversial, for example the decision
to use the single character
<TT>=</TT>
for assignment instead of
<TT>:=</TT>.
Similarly, B uses
<TT>/**/</TT>
to enclose comments, where BCPL uses
<TT>//</TT>,
to ignore text up to the end of the line.
The legacy of PL/I is evident here.
(C++ has resurrected the BCPL comment convention.)
Fortran influenced the syntax
of declarations:
B declarations begin with a specifier
like
<TT>auto</TT>
or
<TT>static</TT>,
followed by a list of names, and C
not only followed this style but ornamented it by
placing its type keywords at the start of declarations.
</P>
<P>
Not every difference between the BCPL language documented in
Richards's book
[Richards 79]
and B was deliberate;
we started from an earlier version of BCPL [Richards 67].
For example, the
<TT>endcase</TT>
that escapes from a BCPL
<TT>switchon</TT>
statement was not present in the language when we learned it
in the 1960s,
and so the overloading of the
<TT>break</TT>
keyword
to escape from the B and C
<TT>switch</TT>
statement
owes to divergent
evolution rather than conscious change.
</P>
<P>
In contrast to the pervasive syntax variation that occurred
during the creation of B, the core semantic content of BCPL&#173;its
type structure and expression evaluation rules&#173;remained intact.
Both languages are typeless, or rather have a single data type,
the `word,' or `cell,' a fixed-length bit pattern. Memory in these languages
consists of a linear array of such cells, and the meaning of
the contents of a cell depends on the operation applied.
The
<TT>+</TT>
operator, for example, simply adds its operands using the machine's
integer add instruction, and the other arithmetic
operations are equally unconscious of the actual meaning
of their operands. Because memory is a linear array, it is possible
to interpret the value in a cell as an index in this array,
and BCPL supplies an operator for this purpose. In the
original language it was spelled
<TT>rv</TT>,
and later
<TT>!</TT>,
while B uses the unary
<TT>*</TT>.
Thus, if
<TT>p</TT>
is a cell containing
the index of (or address of, or pointer to) another cell,
<TT>*p</TT>
refers to the contents of the pointed-to cell, either
as a value in an expression or as the target of
an assignment.
</P>
<P>
Because pointers in BCPL and B are merely integer indices
in the memory array, arithmetic on them is meaningful:
if
<TT>p</TT>
is the address of a cell, then
<TT>p+1</TT>
is the address of
the next cell.
This convention is the basis for the semantics
of arrays in both languages. When in BCPL one writes
<DL><DT><DD><TT><PRE>
let V = vec 10
</PRE></TT></DL>
or in B,
<DL><DT><DD><TT><PRE>
auto V[10];
</PRE></TT></DL>
the effect is the same: a cell named
<TT>V</TT>
is allocated, then
another group of 10 contiguous cells is set aside, and the memory index
of the first of these is placed into
<TT>V</TT>.
By a general rule, in B the expression
<DL><DT><DD><TT><PRE>
*(V+i)
</PRE></TT></DL>
adds
<TT>V</TT>
and
<TT>i</TT>,
and refers to the
<TT>i</TT>-th
location after
<TT>V</TT>.
Both
BCPL and B each add special notation to sweeten such array accesses;
in B an equivalent expression is
<DL><DT><DD><TT><PRE>
V[i]
</PRE></TT></DL>
and in BCPL
<DL><DT><DD><TT><PRE>
V!i
</PRE></TT></DL>
This approach to arrays was unusual even at the time;
C would later assimilate it in an
even less conventional way.
</P>
<P>
None of BCPL, B, or C supports character data strongly
in the language; each treats strings
much like vectors of integers and supplements general rules by
a few conventions.
In both BCPL and B a string literal denotes the address of a
static area initialized with the characters of the string,
packed into cells.
In BCPL, the first packed byte contains the number of characters
in the string;
in B, there is no count and strings are terminated by
a special character, which B spelled
`<TT>*e</TT>'.
This change was made partially to avoid the limitation
on the length of a string caused by holding the count
in an 8- or 9-bit slot, and partly because maintaining
the count seemed, in our experience, less convenient than using a
terminator.
</P>
<P>
Individual characters in a BCPL string were usually manipulated
by spreading the string out into another array, one character per cell,
and then repacking it later;
B provided corresponding routines, but people more often used
other library functions that accessed or replaced individual
characters in a string.
</P>
<H4>More History
</H4>
<P>
After the TMG version of B was working, Thompson rewrote B in itself
(a bootstrapping step).
During development, he continually struggled against memory limitations:
each language addition
inflated the compiler so it could barely fit, but each
rewrite taking advantage of the feature reduced its size.
For example, B introduced generalized assignment operators, using
<TT>x=+y</TT>
to add
<TT>y</TT>
to
<TT>x</TT>.
The notation came from
Algol 68 [Wijngaarden 75] via McIlroy, who had incorporated
it into his version of TMG.
(In B and early C, the operator was spelled
<TT>=+</TT>
instead of
<TT>+=</TT>
; this mistake, repaired in 1976, was induced by a seductively easy
way of handling the first form in B's lexical analyzer.)
</P>
<P>
Thompson went a step further by inventing the
<TT>++</TT>
and
<TT>--</TT>
operators, which increment or decrement;
their prefix
or postfix position determines whether the alteration
occurs before or after noting the value of the operand.
They were not in the earliest versions of B, but appeared
along the way.
People often guess that
they were created to use the auto-increment and
auto-decrement address modes provided by the DEC PDP-11 on which C and Unix
first became popular.
This is historically impossible, since there was no PDP-11
when B was developed.
The PDP-7, however,
did have a few `auto-increment' memory cells, with the property
that an indirect memory reference through them incremented the cell.
This feature probably suggested such operators to Thompson;
the generalization to make them both prefix and postfix
was his own.
Indeed, the auto-increment cells were not used directly in implementation of the
operators, and a stronger motivation for the innovation was probably
his observation that
the translation of
<TT>++x</TT>
was smaller than that of
<TT>x=x+1</TT>.
</P>
<P>
The B compiler on the PDP-7 did not generate machine instructions,
but instead `threaded code' [Bell 72], an interpretive scheme in which
the compiler's output consists
of a sequence of addresses of code fragments that perform the
elementary operations.
The operations typically&#173;in particular for B&#173;act on a simple stack machine.
</P>
<P>
On the PDP-7 Unix system, only a few things were written in B except B itself,
because the machine was too small and too slow to do more than
experiment; rewriting the operating system and the utilities
wholly into B was too expensive a step to
seem feasible.
At some point Thompson relieved the address-space crunch by offering a
`virtual B' compiler that allowed the interpreted program to occupy more than 8K bytes
by paging the code and data within the interpreter,
but it was too slow to be practical for the common utilities.
Still, some utilities written in B appeared, including an early version of
the variable-precision calculator
<I>dc</I>
familiar to Unix users [McIlroy 79].
The most ambitious enterprise I undertook was a genuine
cross-compiler that translated B to GE-635 machine instructions, not threaded code.
It was a small
<I>tour de force</I>:
a full B compiler, written in its
own language and generating code for a 36-bit mainframe,
that ran on an 18-bit machine with 4K words of user address space.
This project was possible only because of the simplicity
of the B language and its run-time system.
</P>
<P>
Although we entertained occasional thoughts
about implementing one of the major languages of the time like Fortran,
PL/I, or Algol 68, such a project seemed hopelessly large for our resources:
much simpler and smaller tools were called for.
All these languages influenced our work,
but it was more fun to do things on our own.
</P>
<P>
By 1970, the Unix project had shown enough promise that we were
able to acquire the new DEC PDP-11.
The processor was among the first of its line delivered by DEC, and three months
passed before its disk arrived.
Making B programs
run on it using the threaded technique
required only writing the code fragments for the operators,
and a simple assembler which I coded in B;
soon,
<I>dc</I>
became the first
interesting program to be tested, before any operating system, on our PDP-11.
Almost as rapidly, still waiting for the disk, Thompson recoded
the Unix kernel and some basic commands in PDP-11 assembly language.
Of the 24K bytes of memory on the machine, the earliest PDP-11 Unix system
used 12K bytes for the operating system,
a tiny space for user programs, and the remainder as a RAM disk.
This version was only for testing, not for real work;
the machine marked time by enumerating closed knight's
tours on chess boards of various sizes.
Once its disk appeared, we quickly migrated to it after
transliterating assembly-language commands to the PDP-11 dialect, and
porting those already in B.
</P>
<P>
By 1971, our miniature computer center was beginning to have users.
We all wanted to create interesting software more easily.
Using assembler was dreary enough that B, despite its performance
problems, had been supplemented by a small library of useful service routines
and was being used for more and more new programs.
Among the more notable results of this period was Steve Johnson's
first version of the
<I>yacc</I>
parser-generator [Johnson 79a].
</P>
<H4>The Problems of B
</H4>
<P>
The machines on which we first used BCPL and then B were word-addressed,
and these languages' single data type, the `cell,' comfortably
equated with the hardware machine word.
The advent of the PDP-11 exposed several inadequacies of B's semantic model.
First, its character-handling mechanisms, inherited with few changes from BCPL,
were clumsy:
using library procedures to spread packed strings into individual
cells and then repack, or to access and replace
individual characters,
began to feel awkward, even silly, on a byte-oriented machine.
</P>
<P>
Second, although the original PDP-11 did not provide for floating-point
arithmetic,
the manufacturer promised that it would soon be available.
Floating-point operations had been added to BCPL
in our Multics and GCOS compilers by defining
special operators, but the mechanism was possible
only because on the relevant machines, a single word
was large enough to contain a floating-point number;
this was not true on the 16-bit PDP-11.
</P>
<P>
Finally, the B and BCPL model implied overhead in dealing
with pointers: the language rules, by defining a pointer
as an index in an array of words, forced pointers to be represented
as word indices.
Each pointer reference
generated a run-time scale conversion from the pointer to the
byte address expected by the hardware.
</P>
<P>
For all these reasons, it seemed that a typing scheme
was necessary to cope
with characters and byte addressing, and to prepare for the
coming floating-point hardware.
Other issues, particularly type safety and interface checking, did not
seem as important then as they became later.
</P>
<P>
Aside from the problems with the language itself, the B compiler's
threaded-code technique yielded programs
so much slower than their assembly-language counterparts
that we discounted the possibility of recoding the
operating system or its central utilities in B.
</P>
<P>
In 1971 I began to extend the B language by adding a character type
and also rewrote its compiler to generate PDP-11 machine instructions
instead of threaded code.
Thus the transition from B to C
was contemporaneous with the creation of a compiler
capable of producing programs fast and small enough
to compete with assembly language.
I called the slightly-extended language NB, for `new B.'
</P>
<H4>Embryonic C
</H4>
<P>
NB existed so briefly that no full description of
it was written.
It supplied the types
<TT>int</TT>
and
<TT>char</TT>,
arrays of them, and pointers to them, declared in a style typified by
<DL><DT><DD><TT><PRE>
int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];
</PRE></TT></DL>
The semantics of arrays remained exactly as in B and BCPL:
the declarations of
<TT>iarray</TT>
and
<TT>carray</TT>
create cells dynamically initialized with a value pointing to the
first of a sequence of 10 integers and characters respectively.
The declarations for
<TT>ipointer</TT>
and
<TT>cpointer</TT>
omit the size, to assert that no storage should be allocated automatically.
Within procedures, the language's interpretation of
the pointers was identical to that of the array variables:
a pointer declaration created a cell differing from
an array declaration only in that the programmer was expected to assign
a referent, instead of letting the compiler allocate the space
and initialize the cell.
</P>
<P>
Values stored in the cells bound to
array and pointer names
were the machine addresses,
measured in bytes, of the corresponding storage area.
Therefore, indirection through a pointer implied no
run-time overhead to scale the pointer from word to byte offset.
On the other hand, the machine code for array subscripting and pointer arithmetic
now depended on the type of the array or the pointer:
to compute
<TT>iarray[i]</TT>
or
<TT>ipointer+i</TT>
implied scaling the addend
<TT>i</TT>
by the size of the object referred to.
</P>
<P>
These semantics represented an easy transition from B,
and I experimented with them for some months.
Problems became evident when I tried to extend the type notation, especially
to add structured (record) types.
Structures, it seemed, should map in an intuitive way
onto memory in the machine,
but in a
structure containing an array, there was no good place to stash the
pointer containing the base of the array, nor any
convenient way to arrange that it be initialized.
For example, the directory entries of early Unix systems
might be described in C as
<DL><DT><DD><TT><PRE>
struct {
int inumber;
char name[14];
};
</PRE></TT></DL>
I wanted the structure not merely to characterize an abstract object
but also to describe a collection of bits that might be read from
a directory.
Where could the compiler hide the pointer to
<TT>name</TT>
that the semantics demanded?
Even if structures were thought of more abstractly,
and the space for pointers could be hidden somehow,
how could I handle the technical problem of properly initializing
these pointers when allocating a complicated object, perhaps one that specified
structures containing arrays containing structures to arbitrary depth?
</P>
<P>
The solution constituted the crucial jump
in the evolutionary chain between typeless BCPL and typed C.
It eliminated the
materialization of the pointer in storage, and instead caused the
creation of the pointer when the array name is mentioned in an expression.
The rule, which survives in today's C, is that values of array
type are converted, when they appear in expressions, into
pointers to the first of the objects making up the array.
</P>
<P>
This invention enabled most existing B code to continue
to work, despite the underlying shift in the language's semantics.
The few programs that assigned new values to
an array name to adjust its origin&#173;possible in B and BCPL,
meaningless in C&#173;were easily repaired.
More important, the new language retained a coherent and workable (if unusual)
explanation of the semantics of arrays, while opening the way to a
more comprehensive type structure.
</P>
<P>
The second innovation that most clearly
distinguishes C from its predecessors is
this fuller type structure and especially its expression in the syntax of declarations.
NB offered the basic types
<TT>int</TT>
and
<TT>char</TT>,
together with arrays of them, and pointers to them,
but no further ways of composition.
Generalization was required:
given an object of any type, it should
be possible to describe a new object that gathers several into an array,
yields it from a function, or is a pointer to it.
</P>
<P>
For each object of such a composed type, there
was already a way to mention the underlying object:
index the array,
call the function, use the indirection operator on the pointer.
Analogical reasoning led to a declaration syntax for names
mirroring that of the expression syntax in which the names typically appear.
Thus,
<DL><DT><DD><TT><PRE>
int i, *pi, **ppi;
</PRE></TT></DL>
declare an integer, a pointer to an integer, a pointer to
a pointer to an integer.
The syntax of these declarations reflects the
observation that
<TT>i</TT>,
<TT>*pi</TT>,
and
<TT>**ppi</TT>
all yield an
<TT>int</TT>
type when used in an expression. Similarly,
<DL><DT><DD><TT><PRE>
int f(), *f(), (*f)();
</PRE></TT></DL>
declare a function returning an integer, a function returning
a pointer to an integer, a pointer to a function returning
an integer;
<DL><DT><DD><TT><PRE>
int *api[10], (*pai)[10];
</PRE></TT></DL>
declare an array of pointers to integers, and a pointer to
an array of integers.
In all these cases the declaration of a variable resembles
its usage in an expression whose type is the one named at the head of
the declaration.
</P>
<P>
The scheme of type composition adopted by C owes considerable debt
to Algol 68, although it did not, perhaps, emerge in a form
that Algol's adherents would approve of.
The central notion I captured from Algol was a type structure
based on atomic
types (including structures), composed into arrays, pointers (references),
and functions (procedures).
Algol 68's concept of unions
and casts also had an influence that appeared later.
</P>
<P>
After creating the type system, the associated
syntax, and the compiler for the new language,
I felt that it deserved a new name;
NB seemed insufficiently distinctive.
I decided to follow the single-letter style and called it C,
leaving open the question whether the name represented
a progression through the alphabet or through the letters in BCPL.
</P>
<H4>Neonatal C
</H4>
<P>
Rapid changes continued after the language had been named,
for example
the introduction of the
<TT>&amp;&amp;</TT>
and
<TT>||</TT>
operators.
In BCPL and B, the evaluation of expressions depends
on context: within
<TT>if</TT>
and other conditional statements that compare
an expression's value with zero,
these languages place a special interpretation on the
<TT>and</TT>
(<TT>&amp;</TT>)
and
<TT>or</TT>
(<TT>|</TT>)
operators.
In ordinary contexts, they operate bitwise, but
in the B statement
<DL><DT><DD><TT><PRE>
if (e1 &amp; e2) ...
</PRE></TT></DL>
the compiler must evaluate
<TT>e1</TT>
and if it is non-zero, evaluate
<TT>e2</TT>,
and if it too is non-zero, elaborate the statement dependent on
the
<TT>if</TT>.
The requirement descends recursively on
<TT>&amp;</TT>
and
<TT>|</TT>
operators within
<TT>e1</TT>
and
<TT>e2</TT>.
The short-circuit semantics of the Boolean operators in such
`truth-value' context seemed desirable,
but the overloading of the operators was difficult to explain and use.
At the suggestion of Alan Snyder,
I introduced the
<TT>&amp;&amp;</TT>
and
<TT>||</TT>
operators
to make the mechanism more explicit.
</P>
<P>
Their tardy introduction explains an
infelicity of C's precedence rules. In B one writes
<DL><DT><DD><TT><PRE>
if (a==b &amp; c) ...
</PRE></TT></DL>
to check whether
<TT>a</TT>
equals
<TT>b</TT>
and
<TT>c</TT>
is non-zero;
in such a conditional expression it is better that
<TT>&amp;</TT>
have lower precedence than
<TT>==</TT>.
In converting from B to C, one wants to replace
<TT>&amp;</TT>
by
<TT>&amp;&amp;</TT>
in such a statement;
to make the conversion less painful,
we decided to keep the precedence of the
<TT>&amp;</TT>
operator the same relative to
<TT>==</TT>,
and merely split the precedence of
<TT>&amp;&amp;</TT>
slightly from
<TT>&amp;</TT>.
Today, it seems that it would have been preferable to move
the relative precedences of
<TT>&amp;</TT>
and
<TT>==</TT>,
and thereby simplify a common C idiom:
to test a masked value
against another value, one must write
<DL><DT><DD><TT><PRE>
if ((a&amp;mask) == b) ...
</PRE></TT></DL>
where the inner parentheses are required but easily forgotten.
</P>
<P>
Many other changes occurred around 1972-3, but the most important
was the introduction of the preprocessor,
partly at the urging of Alan Snyder [Snyder 74],
but also in recognition of the utility of the
the file-inclusion mechanisms available in BCPL and PL/I.
Its original version was exceedingly simple,
and provided only included files and
simple string replacements:
<TT>#include</TT>
and
<TT>#define</TT>
of parameterless macros.
Soon thereafter, it was extended, mostly by Mike Lesk
and then by John Reiser,
to incorporate macros with arguments and conditional
compilation.
The preprocessor was originally considered an optional adjunct
to the language itself. Indeed, for some years,
it was not even invoked unless the source program contained
a special signal at its beginning.
This attitude persisted, and explains
both the incomplete integration of the syntax of the
preprocessor with the rest of the language
and the imprecision of its description in early reference
manuals.
</P>
<H4>Portability
</H4>
<P>
By early 1973, the essentials of
modern C were complete.
The language and compiler were strong enough to permit us to
rewrite the Unix kernel for the PDP-11 in C during the summer of that year.
(Thompson had made a brief attempt to produce a system coded in an early version of
C&#173;before structures&#173;in 1972, but gave up the effort.)
Also during this period, the compiler was retargeted to other nearby machines,
particularly the Honeywell 635 and IBM 360/370;
because the language could not live in isolation,
the prototypes for the modern libraries
were developed.
In particular, Lesk wrote a `portable I/O package' [Lesk 72]
that was later reworked to become the C `standard I/O' routines.
In 1978 Brian Kernighan and I published
<I>The C Programming Language</I>
[Kernighan 78].
Although it did not describe some additions
that soon became common, this book served as the language
reference until a formal standard was adopted more than
ten years later.
Although we worked closely together on this book, there was a clear division of labor:
Kernighan wrote almost all the expository material, while
I was responsible for the appendix containing the reference manual and
the chapter on interfacing with the Unix system.
</P>
<P>
During 1973-1980,
the language grew a bit:
the type structure gained unsigned, long, union, and enumeration types,
and structures became nearly first-class objects
(lacking only a notation for literals).
Equally important developments appeared in its environment and the accompanying
technology.
Writing the Unix kernel in C had given us enough confidence in the language's
usefulness and efficiency that we began to recode the
system's utilities and tools as well,
and then to move the most interesting among them to the other
platforms.
As described in [Johnson 78a], we discovered that the hardest problems
in propagating Unix tools lay not in the
interaction of the C language with new hardware,
but in adapting to the existing software of other
operating systems.
Thus Steve Johnson began to work on
<I>pcc</I>,
a C compiler intended to be easy to retarget to new machines [Johnson 78b],
while he, Thompson, and I began to move the Unix system itself to
the Interdata 8/32 computer.
</P>
<P>
The language changes during this period, especially around 1977,
were largely focused on considerations of portability and type safety,
in an effort to cope with the problems we foresaw and observed
in moving a considerable body of code to the new Interdata
platform.
C at that time still manifested strong signs of its typeless
origins.
Pointers, for example, were barely distinguished from
integral memory indices in early language manuals or extant code;
the similarity of the arithmetic properties of
character pointers and unsigned integers made it hard
to resist the temptation to identify them.
The
<TT>unsigned</TT>
types were added to make unsigned arithmetic available
without confusing it with pointer manipulation.
Similarly, the early language condoned assignments between
integers and pointers, but this practice began to be discouraged;
a notation for type conversions (called `casts' from the example of Algol 68)
was invented to specify type conversions more explicitly.
Beguiled by the example of PL/I, early C
did not tie structure pointers firmly to the structures
they pointed to, and permitted programmers to write
<TT>pointer-&#62;member</TT>
almost without regard to the type of
<TT>pointer</TT>;
such an expression was taken uncritically as a reference
to a region of memory designated by the pointer, while the member
name specified only an offset and a type.
</P>
<P>
Although the first edition of K&amp;R described most of the
rules that brought C's type structure to its present form,
many programs written in the older, more relaxed style
persisted, and so did compilers that tolerated it.
To encourage people to pay more attention to the
official language rules, to detect legal but suspicious constructions,
and to help find interface mismatches
undetectable with simple mechanisms for separate compilation,
Steve Johnson adapted his
<I>pcc</I>
compiler to produce
<I>lint</I>
[Johnson 79b],
which scanned a set of files and remarked on dubious constructions.
</P>
<H4>Growth in Usage
</H4>
<P>
The success of our portability experiment on the
Interdata 8/32 soon led to another by Tom London and John Reiser
on the DEC VAX 11/780.
This machine became much more popular than the Interdata, and
Unix and the C language began to spread rapidly, both within AT&amp;T and
outside.
Although by the middle 1970s
Unix was in use by
a variety of projects within the Bell System
as well as a small group of research-oriented
industrial, academic, and government organizations outside our company,
its real growth began only after portability had been achieved.
Of particular note were the System III and System V
versions of the system from the emerging Computer Systems division of AT&amp;T, based
on work by the company's development and research groups,
and the BSD series of releases by the University
of California at Berkeley that derived from research
organizations in Bell Laboratories.
</P>
<P>
During the 1980s the use of the C language spread widely,
and compilers became available on nearly every machine architecture
and operating system; in particular it became popular as a
programming tool for personal computers, both for manufacturers
of commercial software for these machines, and for end-users
interesting in programming.
At the start of the decade, nearly every compiler was based on Johnson's
<I>pcc</I>;
by 1985 there were many independently-produced compiler products.
</P>
<H4>Standardization
</H4>
<P>
By 1982 it was clear that C needed formal standardization.
The best approximation to a standard,
the first edition of K&amp;R, no longer described the language in actual use;
in particular, it mentioned neither the
<TT>void</TT>
or
<TT>enum</TT>
types.
While it foreshadowed the newer approach to structures, only after
it was published did the language support assigning them, passing them
to and from functions, and associating the names of members firmly
with the structure or union containing them.
Although compilers distributed by AT&amp;T incorporated these changes,
and most of the purveyors of compilers not based on
<I>pcc</I>
quickly picked up them up, there remained no complete, authoritative
description of the language.
</P>
<P>
The first edition of K&amp;R was also insufficiently precise on many details
of the language, and it became increasingly impractical to regard
<I>pcc</I>
as a `reference compiler;'
it did not perfectly
embody even the language described by K&amp;R, let alone subsequent extensions.
Finally, the incipient use of C in projects subject to commercial
and government contract meant that the imprimatur of an official
standard was important.
Thus (at the urging of M. D. McIlroy), ANSI established the X3J11
committee under the direction of CBEMA
in the summer of 1983, with the goal of producing
a C standard.
X3J11 produced its report [ANSI 89] at the end of 1989,
and subsequently this standard was accepted by ISO as
ISO/IEC 9899-1990.
</P>
<P>
From the beginning, the X3J11 committee took a cautious,
conservative view of language extensions.
Much to my
satisfaction, they took seriously their goal:
`to develop a clear, consistent, and unambiguous Standard
for the C programming language which codifies the common,
existing definition of C and which promotes the portability
of user programs across C language environments.' [ANSI 89]
The committee realized that mere promulgation of a standard
does not make the world change.
</P>
<P>
X3J11 introduced only one genuinely important change to the language itself:
it incorporated the types of formal arguments in the type
signature of a function, using syntax borrowed from C++ [Stroustrup 86].
In the old style, external functions were declared like this:
<DL><DT><DD><TT><PRE>
double sin();
</PRE></TT></DL>
which says only that
<TT>sin</TT>
is a function returning a
<TT>double</TT>
(that is, double-precision floating-point) value.
In the new style, this better rendered
<DL><DT><DD><TT><PRE>
double sin(double);
</PRE></TT></DL>
to make the argument type explicit
and thus encourage better type checking and appropriate conversion.
Even this addition, though it produced a noticeably better language,
caused difficulties.
The committee justifiably felt that simply outlawing
`old-style' function definitions and declarations was not
feasible, yet also agreed that the new forms were better.
The inevitable compromise was as good as it
could have been, though the language definition is complicated by
permitting both forms, and writers of portable software must contend
with compilers not yet brought up to standard.
</P>
<P>
X3J11 also introduced
a host of smaller additions and adjustments, for example,
the type qualifiers
<TT>const</TT>
and
<TT>volatile</TT>,
and slightly different type promotion rules.
Nevertheless, the standardization process did not change the character
of the language.
In particular, the C standard did not attempt to specify formally
the language semantics, and so there can be dispute over fine points;
nevertheless, it successfully accounted for changes in
usage since the original description, and is sufficiently precise to
base implementations on it.
</P>
<P>
Thus the core C language escaped nearly unscathed from the
standardization process, and the Standard emerged more
as a better, careful codification than a new invention.
More important changes took place in the language's surroundings:
the preprocessor and the library.
The preprocessor performs macro substitution, using conventions
distinct from the rest of the language.
Its
interaction with the compiler had never
been well-described, and X3J11 attempted to remedy the
situation.
The result is noticeably better than the explanation in the first edition of K&amp;R;
besides being more comprehensive, it provides
operations, like token concatenation, previously available
only by accidents of implementation.
</P>
<P>
X3J11 correctly believed that a full and careful
description of a standard C library was as important as its
work on the language itself.
The C language itself does not provide for input-output
or any other interaction with the outside world, and thus
depends on a set of standard procedures.
At the time of publication of K&amp;R, C was thought of mainly
as the system programming language of Unix; although we
provided examples of library routines intended to be readily transportable
to other operating systems, underlying support from Unix was implicitly
understood.
Thus, the X3J11 committee spent much of its time designing
and documenting a set
of library routines required to be available in all
conforming implementations.
</P>
<P>
By the rules of the standards process, the current activity of the X3J11
committee is confined to issuing interpretations on the existing
standard.
However, an informal group originally convened by Rex Jaeschke
as NCEG (Numerical C Extensions Group) has been officially accepted
as subgroup X3J11.1,
and they continue to consider extensions to C.
As the name implies, many of these possible extensions are intended to make the language
more suitable for numerical use: for example, multi-dimensional arrays
whose bounds are dynamically determined, incorporation of facilities
for dealing with IEEE arithmetic, and making the language more effective on machines
with vector or other advanced architectural features.
Not all the possible extensions are specifically numerical; they
include a notation for structure literals.
</P>
<H4>Successors
</H4>
<P>
C and even B have several direct descendants, though they
do not rival Pascal in generating progeny.
One side branch developed early.
When Steve Johnson visited the University of Waterloo on sabbatical
in 1972,
he brought B with him. It became popular
on the Honeywell machines there, and later spawned Eh and Zed
(the Canadian answers to `what follows B?').
When Johnson returned to Bell Labs in 1973, he was disconcerted to
find that the language whose seeds he brought to Canada
had evolved back home;
even his own
<I>yacc</I>
program had been rewritten in C, by Alan Snyder.
</P>
<P>
More recent descendants of C proper include Concurrent C [Gehani 89],
Objective C [Cox 86], C* [Thinking 90],
and especially C++ [Stroustrup 86].
The language is also widely used as an intermediate
representation (essentially, as a portable assembly language)
for a wide variety of compilers, both for direct descendents
like C++, and independent languages like
Modula 3 [Nelson 91] and
Eiffel
[Meyer 88].
</P>
<H4>Critique
</H4>
<P>
Two ideas are most characteristic of C among languages of its class:
the relationship between arrays and pointers,
and the way in which declaration syntax mimics expression syntax.
They are also among its most frequently criticized features,
and often serve as stumbling blocks to the beginner.
In both cases, historical accidents or mistakes have exacerbated
their difficulty.
The most important of these has been the tolerance of C compilers
to errors in type.
As should be clear from the history above, C evolved from typeless
languages.
It did not suddenly appear to its earliest
users and developers as an entirely new language with its own rules;
instead we continually had to adapt existing programs as the
language developed, and make allowance for an existing body
of code. (Later, the ANSI X3J11 committee standardizing C would
face the same problem.)
</P>
<P>
Compilers in 1977, and even well after,
did not complain about usages such as assigning between integers
and pointers or using objects of the wrong type to refer
to structure members.
Although the language definition presented in the first edition of K&amp;R
was reasonably (though not completely) coherent in its treatment of type rules,
that book admitted that existing compilers didn't enforce them.
Moreover, some rules designed to ease early transitions
contributed to later confusion.
For example, the empty square brackets in the function declaration
<DL><DT><DD><TT><PRE>
int f(a) int a[]; { ... }
</PRE></TT></DL>
are a living fossil, a remnant of NB's way of declaring a pointer;
<TT>a</TT>
is, in this special case only, interpreted in C as a pointer.
The notation survived in part for the sake of
compatibility, in part under the rationalization
that it would allow programmers
to communicate to their readers
an intent to pass
<TT>f</TT>
a pointer generated from an array, rather than a reference to a single
integer.
Unfortunately, it serves as much to confuse the learner
as to alert the reader.
</P>
<P>
In K&amp;R C, supplying arguments of the proper type to a function call
was the responsibility of the programmer, and the extant compilers
did not check for type agreement.
The failure of the original language to include argument types
in the type signature of a function
was a significant weakness,
indeed the one that required the X3J11 committee's boldest and most painful
innovation to repair.
The early design is explained (if not justified) by my avoidance of technological
problems, especially cross-checking between separately-compiled source files,
and my incomplete assimilation of the implications of moving between
an untyped to a typed language.
The
<I>lint</I>
program,
mentioned above,
tried to alleviate the problem:
among its other functions,
<I>lint</I>
checks the consistency and coherency of a whole program by scanning a set
of source files,
comparing the types of function arguments used in calls with those
in their definitions.
</P>
<P>
An accident of syntax contributed to the perceived complexity of the language.
The indirection operator, spelled
<TT>*</TT>
in C, is syntactically a unary prefix operator, just as in BCPL and B.
This works well in simple expressions, but in more complex cases,
parentheses are required to direct the parsing.
For example, to distinguish indirection through the value
returned by a function from calling a function designated by
a pointer, one writes
<TT>*fp()</TT>
and
<TT>(*pf)()</TT>
respectively.
The style used in expressions carries through to declarations, so the names might be
declared
<DL><DT><DD><TT><PRE>
int *fp();
int (*pf)();
</PRE></TT></DL>
In more ornate but still realistic cases, things become worse:
<DL><DT><DD><TT><PRE>
int *(*pfp)();
</PRE></TT></DL>
is a pointer to a function returning a pointer to an integer.
There are two effects occurring.
Most important, C has a relatively rich set of ways of
describing types (compared, say, with Pascal).
Declarations in languages
as expressive as C&#173;Algol 68, for example&#173;describe objects equally hard
to understand, simply because the objects themselves are complex.
A second effect owes to details of the syntax.
Declarations in C must be read
in an `inside-out' style that many find
difficult to grasp [Anderson 80].
Sethi [Sethi 81] observed that many of the nested declarations
and expressions would become simpler
if the indirection operator had been taken as a postfix operator
instead of prefix, but by then it was too late to change.
</P>
<P>
In spite of its difficulties,
I believe that the C's approach to declarations remains plausible,
and am comfortable with it; it is a useful unifying principle.
</P>
<P>
The other characteristic feature of C, its treatment of arrays,
is more suspect on practical grounds, though it also has
real virtues.
Although the relationship between pointers and arrays
is unusual, it can be learned.
Moreover, the language shows considerable power to describe important
concepts, for example, vectors whose length varies at run time,
with only a few basic rules and conventions.
In particular, character strings are handled by the same mechanisms
as any other array, plus the convention that a null character
terminates a string.
It is interesting to compare C's approach with that of two
nearly contemporaneous languages, Algol 68 and Pascal [Jensen 74].
Arrays in Algol 68 either have fixed bounds, or are `flexible:'
considerable mechanism is required both in the language
definition, and in compilers, to accommodate flexible arrays
(and not all compilers fully implement them.)
Original Pascal had only fixed-sized arrays and strings,
and this proved confining [Kernighan 81].
Later, this was partially fixed, though the resulting
language is not yet universally available.
</P>
<P>
C treats strings as arrays of characters
conventionally terminated by a marker.
Aside from one special rule about initialization by string literals,
the semantics of strings are fully subsumed by more general
rules governing all arrays, and
as a result the language is simpler to describe and
to translate than one incorporating the string as a unique
data type.
Some costs accrue from its approach:
certain string operations are more expensive than in other designs
because application code or
a library routine must occasionally search for the end of a string,
because few built-in operations are available, and because
the burden of storage management for strings falls more
heavily on the user.
Nevertheless, C's approach to strings works well.
</P>
<P>
On the other hand, C's treatment of arrays in general (not just strings)
has unfortunate implications both for optimization
and for future extensions.
The prevalence of pointers in C programs, whether those declared
explicitly or arising from arrays, means that
optimizers must be cautious, and must use careful dataflow techniques
to achieve good results.
Sophisticated compilers can understand what most pointers
can possibly change, but some important usages remain difficult
to analyze.
For example, functions with pointer arguments derived from
arrays are hard to compile into efficient code on
vector machines, because it is seldom possible to determine
that one argument pointer does not overlap data also
referred to by another argument, or accessible externally.
More fundamentally, the definition of C so specifically describes
the semantics of arrays that
changes or extensions treating arrays as more
primitive objects, and permitting operations on them as wholes,
become hard to fit into the existing language.
Even extensions to permit the declaration and use of multidimensional arrays whose
size is determined dynamically are not entirely straightforward [MacDonald 89]
[Ritchie 90],
although they would make it much easier to write numerical
libraries in C.
Thus, C covers the most important uses of strings and arrays
arising in practice by a uniform and simple mechanism,
but leaves problems for highly efficient implementations and for extensions.
</P>
<P>
Many smaller infelicities exist in the language
and its description
besides those discussed above, of course.
There are also
general criticisms to be lodged that transcend detailed points.
Chief among these is that the language and its generally-expected
environment provide little help for writing very large systems.
The naming structure provides only two main levels,
`external' (visible everywhere) and `internal' (within
a single procedure).
An intermediate level
of visibility (within a single file of data and procedures)
is weakly tied to the language definition.
Thus, there is little direct support for modularization,
and project designers are forced to create their own conventions.
</P>
<P>
Similarly, C itself provides two durations of storage:
`automatic' objects that exist while control resides in or below
a procedure, and `static,' existing throughout execution of a program.
Off-stack, dynamically-allocated storage is provided only
by a library routine and
the burden of managing it is placed on
the programmer: C is hostile to automatic garbage collection.
</P>
<H4>Whence Success?
</H4>
<P>
C has become successful to an extent far surpassing any early
expectations. What qualities contributed to its widespread use?
</P>
<P>
Doubtless the success of Unix itself was the most important factor;
it made the language available to hundreds of thousands of people.
Conversely, of course, Unix's use of C and its consequent
portability to a wide variety of machines
was important in the system's success.
But the language's invasion of other environments suggests more
fundamental merits.
</P>
<P>
Despite some aspects mysterious to the beginner and
occasionally even to the adept,
C remains a simple and small language, translatable with simple and small compilers.
Its types and operations are
well-grounded in those provided by
real machines, and for
people used to how computers work,
learning the idioms for generating time- and space-efficient programs
is not difficult.
At the same time the language is sufficiently abstracted from machine
details that program portability can be achieved.
</P>
<P>
Equally important, C and its central library support always
remained in touch with a real environment.
It was not designed in isolation to prove a point, or to serve
as an example, but as a tool to write programs that did
useful things; it was always meant to interact with a larger
operating system, and was regarded as a
tool to build larger tools.
A parsimonious, pragmatic approach influenced the things that went into C:
it covers
the essential needs of many programmers,
but does not try to supply too much.
</P>
<P>
Finally, despite the changes that it has undergone since its first
published description, which was admittedly informal
and incomplete, the actual C language as seen by millions of users
using many different compilers has remained remarkably stable
and unified compared to those of similarly widespread currency,
for example Pascal and Fortran.
There are differing dialects of C&#173;most noticeably, those described by
the older K&amp;R and the newer Standard C&#173;but on the whole, C has remained
freer of proprietary extensions than other languages.
Perhaps the most significant extensions are the `far' and `near'
pointer qualifications intended to deal with peculiarities
of some Intel processors.
Although C was not originally designed with portability
as a prime goal, it succeeded in expressing
programs, even including operating systems,
on machines ranging from the smallest personal
computers through the mightiest supercomputers.
</P>
<P>
C is quirky, flawed, and an enormous success.
While accidents of history surely helped,
it evidently satisfied a need for a system implementation language
efficient enough
to displace assembly language, yet sufficiently abstract and fluent to
describe algorithms and interactions in a wide variety of environments.
</P>
<H4>Acknowledgments
</H4>
<P>
It is worth summarizing compactly the roles of the direct contributors to today's
C language.
Ken Thompson created the B language in 1969-70; it was derived directly
from Martin Richards's BCPL.
Dennis Ritchie turned B into C during 1971-73, keeping most of B's syntax
while adding types and many other changes, and writing the
first compiler.
Ritchie, Alan Snyder, Steven C. Johnson, Michael Lesk, and Thompson contributed language
ideas during 1972-1977,
and Johnson's portable compiler remains widely used.
During this period, the collection of library routines grew
considerably, thanks to these people and many others at Bell Laboratories.
In 1978, Brian Kernighan and Ritchie wrote the book that
became the language definition for several years.
Beginning in 1983, the ANSI X3J11 committee standardized
the language. Especially notable in keeping its
efforts on track were its officers
Jim Brodie, Tom Plum, and P. J. Plauger, and the successive draft redactors,
Larry Rosler and Dave Prosser.
</P>
<P>
I thank Brian Kernighan, Doug McIlroy, Dave Prosser, Peter
Nelson, Rob Pike, Ken Thompson, and HOPL's referees
for advice in the preparation of this paper.
</P>
<H4>References
</H4>
<br>&#32;<br>
<DL>
<DT>&#32;<DD><DT>[ANSI 89]<DD>
American National Standards Institute,
<I>American National Standard for Information Systems&amp;#173;Programming Language C,</I>
X3.159-1989.
<DT>&#32;<DD><DT>[Anderson 80]<DD>
B. Anderson,
`Type syntax in the language C: an object lesson in syntactic innovation,'
SIGPLAN Notices
<B>15</B>
(3), March, 1980, pp. 21-27.
<DT>&#32;<DD><DT>[Bell 72]<DD>
J. R. Bell, `Threaded Code,' C. ACM
<B>16</B>
(6), pp. 370-372.
<DT>&#32;<DD><DT>[Canaday 69]<DD>
R. H. Canaday and D. M. Ritchie,
`Bell Laboratories BCPL,'
AT&amp;T Bell Laboratories internal memorandum, May, 1969.
<DT>&#32;<DD><DT>[Corbato 62]<DD>
F. J. Corbato, M. Merwin-Dagget, R. C. Daley,
`An Experimental Time-sharing System,' AFIPS Conf. Proc. SJCC,
1962, pp. 335-344.
<DT>&#32;<DD><DT>[Cox 86]<DD>
B. J. Cox and A. J. Novobilski,
<I>Object-Oriented Programming: An Evolutionary Approach,</I>
Addison-Wesley: Reading, Mass., 1986. Second edition, 1991.
<DT>&#32;<DD><DT>[Gehani 89]<DD>
N. H. Gehani and W. D. Roome,
<I>Concurrent C,</I>
Silicon Press: Summit, NJ, 1989.
<DT>&#32;<DD><DT>[Jensen 74]<DD>
K. Jensen and N. Wirth,
<I>Pascal User Manual and Report,</I>
Springer-Verlag: New York, Heidelberg, Berlin. Second Edition, 1974.
<DT>&#32;<DD><DT>[Johnson 73]<DD>
S. C. Johnson and B. W. Kernighan, `The Programming Language B,'
Comp. Sci. Tech. Report #8, AT&amp;T Bell Laboratories (January 1973).
<DT>&#32;<DD><DT>[Johnson 78a]<DD>
S. C. Johnson and D. M. Ritchie,
`Portability of C Programs and the UNIX System,'
Bell Sys. Tech. J.
<B>57</B>
(6) (part 2), July-Aug, 1978.
<DT>&#32;<DD><DT>[Johnson 78b]<DD>
S. C. Johnson,
`A Portable Compiler: Theory and Practice,'
Proc. 5th ACM POPL Symposium (January 1978).
<DT>&#32;<DD><DT>[Johnson 79a]<DD>
S. C. Johnson, `Yet another compiler-compiler,' in
<I>Unix Programmer's Manual,</I>
Seventh Edition, Vol. 2A, M. D. McIlroy and B. W. Kernighan, eds.
AT&amp;T Bell Laboratories: Murray Hill, NJ, 1979.
<DT>&#32;<DD><DT>[Johnson 79b]<DD>
S. C. Johnson, `Lint, a Program Checker,' in
<I>Unix Programmer's Manual,</I>
Seventh Edition, Vol. 2B, M. D. McIlroy and B. W. Kernighan, eds.
AT&amp;T Bell Laboratories: Murray Hill, NJ, 1979.
<DT>&#32;<DD><DT>[Kernighan 78]<DD>
B. W. Kernighan and D. M. Ritchie,
<I>The C Programming Language,</I>
Prentice-Hall: Englewood Cliffs, NJ, 1978.
Second edition, 1988.
<DT>&#32;<DD><DT>[Kernighan 81]<DD>
B. W. Kernighan,
`Why Pascal is not my favorite programming language,'
Comp. Sci. Tech. Rep. #100, AT&amp;T Bell Laboratories, 1981.
<DT>&#32;<DD><DT>[Lesk 73]<DD>
M. E. Lesk, `A Portable I/O Package,'
AT&amp;T Bell Laboratories internal memorandum ca. 1973.
<DT>&#32;<DD><DT>[MacDonald 89]<DD>
T. MacDonald,
`Arrays of variable length,'
J. C Lang. Trans
<B>1</B>
(3), Dec. 1989, pp. 215-233.
<DT>&#32;<DD><DT>[McClure 65]<DD>
R. M. McClure, `TMG&#173;A Syntax Directed Compiler,'
Proc. 20th ACM National Conf. (1965), pp. 262-274.
<DT>&#32;<DD><DT>[McIlroy 60]<DD>
M. D. McIlroy, `Macro Instruction Extensions of Compiler Languages,'
C. ACM
<B>3</B>
(4), pp. 214-220.
<DT>&#32;<DD><DT>[McIlroy 79]<DD>
M. D. McIlroy and B. W. Kernighan, eds,
<I>Unix Programmer's Manual,</I>
Seventh Edition, Vol. I,
AT&amp;T Bell Laboratories: Murray Hill, NJ, 1979.
<DT>&#32;<DD><DT>[Meyer 88]<DD>
B. Meyer,
<I>Object-oriented Software Construction,</I>
Prentice-Hall: Englewood Cliffs, NJ, 1988.
<DT>&#32;<DD><DT>[Nelson 91]<DD>
G. Nelson,
<I> Systems Programming with Modula-3,</I>
Prentice-Hall: Englewood Cliffs, NJ, 1991.
<DT>&#32;<DD><DT>[Organick 75]<DD>
E. I. Organick,
<I>The Multics System: An Examination of its Structure,</I>
MIT Press: Cambridge, Mass., 1975.
<DT>&#32;<DD><DT>[Richards 67]<DD>
M. Richards, `The BCPL Reference Manual,'
MIT Project MAC Memorandum M-352, July 1967.
<DT>&#32;<DD><DT>[Richards 79]<DD>
M. Richards and C. Whitbey-Strevens,
<I>BCPL: The Language and its Compiler,</I>
Cambridge Univ. Press: Cambridge, 1979.
<DT>&#32;<DD><DT>[Ritchie 78]<DD>
D. M. Ritchie, `UNIX: A Retrospective,' Bell Sys. Tech. J.
<B>57</B>
(6) (part 2), July-Aug, 1978.
<DT>&#32;<DD><DT>[Ritchie 84]<DD>
D. M. Ritchie, `The Evolution of the UNIX Time-sharing System,'
AT&amp;T Bell Labs. Tech. J.
<B>63</B>
(8) (part 2), Oct. 1984.
<DT>&#32;<DD><DT>[Ritchie 90]<DD>
D. M. Ritchie,
`Variable-size arrays in C,'
J. C Lang. Trans.
<B>2</B>
(2), Sept. 1990, pp. 81-86.
<DT>&#32;<DD><DT>[Sethi 81]<DD>
R. Sethi,
`Uniform syntax for type expressions and declarators,'
Softw. Prac. and Exp.
<B>11</B>
(6), June 1981, pp. 623-628.
<DT>&#32;<DD><DT>[Snyder 74]<DD>
A. Snyder,
<I>A Portable Compiler for the Language C,</I>
MIT: Cambridge, Mass., 1974.
<DT>&#32;<DD><DT>[Stoy 72]<DD>
J. E. Stoy and C. Strachey, `OS6&#173;An experimental operating
system for a small computer. Part I: General principles and structure,'
Comp J.
<B>15</B>,
(Aug. 1972), pp. 117-124.
<DT>&#32;<DD><DT>[Stroustrup 86]<DD>
B. Stroustrup,
<I>The C++ Programming Language,</I>
Addison-Wesley: Reading, Mass., 1986.
Second edition, 1991.
<DT>&#32;<DD><DT>[Thacker 79]<DD>
C. P. Thacker, E. M. McCreight, B. W. Lampson, R. F. Sproull,
D. R. Boggs, `Alto: A Personal Computer,' in
<I>Computer Structures: Principles and Examples,</I>
D. Sieworek, C. G. Bell, A. Newell,
McGraw-Hill: New York, 1982.
<DT>&#32;<DD><DT>[Thinking 90]<DD>
<I>C* Programming Guide,</I>
Thinking Machines Corp.: Cambridge Mass., 1990.
<DT>&#32;<DD><DT>[Thompson 69]<DD>
K. Thompson, `Bon&#173;an Interactive Language,' undated AT&amp;T Bell Laboratories
internal memorandum (ca. 1969).
<DT>&#32;<DD><DT>[Wijngaarden 75]<DD>
A. van Wijngaarden, B. J. Mailloux, J. E. Peck, C. H. Koster, M. Sintzoff,
C. Lindsey, L. G. Meertens, R. G. Fisker, `Revised report on the algorithmic
language Algol 68,' Acta Informatica
<B>5</B>,
pp. 1-236.
</dl>
<br>&#32;<br>
<A href=http://www.lucent.com/copyright.html>
Copyright</A> &#169; 1996 Lucent Technologies Inc. All rights reserved.
</body></html>