noc-book-2/content/10_nn.html

<section data-type="chapter">
  <h1 id="chapter-10-neural-networks">Chapter 10. Neural Networks</h1>
  <div class="chapter-opening-quote">
    <blockquote data-type="epigraph">
      <p>The human brain has 100 billion neurons,</p>
      <p>each neuron connected to 10 thousand</p>
      <p>other neurons. Sitting on your shoulders</p>
      <p>is the most complicated object</p>
      <p>in the known universe.</p>
      <div class="chapter-opening-quote-source">
        <p>—Michio Kaku</p>
      </div>
    </blockquote>
  </div>
  <div class="chapter-opening-figure">
    <figure>
      <img src="images/10_nn/10_nn_1.jpg" alt="" />
      <figcaption></figcaption>
    </figure>
    <h3
      id="khipu-on-display-at-the-machu-picchu-museum-cusco-peru-photo-by-pi3124"
    >
      Khipu on display at the Machu Picchu Museum, Cusco, Peru (photo by
      Pi3.124)
    </h3>
    <p>
      The <em>khipu</em> (or <em>quipu</em>) is an ancient Incan device used for
      recordkeeping and communication. It comprised a complex system of knotted
      cords to encode and transmit information. Each colored string and knot
      type and pattern represented specific data, such as census records or
      calendrical information. Interpreters, known as <em>quipucamayocs</em>,
      acted as a kind of accountant and decoded the stringed narrative into
      understandable information.
    </p>
  </div>
  <p>
    I began with inanimate objects living in a world of forces, and I gave them
    desires, autonomy, and the ability to take action according to a system of
    rules. Next, I allowed those objects, now called <em>creatures</em>, to live
    in a population and evolve over time. Now I’d like to ask, What is each
    creature’s decision-making process? How can it adjust its choices by
    learning over time? Can a computational entity process its environment and
    generate a decision?
  </p>
  <p>
    To answer these questions, I’ll once again look to nature for
    inspiration—specifically, the human brain. A brain can be described as a
    biological <strong>neural network</strong>, an interconnected web of neurons
    transmitting elaborate patterns of electrical signals. Within each neuron,
    dendrites receive input signals, and based on those inputs, the neuron fires
    an output signal via an axon (see Figure 10.1). Or something like that. How
    the human brain actually works is an elaborate and complex mystery, one that
    I’m certainly not going to attempt to unravel in rigorous detail in this
    chapter.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_2.png"
      alt="Figure 10.1: A neuron with dendrites and an axon connected to another neuron"
    />
    <figcaption>
      Figure 10.1: A neuron with dendrites and an axon connected to another
      neuron
    </figcaption>
  </figure>
  <p>
    Fortunately, as you’ve seen throughout this book, developing engaging
    animated systems with code doesn’t require scientific rigor or accuracy.
    Designing a smart rocket isn’t rocket science, and neither is designing an
    artificial neural network brain science. It’s enough to simply be inspired
    by the <em>idea</em> of brain function.
  </p>
  <p>
    In this chapter, I’ll begin with a conceptual overview of the properties and
    features of neural networks and build the simplest possible example of one,
    a network that consists of a single neuron. I’ll then introduce you to more
    complex neural networks by using the ml5.js library. This will serve as a
    foundation for <a href="/neuroevolution#">Chapter 11</a>, the grand finale
    of this book, where I’ll combine GAs with neural networks for physics
    simulation.
  </p>
  <h2 id="introducing-artificial-neural-networks">
    Introducing Artificial Neural Networks
  </h2>
  <p>
    Computer scientists have long been inspired by the human brain. In 1943,
    Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician,
    developed the first conceptual model of an artificial neural network. In
    their paper “A Logical Calculus of the Ideas Immanent in Nervous Activity,”
    they describe a <strong>neuron </strong>as a single computational cell
    living in a network of cells that receives inputs, processes those inputs,
    and generates an output.
  </p>
  <p>
    Their work, and the work of many scientists and researchers who followed,
    wasn’t meant to accurately describe how the biological brain works. Rather,
    an <em>artificial</em> neural network (hereafter referred to as just a
    <em>neural network</em>) was intended as a computational model based on the
    brain, designed to solve certain kinds of problems that were traditionally
    difficult for computers.
  </p>
  <p>
    Some problems are incredibly simple for a computer to solve but difficult
    for humans like you and me. Finding the square root of 964,324 is an
    example. A quick line of code produces the value 982, a number my computer
    can compute in less than a millisecond, but if you asked me to calculate
    that number myself, you’d be in for quite a wait. On the other hand, certain
    problems are incredibly simple for you or me to solve, but not so easy for a
    computer. Show any toddler a picture of a kitten or puppy, and they’ll
    quickly be able to tell you which one is which. Listen to a conversation in
    a noisy café and focus on just one person’s voice, and you can effortlessly
    comprehend their words. But need a machine to perform one of these tasks?
    Scientists have spent entire careers researching and implementing complex
    solutions, and neural networks are one of them.
  </p>
  <p>
    Here are some of the easy-for-a-human, difficult-for-a-machine applications
    of neural networks in software today:
  </p>
  <ul>
    <li>
      <strong>Pattern recognition:</strong> Neural networks are well suited to
      problems when the aim is to detect, interpret, and classify features or
      patterns within a dataset. This includes everything from identifying
      objects (like faces) in images, to optical character recognition, to more
      complex tasks like gesture recognition.
    </li>
    <li>
      <strong>Time-series prediction and anomaly detection: </strong>Neural
      networks are utilized both in forecasting, such as predicting stock market
      trends or weather patterns, and in recognizing anomalies, which can be
      applied to areas like cyberattack detection and fraud prevention.
    </li>
    <li>
      <strong>Natural language processing (NLP):</strong> One of the biggest
      developments in recent years has been the use of neural networks for
      processing and understanding human language. They’re used in various tasks
      including machine translation, sentiment analysis, and text summarization,
      and are the underlying technology behind many digital assistants and
      chatbots.
    </li>
    <li>
      <strong>Signal processing and soft sensors:</strong> Neural networks play
      a crucial role in devices like cochlear implants and hearing aids by
      filtering noise and amplifying essential sounds. They’re also involved in
      <em>soft sensors</em>, software systems that process data from multiple
      sources to give a comprehensive analysis of the environment.
    </li>
    <li>
      <strong>Control and adaptive decision-making systems: </strong>These
      applications range from autonomous vehicles like self-driving cars and
      drones to adaptive decision-making used in game playing, pricing models,
      and recommendation systems on media platforms.
    </li>
    <li>
      <strong>Generative models:</strong> The rise of novel neural network
      architectures has made it possible to generate new content. These systems
      can synthesize images, enhance image resolution, transfer style between
      images, and even generate music and video.
    </li>
  </ul>
  <p>
    Covering the full gamut of applications for neural networks would merit an
    entire book (or series of books), and by the time that book was printed, it
    would probably be out of date. Hopefully, this list gives you an overall
    sense of the features and possibilities.
  </p>
  <h3 id="how-neural-networks-work">How Neural Networks Work</h3>
  <p>
    In some ways, neural networks are quite different from other computer
    programs. The computational systems I’ve been writing so far in this book
    are <strong>procedural</strong>: a program starts at the first line of code,
    executes it, and goes on to the next, following instructions in a linear
    fashion. By contrast, a true neural network doesn’t follow a linear path.
    Instead, information is processed collectively, in parallel, throughout a
    network of nodes, with each node representing a neuron. In this sense, a
    neural network is considered a <strong>connectionist </strong>system.
  </p>
  <p>
    In other ways, neural networks aren’t so different from some of the programs
    you’ve seen. A neural network exhibits all the hallmarks of a complex
    system, much like a cellular automaton or a flock of boids. Remember how
    each individual boid was simple to understand, yet by following only three
    rules—separation, alignment, cohesion—it contributed to complex behaviors?
    Each individual element in a neural network is equally simple to understand.
    It reads an input (a number), processes it, and generates an output (another
    number). That’s all there is to it, and yet a network of many neurons can
    exhibit incredibly rich and intelligent behaviors, echoing the complex
    dynamics seen in a flock of boids.
  </p>
  <div class="half-width-right">
    <figure>
      <img
        src="images/10_nn/10_nn_3.png"
        alt="Figure 10.2: A neural network is a system of neurons and connections."
      />
      <figcaption>
        Figure 10.2: A neural network is a system of neurons and connections.
      </figcaption>
    </figure>
  </div>
  <p>
    In fact, a neural network isn’t just a complex system, but a complex
    <em>adaptive</em> system, meaning it can change its internal structure based
    on the information flowing through it. In other words, it has the ability to
    learn. Typically, this is achieved by adjusting <strong>weights</strong>. In
    Figure 10.2, each arrow represents a connection between two neurons and
    indicates the pathway for the flow of information. Each connection has a
    weight, a number that controls the signal between the two neurons. If the
    network generates a <em>good</em> output (which I’ll define later), there’s
    no need to adjust the weights. However, if the network generates a
    <em>poor</em> output—an error, so to speak—then the system adapts, altering
    the weights with the hope of improving subsequent results.
  </p>
  <p>
    Neural networks may use a variety of strategies for learning, and I’ll focus
    on one of them in this chapter:
  </p>
  <ul>
    <li>
      <strong>Supervised learning:</strong> Essentially, this strategy involves
      a teacher that’s smarter than the network itself. Take the case of facial
      recognition. The teacher shows the network a bunch of faces, and the
      teacher already knows the name associated with each face. The network
      makes its guesses; then the teacher provides the network with the actual
      names. The network can compare its answers to the known correct ones and
      make adjustments according to its errors. The neural networks in this
      chapter follow this model.
    </li>
    <li>
      <strong>Unsupervised learning:</strong> This technique is required when
      you don’t have an example dataset with known answers. Instead, the network
      works on its own to uncover hidden patterns in the data. An application of
      this is clustering: a set of elements is divided into groups according to
      an unknown pattern. I won’t be showing any instances of unsupervised
      learning, as the strategy is less relevant to the book’s examples.
    </li>
    <li>
      <strong>R</strong><strong>einforcement learning:</strong> This strategy is
      built on observation: a learning agent makes decisions and looks to its
      environment for the results. It’s rewarded for good decisions and
      penalized for bad decisions, such that it learns to make better decisions
      over time. I’ll discuss this strategy in more detail in
      <a href="/neuroevolution#">Chapter 11</a>.
    </li>
  </ul>
  <p>
    The ability of a neural network to learn, to make adjustments to its
    structure over time, is what makes it so useful in the field of
    <strong>machine learning</strong>. This term can be traced back to the 1959
    paper “Some Studies in Machine Learning Using the Game of Checkers,” in
    which computer scientist Arthur Lee Samuel outlines a “self-learning”
    program for playing checkers. The concept of an algorithm enabling a
    computer to learn without explicit programming is the foundation of machine
    learning.
  </p>
  <p>
    Think about what you’ve been doing throughout this book: coding! In
    traditional programming, a computer program takes inputs and, based on the
    rules you’ve provided, produces outputs. Machine learning, however, turns
    this approach upside down. Instead of you writing the rules, the system is
    given example inputs and outputs, and generates the rules itself! Many
    algorithms can be used to implement machine learning, and a neural network
    is just one of them.
  </p>
  <p>
    Machine learning is part of the broad, sweeping field of
    <strong>artificial intelligence (AI)</strong>, although the terms are
    sometimes used interchangeably. In their thoughtful and friendly primer
    <em>A People’s Guide to AI</em>, Mimi Onuoha and Diana Nucera (aka Mother
    Cyborg) define AI as “the theory and development of computer systems able to
    perform tasks that normally require human intelligence.” Machine learning
    algorithms are one approach to these tasks, but not all AI systems feature a
    self-learning component.
  </p>
  <h3 id="machine-learning-libraries">Machine Learning Libraries</h3>
  <p>
    Today, leveraging machine learning in creative coding and interactive media
    isn’t only feasible but increasingly common, thanks to third-party libraries
    that handle a lot of the neural network implementation details under the
    hood. While the vast majority of machine learning development and research
    is done in Python, the world of web development has seen the emergence of
    powerful JavaScript-based tools. Two libraries of note are TensorFlow.js and
    ml5.js.
  </p>
  <p>
    TensorFlow.js<strong> </strong>is an open source library that lets you
    define, train, and run neural networks directly in the browser using
    JavaScript, without the need to install or configure complex environments.
    It’s part of the TensorFlow ecosystem, which is maintained and developed by
    Google. TensorFlow.js is a powerful tool, but its low-level operations and
    highly technical API can be intimidating to beginners. Enter ml5.js, a
    library built on top of TensorFlow.js and designed specifically for use with
    p5.js. Its goal is to be beginner friendly and make machine learning
    approachable for a broad audience of artists, creative coders, and students.
    I’ll demonstrate how to use ml5.js in
    <a href="#machine-learning-with-ml5js">“Machine Learning with ml5.js”</a>.
  </p>
  <p>
    A benefit of libraries like TensorFlow.js and ml5.js is that you can use
    them to run pretrained models. A machine learning <strong>model</strong> is
    a specific setup of neurons and connections, and a
    <strong>pretrained</strong> model is one that has already been prepared for
    a particular task. For example, popular pretrained models are used for
    classifying images, identifying body poses, recognizing facial landmarks or
    hand positions, and even analyzing the sentiment expressed in a text. You
    can use such a model as is or treat it as a starting point for additional
    learning (commonly referred to as <strong>transfer learning</strong>).
  </p>
  <p>
    Before I get to exploring the ml5.js library, however, I’d like to try my
    hand at building the simplest of all neural networks from scratch, using
    only p5.js, to illustrate how the concepts of neural networks and machine
    learning are implemented in code.
  </p>
  <h2 id="the-perceptron">The Perceptron</h2>
  <p>
    A <strong>perceptron</strong> is the simplest neural network possible: a
    computational model of a single neuron. Invented in 1957 by Frank Rosenblatt
    at the Cornell Aeronautical Laboratory, a perceptron consists of one or more
    inputs, a processor, and a single output, as shown in Figure 10.3.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_4.png"
      alt="Figure 10.3: A simple perceptron with two inputs and one output"
    />
    <figcaption>
      Figure 10.3: A simple perceptron with two inputs and one output
    </figcaption>
  </figure>
  <p>
    A perceptron follows the <strong>feed-forward</strong> model: data passes
    (feeds) through the network in one direction. The inputs are sent into the
    neuron, are processed, and result in an output. This means the one-neuron
    network diagrammed in Figure 10.3 reads from left to right (forward): inputs
    come in, and output goes out.
  </p>
  <p>
    Say I have a perceptron with two inputs, the values 12 and 4. In machine
    learning, it’s customary to denote each input with an
    <span data-type="equation">x</span>, so I’ll call these inputs
    <span data-type="equation">x_0</span> and
    <span data-type="equation">x_1</span>:
  </p>
  <table>
    <thead>
      <tr>
        <th style="width: 100px">Phrase</th>
        <th>Value</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td><span data-type="equation">x_0</span></td>
        <td>12</td>
      </tr>
      <tr>
        <td><span data-type="equation">x_1</span></td>
        <td>4</td>
      </tr>
    </tbody>
  </table>
  <h3 id="perceptron-steps">Perceptron Steps</h3>
  <p>
    To get from these inputs to an output, the perceptron follows a series of
    steps.
  </p>
  <h4 id="step-1-weight-the-inputs">Step 1: Weight the Inputs</h4>
  <p>
    Each input sent into the neuron must first be weighted, meaning it’s
    multiplied by a value, often a number from –1 to +1. When creating a
    perceptron, the inputs are typically assigned random weights. I’ll call my
    weights <span data-type="equation">w_0</span> and
    <span data-type="equation">w_1</span>:
  </p>
  <table>
    <thead>
      <tr>
        <th style="width: 100px">Phrase</th>
        <th>Value</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td><span data-type="equation">w_0</span></td>
        <td>0.5</td>
      </tr>
      <tr>
        <td><span data-type="equation">w_1</span></td>
        <td>–1</td>
      </tr>
    </tbody>
  </table>
  <p>Each input needs to be multiplied by its corresponding weight:</p>
  <table>
    <thead>
      <tr>
        <th style="width: 100px">Phrase</th>
        <th style="width: 100px">Phrase</th>
        <th>
          Input <span data-type="equation">\boldsymbol{\times}</span> Weight
        </th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>12</td>
        <td>0.5</td>
        <td>6</td>
      </tr>
      <tr>
        <td>4</td>
        <td>–1</td>
        <td>–4</td>
      </tr>
    </tbody>
  </table>
  <h4 id="step-2-sum-the-inputs">Step 2: Sum the Inputs</h4>
  <p>The weighted inputs are then added together:</p>
  <div data-type="equation">6 + -4 = 2</div>
  <h4 id="step-3-generate-the-output">Step 3: Generate the Output</h4>
  <p>
    The output of a perceptron is produced by passing the sum through an
    <strong>activation function</strong> that reduces the output to one of two
    possible values. Think of this binary output as an LED that’s only
    <em>off</em> or <em>on</em>, or as a neuron in an actual brain that either
    fires or doesn’t fire. The activation function determines whether the
    perceptron should “fire.”
  </p>
  <p>
    Activation functions can get a little bit hairy. If you start reading about
    them in an AI textbook, you may soon find yourself reaching in turn for a
    calculus textbook. However, your new friend the simple perceptron provides
    an easier option that still demonstrates the concept. I’ll make the
    activation function the sign of the sum. If the sum is a positive number,
    the output is 1; if it’s negative, the output is –1:
  </p>
  <div data-type="equation">\text{sign}(2) = +1</div>
  <h3 id="putting-it-all-together-1">Putting It All Together</h3>
  <p>
    Putting the preceding three parts together, here are the steps of the
    <strong>perceptron algorithm</strong>:
  </p>
  <ol>
    <li>For every input, multiply that input by its weight.</li>
    <li>Sum all the weighted inputs.</li>
    <li>
      Compute the output of the perceptron by passing that sum through an
      activation function (the sign of the sum).
    </li>
  </ol>
  <p>
    I can start writing this algorithm in code by using two arrays of values,
    one for the inputs and one for the weights:
  </p>
  <pre class="codesplit" data-code-language="javascript">
let inputs = [12, 4];
let weights = [0.5, -1];</pre
  >
  <p>
    The “for every input” in step 1 implies a loop that multiplies each input by
    its corresponding weight. To obtain the sum, the results can be added up in
    that same loop:
  </p>
  <pre class="codesplit" data-code-language="javascript">
// Steps 1 and 2: Add up all the weighted inputs.
let sum = 0;
for (let i = 0; i &#x3C; inputs.length; i++) {
  sum += inputs[i] * weights[i];
}</pre
  >
  <p>With the sum, I can then compute the output:</p>
  <pre class="codesplit" data-code-language="javascript">
// Step 3: Pass the sum through an activation function.
let output = activate(sum);

// The activation function
function activate(sum) {
  //{!5} Return a 1 if positive, –1 if negative.
  if (sum > 0) {
    return 1;
  } else {
    return -1;
  }
}</pre
  >
  <p>
    You might be wondering how I’m handling the value of 0 in the activation
    function. Is 0 positive or negative? The deep philosophical implications of
    this question aside, I’m choosing here to arbitrarily return a –1 for 0, but
    I could easily change the <code>></code> to <code>>=</code> to go the other
    way. Depending on the application, this decision could be significant, but
    for demonstration purposes here, I can just pick one.
  </p>
  <p>
    Now that I’ve explained the computational process of a perceptron, let’s
    look at an example of one in action.
  </p>
  <h3 id="simple-pattern-recognition-using-a-perceptron">
    Simple Pattern Recognition Using a Perceptron
  </h3>
  <p>
    I’ve mentioned that neural networks are commonly used for pattern
    recognition. The scenarios outlined earlier require more complex networks,
    but even a simple perceptron can demonstrate a fundamental type of pattern
    recognition in which data points are classified as belonging to one of two
    groups. For instance, imagine you have a dataset of plants and want to
    identify them as either <em>xerophytes</em> (plants that have evolved to
    survive in an environment with little water and lots of sunlight, like the
    desert) or <em>hydrophytes</em> (plants that have adapted to living
    submerged in water, with reduced light). That’s how I’ll use my perceptron
    in this section.
  </p>
  <p>
    One way to approach classifying the plants is to plot their data on a 2D
    graph and treat the problem as a spatial one. On the x-axis, plot the amount
    of daily sunlight received by the plant, and on the y-axis, plot the amount
    of water. Once all the data has been plotted, it’s easy to draw a line
    across the graph, with all the xerophytes on one side and all the
    hydrophytes on the other, as in Figure 10.4. (I’m simplifying a little here.
    Real-world data would probably be messier, making the line harder to draw.)
    That’s how each plant can be classified. Is it below the line? Then it’s a
    xerophyte. Is it above the line? Then it’s a hydrophyte.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_5.png"
      alt="Figure 10.4: A collection of points in 2D space divided by a line, representing plant categories according to their water and sunlight intake "
    />
    <figcaption>
      Figure 10.4: A collection of points in 2D space divided by a line,
      representing plant categories according to their water and sunlight intake
    </figcaption>
  </figure>
  <p>
    In truth, I don’t need a neural network—not even a simple perceptron—to tell
    me whether a point is above or below a line. I can see the answer for myself
    with my own eyes, or have my computer figure it out with simple algebra. But
    just like solving a problem with a known answer—“to be or not to be”—was a
    convenient first test for the GA in
    <a href="/genetic-algorithms#">Chapter 9</a>, training a perceptron to
    categorize points as being on one side of a line versus the other will be a
    valuable way to demonstrate the algorithm of the perceptron and verify that
    it’s working properly.
  </p>
  <p>
    To solve this problem, I’ll give my perceptron two inputs:
    <span data-type="equation">x_0</span> is the x-coordinate of a point,
    representing a plant’s amount of sunlight, and
    <span data-type="equation">x_1</span> is the y-coordinate of that point,
    representing the plant’s amount of water. The perceptron then guesses the
    plant’s classification according to the sign of the weighted sum of these
    inputs. If the sum is positive, the perceptron outputs a +1, signifying a
    hydrophyte (above the line). If the sum is negative, it outputs a –1,
    signifying a xerophyte (below the line). Figure 10.5 shows this perceptron
    (note the shorthand of <span data-type="equation">w_0</span> and
    <span data-type="equation">w_1</span> for the weights).
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_6.png"
      alt="Figure 10.5: A perceptron with two inputs (x_0 and x_1), a weight for each input (w_0 and w_1), and a processing neuron that generates the output"
    />
    <figcaption>
      Figure 10.5: A perceptron with two inputs (<span data-type="equation"
        >x_0</span
      >
      and <span data-type="equation">x_1</span>), a weight for each input (<span
        data-type="equation"
        >w_0</span
      >
      and <span data-type="equation">w_1</span>), and a processing neuron that
      generates the output
    </figcaption>
  </figure>
  <p>
    This scheme has a pretty significant problem, however. What if my data point
    is (0, 0), and I send this point into the perceptron as inputs
    <span data-type="equation">x_0 = 0</span> and
    <span data-type="equation">x_1=0</span>? No matter what the weights are,
    multiplication by 0 is 0. The weighted inputs are therefore still 0, and
    their sum will be 0 too. And the sign of 0 is . . . hmmm, there’s that deep
    philosophical quandary again. Regardless of how I feel about it, the point
    (0, 0) could certainly be above or below various lines in a 2D world. How is
    the perceptron supposed to interpret it accurately?
  </p>
  <p>
    To avoid this dilemma, the perceptron requires a third input, typically
    referred to as a <strong>bias</strong> input. This extra input always has
    the value of 1 and is also weighted. Figure 10.6 shows the perceptron with
    the addition of the bias.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_7.png"
      alt="Figure 10.6: Adding a bias input, along with its weight, to the perceptron"
    />
    <figcaption>
      Figure 10.6: Adding a bias input, along with its weight, to the perceptron
    </figcaption>
  </figure>
  <p>How does this affect point (0, 0)?</p>
  <table>
    <thead>
      <tr>
        <th style="width: 100px">Phrase</th>
        <th style="width: 100px">Phrase</th>
        <th>Result</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>0</td>
        <td><span data-type="equation">w_0</span></td>
        <td>0</td>
      </tr>
      <tr>
        <td>0</td>
        <td><span data-type="equation">w_1</span></td>
        <td>0</td>
      </tr>
      <tr>
        <td>1</td>
        <td><span data-type="equation">w_\text{bias}</span></td>
        <td><span data-type="equation">w_\text{bias}</span></td>
      </tr>
    </tbody>
  </table>
  <p>
    The output is then the sum of the weighted results:
    <span data-type="equation">0 + 0 + w_\text{bias}</span>. Therefore, the bias
    by itself answers the question of where (0, 0) is in relation to the line.
    If the bias’s weight is positive, (0, 0) is above the line; if negative,
    it’s below. The extra input and its weight <em>bias</em> the perceptron’s
    understanding of the line’s position relative to (0, 0)!
  </p>
  <h3 id="the-perceptron-code">The Perceptron Code</h3>
  <p>
    I’m now ready to assemble the code for a <code>Perceptron</code> class. The
    perceptron needs to track only the input weights, which I can store using an
    array:
  </p>
  <div class="snip-below">
    <pre class="codesplit" data-code-language="javascript">
class Perceptron {
  constructor() {
    this.weights = [];
  }</pre
    >
  </div>
  <p>
    The constructor can receive an argument indicating the number of inputs (in
    this case, three: <span data-type="equation">x_0</span>,
    <span data-type="equation">x_1</span>, and a bias) and size the
    <code>weights</code> array accordingly, filling it with random values to
    start:
  </p>
  <div class="snip-above snip-below">
    <pre class="codesplit" data-code-language="javascript">
	// The argument <code>n</code> determines the number of inputs (including the bias).
  constructor(n) {
    this.weights = [];
    for (let i = 0; i &#x3C; n; i++) {
      //{!1} The weights are picked randomly to start.
      this.weights[i] = random(-1, 1);
    }
  }</pre
    >
  </div>
  <p>
    A perceptron’s job is to receive inputs and produce an output. These
    requirements can be packaged together in a
    <code>feedForward()</code> method. In this example, the perceptron’s inputs
    are an array (which should be the same length as the array of weights), and
    the output is a number, +1 or –1, as returned by the activation function
    based on the sign of the sum:
  </p>
  <div class="snip-above">
    <pre class="codesplit" data-code-language="javascript">
  feedForward(inputs) {
    let sum = 0;
    for (let i = 0; i &#x3C; this.weights.length; i++) {
      sum += inputs[i] * this.weights[i];
    }
    //{!1} The result is the sign of the sum, –1 or +1.
    // Here the perceptron is making a guess:
    // Is it on one side of the line or the other?
    return this.activate(sum);
  }
}</pre
    >
  </div>
  <p>
    Presumably, I could now create a <code>Perceptron</code> object and ask it
    to make a guess for any given point, as in Figure 10.7.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_8.png"
      alt="Figure 10.7: An (x, y) coordinate from the 2D space is the input to the perceptron. "
    />
    <figcaption>
      Figure 10.7: An (<em>x</em>, <em>y</em>) coordinate from the 2D space is
      the input to the perceptron.
    </figcaption>
  </figure>
  <p>Here’s the code to generate a guess:</p>
  <pre class="codesplit" data-code-language="javascript">
// Create the perceptron.
let perceptron = new Perceptron(3);
// The input is three values: x, y, and the bias.
let inputs = [50, -12, 1];
// The answer!
let guess = perceptron.feedForward(inputs);</pre
  >
  <p>
    Did the perceptron get it right? Maybe yes, maybe no. At this point, the
    perceptron has no better than a 50/50 chance of arriving at the correct
    answer, since each weight starts out as a random value. A neural network
    isn’t a magic tool that can automatically guess correctly on its own. I need
    to teach it how to do so!
  </p>
  <p>
    To train a neural network to answer correctly, I’ll use the supervised
    learning method I described earlier in the chapter. Remember, this technique
    involves giving the network inputs with known answers. This enables the
    network to check whether it has made a correct guess. If not, the network
    can learn from its mistake and adjust its weights. The process is as
    follows:
  </p>
  <ol>
    <li>
      Provide the perceptron with inputs for which there is a known answer.
    </li>
    <li>Ask the perceptron to guess an answer.</li>
    <li>Compute the error. (Did it get the answer right or wrong?)</li>
    <li>Adjust all the weights according to the error.</li>
    <li>Return to step 1 and repeat!</li>
  </ol>
  <p>
    This process can be packaged into a method on the
    <code>Perceptron</code> class, but before I can write it, I need to examine
    steps 3 and 4 in more detail. How do I define the perceptron’s error? And
    how should I adjust the weights according to this error?
  </p>
  <p>
    The perceptron’s error can be defined as the difference between the desired
    answer and its guess:
  </p>
  <div data-type="equation">
    \text{error} = \text{desired output} - \text{guess output}
  </div>
  <p>
    Does this formula look familiar? Think back to the formula for a vehicle’s
    steering force that I worked out in
    <a href="/autonomous-agents#">Chapter 5</a>:
  </p>
  <div data-type="equation">
    \text{steering} = \text{desired velocity} - \text{current velocity}
  </div>
  <p>
    This is also a calculation of an error! The current velocity serves as a
    guess, and the error (the steering force) indicates how to adjust the
    velocity in the correct direction. Adjusting a vehicle’s velocity to follow
    a target is similar to adjusting the weights of a neural network toward the
    correct answer.
  </p>
  <p>
    For the perceptron, the output has only two possible values: +1 or –1.
    Therefore, only three errors are possible. If the perceptron guesses the
    correct answer, the guess equals the desired output and the error is 0. If
    the correct answer is –1 and the perceptron guessed +1, then the error is
    –2. If the correct answer is +1 and the perceptron guessed –1, then the
    error is +2. Here’s that process summarized in a table:
  </p>
  <table>
    <thead>
      <tr>
        <th style="width: 100px">Phrase</th>
        <th style="width: 100px">Phrase</th>
        <th>Error</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>–1</td>
        <td>–1</td>
        <td>0</td>
      </tr>
      <tr>
        <td>–1</td>
        <td>+1</td>
        <td>–2</td>
      </tr>
      <tr>
        <td>+1</td>
        <td>–1</td>
        <td>+2</td>
      </tr>
      <tr>
        <td>+1</td>
        <td>+1</td>
        <td>0</td>
      </tr>
    </tbody>
  </table>
  <p>
    The error is the determining factor in how the perceptron’s weights should
    be adjusted. For any given weight, what I’m looking to calculate is the
    change in weight, often called
    <span data-type="equation">\Delta\text{weight}</span> (or
    <em>delta weight</em>, <span data-type="equation">\Delta</span> being the
    Greek letter delta):
  </p>
  <div data-type="equation">
    \text{new weight} = \text{weight} + \Delta\text{weight}
  </div>
  <p>
    To calculate <span data-type="equation">\Delta\text{weight}</span>, I need
    to multiply the error by the input:
  </p>
  <div data-type="equation">
    \Delta\text{weight} = \text{error} \times \text{input}
  </div>
  <p>Therefore, the new weight is calculated as follows:</p>
  <div data-type="equation">
    \text{new weight} = \text{weight} + \text{error} \times \text{input}
  </div>
  <p>
    To understand why this works, think again about steering. A steering force
    is essentially an error in velocity. By applying a steering force as an
    acceleration (or <span data-type="equation">\Delta\text{velocity}</span>),
    the velocity is adjusted to move in the correct direction. This is what I
    want to do with the neural network’s weights. I want to adjust them in the
    right direction, as defined by the error.
  </p>
  <p>
    With steering, however, I had an additional variable that controlled the
    vehicle’s ability to steer: the maximum force. A high maximum force allowed
    the vehicle to accelerate and turn quickly, while a lower force resulted in
    a slower velocity adjustment. The neural network will use a similar strategy
    with a variable called the <strong>learning constant</strong>:
  </p>
  <div data-type="equation">
    \text{new weight} = \text{weight} + (\text{error} \times \text{input})
    \times \text{learning constant}
  </div>
  <p>
    A high learning constant causes the weight to change more drastically. This
    may help the perceptron arrive at a solution more quickly, but it also
    increases the risk of overshooting the optimal weights. A small learning
    constant will adjust the weights more slowly and require more training time,
    but will allow the network to make small adjustments that could improve
    overall accuracy.
  </p>
  <p>
    Assuming the addition of a <code>learningConstant</code> property to the
    <code>Perceptron</code> class, I can now write a training method for the
    perceptron following the steps I outlined earlier:
  </p>
  <pre class="codesplit" data-code-language="javascript">
  // Step 1: Provide the inputs and known answer.
  // These are passed in as arguments to <code>train()</code>.
  train(inputs, desired) {
    // Step 2: Guess according to those inputs.
    let guess = this.feedforward(inputs);

    // Step 3: Compute the error (the difference between <code>desired</code> and <code>guess</code>).
    let error = desired - guess;

    //{!3} Step 4: Adjust all the weights according to the error and learning constant.
    for (let i = 0; i &#x3C; this.weights.length; i++) {
      this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
    }
  }</pre
  >
  <p>Here’s the <code>Perceptron</code> class as a whole:</p>
  <pre class="codesplit" data-code-language="javascript">
class Perceptron {
  constructor(totalInputs) {
    //{!2} The perceptron stores its weights and learning constants.
    this.weights = [];
    this.learningConstant = 0.01;
    //{!3} The weights start off random.
    for (let i = 0; i &#x3C; totalInputs; i++) {
      this.weights[i] = random(-1, 1);
    }
  }

  //{!7} Return an output based on inputs.
  feedforward(inputs) {
    let sum = 0;
    for (let i = 0; i &#x3C; this.weights.length; i++) {
      sum += inputs[i] * this.weights[i];
    }
    return this.activate(sum);
  }

  // The output is a +1 or –1.
  activate(sum) {
    if (sum > 0) {
      return 1;
    } else {
      return -1;
    }
  }

  //{!7} Train the network against known data.
  train(inputs, desired) {
    let guess = this.feedforward(inputs);
    let error = desired - guess;
    for (let i = 0; i &#x3C; this.weights.length; i++) {
      this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
    }
  }
}</pre
  >
  <p>
    To train the perceptron, I need a set of inputs with known answers. However,
    I don’t happen to have a real-world dataset (or time to research and collect
    one) for the xerophytes and hydrophytes scenario. In truth, though, the
    purpose of this demonstration isn’t to show you how to classify plants. It’s
    about how a perceptron can learn whether points are above or below a line on
    a graph, and so any set of points will do. In other words, I can just make
    up the data.
  </p>
  <p>
    What I’m describing is an example of <strong>synthetic data</strong>,
    artificially generated data that’s often used in machine learning to create
    controlled scenarios for training and testing. In this case, my synthetic
    data will consist of a set of random input points, each with a known answer
    indicating whether the point is above or below a line. To define the line
    and generate the data, I’ll use simple algebra. This approach allows me to
    clearly demonstrate the training process and show how the perceptron learns.
  </p>
  <p>
    The question therefore becomes, how do I pick a point and know whether it’s
    above or below a line (without a neural network, that is)? A line can be
    described as a collection of points, where each point’s y-coordinate is a
    function of its x-coordinate:
  </p>
  <div data-type="equation">y = f(x)</div>
  <p>
    For a straight line (specifically, a linear function), the relationship can
    be written like this:
  </p>
  <div data-type="equation">y = mx + b</div>
  <p>
    Here <em>m</em> is the slope of the line, and <em>b</em> is the value of
    <em>y</em> when <em>x</em> is 0 (the y-intercept). Here’s a specific
    example, with the corresponding graph in Figure 10.8.
  </p>
  <div data-type="equation">y = \frac{1}2x - 1</div>
  <figure>
    <img
      src="images/10_nn/10_nn_9.png"
      alt="Figure 10.8: A graph of y = \frac{1}2x - 1"
    />
    <figcaption>
      Figure 10.8: A graph of
      <span data-type="equation">y = \frac{1}2x - 1</span>
    </figcaption>
  </figure>
  <p>
    I’ll arbitrarily choose that as the equation for my line, and write a
    function accordingly:
  </p>
  <pre class="codesplit" data-code-language="javascript">
// A function to calculate <code>y</code> based on <code>x</code> along a line
function f(x) {
  return 0.5 * x - 1;
}</pre
  >
  <p>
    Now there’s the matter of the p5.js canvas defaulting to (0, 0) in the
    top-left corner with the y-axis pointing down. For this discussion, I’ll
    assume I’ve built the following into the code to reorient the canvas to
    match a more traditional Cartesian space:
  </p>
  <pre
    class="codesplit"
    data-code-language="javascript"
  >// Move the origin <code>(0, 0)</code> to the center.
translate(width / 2, height / 2);
// Flip the y-axis orientation (positive points up!).
scale(1, -1);</pre>
  <p>I can now pick a random point in the 2D space:</p>
  <pre class="codesplit" data-code-language="javascript">
let x = random(-100, 100);
let y = random(-100, 100);</pre
  >
  <p>
    How do I know if this point is above or below the line? The line function
    <em>f</em>(<em>x</em>) returns the <em>y</em> value on the line for that
    x-position. I’ll call that <span data-type="equation">y_\text{line}</span>:
  </p>
  <pre class="codesplit" data-code-language="javascript">
// The <code>y</code> position on the line
let yline = f(x);</pre
  >
  <p>
    If the <em>y</em> value I’m examining is above the line, it will be greater
    than <span data-type="equation">y_\text{line}</span>, as in Figure 10.9.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_10.png"
      alt="Figure 10.9: If y_\text{line} is less than y, the point is above the line."
    />
    <figcaption>
      Figure 10.9: If <span data-type="equation">y_\text{line}</span> is less
      than <em>y</em>, the point is above the line.
    </figcaption>
  </figure>
  <p>Here’s the code for that logic:</p>
  <pre class="codesplit" data-code-language="javascript">
// Start with a value of –1.
let desired = -1;
if (y > yline) {
  //{!1} The answer becomes +1 if <code>y</code> is above the line.
  desired = 1;
}</pre
  >
  <p>
    I can then make an input array to go with the <code>desired</code> output:
  </p>
  <pre class="codesplit" data-code-language="javascript">
// Don’t forget to include the bias!
let trainingInputs = [x, y, 1];</pre
  >
  <p>
    Assuming that I have a <code>perceptron</code> variable, I can train it by
    providing the inputs along with the desired answer:
  </p>
  <pre class="codesplit" data-code-language="javascript">
perceptron.train(trainingInputs, desired);</pre
  >
  <p>
    If I train the perceptron on a new random point (and its answer) for each
    cycle through <code>draw()</code>, it will gradually get better at
    classifying the points as above or below the line.
  </p>
  <div data-type="example">
    <h3 id="example-101-the-perceptron">Example 10.1: The Perceptron</h3>
    <figure>
      <div
        data-type="embed"
        data-p5-editor="https://editor.p5js.org/natureofcode/sketches/sMozIaMCW"
        data-example-path="examples/10_nn/10_1_perceptron_with_normalization"
      >
        <img
          src="examples/10_nn/10_1_perceptron_with_normalization/screenshot.png"
        />
      </div>
      <figcaption></figcaption>
    </figure>
  </div>
  <pre class="codesplit" data-code-language="javascript">// The perceptron
let perceptron;
//{!1} An array for training data
let training = [];
// A counter to track training data points one by one
let count = 0;

//{!3} The formula for a line
function f(x) {
  return 0.5 * x + 1;
}

function setup() {
  createCanvas(640, 240);

  // The perceptron has three inputs (including bias) and a learning rate of 0.0001.
  perceptron = new Perceptron(3, 0.0001);

  //{!1} Make 2,000 training data points.
  for (let i = 0; i &#x3C; 2000; i++) {
    let x = random(-width / 2, width / 2);
    let y = random(-height / 2, height / 2);
    training[i] = [x, y, 1];
  }
}

function draw() {
  background(255);
  // Reorient the canvas to match a traditional Cartesian plane.
  translate(width / 2, height / 2);
  scale(1, -1);

  // Draw the line.
  stroke(0);
  strokeWeight(2);
  line(-width / 2, f(-width / 2), width / 2, f(width / 2));

  // Get the current <code>(x, y)</code> of the training data.
  let x = training[count][0];
  let y = training[count][1];
  // What is the desired output?
  let desired = -1;
  if (y > f(x)) {
    desired = 1;
  }
  // Train the perceptron.
  perceptron.train(training[count], desired);

  // For animation, train one point at a time.
  count = (count + 1) % training.length;

  // Draw all the points and color according to the output of the perceptron.
  for (let dataPoint of training) {
    let guess = perceptron.feedforward(dataPoint);
    if (guess > 0) {
      fill(127);
    } else {
      fill(255);
    }
    strokeWeight(1);
    stroke(0);
    circle(dataPoint[0], dataPoint[1], 8);
  }
}</pre>
  <p>
    In Example 10.1, the training data is visualized alongside the target
    solution line. Each point represents a piece of training data, and its color
    is determined by the perceptron’s current classification—gray for +1 or
    white for –1. I use a small learning constant (0.0001) to slow down how the
    system refines its classifications over time.
  </p>
  <p>
    An intriguing aspect of this example lies in the relationship between the
    perceptron’s weights and the characteristics of the line dividing the
    points—specifically, the line’s slope and y-intercept (the <em>m</em> and
    <em>b</em> in <em>y</em> = <em>mx</em> + <em>b</em>). The weights in this
    context aren’t just arbitrary or “magic” values; they bear a direct
    relationship to the geometry of the dataset. In this case, I’m using just 2D
    data, but for many machine learning applications, the data exists in much
    higher-dimensional spaces. The weights of a neural network help navigate
    these spaces, defining <em>hyperplanes</em> or decision boundaries that
    segment and classify the data.
  </p>
  <div data-type="exercise">
    <h3 id="exercise-101">Exercise 10.1</h3>
    <p>
      Modify the code from Example 10.1 to also draw the perceptron’s current
      decision boundary during the training process—its best guess for where the
      line should be. Hint: Use the perceptron’s current weights to calculate
      the line’s equation.
    </p>
  </div>
  <p>
    While this perceptron example offers a conceptual foundation, real-world
    datasets often feature more diverse and dynamic ranges of input values. For
    the simplified scenario here, the range of values for <em>x</em> is larger
    than that for <em>y</em> because of the canvas size of 640<span
      data-type="equation"
      >\times</span
    >240. Despite this, the example still works—after all, the sign activation
    function doesn’t rely on specific input ranges, and it’s such a
    straightforward binary classification task.
  </p>
  <p>
    However, real-world data often has much greater complexity in terms of input
    ranges. To this end, <strong>data normalization</strong> is a critical step
    in machine learning. Normalizing data involves mapping the training data to
    ensure that all inputs (and outputs) conform to a uniform range—typically 0
    to 1, or perhaps –1 to 1. This process can improve training efficiency and
    prevent individual inputs from dominating the learning process. In the next
    section, using the ml5.js library, I’ll build data normalization into the
    process.
  </p>
  <div data-type="exercise">
    <h3 id="exercise-102">Exercise 10.2</h3>
    <p>
      Instead of using supervised learning, can you train the neural network to
      find the right weights by using a GA?
    </p>
  </div>
  <div data-type="exercise">
    <h3 id="exercise-103">Exercise 10.3</h3>
    <p>
      Incorporate data normalization into the example. Does this improve the
      learning efficiency?
    </p>
  </div>
  <h2 id="putting-the-network-in-neural-network">
    Putting the “Network” in Neural Network
  </h2>
  <p>
    A perceptron can have multiple inputs, but it’s still just a single, lonely
    neuron. Unfortunately, that limits the range of problems it can solve. The
    true power of neural networks comes from the <em>network</em> part. Link
    multiple neurons together and you’re able to solve problems of much greater
    complexity.
  </p>
  <p>
    If you read an AI textbook, it will say that a perceptron can solve only
    <strong>linearly separable</strong> problems. If a dataset is linearly
    separable, you can graph it and classify it into two groups simply by
    drawing a straight line (see Figure 10.10, left). Classifying plants as
    xerophytes or hydrophytes is a linearly separable problem.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_11.png"
      alt="Figure 10.10: Data points that are linearly separable (left) and data points that are nonlinearly separable, as a curve is required to separate the points (right)"
    />
    <figcaption>
      Figure 10.10: Data points that are linearly separable (left) and data
      points that are nonlinearly separable, as a curve is required to separate
      the points (right)
    </figcaption>
  </figure>
  <p>
    Now imagine you’re classifying plants according to soil acidity (x-axis) and
    temperature (y-axis). Some plants might thrive in acidic soils but only
    within a narrow temperature range, while other plants prefer less acidic
    soils but tolerate a broader range of temperatures. A more complex
    relationship exists between the two variables, so a straight line can’t be
    drawn to separate the two categories of plants, <em>acidophilic</em> and
    <em>alkaliphilic</em> (see Figure 10.10, right). A lone perceptron can’t
    handle this type of <strong>nonlinearly separable</strong> problem. (Caveat
    here: I’m making up these scenarios. If you happen to be a botanist, please
    let me know if I’m anywhere close to reality.)
  </p>
  <p>
    One of the simplest examples of a nonlinearly separable problem is XOR
    (exclusive or). This is a logical operator, similar to the more familiar AND
    and OR. For <em>A</em> AND <em>B </em>to be true, both <em>A</em> and
    <em>B</em> must be true. With OR, either <em>A</em> or <em>B</em> (or both)
    can be true. These are both linearly separable problems. The truth tables in
    Figure 10.11 show their solution space. Each true or false value in the
    table shows the output for a particular combination of true or false inputs.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_12.png"
      alt="Figure 10.11: Truth tables for the AND and OR logical operators. The true and false outputs can be separated by a line."
    />
    <figcaption>
      Figure 10.11: Truth tables for the AND and OR logical operators. The true
      and false outputs can be separated by a line.
    </figcaption>
  </figure>
  <p>
    See how you can draw a straight line to separate the true outputs from the
    false ones?
  </p>
  <p>
    The XOR operator is the equivalent of (OR) AND (NOT AND). In other words,
    <em>A</em> XOR <em>B </em>evaluates to true only if one of the inputs is
    true. If both inputs are false or both are true, the output is false. To
    illustrate, let’s say you’re having pizza for dinner. You love pineapple on
    pizza, and you love mushrooms on pizza, but put them together, and yech! And
    plain pizza, that’s no good either!
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_13.png"
      alt="Figure 10.12: The “truth” table for whether you want to eat the pizza (left) and XOR (right). Note how the true and false outputs can’t be separated by a single line."
    />
    <figcaption>
      Figure 10.12: The “truth” table for whether you want to eat the pizza
      (left) and XOR (right). Note how the true and false outputs can’t be
      separated by a single line.
    </figcaption>
  </figure>
  <p>
    The XOR truth table in Figure 10.12 isn’t linearly separable. Try to draw a
    straight line to separate the true outputs from the false ones—you can’t!
  </p>
  <p>
    The fact that a perceptron can’t even solve something as simple as XOR may
    seem extremely limiting. But what if I made a network out of two
    perceptrons? If one perceptron can solve the linearly separable OR and one
    perceptron can solve the linearly separate NOT AND, then two perceptrons
    combined can solve the nonlinearly separable XOR.
  </p>
  <p>
    When you combine multiple perceptrons, you get a
    <strong>multilayered perceptron</strong>, a network of many neurons (see
    Figure 10.13). Some are input neurons and receive the initial inputs, some
    are part of what’s called a <strong>hidden layer</strong> (as they’re
    connected to neither the inputs nor the outputs of the network directly),
    and then there are the output neurons, from which the results are read.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_14.png"
      alt="Figure 10.13: A multilayered perceptron has the same inputs and output as the simple perceptron, but now it includes a hidden layer of neurons."
    />
    <figcaption>
      Figure 10.13: A multilayered perceptron has the same inputs and output as
      the simple perceptron, but now it includes a hidden layer of neurons.
    </figcaption>
  </figure>
  <p>
    Up until now, I’ve been visualizing a singular perceptron with one circle
    representing a neuron processing its input signals. Now, as I move on to
    larger networks, it’s more typical to represent all the elements (inputs,
    neurons, outputs) as circles, with arrows that indicate the flow of data. In
    Figure 10.13, you can see the inputs and bias flowing into the hidden layer,
    which then flows to the output.
  </p>
  <p>
    Training a simple perceptron is pretty straightforward: you feed the data
    through and evaluate how to change the input weights according to the error.
    With a multilayered perceptron, however, the training process becomes more
    complex. The overall output of the network is still generated in essentially
    the same manner as before: the inputs multiplied by the weights are summed
    and fed forward through the various layers of the network. And you still use
    the network’s guess to calculate the error (desired result – guess). But now
    so many connections exist between layers of the network, each with its own
    weight. How do you know how much each neuron or connection contributed to
    the overall error of the network, and how it should be adjusted?
  </p>
  <p>
    The solution to optimizing the weights of a multilayered network is
    <strong>backpropagation</strong>. This process takes the error and feeds it
    backward through the network so it can adjust the weights of all the
    connections in proportion to how much they’ve contributed to the total
    error. The details of backpropagation are beyond the scope of this book. The
    algorithm uses a variety of activation functions (one classic example is the
    sigmoid function) as well as some calculus. If you’re interested in
    continuing down this road and learning more about how backpropagation works,
    you can find my
    <a href="https://thecodingtrain.com/neural-network"
      >“Toy Neural Network” project at the Coding Train website with
      accompanying video tutorials</a
    >. They go through all the steps of solving XOR using a multilayered
    feed-forward network with backpropagation. For this chapter, however, I’d
    instead like to get some help and phone a friend.
  </p>
  <h2 id="machine-learning-with-ml5js">Machine Learning with ml5.js</h2>
  <p>
    That friend is ml5.js. This machine learning library can manage the details
    of complex processes like backpropagation so you and I don’t have to worry
    about them. As I mentioned earlier in the chapter, ml5.js aims to provide a
    friendly entry point for those who are new to machine learning and neural
    networks, while still harnessing the power of Google’s TensorFlow.js behind
    the scenes.
  </p>
  <p>
    To use ml5.js in a sketch, you must import it via a
    <code>&#x3C;script></code> element in your <em>index.html</em> file, much as
    you did with Matter.js and Toxiclibs.js in
    <a href="/physics-libraries#">Chapter 6</a>:
  </p>
  <pre class="codesplit" data-code-language="html">
&#x3C;script src="https://unpkg.com/ml5@latest/dist/ml5.min.js">&#x3C;/script></pre
  >
  <p>
    My goal for the rest of this chapter is to introduce ml5.js by developing a
    system that can recognize mouse gestures. This will prepare you for
    <a href="/neuroevolution#">Chapter 11</a>, where I’ll add a neural network
    “brain” to an autonomous steering agent and tie machine learning back into
    the story of the book. First, however, I’d like to talk more generally
    through the steps of training a multilayered neural network model using
    supervised learning. Outlining these steps will highlight important
    decisions you’ll have to make before developing a learning model, introduce
    the syntax of the ml5.js library, and provide you with the context you’ll
    need before training your own machine learning models.
  </p>
  <h3 id="the-machine-learning-life-cycle">The Machine Learning Life Cycle</h3>
  <p>
    The life cycle of a machine learning model is typically broken into seven
    steps:
  </p>
  <ol>
    <li>
      <strong>Collect the data.</strong> Data forms the foundation of any
      machine learning task. This stage might involve running experiments,
      manually inputting values, sourcing public data, or a myriad of other
      methods (like generating synthetic data).
    </li>
    <li>
      <strong>Prepare the data.</strong> Raw data often isn’t in a format
      suitable for machine learning algorithms. It might also have duplicate or
      missing values, or contain outliers that skew the data. Such
      inconsistencies may need to be manually adjusted. Additionally, as I
      mentioned earlier, neural networks work best with normalized data, which
      has values scaled to fit within a standard range. Another key part of
      preparing data is separating it into distinct sets: training, validation,
      and testing. The training data is used to teach the model (step 4), while
      the validation and testing data (the distinction is subtle—more on this
      later) are set aside and reserved for evaluating the model’s performance
      (step 5).
    </li>
    <li>
      <strong>Choose a model.</strong> Design the architecture of the neural
      network. Different models are more suitable for certain types of data and
      outputs.
    </li>
    <li>
      <strong>Train the model.</strong> Feed the training portion of the data
      through the model and allow the model to adjust the weights of the neural
      network based on its errors. This process is known as
      <strong>optimization</strong>: the model tunes the weights so they result
      in the fewest number of errors.
    </li>
    <li>
      <strong>Evaluate the model.</strong> Remember the testing data that was
      set aside in step 2? Since that data wasn’t used in training, it provides
      a means to evaluate how well the model performs on new, unseen data.
    </li>
    <li>
      <strong>Tune the parameters.</strong> The training process is influenced
      by a set of parameters (often called <strong>hyperparameters</strong>)
      such as the learning rate, which dictates how much the model should adjust
      its weights based on errors in prediction. I called this the
      <code>learningConstant</code> in the perceptron example. By fine-tuning
      these parameters and revisiting steps 4 (training), 3 (model selection),
      and even 2 (data preparation), you can often improve the model’s
      performance.
    </li>
    <li>
      <strong>Deploy the model. </strong>Once the model is trained and its
      performance is evaluated satisfactorily, it’s time to use the model out in
      the real world with new data!
    </li>
  </ol>
  <p>
    These steps are the cornerstone of supervised machine learning. However,
    even though 7 is a truly excellent number, I think I missed one more
    critical step. I’ll call it step 0.
  </p>
  <ol>
    <li value="0">
      <strong>Identify the problem.</strong> This initial step defines the
      problem that needs solving. What is the objective? What are you trying to
      accomplish or predict with your machine learning model?
    </li>
  </ol>
  <p>
    This zeroth step informs all the other steps in the process. After all, how
    are you supposed to collect your data and choose a model without knowing
    what you’re even trying to do? Are you predicting a number? A category? A
    sequence? Is it a binary choice, or are there many options? These sorts of
    questions often boil down to choosing between two types of tasks that the
    majority of machine learning applications fall into: classification and
    regression.
  </p>
  <h3 id="classification-and-regression">Classification and Regression</h3>
  <p>
    <strong>Classification</strong> is a type of machine learning problem that
    involves predicting a <strong>label</strong> (also called a
    <strong>category</strong> or <strong>class</strong>) for a piece of data. If
    this sounds familiar, that’s because it is: the simple perceptron in Example
    10.1 was trained to classify points as above or below a line. To give
    another example, an image classifier might try to guess if a photo is of a
    cat or a dog and assign the corresponding label (see Figure 10.14).
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_15.png"
      alt="Figure 10.14: Labeling images as cats or dogs"
    />
    <figcaption>Figure 10.14: Labeling images as cats or dogs</figcaption>
  </figure>
  <p>
    Classification doesn’t happen by magic. The model must first be shown many
    examples of dogs and cats with the correct labels in order to properly
    configure the weights of all the connections. This is the training part of
    supervised learning.
  </p>
  <p>
    The classic “Hello, world!” demonstration of machine learning and supervised
    learning is a classification problem of the MNIST dataset. Short for
    <em>Modified National Institute of Standards and Technology</em>,
    <strong>MNIST</strong> is a dataset that was collected and processed by Yann
    LeCun (Courant Institute, NYU), Corinna Cortes (Google Labs), and
    Christopher J.C. Burges (Microsoft Research). Widely used for training and
    testing in the field of machine learning, this dataset consists of 70,000
    handwritten digits from 0 to 9; each is a 28<span data-type="equation"
      >\times</span
    >28-pixel grayscale image (see Figure 10.15 for examples). Each image is
    labeled with its corresponding digit.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_16.png"
      alt="Figure 10.15: A selection of handwritten digits 0–9 from the MNIST dataset (courtesy of Suvanjanprasai)"
    />
    <figcaption>
      Figure 10.15: A selection of handwritten digits 0–9 from the MNIST dataset
      (courtesy of Suvanjanprasai)
    </figcaption>
  </figure>
  <p>
    MNIST is a canonical example of a training dataset for image classification:
    the model has a discrete number of categories to choose from (10 to be
    exact—no more, no less). After the model is trained on the 70,000 labeled
    images, the goal is for it to classify new images and assign the appropriate
    label, a digit from 0 to 9.
  </p>
  <p>
    <strong>Regression</strong>, on the other hand, is a machine learning task
    for which the prediction is a continuous value, typically a floating-point
    number. A regression problem can involve multiple outputs, but thinking
    about just one is often simpler to start. For example, consider a machine
    learning model that predicts the daily electricity usage of a house based on
    input factors like the number of occupants, the size of the house, and the
    temperature outside (see Figure 10.16).
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_17.png"
      alt="Figure 10.16: Factors like weather and the size and occupancy of a home can influence its daily electricity usage."
    />
    <figcaption>
      Figure 10.16: Factors like weather and the size and occupancy of a home
      can influence its daily electricity usage.
    </figcaption>
  </figure>
  <p>
    Rather than picking from a discrete set of output options, the goal of the
    neural network is now to guess a number—any number. Will the house use 30.5
    kilowatt-hours of electricity that day? Or 48.7 kWh? Or 100.2 kWh? The
    output prediction could be any value from a continuous range.
  </p>
  <h3 id="network-design">Network Design</h3>
  <p>
    Knowing what problem you’re trying to solve (step 0) also has a significant
    bearing on the design of the neural network—in particular, on its input and
    output layers. I’ll demonstrate with another classic “Hello, world!”
    classification example from the field of data science and machine learning:
    the iris dataset. This dataset, which can be found in the Machine Learning
    Repository at the University of California, Irvine, originated from the work
    of American botanist Edgar Anderson.
  </p>
  <p>
    Anderson collected flower data over many years across multiple regions of
    the United States and Canada. For more on the origins of this famous
    dataset, see “The Iris Data Set: In Search of the Source of
    <em>Virginica</em
    ><a href="https://academic.oup.com/jrssig/article/18/6/26/7038520"
      >” by Antony Unwin and Kim Kleinman</a
    >. After carefully analyzing the data, Anderson built a table to classify
    iris flowers into three distinct species: <em>Iris setosa</em>,
    <em>Iris virginica</em>, and <em>Iris versicolor </em>(see Figure 10.17).
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_18.png"
      alt="Figure 10.17: Three distinct species of iris flowers"
    />
    <figcaption>
      Figure 10.17: Three distinct species of iris flowers
    </figcaption>
  </figure>
  <p>
    Anderson included four numeric attributes for each flower: sepal length,
    sepal width, petal length, and petal width, all measured in centimeters. (He
    also recorded color information, but that data appears to have been lost.)
    Each record is then paired with the appropriate iris categorization:
  </p>
  <table>
    <thead>
      <tr>
        <th>Sepal Length</th>
        <th>Sepal Width</th>
        <th>Petal Length</th>
        <th>Petal Width</th>
        <th>Classification</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>5.1</td>
        <td>3.5</td>
        <td>1.4</td>
        <td>0.2</td>
        <td><em>Iris setosa</em></td>
      </tr>
      <tr>
        <td>4.9</td>
        <td>3.0</td>
        <td>1.4</td>
        <td>0.2</td>
        <td><em>Iris setosa</em></td>
      </tr>
      <tr>
        <td>7.0</td>
        <td>3.2</td>
        <td>4.7</td>
        <td>1.4</td>
        <td><em>Iris versicolor</em></td>
      </tr>
      <tr>
        <td>6.4</td>
        <td>3.2</td>
        <td>4.5</td>
        <td>1.5</td>
        <td><em>Iris versicolor</em></td>
      </tr>
      <tr>
        <td>6.3</td>
        <td>3.3</td>
        <td>6.0</td>
        <td>2.5</td>
        <td><em>Iris virginica</em></td>
      </tr>
      <tr>
        <td>5.8</td>
        <td>2.7</td>
        <td>5.1</td>
        <td>1.9</td>
        <td><em>Iris virginica</em></td>
      </tr>
    </tbody>
  </table>
  <p>
    In this dataset, the first four columns (sepal length, sepal width, petal
    length, petal width) serve as inputs to the neural network. The output is
    the classification provided in the fifth column. Figure 10.18 depicts a
    possible architecture for a neural network that can be trained on this data.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_19.png"
      alt="Figure 10.18: A possible network architecture for iris classification"
    />
    <figcaption>
      Figure 10.18: A possible network architecture for iris classification
    </figcaption>
  </figure>
  <p>
    On the left are the four inputs to the network, corresponding to the first
    four columns of the data table. On the right are three possible outputs,
    each representing one of the iris species labels. In between is the hidden
    layer, which, as mentioned earlier, adds complexity to the network’s
    architecture, necessary for handling nonlinearly separable data. Each node
    in the hidden layer is connected to every node that comes before and after
    it. This is commonly called a <strong>fully connected</strong> or
    <strong>dense </strong>layer.
  </p>
  <p>
    You might also notice the absence of explicit bias nodes in this diagram.
    While biases play an important role in the output of each neuron, they’re
    often left out of visual representations to keep the diagrams clean and
    focused on the primary data flow. (The ml5.js library will ultimately manage
    the biases for me internally.)
  </p>
  <p>
    The neural network’s goal is to “activate” the correct output for the input
    data, just as the perceptron would output a +1 or –1 for its single binary
    classification. In this case, the output values are like signals that help
    the network decide which iris species label to assign. The highest computed
    value activates to signify the network’s best guess about the
    classification.
  </p>
  <p>
    The key takeaway here is that a classification network should have as many
    inputs as there are values for each item in the dataset, and as many outputs
    as there are categories. As for the hidden layer, the design is much less
    set in stone. The hidden layer in Figure 10.18 has five nodes, but this
    number is entirely arbitrary. Neural network architectures can vary greatly,
    and the number of hidden nodes is often determined through trial and error
    or other educated guessing methods (called <em>heuristics</em>). In the
    context of this book, I’ll be relying on ml5.js to automatically configure
    the architecture based on the input and output data.
  </p>
  <p>
    What about the inputs and outputs in a regression scenario, like the
    household electricity consumption example I mentioned earlier? I’ll go ahead
    and make up a dataset for this scenario, with values representing the
    occupants and size of the house, the day’s temperature, and the
    corresponding electricity usage. This is much like a synthetic dataset,
    given that it’s not data collected for a real-world scenario—but whereas
    synthetic data is generated automatically, here I’m manually inputting
    numbers from my own imagination:
  </p>
  <table>
    <tbody>
      <tr>
        <td><strong>Occupants</strong></td>
        <td><strong>Size (m²)</strong></td>
        <td><strong>Temperature Outside (°C)</strong></td>
        <td><strong>Electricity Usage (kWh)</strong></td>
      </tr>
      <tr>
        <td>4</td>
        <td>150</td>
        <td>24</td>
        <td>25.3</td>
      </tr>
      <tr>
        <td>2</td>
        <td>100</td>
        <td>25.5</td>
        <td>16.2</td>
      </tr>
      <tr>
        <td>1</td>
        <td>70</td>
        <td>26.5</td>
        <td>12.1</td>
      </tr>
      <tr>
        <td>4</td>
        <td>120</td>
        <td>23</td>
        <td>22.1</td>
      </tr>
      <tr>
        <td>2</td>
        <td>90</td>
        <td>21.5</td>
        <td>15.2</td>
      </tr>
      <tr>
        <td>5</td>
        <td>180</td>
        <td>20</td>
        <td>24.4</td>
      </tr>
      <tr>
        <td>1</td>
        <td>60</td>
        <td>18.5</td>
        <td>11.7</td>
      </tr>
    </tbody>
  </table>
  <p>
    The neural network for this problem should have three input nodes
    corresponding to the first three columns (occupants, size, temperature).
    Meanwhile, it should have one output node representing the fourth column,
    the network’s guess about the electricity usage. And I’ll arbitrarily say
    the network’s hidden layer should have four nodes rather than five. Figure
    10.19 shows this network architecture.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_20.png"
      alt="Figure 10.19: A possible network architecture for three inputs and one regression output"
    />
    <figcaption>
      Figure 10.19: A possible network architecture for three inputs and one
      regression output
    </figcaption>
  </figure>
  <p>
    Unlike the iris classification network, which is choosing from three labels
    and therefore has three outputs, this network is trying to predict just one
    number, so it has only one output. I’ll note, however, that a single output
    isn’t a requirement of regression. A machine learning model can also perform
    a regression that predicts multiple continuous values, in which case the
    model would have multiple outputs.
  </p>
  <h3 id="ml5js-syntax">ml5.js Syntax</h3>
  <p>
    The ml5.js library is a collection of machine learning models that can be
    accessed using the syntax <code>ml5.</code><code><em>functionName</em></code
    ><code>()</code>. For example, to use a pretrained model that detects hand
    positions, you can use <code>ml5.handpose()</code>. For classifying images,
    you can use <code>ml5.imageClassifier()</code>. While I encourage you to
    explore all that ml5.js has to offer (I’ll reference some of these
    pretrained models in upcoming exercise ideas), for this chapter I’ll focus
    on only one function in ml5.js, <code>ml5.neuralNetwork()</code>, which
    creates an empty neural network for you to train.
  </p>
  <p>
    To use this function, you must first create a JavaScript object that will
    configure the model being created. Here’s where some of the big-picture
    factors I just discussed—is this a classification or a regression task? How
    many inputs and outputs?—come into play. I’ll begin by specifying the task I
    want the model to perform (<code>"regression"</code> or
    <code>"classification"</code>):
  </p>
  <pre class="codesplit" data-code-language="javascript">
let options = { task: "classification" };
let classifier = ml5.neuralNetwork(options);</pre
  >
  <p>
    This, however, gives ml5.js little to go on in terms of designing the
    network architecture. Adding the inputs and outputs will complete the rest
    of the puzzle. The iris flower classification has four inputs and three
    possible output labels. This can be configured as part of the
    <code>options</code> object with a single integer for the number of inputs
    and an array of strings listing the output labels:
  </p>
  <pre class="codesplit" data-code-language="javascript">
let options = {
  inputs: 4,
  outputs: ["iris-setosa", "iris-virginica", "iris-versicolor"],
  task: "classification",
};
let digitClassifier = ml5.neuralNetwork(options);</pre
  >
  <p>
    The electricity regression scenario had three input values (occupants, size,
    temperature) and one output value (usage in kWh). With regression, there are
    no string output labels, so only an integer indicating the number of outputs
    is required:
  </p>
  <pre class="codesplit" data-code-language="javascript">
let options = {
  inputs: 3,
  outputs: 1,
  task: "regression",
};
let energyPredictor = ml5.neuralNetwork(options);</pre
  >
  <p>
    You can set many other properties of the model through the
    <code>options</code> object. For example, you could specify the number of
    hidden layers between the inputs and outputs (there are typically several),
    the number of neurons in each layer, which activation functions to use, and
    more. In most cases, however, you can leave out these extra settings and let
    ml5.js make its best guess on how to design the model based on the task and
    data at hand.
  </p>
  <h2 id="building-a-gesture-classifier">Building a Gesture Classifier</h2>
  <p>
    I’ll now walk through the steps of the machine learning life cycle with an
    example problem well suited for p5.js, building all the code for each step
    along the way using ml5.js. I’ll begin at step 0 by articulating the
    problem. Imagine for a moment that you’re working on an interactive
    application that responds to gestures. Maybe the gestures are ultimately
    meant to be recorded via body tracking, but you want to start with something
    much simpler—a single stroke of the mouse (see Figure 10.20).
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_21.png"
      alt="Figure 10.20: A single mouse gesture as a vector between a start and end point"
    />
    <figcaption>
      Figure 10.20:<em> </em>A single mouse gesture as a vector between a start
      and end point
    </figcaption>
  </figure>
  <p>
    Each gesture could be recorded as a vector extending from the start to the
    end point of a mouse movement. The x- and y-components of the vector will be
    the model’s inputs. The model’s task could be to predict one of four
    possible labels for the gesture: <em>up</em>, <em>down</em>, <em>left</em>,
    or <em>right</em>. With a discrete set of possible outputs, this sounds like
    a classification problem. The four labels will be the model’s outputs.
  </p>
  <p>
    Much like some of the GA demonstrations in
    <a href="/genetic-algorithms#">Chapter 9</a>—and like the simple perceptron
    example earlier in this chapter—the problem I’m selecting here has a known
    solution and could be solved more easily and efficiently without a neural
    network. The direction of a vector can be classified with the
    <code>heading()</code> function and a series of <code>if</code> statements!
    However, by using this seemingly trivial scenario, I hope to explain the
    process of training a machine learning model in an understandable and
    friendly way. Additionally, this example will make it easy to check that the
    code is working as expected. When I’m done, I’ll provide some ideas about
    how to expand the classifier to a scenario that couldn’t use simple
    <code>if</code> statements.
  </p>
  <h3 id="collecting-and-preparing-the-data">
    Collecting and Preparing the Data
  </h3>
  <p>
    With the problem established, I can turn to steps 1 and 2: collecting and
    preparing the data. In the real world, these steps can be tedious,
    especially when the raw data you collect is messy and needs a lot of initial
    processing. You can think of this like having to organize, wash, and chop
    all your ingredients before you can start cooking a meal from scratch.
  </p>
  <p>
    For simplicity, I’d instead like to take the approach of ordering a machine
    learning “meal kit,” with the ingredients (data) already portioned and
    prepared. This way, I’ll get straight to the cooking itself, the process of
    training the model. After all, this is really just an appetizer for what
    will be the ultimate meal in <a href="/neuroevolution#">Chapter 11</a>, when
    I apply neural networks to steering agents.
  </p>
  <p>
    With that in mind, I’ll handcode some example data and manually keep it
    normalized within a range of –1 and +1. I’ll organize the data into an array
    of objects, pairing the x- and y-components of a vector with a string label.
    I’m picking values that I feel clearly point in a specific direction and
    assigning the appropriate label—two examples per label:
  </p>
  <pre class="codesplit" data-code-language="javascript">
let data = [
  { x: 0.99, y: 0.02, label: "right" },
  { x: 0.76, y: -0.1, label: "right" },
  { x: -1.0, y: 0.12, label: "left" },
  { x: -0.9, y: -0.1, label: "left" },
  { x: 0.02, y: 0.98, label: "down" },
  { x: -0.2, y: 0.75, label: "down" },
  { x: 0.01, y: -0.9, label: "up" },
  { x: -0.1, y: -0.8, label: "up" },
];</pre
  >
  <p>Figure 10.21 shows the same data expressed as arrows.</p>
  <figure>
    <img
      src="images/10_nn/10_nn_22.png"
      alt="Figure 10.21: The input data visualized as vectors (arrows)"
    />
    <figcaption>
      Figure 10.21: The input data visualized as vectors (arrows)
    </figcaption>
  </figure>
  <p>
    In a more realistic scenario, I’d probably have a much larger dataset that
    would be loaded in from a separate file, instead of written directly into
    the code. For example, JavaScript Object Notation (JSON) and comma-separated
    values (CSV) are two popular formats for storing and loading data. JSON
    stores data in key-value pairs and follows the same exact format as
    JavaScript object literals. CSV is a file format that stores tabular data
    (like a spreadsheet). You could use numerous other data formats, depending
    on your needs and the programming environment you’re working with.
  </p>
  <p>
    In the real world, the values in that larger dataset would actually come
    from somewhere. Maybe I would collect the data by asking users to perform
    specific gestures and recording their inputs, or by writing an algorithm to
    automatically generate larger amounts of synthetic data that represent the
    idealized versions of the gestures I want the model to recognize. In either
    case, the key would be to collect a diverse set of examples that adequately
    represent the variations in how the gestures might be performed. For now,
    however, let’s see how it goes with just a few servings of data.
  </p>
  <div data-type="exercise">
    <h3 id="exercise-104">Exercise 10.4</h3>
    <p>
      Create a p5.js sketch that collects gesture data from users and saves it
      to a JSON file. You can use <code>mousePressed()</code> and
      <code>mouseReleased()</code> to mark the start and end of each gesture,
      and <code>saveJSON()</code> to download the data into a file.
    </p>
  </div>
  <h3 id="choosing-a-model">Choosing a Model</h3>
  <p>
    I’ve now come to step 3 of the machine learning life cycle, selecting a
    model. This is where I’m going to start letting ml5.js do the heavy lifting
    for me. To create the model with ml5.js, all I need to do is specify the
    task, the inputs, and the outputs:
  </p>
  <pre class="codesplit" data-code-language="javascript">
let options = {
  task: "classification",
  inputs: 2,
  outputs: ["up", "down", "left", "right"],
  debug: true
};
let classifier = ml5.neuralNetwork(options);</pre
  >
  <p>
    That’s it! I’m done! Thanks to ml5.js, I can bypass a host of complexities
    such as the number of layers and neurons per layer to have, the kinds of
    activation functions to use, and how to set up the algorithms for training
    the network. The library will make these decisions for me.
  </p>
  <p>
    Of course, the default ml5.js model architecture may not be perfect for all
    cases. I encourage you to read the ml5.js documentation for additional
    details on how to customize the model. I’ll also point out that ml5.js is
    able to infer the inputs and outputs from the data, so those properties
    aren’t entirely necessary to include here in the
    <code>options</code> object. However, for the sake of clarity (and since
    I’ll need to specify them for later examples), I’m including them here.
  </p>
  <p>
    The <code>debug</code> property, when set to <code>true</code>, turns on a
    visual interface for the training process. It’s a helpful tool for spotting
    potential issues during training and for getting a better understanding of
    what’s happening behind the scenes. You’ll see what this interface looks
    like later in the chapter.
  </p>
  <h3 id="training-the-model">Training the Model</h3>
  <p>
    Now that I have the data in a <code>data</code> variable and a neural
    network initialized in the <code>classifier</code> variable, I’m ready to
    train the model. That process starts with adding the data to the model. And
    for that, it turns out I’m not quite done with preparing the data.
  </p>
  <p>
    Right now, my data is neatly organized in an array of objects, each
    containing the x- and y-components of a vector and a corresponding string
    label. This is a typical format for training data, but it isn’t directly
    consumable by ml5.js. (Sure, I could have initially organized the data into
    a format that ml5.js recognizes, but I’m including this extra step because
    it will likely be necessary when you’re using a dataset that has been
    collected or sourced elsewhere.) To add the data to the model, I need to
    separate the inputs from the outputs so that the model understands which are
    which.
  </p>
  <p>
    The ml5.js library offers a fair amount of flexibility in the kinds of
    formats it will accept, but I’ll choose to use arrays—one for the
    <code>inputs</code> and one for the <code>outputs</code>. I can use a loop
    to reorganize each data item and add it to the model:
  </p>
  <pre class="codesplit" data-code-language="javascript">
for (let item of data) {
  // An array of two numbers for the inputs
  let inputs = [item.x, item.y];
  // A single string label for the output
  let outputs = [item.label];
  //{!1} Add the training data to the classifier.
  classifier.addData(inputs, outputs);
}</pre
  >
  <p>
    What I’ve done here is set the <strong>shape</strong> of the data. In
    machine learning, this term describes the data’s dimensions and structure.
    It indicates how the data is organized in terms of rows, columns, and
    potentially even deeper, into additional dimensions. Understanding the shape
    of your data is crucial because it determines the way the model should be
    structured.
  </p>
  <p>
    Here, the input data’s shape is a 1D array containing two numbers
    (representing <em>x</em> and <em>y</em>). The output data, similarly, is a
    1D array containing just a single string label. Every piece of data going in
    and out of the network will follow this pattern. While this is a small and
    simple example, it nicely mirrors many real-world scenarios in which the
    inputs are numerically represented in an array, and the outputs are string
    labels.
  </p>
  <p>
    After passing the data into the <code>classifier</code>, ml5.js provides a
    helper function to normalize it. As I’ve mentioned, normalizing data
    (adjusting the scale to a standard range) is a critical step in the machine
    learning process:
  </p>
  <pre class="codesplit" data-code-language="javascript">
// Normalize the data.
classifier.normalizeData();</pre
  >
  <p>
    In this case, the handcoded data was limited to a range of –1 to +1 from the
    get-go, so calling <code>normalizeData()</code> here is likely redundant.
    Still, this function call is important to demonstrate. Normalizing your data
    ahead of time as part of the preprocessing step will absolutely work, but
    the auto-normalization feature of ml5.js is a big help!
  </p>
  <p>
    Now for the heart of the machine learning process: actually training the
    model. Here’s the code:
  </p>
  <pre class="codesplit" data-code-language="javascript">
// The <code>train()</code> method initiates the training process.
classifier.train(finishedTraining);

// A callback function for when the training is complete
function finishedTraining() {
  console.log("Training complete!");
}</pre
  >
  <p>
    Yes, that’s it! After all, the hard work has already been completed. The
    data was collected, prepared, and fed into the model. All that remains is to
    call the <code>train()</code> method, sit back, and let ml5.js do its thing.
  </p>
  <p>
    In truth, it isn’t <em>quite</em> that simple. If I were to run the code as
    written and then test the model, the results would probably be inadequate.
    Here’s where another key term in machine learning comes into play:
    <strong>epochs</strong>. The <code>train()</code> method tells the neural
    network to start the learning process. But how long should it train for? You
    can think of an epoch as one round of practice, one cycle of using the
    entire training dataset to update the weights of the neural network.
    Generally speaking, the more epochs you go through, the better the network
    will perform, but at a certain point you’ll have diminishing returns. The
    number of epochs can be set by passing in an <code>options</code> object
    into <code>train()</code>:
  </p>
  <pre class="codesplit" data-code-language="javascript">
//{!1} Set the number of epochs for training.
let options = { epochs: 25 };
classifier.train(options, finishedTraining);</pre
  >
  <p>
    The number of epochs is an example of a hyperparameter, a global setting for
    the training process. You can set others through the
    <code>options</code> object (the learning rate, for example), but I’m going
    to stick with the defaults. You can read more about customization options in
    the ml5.js documentation.
  </p>
  <p>
    The second argument to <code>train()</code> is optional, but it’s good to
    include one. It specifies a callback function that runs when the training
    process is complete—in this case, <code>finshedTraining()</code>. (See the
    “Callbacks” box for more on callback functions.) This is useful for knowing
    when you can proceed to the next steps in your code. Another optional
    callback, which I usually name <code>whileTraining()</code>, is triggered
    after each epoch. However, for my purposes, knowing when the training is
    done is plenty!
  </p>
  <div data-type="note">
    <h3 id="callbacks">Callbacks</h3>
    <p>
      A <strong>callback function</strong> in JavaScript is a function you don’t
      actually call yourself. Instead, you provide it as an argument to another
      function, intending for it to be <em>called back</em> automatically at a
      later time (typically associated with an event, like a mouse click).
      You’ve seen this before when working with Matter.js in
      <a href="/physics-libraries#">Chapter 6</a>, where you specified a
      function to call whenever a collision was detected.
    </p>
    <p>
      Callbacks are needed for <strong>asynchronous</strong> operations, when
      you want your code to continue along with animating or doing other things
      while waiting for another task (like training a machine learning model) to
      finish. A classic example of this in p5.js is loading data into a sketch
      with <code>loadJSON()</code>.
    </p>
    <p>
      JavaScript also provides a more recent approach for handling asynchronous
      operations known as <strong>promises</strong>. With promises, you can use
      keywords like <code>async</code> and <code>await</code> to make your
      asynchronous code look more like traditional synchronous code. While
      ml5.js also supports this style, I’ll stick to using callbacks to stay
      aligned with p5.js style.
    </p>
  </div>
  <h3 id="evaluating-the-model">Evaluating the Model</h3>
  <p>
    If <code>debug</code> is set to <code>true</code> in the initial call to
    <code>ml5.neuralNetwork()</code>, a visual interface should appear after
    <code>train()</code> is called, covering most of the p5.js page and canvas
    (see Figure 10.22). This interface, called the <em>Visor</em>, represents
    the evaluation step.
  </p>
  <figure>
    <img
      src="images/10_nn/10_nn_23.png"
      alt="Figure 10.22: The Visor, with a graph of the loss function and model details"
    />
    <figcaption>
      Figure 10.22: The Visor, with a graph of the loss function and model
      details
    </figcaption>
  </figure>
  <p>
    The Visor comes from TensorFlow.js (which underlies ml5.js) and includes a
    graph that provides real-time feedback on the progress of the training. This
    graph plots the loss of the model on the y-axis against the number of epochs
    along the x-axis. <strong>Loss</strong> is a measure of how far off the
    model’s predictions are from the correct outputs provided by the training
    data. It quantifies the model’s total error. When training begins, it’s
    common for the loss to be high because the model has yet to learn anything.
    Ideally, as the model trains through more epochs, it should get better at
    its predictions, and the loss should decrease. If the graph goes down as the
    epochs increase, this is a good sign!
  </p>
  <p>
    Running the training for the 200 epochs depicted in Figure 10.21 might
    strike you as a bit excessive. In a real-world scenario with more extensive
    data, I would probably use fewer epochs, like the 25 I specified in the
    original code snippet. However, because the dataset here is so tiny, the
    higher number of epochs helps the model get enough practice with the data.
    Remember, this is a toy example, aiming to make the concepts clear rather
    than to produce a sophisticated machine learning model.
  </p>
  <p>
    Below the graph, the Visor shows a Model Summary table with details on the
    lower-level TensorFlow.js model architecture created behind the scenes. The
    summary includes layer names, neuron counts per layer (in the Output Shape
    column), and a parameters count, which is the total number of weights, one
    for each connection between two neurons. In this case, dense_Dense1 is the
    hidden layer with 16 neurons (a number chosen by ml5.js), and dense_Dense2
    is the output layer with 4 neurons, one for each classification category.
    (TensorFlow.js doesn’t think of the inputs as a distinct layer; rather,
    they’re merely the starting point of the data flow.) The <em>batch</em> in
    the Output Shape column doesn’t refer to a specific number but indicates
    that the model can process a variable amount of training data (a batch) for
    any single cycle of model training.
  </p>
  <p>
    Before moving on from the evaluation stage, I have a loose end to tie up.
    When I first outlined the steps of the machine learning life cycle, I
    mentioned that preparing the data typically involves splitting the dataset
    into three parts to help with the evaluation process:
  </p>
  <ul>
    <li>
      <strong>Training:</strong> The primary dataset used to train the model
    </li>
    <li>
      <strong>Validation:</strong> A subset of the data used to check the model
      during training, typically at the end of each epoch
    </li>
    <li>
      <strong>Testing:</strong> Additional untouched data never considered
      during the training process, for determining the model’s final performance
      after the training is completed
    </li>
  </ul>
  <p>
    You may have noticed that I never did this. For simplicity, I’ve instead
    used the entire dataset for training. After all, my dataset has only eight
    records; it’s much too small to divide three sets! With a large dataset,
    this three-way split would be more appropriate.
  </p>
  <p>
    Using such a small dataset risks the model <strong>overfitting</strong> the
    data, however: the model becomes so tuned to the specific peculiarities of
    the training data that it’s much less effective when working with new,
    unseen data. The main reason to use a validation set is to monitor the model
    during the training process. As training progresses, if the model’s accuracy
    improves on the training data but deteriorates on the validation data, it’s
    a strong indicator that overfitting might be occurring. (The testing set is
    reserved strictly for the final evaluation, one more chance after training
    is complete to gauge the model’s performance.)
  </p>
  <p>
    For more realistic scenarios, ml5.js provides a way to split up the data, as
    well as automatic features for employing validation data. If you’re inclined
    to go further,
    <a href="http://ml5js.org/"
      >you can explore the full set of neural network examples on the ml5.js
      website</a
    >.
  </p>
  <h3 id="tuning-the-parameters">Tuning the Parameters</h3>
  <p>
    After the evaluation step, there’s typically an iterative process of
    adjusting hyperparameters and going through training again to achieve the
    best performance from the model. While ml5.js offers capabilities for
    parameter tuning (which you can learn about in the library’s reference), it
    isn’t really geared toward making low-level, fine-grained adjustments to a
    model. Using TensorFlow.js directly might be your best bet if you want to
    explore this step in more detail, since it offers a broader suite of tools
    and allows for lower-level control over the training process.
  </p>
  <p>
    In this case, tuning the parameters isn’t strictly necessary. The graph in
    the Visor shows a loss all the way down at 0.1, which is plenty accurate for
    my purposes. I’m happy to move on.
  </p>
  <h3 id="deploying-the-model">Deploying the Model</h3>
  <p>
    It’s finally time to deploy the model and see the payoff of all that hard
    work. This typically involves integrating the model into a separate
    application to make predictions or decisions based on new, previously unseen
    data. For this, ml5.js offers the convenience of a
    <code>save()</code> function to download the trained model to a file from
    one sketch and a <code>load()</code> function to load it for use in a
    completely different sketch. This saves you from having to retrain the model
    from scratch every single time you need it.
  </p>
  <p>
    While a model would typically be deployed to a different sketch from the one
    where it was trained, I’m going to deploy the model in the same sketch for
    the sake of simplicity. In fact, once the training process is complete, the
    resulting model is, in essence, already deployed in the current sketch. It’s
    saved in the <code>classifier</code> variable and can be used to make
    predictions by passing the model new data through the
    <code>classify()</code> method. The shape of the data sent to
    <code>classify()</code> should match that of the input data used in
    training—in this case, two floating-point numbers, representing the x- and
    y-components of a direction vector:
  </p>
  <pre class="codesplit" data-code-language="javascript">
// Manually create a vector.
let direction = createVector(1, 0);
// Convert the x- and y-components into an input array.
let inputs = [direction.x, direction.y];
// Ask the model to classify the inputs.
classifier.classify(inputs, gotResults);</pre
  >
  <p>
    The second argument to <code>classify()</code> is another callback function
    for accessing the results:
  </p>
  <pre class="codesplit" data-code-language="javascript">
function gotResults(results) {
  console.log(results);
}</pre
  >
  <p>
    The model’s prediction arrives in the argument to the callback, which I’m
    calling <code>results</code> in the code. Inside, you’ll find an array of
    the possible labels, sorted by <strong>confidence</strong>, a probability
    value that the model assigns to each label. These probabilities represent
    how sure the model is of that particular prediction. They range from 0 to 1,
    with values closer to 1 indicating higher confidence and values near 0
    suggesting lower confidence:
  </p>
  <pre class="codesplit" data-code-language="json">
[
  {
    "label": "right",
    "confidence": 0.9669702649116516
  },
  {
    "label": "up",
    "confidence": 0.01878807507455349
  },
  {
    "label": "down",
    "confidence": 0.013948931358754635
  },
  {
    "label": "left",
    "confidence": 0.00029277068097144365
  }
]</pre
  >
  <p>
    In this example output, the model is highly confident (approximately 96.7
    percent) that the correct label is <code>"right"</code>, while it has
    minimal confidence (0.03 percent) in the <code>"left"</code> label. The
    confidence values are normalized and add up to 100 percent.
  </p>
  <p>
    All that remains now is to fill out the sketch with code so the model can
    receive live input from the mouse. The first step is to signal the
    completion of the training process so the user knows the model is ready.
    I’ll include a global <code>status</code> variable to track the training
    process and ultimately display the predicted label on the canvas. The
    variable is initialized to <code>"training"</code> but updated to
    <code>"ready"</code> through the <code>finishedTraining()</code> callback:
  </p>
  <pre class="codesplit" data-code-language="javascript">
// When the sketch starts, it will show a status of <code>training</code>.
let status = "training";

function draw() {
  background(255);
  textAlign(CENTER, CENTER);
  textSize(64);
  text(status, width / 2, height / 2);
}

// This is the callback for when training is complete, and the message changes to <code>ready</code>.
function finishedTraining() {
  status = "ready";
}</pre
  >
  <p>
    Finally, I’ll use p5.js’s mouse functions to build a vector while the mouse
    is being dragged and call <code>classifier.classify()</code> on that vector
    when the mouse is clicked.
  </p>
  <div data-type="example">
    <h3 id="example-102-gesture-classifier">
      Example 10.2: Gesture Classifier
    </h3>
    <figure>
      <div
        data-type="embed"
        data-p5-editor="https://editor.p5js.org/natureofcode/sketches/SbfSv_GhM"
        data-example-path="examples/10_nn/10_2_gesture_classifier"
      >
        <img src="examples/10_nn/10_2_gesture_classifier/screenshot.png" />
      </div>
      <figcaption></figcaption>
    </figure>
  </div>
  <pre class="codesplit" data-code-language="javascript">
// Store the start of a gesture when the mouse is pressed.
function mousePressed() {
  start = createVector(mouseX, mouseY);
}

// Update the end of a gesture as the mouse is dragged.
function mouseDragged() {
  end = createVector(mouseX, mouseY);
}

// The gesture is complete when the mouse is released.
function mouseReleased() {
  // Calculate and normalize a direction vector.
  let dir = p5.Vector.sub(end, start);
  dir.normalize();
  // Convert to an input array and classify.
  let inputs = [dir.x, dir.y];
  classifier.classify(inputs, gotResults);
}

// Store the resulting label in the <code>status</code> variable for showing in the canvas.
function gotResults(error, results) {
  status = results[0].label;
}</pre
  >
  <p>
    Since the <code>results</code> array is sorted by confidence, if I just want
    to use a single label as the prediction, I can access the first element of
    the array with <code>results[0].label</code>, as in the
    <code>gotResults()</code> function in Example 10.2. This label is passed to
    the <code>status</code> variable to be displayed on the canvas.
  </p>
  <div data-type="exercise">
    <h3 id="exercise-105">Exercise 10.5</h3>
    <p>
      Divide Example 10.2 into three sketches: one for collecting data, one for
      training, and one for deployment. Use the
      <code>ml5.neuralNetwork</code> functions <code>save()</code> and
      <code>load()</code> for saving and loading the model to and from a file,
      respectively.
    </p>
  </div>
  <div data-type="exercise">
    <h3 id="exercise-106">Exercise 10.6</h3>
    <p>
      Expand the gesture-recognition model to classify a sequence of vectors,
      capturing more accurately the path of a longer mouse movement. Remember,
      your input data must have a consistent shape, so you’ll have to decide how
      many vectors to use to represent a gesture and store no more and no less
      for each data point. While this approach can work, other machine learning
      models (such as recurrent neural networks) are specifically designed to
      handle sequential data and might offer more flexibility and potential
      accuracy.
    </p>
  </div>
  <div data-type="exercise">
    <h3 id="exercise-107">Exercise 10.7</h3>
    <p>
      One of the pretrained models in ml5.js is called <em>Handpose</em>. The
      input of the model is an image, and the prediction is a list of 21 key
      points—x- and y-positions, also known as <em>landmarks</em>—that describe
      a hand.
    </p>
    <figure>
      <img src="images/10_nn/10_nn_24.png" alt="" />
      <figcaption></figcaption>
    </figure>
    <p>
      Can you use the outputs of the <code>ml5.handpose()</code> model as the
      inputs to an <code>ml5.neuralNetwork()</code> and classify various hand
      gestures (like a thumbs-up or thumbs-down)? For hints, you can watch my
      <a href="https://thecodingtrain.com/pose-classifier"
        >video tutorial that walks you through this process for body poses in
        the machine learning track on the Coding Train website</a
      >.
    </p>
  </div>
  <div data-type="project">
    <h3 id="the-ecosystem-project-11">The Ecosystem Project</h3>
    <p>
      Incorporate machine learning into your ecosystem to enhance the behavior
      of creatures. How could classification or regression be applied?
    </p>
    <ul>
      <li>
        Can you classify the creatures of your ecosystem into multiple
        categories? What if you use an initial population as a training dataset,
        and as new creatures are born, the system classifies them according to
        their features? What are the inputs and outputs for your system?
      </li>
      <li>
        Can you use a regression to predict the life span of a creature based on
        its properties? Think about how size and speed affected the life span of
        the bloops from <a href="/genetic-algorithms#">Chapter 9</a>. Could you
        analyze how well the regression model’s predictions align with the
        actual outcomes?
      </li>
    </ul>
    <figure>
      <img src="images/10_nn/10_nn_25.png" alt="" />
      <figcaption></figcaption>
    </figure>
  </div>
  <p></p>
</section>