noc-book-2/content/10_nn.html
2024-02-10 21:28:21 +00:00

949 lines
No EOL
98 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<section data-type="chapter">
<h1 id="chapter-10-neural-networks">Chapter 10. Neural Networks</h1>
<div class="chapter-opening-quote">
<blockquote data-type="epigraph">
<p>The human brain has 100 billion neurons,</p>
<p>each neuron connected to 10 thousand</p>
<p>other neurons. Sitting on your shoulders</p>
<p>is the most complicated object</p>
<p>in the known universe.</p>
<div class="chapter-opening-quote-source">
<p>—Michio Kaku</p>
</div>
</blockquote>
</div>
<div class="chapter-opening-figure">
<figure>
<img src="images/10_nn/10_nn_1.jpg" alt="">
<figcaption></figcaption>
</figure>
<h3 id="khipu-on-display-at-the-machu-picchu-museum-cusco-peru-photo-by-pi3124">Khipu on display at the Machu Picchu Museum, Cusco, Peru (photo by Pi3.124)</h3>
<p>The <em>khipu</em> (or <em>quipu</em>) is an ancient Incan device used for recordkeeping and communication. It comprised a complex system of knotted cords to encode and transmit information. Each colored string and knot type and pattern represented specific data, such as census records or calendrical information. Interpreters, known as <em>quipucamayocs</em>, acted as a kind of accountant and decoded the stringed narrative into understandable information.</p>
</div>
<p>I began with inanimate objects living in a world of forces, and I gave them desires, autonomy, and the ability to take action according to a system of rules. Next, I allowed those objects, now called <em>creatures</em>, to live in a population and evolve over time. Now Id like to ask, What is each creatures decision-making process? How can it adjust its choices by learning over time? Can a computational entity process its environment and generate a decision?</p>
<p>To answer these questions, Ill once again look to nature for inspiration—specifically, the human brain. A brain can be described as a biological <strong>neural network</strong>, an interconnected web of neurons transmitting elaborate patterns of electrical signals. Within each neuron, dendrites receive input signals, and based on those inputs, the neuron fires an output signal via an axon (see Figure 10.1). Or something like that. How the human brain actually works is an elaborate and complex mystery, one that Im certainly not going to attempt to unravel in rigorous detail in this chapter.</p>
<figure>
<img src="images/10_nn/10_nn_2.png" alt="Figure 10.1: A neuron with dendrites and an axon connected to another neuron">
<figcaption>Figure 10.1: A neuron with dendrites and an axon connected to another neuron</figcaption>
</figure>
<p>Fortunately, as youve seen throughout this book, developing engaging animated systems with code doesnt require scientific rigor or accuracy. Designing a smart rocket isnt rocket science, and neither is designing an artificial neural network brain science. Its enough to simply be inspired by the <em>idea</em> of brain function.</p>
<p>In this chapter, Ill begin with a conceptual overview of the properties and features of neural networks and build the simplest possible example of one, a network that consists of a single neuron. Ill then introduce you to more complex neural networks by using the ml5.js library. This will serve as a foundation for <a href="/neuroevolution#">Chapter 11</a>, the grand finale of this book, where Ill combine GAs with neural networks for physics simulation.</p>
<h2 id="introducing-artificial-neural-networks">Introducing Artificial Neural Networks</h2>
<p>Computer scientists have long been inspired by the human brain. In 1943, Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, developed the first conceptual model of an artificial neural network. In their paper “A Logical Calculus of the Ideas Immanent in Nervous Activity,” they describe a <strong>neuron </strong>as a single computational cell living in a network of cells that receives inputs, processes those inputs, and generates an output.</p>
<p>Their work, and the work of many scientists and researchers who followed, wasnt meant to accurately describe how the biological brain works. Rather, an <em>artificial</em> neural network (hereafter referred to as just a <em>neural network</em>) was intended as a computational model based on the brain, designed to solve certain kinds of problems that were traditionally difficult for computers.</p>
<p>Some problems are incredibly simple for a computer to solve but difficult for humans like you and me. Finding the square root of 964,324 is an example. A quick line of code produces the value 982, a number my computer can compute in less than a millisecond, but if you asked me to calculate that number myself, youd be in for quite a wait. On the other hand, certain problems are incredibly simple for you or me to solve, but not so easy for a computer. Show any toddler a picture of a kitten or puppy, and theyll quickly be able to tell you which one is which. Listen to a conversation in a noisy café and focus on just one persons voice, and you can effortlessly comprehend their words. But need a machine to perform one of these tasks? Scientists have spent entire careers researching and implementing complex solutions, and neural networks are one of them.</p>
<p>Here are some of the easy-for-a-human, difficult-for-a-machine applications of neural networks in software today:</p>
<ul>
<li><strong>Pattern recognition:</strong> Neural networks are well suited to problems when the aim is to detect, interpret, and classify features or patterns within a dataset. This includes everything from identifying objects (like faces) in images, to optical character recognition, to more complex tasks like gesture recognition.</li>
<li><strong>Time-series prediction and anomaly detection: </strong>Neural networks are utilized both in forecasting, such as predicting stock market trends or weather patterns, and in recognizing anomalies, which can be applied to areas like cyberattack detection and fraud prevention.</li>
<li><strong>Natural language processing (NLP):</strong> One of the biggest developments in recent years has been the use of neural networks for processing and understanding human language. Theyre used in various tasks including machine translation, sentiment analysis, and text summarization, and are the underlying technology behind many digital assistants and chatbots.</li>
<li><strong>Signal processing and soft sensors:</strong> Neural networks play a crucial role in devices like cochlear implants and hearing aids by filtering noise and amplifying essential sounds. Theyre also involved in <em>soft sensors</em>, software systems that process data from multiple sources to give a comprehensive analysis of the environment.</li>
<li><strong>Control and adaptive decision-making systems: </strong>These applications range from autonomous vehicles like self-driving cars and drones to adaptive decision-making used in game playing, pricing models, and recommendation systems on media platforms.</li>
<li><strong>Generative models:</strong> The rise of novel neural network architectures has made it possible to generate new content. These systems can synthesize images, enhance image resolution, transfer style between images, and even generate music and video.</li>
</ul>
<p>Covering the full gamut of applications for neural networks would merit an entire book (or series of books), and by the time that book was printed, it would probably be out of date. Hopefully, this list gives you an overall sense of the features and possibilities.</p>
<h3 id="how-neural-networks-work">How Neural Networks Work</h3>
<p>In some ways, neural networks are quite different from other computer programs. The computational systems Ive been writing so far in this book are <strong>procedural</strong>: a program starts at the first line of code, executes it, and goes on to the next, following instructions in a linear fashion. By contrast, a true neural network doesnt follow a linear path. Instead, information is processed collectively, in parallel, throughout a network of nodes, with each node representing a neuron. In this sense, a neural network is considered a <strong>connectionist </strong>system.</p>
<p>In other ways, neural networks arent so different from some of the programs youve seen. A neural network exhibits all the hallmarks of a complex system, much like a cellular automaton or a flock of boids. Remember how each individual boid was simple to understand, yet by following only three rules—separation, alignment, cohesion—it contributed to complex behaviors? Each individual element in a neural network is equally simple to understand. It reads an input (a number), processes it, and generates an output (another number). Thats all there is to it, and yet a network of many neurons can exhibit incredibly rich and intelligent behaviors, echoing the complex dynamics seen in a flock of boids.</p>
<div class="half-width-right">
<figure>
<img src="images/10_nn/10_nn_3.png" alt="Figure 10.2: A neural network is a system of neurons and connections.">
<figcaption>Figure 10.2: A neural network is a system of neurons and connections.</figcaption>
</figure>
</div>
<p>In fact, a neural network isnt just a complex system, but a complex <em>adaptive</em> system, meaning it can change its internal structure based on the information flowing through it. In other words, it has the ability to learn. Typically, this is achieved by adjusting <strong>weights</strong>. In Figure 10.2, each arrow represents a connection between two neurons and indicates the pathway for the flow of information. Each connection has a weight, a number that controls the signal between the two neurons. If the network generates a <em>good</em> output (which Ill define later), theres no need to adjust the weights. However, if the network generates a <em>poor</em> output—an error, so to speak—then the system adapts, altering the weights with the hope of improving subsequent results.</p>
<p>Neural networks may use a variety of strategies for learning, and Ill focus on one of them in this chapter:</p>
<ul>
<li><strong>Supervised learning:</strong> Essentially, this strategy involves a teacher thats smarter than the network itself. Take the case of facial recognition. The teacher shows the network a bunch of faces, and the teacher already knows the name associated with each face. The network makes its guesses; then the teacher provides the network with the actual names. The network can compare its answers to the known correct ones and make adjustments according to its errors. The neural networks in this chapter follow this model.</li>
<li><strong>Unsupervised learning:</strong> This technique is required when you dont have an example dataset with known answers. Instead, the network works on its own to uncover hidden patterns in the data. An application of this is clustering: a set of elements is divided into groups according to an unknown pattern. I wont be showing any instances of unsupervised learning, as the strategy is less relevant to the books examples.</li>
<li><strong>R</strong><strong>einforcement learning:</strong> This strategy is built on observation: a learning agent makes decisions and looks to its environment for the results. Its rewarded for good decisions and penalized for bad decisions, such that it learns to make better decisions over time. Ill discuss this strategy in more detail in <a href="/neuroevolution#">Chapter 11</a>.</li>
</ul>
<p>The ability of a neural network to learn, to make adjustments to its structure over time, is what makes it so useful in the field of <strong>machine learning</strong>. This term can be traced back to the 1959 paper “Some Studies in Machine Learning Using the Game of Checkers,” in which computer scientist Arthur Lee Samuel outlines a “self-learning” program for playing checkers. The concept of an algorithm enabling a computer to learn without explicit programming is the foundation of machine learning.</p>
<p>Think about what youve been doing throughout this book: coding! In traditional programming, a computer program takes inputs and, based on the rules youve provided, produces outputs. Machine learning, however, turns this approach upside down. Instead of you writing the rules, the system is given example inputs and outputs, and generates the rules itself! Many algorithms can be used to implement machine learning, and a neural network is just one of them.</p>
<p>Machine learning is part of the broad, sweeping field of <strong>artificial intelligence (AI)</strong>, although the terms are sometimes used interchangeably. In their thoughtful and friendly primer <em>A Peoples Guide to AI</em>, Mimi Onuoha and Diana Nucera (aka Mother Cyborg) define AI as “the theory and development of computer systems able to perform tasks that normally require human intelligence.” Machine learning algorithms are one approach to these tasks, but not all AI systems feature a self-learning component.</p>
<h3 id="machine-learning-libraries">Machine Learning Libraries</h3>
<p>Today, leveraging machine learning in creative coding and interactive media isnt only feasible but increasingly common, thanks to third-party libraries that handle a lot of the neural network implementation details under the hood. While the vast majority of machine learning development and research is done in Python, the world of web development has seen the emergence of powerful JavaScript-based tools. Two libraries of note are TensorFlow.js and ml5.js.</p>
<p>TensorFlow.js<strong> </strong>is an open source library that lets you define, train, and run neural networks directly in the browser using JavaScript, without the need to install or configure complex environments. Its part of the TensorFlow ecosystem, which is maintained and developed by Google. TensorFlow.js is a powerful tool, but its low-level operations and highly technical API can be intimidating to beginners. Enter ml5.js, a library built on top of TensorFlow.js and designed specifically for use with p5.js. Its goal is to be beginner friendly and make machine learning approachable for a broad audience of artists, creative coders, and students. Ill demonstrate how to use ml5.js in <a href="#machine-learning-with-ml5js">“Machine Learning with ml5.js”</a>.</p>
<p>A benefit of libraries like TensorFlow.js and ml5.js is that you can use them to run pretrained models. A machine learning <strong>model</strong> is a specific setup of neurons and connections, and a <strong>pretrained</strong> model is one that has already been prepared for a particular task. For example, popular pretrained models are used for classifying images, identifying body poses, recognizing facial landmarks or hand positions, and even analyzing the sentiment expressed in a text. You can use such a model as is or treat it as a starting point for additional learning (commonly referred to as <strong>transfer learning</strong>).</p>
<p>Before I get to exploring the ml5.js library, however, Id like to try my hand at building the simplest of all neural networks from scratch, using only p5.js, to illustrate how the concepts of neural networks and machine learning are implemented in code.</p>
<h2 id="the-perceptron">The Perceptron</h2>
<p>A <strong>perceptron</strong> is the simplest neural network possible: a computational model of a single neuron. Invented in 1957 by Frank Rosenblatt at the Cornell Aeronautical Laboratory, a perceptron consists of one or more inputs, a processor, and a single output, as shown in Figure 10.3.</p>
<figure>
<img src="images/10_nn/10_nn_4.png" alt="Figure 10.3: A simple perceptron with two inputs and one output">
<figcaption>Figure 10.3: A simple perceptron with two inputs and one output</figcaption>
</figure>
<p>A perceptron follows the <strong>feed-forward</strong> model: data passes (feeds) through the network in one direction. The inputs are sent into the neuron, are processed, and result in an output. This means the one-neuron network diagrammed in Figure 10.3 reads from left to right (forward): inputs come in, and output goes out.</p>
<p>Say I have a perceptron with two inputs, the values 12 and 4. In machine learning, its customary to denote each input with an <span data-type="equation">x</span>, so Ill call these inputs <span data-type="equation">x_0</span> and <span data-type="equation">x_1</span>:</p>
<table>
<thead>
<tr>
<th style="width:100px">Phrase</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><span data-type="equation">x_0</span></td>
<td>12</td>
</tr>
<tr>
<td><span data-type="equation">x_1</span></td>
<td>4</td>
</tr>
</tbody>
</table>
<h3 id="perceptron-steps">Perceptron Steps</h3>
<p>To get from these inputs to an output, the perceptron follows a series of steps.</p>
<h4 id="step-1-weight-the-inputs">Step 1: Weight the Inputs</h4>
<p>Each input sent into the neuron must first be weighted, meaning its multiplied by a value, often a number from 1 to +1. When creating a perceptron, the inputs are typically assigned random weights. Ill call my weights <span data-type="equation">w_0</span> and <span data-type="equation">w_1</span>:</p>
<table>
<thead>
<tr>
<th style="width:100px">Phrase</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><span data-type="equation">w_0</span></td>
<td>0.5</td>
</tr>
<tr>
<td><span data-type="equation">w_1</span></td>
<td>1</td>
</tr>
</tbody>
</table>
<p>Each input needs to be multiplied by its corresponding weight:</p>
<table>
<thead>
<tr>
<th style="width:100px">Phrase</th>
<th style="width:100px">Phrase</th>
<th>Input <span data-type="equation">\boldsymbol{\times}</span> Weight</th>
</tr>
</thead>
<tbody>
<tr>
<td>12</td>
<td>0.5</td>
<td>6</td>
</tr>
<tr>
<td>4</td>
<td>1</td>
<td>4</td>
</tr>
</tbody>
</table>
<h4 id="step-2-sum-the-inputs">Step 2: Sum the Inputs</h4>
<p>The weighted inputs are then added together:</p>
<div data-type="equation">6 + -4 = 2</div>
<h4 id="step-3-generate-the-output">Step 3: Generate the Output</h4>
<p>The output of a perceptron is produced by passing the sum through an <strong>activation function</strong> that reduces the output to one of two possible values. Think of this binary output as an LED thats only <em>off</em> or <em>on</em>, or as a neuron in an actual brain that either fires or doesnt fire. The activation function determines whether the perceptron should “fire.”</p>
<p>Activation functions can get a little bit hairy. If you start reading about them in an AI textbook, you may soon find yourself reaching in turn for a calculus textbook. However, your new friend the simple perceptron provides an easier option that still demonstrates the concept. Ill make the activation function the sign of the sum. If the sum is a positive number, the output is 1; if its negative, the output is 1:</p>
<div data-type="equation">\text{sign}(2) = +1</div>
<h3 id="putting-it-all-together-1">Putting It All Together</h3>
<p>Putting the preceding three parts together, here are the steps of the <strong>perceptron algorithm</strong>:</p>
<ol>
<li>For every input, multiply that input by its weight.</li>
<li>Sum all the weighted inputs.</li>
<li>Compute the output of the perceptron by passing that sum through an activation function (the sign of the sum).</li>
</ol>
<p>I can start writing this algorithm in code by using two arrays of values, one for the inputs and one for the weights:</p>
<pre class="codesplit" data-code-language="javascript">let inputs = [12, 4];
let weights = [0.5, -1];</pre>
<p>The “for every input” in step 1 implies a loop that multiplies each input by its corresponding weight. To obtain the sum, the results can be added up in that same loop:</p>
<pre class="codesplit" data-code-language="javascript">// Steps 1 and 2: Add up all the weighted inputs.
let sum = 0;
for (let i = 0; i &#x3C; inputs.length; i++) {
sum += inputs[i] * weights[i];
}</pre>
<p>With the sum, I can then compute the output:</p>
<pre class="codesplit" data-code-language="javascript">// Step 3: Pass the sum through an activation function.
let output = activate(sum);
// The activation function
function activate(sum) {
//{!5} Return a 1 if positive, 1 if negative.
if (sum > 0) {
return 1;
} else {
return -1;
}
}</pre>
<p>You might be wondering how Im handling the value of 0 in the activation function. Is 0 positive or negative? The deep philosophical implications of this question aside, Im choosing here to arbitrarily return a 1 for 0, but I could easily change the <code>></code> to <code>>=</code> to go the other way. Depending on the application, this decision could be significant, but for demonstration purposes here, I can just pick one.</p>
<p>Now that Ive explained the computational process of a perceptron, lets look at an example of one in action.</p>
<h3 id="simple-pattern-recognition-using-a-perceptron">Simple Pattern Recognition Using a Perceptron</h3>
<p>Ive mentioned that neural networks are commonly used for pattern recognition. The scenarios outlined earlier require more complex networks, but even a simple perceptron can demonstrate a fundamental type of pattern recognition in which data points are classified as belonging to one of two groups. For instance, imagine you have a dataset of plants and want to identify them as either <em>xerophytes</em> (plants that have evolved to survive in an environment with little water and lots of sunlight, like the desert) or <em>hydrophytes</em> (plants that have adapted to living submerged in water, with reduced light). Thats how Ill use my perceptron in this section.</p>
<p>One way to approach classifying the plants is to plot their data on a 2D graph and treat the problem as a spatial one. On the x-axis, plot the amount of daily sunlight received by the plant, and on the y-axis, plot the amount of water. Once all the data has been plotted, its easy to draw a line across the graph, with all the xerophytes on one side and all the hydrophytes on the other, as in Figure 10.4. (Im simplifying a little here. Real-world data would probably be messier, making the line harder to draw.) Thats how each plant can be classified. Is it below the line? Then its a xerophyte. Is it above the line? Then its a hydrophyte.</p>
<figure>
<img src="images/10_nn/10_nn_5.png" alt="Figure 10.4: A collection of points in 2D space divided by a line, representing plant categories according to their water and sunlight intake ">
<figcaption>Figure 10.4: A collection of points in 2D space divided by a line, representing plant categories according to their water and sunlight intake</figcaption>
</figure>
<p>In truth, I dont need a neural network—not even a simple perceptron—to tell me whether a point is above or below a line. I can see the answer for myself with my own eyes, or have my computer figure it out with simple algebra. But just like solving a problem with a known answer—“to be or not to be”—was a convenient first test for the GA in <a href="/genetic-algorithms#">Chapter 9</a>, training a perceptron to categorize points as being on one side of a line versus the other will be a valuable way to demonstrate the algorithm of the perceptron and verify that its working properly.</p>
<p>To solve this problem, Ill give my perceptron two inputs: <span data-type="equation">x_0</span> is the x-coordinate of a point, representing a plants amount of sunlight, and <span data-type="equation">x_1</span> is the y-coordinate of that point, representing the plants amount of water. The perceptron then guesses the plants classification according to the sign of the weighted sum of these inputs. If the sum is positive, the perceptron outputs a +1, signifying a hydrophyte (above the line). If the sum is negative, it outputs a 1, signifying a xerophyte (below the line). Figure 10.5 shows this perceptron (note the shorthand of <span data-type="equation">w_0</span> and <span data-type="equation">w_1</span> for the weights).</p>
<figure>
<img src="images/10_nn/10_nn_6.png" alt="Figure 10.5: A perceptron with two inputs (x_0 and x_1), a weight for each input (w_0 and w_1), and a processing neuron that generates the output">
<figcaption>Figure 10.5: A perceptron with two inputs (<span data-type="equation">x_0</span> and <span data-type="equation">x_1</span>), a weight for each input (<span data-type="equation">w_0</span> and <span data-type="equation">w_1</span>), and a processing neuron that generates the output</figcaption>
</figure>
<p>This scheme has a pretty significant problem, however. What if my data point is (0, 0), and I send this point into the perceptron as inputs <span data-type="equation">x_0 = 0</span> and <span data-type="equation">x_1=0</span>? No matter what the weights are, multiplication by 0 is 0. The weighted inputs are therefore still 0, and their sum will be 0 too. And the sign of 0 is . . . hmmm, theres that deep philosophical quandary again. Regardless of how I feel about it, the point (0, 0) could certainly be above or below various lines in a 2D world. How is the perceptron supposed to interpret it accurately?</p>
<p>To avoid this dilemma, the perceptron requires a third input, typically referred to as a <strong>bias</strong> input. This extra input always has the value of 1 and is also weighted. Figure 10.6 shows the perceptron with the addition of the bias.</p>
<figure>
<img src="images/10_nn/10_nn_7.png" alt="Figure 10.6: Adding a bias input, along with its weight, to the perceptron">
<figcaption>Figure 10.6: Adding a bias input, along with its weight, to the perceptron</figcaption>
</figure>
<p>How does this affect point (0, 0)?</p>
<table>
<thead>
<tr>
<th style="width:100px">Phrase</th>
<th style="width:100px">Phrase</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td><span data-type="equation">w_0</span></td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td><span data-type="equation">w_1</span></td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td><span data-type="equation">w_\text{bias}</span></td>
<td><span data-type="equation">w_\text{bias}</span></td>
</tr>
</tbody>
</table>
<p>The output is then the sum of the weighted results: <span data-type="equation">0 + 0 + w_\text{bias}</span>. Therefore, the bias by itself answers the question of where (0, 0) is in relation to the line. If the biass weight is positive, (0, 0) is above the line; if negative, its below. The extra input and its weight <em>bias</em> the perceptrons understanding of the lines position relative to (0, 0)!</p>
<h3 id="the-perceptron-code">The Perceptron Code</h3>
<p>Im now ready to assemble the code for a <code>Perceptron</code> class. The perceptron needs to track only the input weights, which I can store using an array:</p>
<div class="snip-below">
<pre class="codesplit" data-code-language="javascript">class Perceptron {
constructor() {
this.weights = [];
}</pre>
</div>
<p>The constructor can receive an argument indicating the number of inputs (in this case, three: <span data-type="equation">x_0</span>, <span data-type="equation">x_1</span>, and a bias) and size the <code>weights</code> array accordingly, filling it with random values to start:</p>
<div class="snip-above snip-below">
<pre class="codesplit" data-code-language="javascript"> // The argument <code>n</code> determines the number of inputs (including the bias).
constructor(n) {
this.weights = [];
for (let i = 0; i &#x3C; n; i++) {
//{!1} The weights are picked randomly to start.
this.weights[i] = random(-1, 1);
}
}</pre>
</div>
<p>A perceptrons job is to receive inputs and produce an output. These requirements can be packaged together in a <code>feedForward()</code> method. In this example, the perceptrons inputs are an array (which should be the same length as the array of weights), and the output is a number, +1 or 1, as returned by the activation function based on the sign of the sum:</p>
<div class="snip-above">
<pre class="codesplit" data-code-language="javascript"> feedForward(inputs) {
let sum = 0;
for (let i = 0; i &#x3C; this.weights.length; i++) {
sum += inputs[i] * this.weights[i];
}
//{!1} The result is the sign of the sum, 1 or +1.
// Here the perceptron is making a guess:
// Is it on one side of the line or the other?
return this.activate(sum);
}
}</pre>
</div>
<p>Presumably, I could now create a <code>Perceptron</code> object and ask it to make a guess for any given point, as in Figure 10.7.</p>
<figure>
<img src="images/10_nn/10_nn_8.png" alt="Figure 10.7: An (x, y) coordinate from the 2D space is the input to the perceptron. ">
<figcaption>Figure 10.7: An (<em>x</em>, <em>y</em>) coordinate from the 2D space is the input to the perceptron.</figcaption>
</figure>
<p>Heres the code to generate a guess:</p>
<pre class="codesplit" data-code-language="javascript">// Create the perceptron.
let perceptron = new Perceptron(3);
// The input is three values: x, y, and the bias.
let inputs = [50, -12, 1];
// The answer!
let guess = perceptron.feedForward(inputs);</pre>
<p>Did the perceptron get it right? Maybe yes, maybe no. At this point, the perceptron has no better than a 50/50 chance of arriving at the correct answer, since each weight starts out as a random value. A neural network isnt a magic tool that can automatically guess correctly on its own. I need to teach it how to do so!</p>
<p>To train a neural network to answer correctly, Ill use the supervised learning method I described earlier in the chapter. Remember, this technique involves giving the network inputs with known answers. This enables the network to check whether it has made a correct guess. If not, the network can learn from its mistake and adjust its weights. The process is as follows:</p>
<ol>
<li>Provide the perceptron with inputs for which there is a known answer.</li>
<li>Ask the perceptron to guess an answer.</li>
<li>Compute the error. (Did it get the answer right or wrong?)</li>
<li>Adjust all the weights according to the error.</li>
<li>Return to step 1 and repeat!</li>
</ol>
<p>This process can be packaged into a method on the <code>Perceptron</code> class, but before I can write it, I need to examine steps 3 and 4 in more detail. How do I define the perceptrons error? And how should I adjust the weights according to this error?</p>
<p>The perceptrons error can be defined as the difference between the desired answer and its guess:</p>
<div data-type="equation">\text{error} = \text{desired output} - \text{guess output}</div>
<p>Does this formula look familiar? Think back to the formula for a vehicles steering force that I worked out in <a href="/autonomous-agents#">Chapter 5</a>:</p>
<div data-type="equation">\text{steering} = \text{desired velocity} - \text{current velocity}</div>
<p>This is also a calculation of an error! The current velocity serves as a guess, and the error (the steering force) indicates how to adjust the velocity in the correct direction. Adjusting a vehicles velocity to follow a target is similar to adjusting the weights of a neural network toward the correct answer.</p>
<p>For the perceptron, the output has only two possible values: +1 or 1. Therefore, only three errors are possible. If the perceptron guesses the correct answer, the guess equals the desired output and the error is 0. If the correct answer is 1 and the perceptron guessed +1, then the error is 2. If the correct answer is +1 and the perceptron guessed 1, then the error is +2. Heres that process summarized in a table:</p>
<table>
<thead>
<tr>
<th style="width:100px">Phrase</th>
<th style="width:100px">Phrase</th>
<th>Error</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>+1</td>
<td>2</td>
</tr>
<tr>
<td>+1</td>
<td>1</td>
<td>+2</td>
</tr>
<tr>
<td>+1</td>
<td>+1</td>
<td>0</td>
</tr>
</tbody>
</table>
<p>The error is the determining factor in how the perceptrons weights should be adjusted. For any given weight, what Im looking to calculate is the change in weight, often called <span data-type="equation">\Delta\text{weight}</span> (or <em>delta weight</em>, <span data-type="equation">\Delta</span> being the Greek letter delta):</p>
<div data-type="equation">\text{new weight} = \text{weight} + \Delta\text{weight}</div>
<p>To calculate <span data-type="equation">\Delta\text{weight}</span>, I need to multiply the error by the input:</p>
<div data-type="equation">\Delta\text{weight} = \text{error} \times \text{input}</div>
<p>Therefore, the new weight is calculated as follows:</p>
<div data-type="equation">\text{new weight} = \text{weight} + \text{error} \times \text{input}</div>
<p>To understand why this works, think again about steering. A steering force is essentially an error in velocity. By applying a steering force as an acceleration (or <span data-type="equation">\Delta\text{velocity}</span>), the velocity is adjusted to move in the correct direction. This is what I want to do with the neural networks weights. I want to adjust them in the right direction, as defined by the error.</p>
<p>With steering, however, I had an additional variable that controlled the vehicles ability to steer: the maximum force. A high maximum force allowed the vehicle to accelerate and turn quickly, while a lower force resulted in a slower velocity adjustment. The neural network will use a similar strategy with a variable called the <strong>learning constant</strong>:</p>
<div data-type="equation">\text{new weight} = \text{weight} + (\text{error} \times \text{input}) \times \text{learning constant}</div>
<p>A high learning constant causes the weight to change more drastically. This may help the perceptron arrive at a solution more quickly, but it also increases the risk of overshooting the optimal weights. A small learning constant will adjust the weights more slowly and require more training time, but will allow the network to make small adjustments that could improve overall accuracy.</p>
<p>Assuming the addition of a <code>learningConstant</code> property to the <code>Perceptron</code> class, I can now write a training method for the perceptron following the steps I outlined earlier:</p>
<pre class="codesplit" data-code-language="javascript"> // Step 1: Provide the inputs and known answer.
// These are passed in as arguments to <code>train()</code>.
train(inputs, desired) {
// Step 2: Guess according to those inputs.
let guess = this.feedforward(inputs);
// Step 3: Compute the error (the difference between <code>desired</code> and <code>guess</code>).
let error = desired - guess;
//{!3} Step 4: Adjust all the weights according to the error and learning constant.
for (let i = 0; i &#x3C; this.weights.length; i++) {
this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
}
}</pre>
<p>Heres the <code>Perceptron</code> class as a whole:</p>
<pre class="codesplit" data-code-language="javascript">class Perceptron {
constructor(totalInputs) {
//{!2} The perceptron stores its weights and learning constants.
this.weights = [];
this.learningConstant = 0.01;
//{!3} The weights start off random.
for (let i = 0; i &#x3C; totalInputs; i++) {
this.weights[i] = random(-1, 1);
}
}
//{!7} Return an output based on inputs.
feedforward(inputs) {
let sum = 0;
for (let i = 0; i &#x3C; this.weights.length; i++) {
sum += inputs[i] * this.weights[i];
}
return this.activate(sum);
}
// The output is a +1 or 1.
activate(sum) {
if (sum > 0) {
return 1;
} else {
return -1;
}
}
//{!7} Train the network against known data.
train(inputs, desired) {
let guess = this.feedforward(inputs);
let error = desired - guess;
for (let i = 0; i &#x3C; this.weights.length; i++) {
this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
}
}
}</pre>
<p>To train the perceptron, I need a set of inputs with known answers. However, I dont happen to have a real-world dataset (or time to research and collect one) for the xerophytes and hydrophytes scenario. In truth, though, the purpose of this demonstration isnt to show you how to classify plants. Its about how a perceptron can learn whether points are above or below a line on a graph, and so any set of points will do. In other words, I can just make up the data.</p>
<p>What Im describing is an example of <strong>synthetic data</strong>, artificially generated data thats often used in machine learning to create controlled scenarios for training and testing. In this case, my synthetic data will consist of a set of random input points, each with a known answer indicating whether the point is above or below a line. To define the line and generate the data, Ill use simple algebra. This approach allows me to clearly demonstrate the training process and show how the perceptron learns.</p>
<p>The question therefore becomes, how do I pick a point and know whether its above or below a line (without a neural network, that is)? A line can be described as a collection of points, where each points y-coordinate is a function of its x-coordinate:</p>
<div data-type="equation">y = f(x)</div>
<p>For a straight line (specifically, a linear function), the relationship can be written like this:</p>
<div data-type="equation">y = mx + b</div>
<p>Here <em>m</em> is the slope of the line, and <em>b</em> is the value of <em>y</em> when <em>x</em> is 0 (the y-intercept). Heres a specific example, with the corresponding graph in Figure 10.8.</p>
<div data-type="equation">y = \frac{1}2x - 1</div>
<figure>
<img src="images/10_nn/10_nn_9.png" alt="Figure 10.8: A graph of y = \frac{1}2x - 1">
<figcaption>Figure 10.8: A graph of <span data-type="equation">y = \frac{1}2x - 1</span></figcaption>
</figure>
<p>Ill arbitrarily choose that as the equation for my line, and write a function accordingly:</p>
<pre class="codesplit" data-code-language="javascript">// A function to calculate <code>y</code> based on <code>x</code> along a line
function f(x) {
return 0.5 * x - 1;
}</pre>
<p>Now theres the matter of the p5.js canvas defaulting to (0, 0) in the top-left corner with the y-axis pointing down. For this discussion, Ill assume Ive built the following into the code to reorient the canvas to match a more traditional Cartesian space:</p>
<pre class="codesplit" data-code-language="javascript">// Move the origin <code>(0, 0)</code> to the center.
translate(width / 2, height / 2);
// Flip the y-axis orientation (positive points up!).
scale(1, -1);</pre>
<p>I can now pick a random point in the 2D space:</p>
<pre class="codesplit" data-code-language="javascript">let x = random(-100, 100);
let y = random(-100, 100);</pre>
<p>How do I know if this point is above or below the line? The line function <em>f</em>(<em>x</em>) returns the <em>y</em> value on the line for that x-position. Ill call that <span data-type="equation">y_\text{line}</span>:</p>
<pre class="codesplit" data-code-language="javascript">// The <code>y</code> position on the line
let yline = f(x);</pre>
<p>If the <em>y</em> value Im examining is above the line, it will be greater than <span data-type="equation">y_\text{line}</span>, as in Figure 10.9.</p>
<figure>
<img src="images/10_nn/10_nn_10.png" alt="Figure 10.9: If y_\text{line} is less than y, the point is above the line.">
<figcaption>Figure 10.9: If <span data-type="equation">y_\text{line}</span> is less than <em>y</em>, the point is above the line.</figcaption>
</figure>
<p>Heres the code for that logic:</p>
<pre class="codesplit" data-code-language="javascript">// Start with a value of 1.
let desired = -1;
if (y > yline) {
//{!1} The answer becomes +1 if <code>y</code> is above the line.
desired = 1;
}</pre>
<p>I can then make an input array to go with the <code>desired</code> output:</p>
<pre class="codesplit" data-code-language="javascript">// Dont forget to include the bias!
let trainingInputs = [x, y, 1];</pre>
<p>Assuming that I have a <code>perceptron</code> variable, I can train it by providing the inputs along with the desired answer:</p>
<pre class="codesplit" data-code-language="javascript">perceptron.train(trainingInputs, desired);</pre>
<p>If I train the perceptron on a new random point (and its answer) for each cycle through <code>draw()</code>, it will gradually get better at classifying the points as above or below the line.</p>
<div data-type="example">
<h3 id="example-101-the-perceptron">Example 10.1: The Perceptron</h3>
<figure>
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/sMozIaMCW" data-example-path="examples/10_nn/10_1_perceptron_with_normalization"><img src="examples/10_nn/10_1_perceptron_with_normalization/screenshot.png"></div>
<figcaption></figcaption>
</figure>
</div>
<pre class="codesplit" data-code-language="javascript">// The perceptron
let perceptron;
//{!1} An array for training data
let training = [];
// A counter to track training data points one by one
let count = 0;
//{!3} The formula for a line
function f(x) {
return 0.5 * x + 1;
}
function setup() {
createCanvas(640, 240);
// The perceptron has three inputs (including bias) and a learning rate of 0.0001.
perceptron = new Perceptron(3, 0.0001);
//{!1} Make 2,000 training data points.
for (let i = 0; i &#x3C; 2000; i++) {
let x = random(-width / 2, width / 2);
let y = random(-height / 2, height / 2);
training[i] = [x, y, 1];
}
}
function draw() {
background(255);
// Reorient the canvas to match a traditional Cartesian plane.
translate(width / 2, height / 2);
scale(1, -1);
// Draw the line.
stroke(0);
strokeWeight(2);
line(-width / 2, f(-width / 2), width / 2, f(width / 2));
// Get the current <code>(x, y)</code> of the training data.
let x = training[count][0];
let y = training[count][1];
// What is the desired output?
let desired = -1;
if (y > f(x)) {
desired = 1;
}
// Train the perceptron.
perceptron.train(training[count], desired);
// For animation, train one point at a time.
count = (count + 1) % training.length;
// Draw all the points and color according to the output of the perceptron.
for (let dataPoint of training) {
let guess = perceptron.feedforward(dataPoint);
if (guess > 0) {
fill(127);
} else {
fill(255);
}
strokeWeight(1);
stroke(0);
circle(dataPoint[0], dataPoint[1], 8);
}
}</pre>
<p>In Example 10.1, the training data is visualized alongside the target solution line. Each point represents a piece of training data, and its color is determined by the perceptrons current classification—gray for +1 or white for 1. I use a small learning constant (0.0001) to slow down how the system refines its classifications over time.</p>
<p>An intriguing aspect of this example lies in the relationship between the perceptrons weights and the characteristics of the line dividing the points—specifically, the lines slope and y-intercept (the <em>m</em> and <em>b</em> in <em>y</em> = <em>mx</em> + <em>b</em>). The weights in this context arent just arbitrary or “magic” values; they bear a direct relationship to the geometry of the dataset. In this case, Im using just 2D data, but for many machine learning applications, the data exists in much higher-dimensional spaces. The weights of a neural network help navigate these spaces, defining <em>hyperplanes</em> or decision boundaries that segment and classify the data.</p>
<div data-type="exercise">
<h3 id="exercise-101">Exercise 10.1</h3>
<p>Modify the code from Example 10.1 to also draw the perceptrons current decision boundary during the training process—its best guess for where the line should be. Hint: Use the perceptrons current weights to calculate the lines equation.</p>
</div>
<p>While this perceptron example offers a conceptual foundation, real-world datasets often feature more diverse and dynamic ranges of input values. For the simplified scenario here, the range of values for <em>x</em> is larger than that for <em>y</em> because of the canvas size of 640<span data-type="equation">\times</span>240. Despite this, the example still works—after all, the sign activation function doesnt rely on specific input ranges, and its such a straightforward binary classification task.</p>
<p>However, real-world data often has much greater complexity in terms of input ranges. To this end, <strong>data normalization</strong> is a critical step in machine learning. Normalizing data involves mapping the training data to ensure that all inputs (and outputs) conform to a uniform range—typically 0 to 1, or perhaps 1 to 1. This process can improve training efficiency and prevent individual inputs from dominating the learning process. In the next section, using the ml5.js library, Ill build data normalization into the process.</p>
<div data-type="exercise">
<h3 id="exercise-102">Exercise 10.2</h3>
<p>Instead of using supervised learning, can you train the neural network to find the right weights by using a GA?</p>
</div>
<div data-type="exercise">
<h3 id="exercise-103">Exercise 10.3</h3>
<p>Incorporate data normalization into the example. Does this improve the learning efficiency?</p>
</div>
<h2 id="putting-the-network-in-neural-network">Putting the “Network” in Neural Network</h2>
<p>A perceptron can have multiple inputs, but its still just a single, lonely neuron. Unfortunately, that limits the range of problems it can solve. The true power of neural networks comes from the <em>network</em> part. Link multiple neurons together and youre able to solve problems of much greater complexity.</p>
<p>If you read an AI textbook, it will say that a perceptron can solve only <strong>linearly separable</strong> problems. If a dataset is linearly separable, you can graph it and classify it into two groups simply by drawing a straight line (see Figure 10.10, left). Classifying plants as xerophytes or hydrophytes is a linearly separable problem.</p>
<figure>
<img src="images/10_nn/10_nn_11.png" alt="Figure 10.10: Data points that are linearly separable (left) and data points that are nonlinearly separable, as a curve is required to separate the points (right)">
<figcaption>Figure 10.10: Data points that are linearly separable (left) and data points that are nonlinearly separable, as a curve is required to separate the points (right)</figcaption>
</figure>
<p>Now imagine youre classifying plants according to soil acidity (x-axis) and temperature (y-axis). Some plants might thrive in acidic soils but only within a narrow temperature range, while other plants prefer less acidic soils but tolerate a broader range of temperatures. A more complex relationship exists between the two variables, so a straight line cant be drawn to separate the two categories of plants, <em>acidophilic</em> and <em>alkaliphilic</em> (see Figure 10.10, right). A lone perceptron cant handle this type of <strong>nonlinearly separable</strong> problem. (Caveat here: Im making up these scenarios. If you happen to be a botanist, please let me know if Im anywhere close to reality.)</p>
<p>One of the simplest examples of a nonlinearly separable problem is XOR (exclusive or). This is a logical operator, similar to the more familiar AND and OR. For <em>A</em> AND <em>B </em>to be true, both <em>A</em> and <em>B</em> must be true. With OR, either <em>A</em> or <em>B</em> (or both) can be true. These are both linearly separable problems. The truth tables in Figure 10.11 show their solution space. Each true or false value in the table shows the output for a particular combination of true or false inputs.</p>
<figure>
<img src="images/10_nn/10_nn_12.png" alt="Figure 10.11: Truth tables for the AND and OR logical operators. The true and false outputs can be separated by a line.">
<figcaption>Figure 10.11: Truth tables for the AND and OR logical operators. The true and false outputs can be separated by a line.</figcaption>
</figure>
<p>See how you can draw a straight line to separate the true outputs from the false ones?</p>
<p>The XOR operator is the equivalent of (OR) AND (NOT AND). In other words, <em>A</em> XOR <em>B </em>evaluates to true only if one of the inputs is true. If both inputs are false or both are true, the output is false. To illustrate, lets say youre having pizza for dinner. You love pineapple on pizza, and you love mushrooms on pizza, but put them together, and yech! And plain pizza, thats no good either!</p>
<figure>
<img src="images/10_nn/10_nn_13.png" alt="Figure 10.12: The “truth” table for whether you want to eat the pizza (left) and XOR (right). Note how the true and false outputs cant be separated by a single line.">
<figcaption>Figure 10.12: The “truth” table for whether you want to eat the pizza (left) and XOR (right). Note how the true and false outputs cant be separated by a single line.</figcaption>
</figure>
<p>The XOR truth table in Figure 10.12 isnt linearly separable. Try to draw a straight line to separate the true outputs from the false ones—you cant!</p>
<p>The fact that a perceptron cant even solve something as simple as XOR may seem extremely limiting. But what if I made a network out of two perceptrons? If one perceptron can solve the linearly separable OR and one perceptron can solve the linearly separate NOT AND, then two perceptrons combined can solve the nonlinearly separable XOR.</p>
<p>When you combine multiple perceptrons, you get a <strong>multilayered perceptron</strong>, a network of many neurons (see Figure 10.13). Some are input neurons and receive the initial inputs, some are part of whats called a <strong>hidden layer</strong> (as theyre connected to neither the inputs nor the outputs of the network directly), and then there are the output neurons, from which the results are read.</p>
<figure>
<img src="images/10_nn/10_nn_14.png" alt="Figure 10.13: A multilayered perceptron has the same inputs and output as the simple perceptron, but now it includes a hidden layer of neurons.">
<figcaption>Figure 10.13: A multilayered perceptron has the same inputs and output as the simple perceptron, but now it includes a hidden layer of neurons.</figcaption>
</figure>
<p>Up until now, Ive been visualizing a singular perceptron with one circle representing a neuron processing its input signals. Now, as I move on to larger networks, its more typical to represent all the elements (inputs, neurons, outputs) as circles, with arrows that indicate the flow of data. In Figure 10.13, you can see the inputs and bias flowing into the hidden layer, which then flows to the output.</p>
<p>Training a simple perceptron is pretty straightforward: you feed the data through and evaluate how to change the input weights according to the error. With a multilayered perceptron, however, the training process becomes more complex. The overall output of the network is still generated in essentially the same manner as before: the inputs multiplied by the weights are summed and fed forward through the various layers of the network. And you still use the networks guess to calculate the error (desired result guess). But now so many connections exist between layers of the network, each with its own weight. How do you know how much each neuron or connection contributed to the overall error of the network, and how it should be adjusted?</p>
<p>The solution to optimizing the weights of a multilayered network is <strong>backpropagation</strong>. This process takes the error and feeds it backward through the network so it can adjust the weights of all the connections in proportion to how much theyve contributed to the total error. The details of backpropagation are beyond the scope of this book. The algorithm uses a variety of activation functions (one classic example is the sigmoid function) as well as some calculus. If youre interested in continuing down this road and learning more about how backpropagation works, you can find my <a href="https://thecodingtrain.com/neural-network">“Toy Neural Network” project at the Coding Train website with accompanying video tutorials</a>. They go through all the steps of solving XOR using a multilayered feed-forward network with backpropagation. For this chapter, however, Id instead like to get some help and phone a friend.</p>
<h2 id="machine-learning-with-ml5js">Machine Learning with ml5.js</h2>
<p>That friend is ml5.js. This machine learning library can manage the details of complex processes like backpropagation so you and I dont have to worry about them. As I mentioned earlier in the chapter, ml5.js aims to provide a friendly entry point for those who are new to machine learning and neural networks, while still harnessing the power of Googles TensorFlow.js behind the scenes.</p>
<p>To use ml5.js in a sketch, you must import it via a <code>&#x3C;script></code> element in your <em>index.html</em> file, much as you did with Matter.js and Toxiclibs.js in <a href="/physics-libraries#">Chapter 6</a>:</p>
<pre class="codesplit" data-code-language="html">&#x3C;script src="https://unpkg.com/ml5@latest/dist/ml5.min.js">&#x3C;/script></pre>
<p>My goal for the rest of this chapter is to introduce ml5.js by developing a system that can recognize mouse gestures. This will prepare you for <a href="/neuroevolution#">Chapter 11</a>, where Ill add a neural network “brain” to an autonomous steering agent and tie machine learning back into the story of the book. First, however, Id like to talk more generally through the steps of training a multilayered neural network model using supervised learning. Outlining these steps will highlight important decisions youll have to make before developing a learning model, introduce the syntax of the ml5.js library, and provide you with the context youll need before training your own machine learning models.</p>
<h3 id="the-machine-learning-life-cycle">The Machine Learning Life Cycle</h3>
<p>The life cycle of a machine learning model is typically broken into seven steps:</p>
<ol>
<li><strong>Collect the data.</strong> Data forms the foundation of any machine learning task. This stage might involve running experiments, manually inputting values, sourcing public data, or a myriad of other methods (like generating synthetic data).</li>
<li><strong>Prepare the data.</strong> Raw data often isnt in a format suitable for machine learning algorithms. It might also have duplicate or missing values, or contain outliers that skew the data. Such inconsistencies may need to be manually adjusted. Additionally, as I mentioned earlier, neural networks work best with normalized data, which has values scaled to fit within a standard range. Another key part of preparing data is separating it into distinct sets: training, validation, and testing. The training data is used to teach the model (step 4), while the validation and testing data (the distinction is subtle—more on this later) are set aside and reserved for evaluating the models performance (step 5).</li>
<li><strong>Choose a model.</strong> Design the architecture of the neural network. Different models are more suitable for certain types of data and outputs.</li>
<li><strong>Train the model.</strong> Feed the training portion of the data through the model and allow the model to adjust the weights of the neural network based on its errors. This process is known as <strong>optimization</strong>: the model tunes the weights so they result in the fewest number of errors.</li>
<li><strong>Evaluate the model.</strong> Remember the testing data that was set aside in step 2? Since that data wasnt used in training, it provides a means to evaluate how well the model performs on new, unseen data.</li>
<li><strong>Tune the parameters.</strong> The training process is influenced by a set of parameters (often called <strong>hyperparameters</strong>) such as the learning rate, which dictates how much the model should adjust its weights based on errors in prediction. I called this the <code>learningConstant</code> in the perceptron example. By fine-tuning these parameters and revisiting steps 4 (training), 3 (model selection), and even 2 (data preparation), you can often improve the models performance.</li>
<li><strong>Deploy the model. </strong>Once the model is trained and its performance is evaluated satisfactorily, its time to use the model out in the real world with new data!</li>
</ol>
<p>These steps are the cornerstone of supervised machine learning. However, even though 7 is a truly excellent number, I think I missed one more critical step. Ill call it step 0.</p>
<ol>
<li value="0"><strong>Identify the problem.</strong> This initial step defines the problem that needs solving. What is the objective? What are you trying to accomplish or predict with your machine learning model?</li>
</ol>
<p>This zeroth step informs all the other steps in the process. After all, how are you supposed to collect your data and choose a model without knowing what youre even trying to do? Are you predicting a number? A category? A sequence? Is it a binary choice, or are there many options? These sorts of questions often boil down to choosing between two types of tasks that the majority of machine learning applications fall into: classification and regression.</p>
<h3 id="classification-and-regression">Classification and Regression</h3>
<p><strong>Classification</strong> is a type of machine learning problem that involves predicting a <strong>label</strong> (also called a <strong>category</strong> or <strong>class</strong>) for a piece of data. If this sounds familiar, thats because it is: the simple perceptron in Example 10.1 was trained to classify points as above or below a line. To give another example, an image classifier might try to guess if a photo is of a cat or a dog and assign the corresponding label (see Figure 10.14).</p>
<figure>
<img src="images/10_nn/10_nn_15.png" alt="Figure 10.14: Labeling images as cats or dogs">
<figcaption>Figure 10.14: Labeling images as cats or dogs</figcaption>
</figure>
<p>Classification doesnt happen by magic. The model must first be shown many examples of dogs and cats with the correct labels in order to properly configure the weights of all the connections. This is the training part of supervised learning.</p>
<p>The classic “Hello, world!” demonstration of machine learning and supervised learning is a classification problem of the MNIST dataset. Short for <em>Modified National Institute of Standards and Technology</em>, <strong>MNIST</strong> is a dataset that was collected and processed by Yann LeCun (Courant Institute, NYU), Corinna Cortes (Google Labs), and Christopher J.C. Burges (Microsoft Research). Widely used for training and testing in the field of machine learning, this dataset consists of 70,000 handwritten digits from 0 to 9; each is a 28<span data-type="equation">\times</span>28-pixel grayscale image (see Figure 10.15 for examples). Each image is labeled with its corresponding digit.</p>
<figure>
<img src="images/10_nn/10_nn_16.png" alt="Figure 10.15: A selection of handwritten digits 09 from the MNIST dataset (courtesy of Suvanjanprasai)">
<figcaption>Figure 10.15: A selection of handwritten digits 09 from the MNIST dataset (courtesy of Suvanjanprasai)</figcaption>
</figure>
<p>MNIST is a canonical example of a training dataset for image classification: the model has a discrete number of categories to choose from (10 to be exact—no more, no less). After the model is trained on the 70,000 labeled images, the goal is for it to classify new images and assign the appropriate label, a digit from 0 to 9.</p>
<p><strong>Regression</strong>, on the other hand, is a machine learning task for which the prediction is a continuous value, typically a floating-point number. A regression problem can involve multiple outputs, but thinking about just one is often simpler to start. For example, consider a machine learning model that predicts the daily electricity usage of a house based on input factors like the number of occupants, the size of the house, and the temperature outside (see Figure 10.16).</p>
<figure>
<img src="images/10_nn/10_nn_17.png" alt="Figure 10.16: Factors like weather and the size and occupancy of a home can influence its daily electricity usage.">
<figcaption>Figure 10.16: Factors like weather and the size and occupancy of a home can influence its daily electricity usage.</figcaption>
</figure>
<p>Rather than picking from a discrete set of output options, the goal of the neural network is now to guess a number—any number. Will the house use 30.5 kilowatt-hours of electricity that day? Or 48.7 kWh? Or 100.2 kWh? The output prediction could be any value from a continuous range.</p>
<h3 id="network-design">Network Design</h3>
<p>Knowing what problem youre trying to solve (step 0) also has a significant bearing on the design of the neural network—in particular, on its input and output layers. Ill demonstrate with another classic “Hello, world!” classification example from the field of data science and machine learning: the iris dataset. This dataset, which can be found in the Machine Learning Repository at the University of California, Irvine, originated from the work of American botanist Edgar Anderson.</p>
<p>Anderson collected flower data over many years across multiple regions of the United States and Canada. For more on the origins of this famous dataset, see “The Iris Data Set: In Search of the Source of <em>Virginica</em><a href="https://academic.oup.com/jrssig/article/18/6/26/7038520">” by Antony Unwin and Kim Kleinman</a>. After carefully analyzing the data, Anderson built a table to classify iris flowers into three distinct species: <em>Iris setosa</em>, <em>Iris virginica</em>, and <em>Iris versicolor </em>(see Figure 10.17).</p>
<figure>
<img src="images/10_nn/10_nn_18.png" alt="Figure 10.17: Three distinct species of iris flowers">
<figcaption>Figure 10.17: Three distinct species of iris flowers</figcaption>
</figure>
<p>Anderson included four numeric attributes for each flower: sepal length, sepal width, petal length, and petal width, all measured in centimeters. (He also recorded color information, but that data appears to have been lost.) Each record is then paired with the appropriate iris categorization:</p>
<table>
<thead>
<tr>
<th>Sepal Length</th>
<th>Sepal Width</th>
<th>Petal Length</th>
<th>Petal Width</th>
<th>Classification</th>
</tr>
</thead>
<tbody>
<tr>
<td>5.1</td>
<td>3.5</td>
<td>1.4</td>
<td>0.2</td>
<td><em>Iris setosa</em></td>
</tr>
<tr>
<td>4.9</td>
<td>3.0</td>
<td>1.4</td>
<td>0.2</td>
<td><em>Iris setosa</em></td>
</tr>
<tr>
<td>7.0</td>
<td>3.2</td>
<td>4.7</td>
<td>1.4</td>
<td><em>Iris versicolor</em></td>
</tr>
<tr>
<td>6.4</td>
<td>3.2</td>
<td>4.5</td>
<td>1.5</td>
<td><em>Iris versicolor</em></td>
</tr>
<tr>
<td>6.3</td>
<td>3.3</td>
<td>6.0</td>
<td>2.5</td>
<td><em>Iris virginica</em></td>
</tr>
<tr>
<td>5.8</td>
<td>2.7</td>
<td>5.1</td>
<td>1.9</td>
<td><em>Iris virginica</em></td>
</tr>
</tbody>
</table>
<p>In this dataset, the first four columns (sepal length, sepal width, petal length, petal width) serve as inputs to the neural network. The output is the classification provided in the fifth column. Figure 10.18 depicts a possible architecture for a neural network that can be trained on this data.</p>
<figure>
<img src="images/10_nn/10_nn_19.png" alt="Figure 10.18: A possible network architecture for iris classification">
<figcaption>Figure 10.18: A possible network architecture for iris classification</figcaption>
</figure>
<p>On the left are the four inputs to the network, corresponding to the first four columns of the data table. On the right are three possible outputs, each representing one of the iris species labels. In between is the hidden layer, which, as mentioned earlier, adds complexity to the networks architecture, necessary for handling nonlinearly separable data. Each node in the hidden layer is connected to every node that comes before and after it. This is commonly called a <strong>fully connected</strong> or <strong>dense </strong>layer.</p>
<p>You might also notice the absence of explicit bias nodes in this diagram. While biases play an important role in the output of each neuron, theyre often left out of visual representations to keep the diagrams clean and focused on the primary data flow. (The ml5.js library will ultimately manage the biases for me internally.)</p>
<p>The neural networks goal is to “activate” the correct output for the input data, just as the perceptron would output a +1 or 1 for its single binary classification. In this case, the output values are like signals that help the network decide which iris species label to assign. The highest computed value activates to signify the networks best guess about the classification.</p>
<p>The key takeaway here is that a classification network should have as many inputs as there are values for each item in the dataset, and as many outputs as there are categories. As for the hidden layer, the design is much less set in stone. The hidden layer in Figure 10.18 has five nodes, but this number is entirely arbitrary. Neural network architectures can vary greatly, and the number of hidden nodes is often determined through trial and error or other educated guessing methods (called <em>heuristics</em>). In the context of this book, Ill be relying on ml5.js to automatically configure the architecture based on the input and output data.</p>
<p>What about the inputs and outputs in a regression scenario, like the household electricity consumption example I mentioned earlier? Ill go ahead and make up a dataset for this scenario, with values representing the occupants and size of the house, the days temperature, and the corresponding electricity usage. This is much like a synthetic dataset, given that its not data collected for a real-world scenario—but whereas synthetic data is generated automatically, here Im manually inputting numbers from my own imagination:</p>
<table>
<tbody>
<tr>
<td><strong>Occupants</strong></td>
<td><strong>Size (m²)</strong></td>
<td><strong>Temperature Outside (°C)</strong></td>
<td><strong>Electricity Usage (kWh)</strong></td>
</tr>
<tr>
<td>4</td>
<td>150</td>
<td>24</td>
<td>25.3</td>
</tr>
<tr>
<td>2</td>
<td>100</td>
<td>25.5</td>
<td>16.2</td>
</tr>
<tr>
<td>1</td>
<td>70</td>
<td>26.5</td>
<td>12.1</td>
</tr>
<tr>
<td>4</td>
<td>120</td>
<td>23</td>
<td>22.1</td>
</tr>
<tr>
<td>2</td>
<td>90</td>
<td>21.5</td>
<td>15.2</td>
</tr>
<tr>
<td>5</td>
<td>180</td>
<td>20</td>
<td>24.4</td>
</tr>
<tr>
<td>1</td>
<td>60</td>
<td>18.5</td>
<td>11.7</td>
</tr>
</tbody>
</table>
<p>The neural network for this problem should have three input nodes corresponding to the first three columns (occupants, size, temperature). Meanwhile, it should have one output node representing the fourth column, the networks guess about the electricity usage. And Ill arbitrarily say the networks hidden layer should have four nodes rather than five. Figure 10.19 shows this network architecture.</p>
<figure>
<img src="images/10_nn/10_nn_20.png" alt="Figure 10.19: A possible network architecture for three inputs and one regression output">
<figcaption>Figure 10.19: A possible network architecture for three inputs and one regression output</figcaption>
</figure>
<p>Unlike the iris classification network, which is choosing from three labels and therefore has three outputs, this network is trying to predict just one number, so it has only one output. Ill note, however, that a single output isnt a requirement of regression. A machine learning model can also perform a regression that predicts multiple continuous values, in which case the model would have multiple outputs.</p>
<h3 id="ml5js-syntax">ml5.js Syntax</h3>
<p>The ml5.js library is a collection of machine learning models that can be accessed using the syntax <code>ml5.</code><code><em>functionName</em></code><code>()</code>. For example, to use a pretrained model that detects hand positions, you can use <code>ml5.handpose()</code>. For classifying images, you can use <code>ml5.imageClassifier()</code>. While I encourage you to explore all that ml5.js has to offer (Ill reference some of these pretrained models in upcoming exercise ideas), for this chapter Ill focus on only one function in ml5.js, <code>ml5.neuralNetwork()</code>, which creates an empty neural network for you to train.</p>
<p>To use this function, you must first create a JavaScript object that will configure the model being created. Heres where some of the big-picture factors I just discussed—is this a classification or a regression task? How many inputs and outputs?—come into play. Ill begin by specifying the task I want the model to perform (<code>"regression"</code> or <code>"classification"</code>):</p>
<pre class="codesplit" data-code-language="javascript">let options = { task: "classification" };
let classifier = ml5.neuralNetwork(options);</pre>
<p>This, however, gives ml5.js little to go on in terms of designing the network architecture. Adding the inputs and outputs will complete the rest of the puzzle. The iris flower classification has four inputs and three possible output labels. This can be configured as part of the <code>options</code> object with a single integer for the number of inputs and an array of strings listing the output labels:</p>
<pre class="codesplit" data-code-language="javascript">let options = {
inputs: 4,
outputs: ["iris-setosa", "iris-virginica", "iris-versicolor"],
task: "classification",
};
let digitClassifier = ml5.neuralNetwork(options);</pre>
<p>The electricity regression scenario had three input values (occupants, size, temperature) and one output value (usage in kWh). With regression, there are no string output labels, so only an integer indicating the number of outputs is required:</p>
<pre class="codesplit" data-code-language="javascript">let options = {
inputs: 3,
outputs: 1,
task: "regression",
};
let energyPredictor = ml5.neuralNetwork(options);</pre>
<p>You can set many other properties of the model through the <code>options</code> object. For example, you could specify the number of hidden layers between the inputs and outputs (there are typically several), the number of neurons in each layer, which activation functions to use, and more. In most cases, however, you can leave out these extra settings and let ml5.js make its best guess on how to design the model based on the task and data at hand.</p>
<h2 id="building-a-gesture-classifier">Building a Gesture Classifier</h2>
<p>Ill now walk through the steps of the machine learning life cycle with an example problem well suited for p5.js, building all the code for each step along the way using ml5.js. Ill begin at step 0 by articulating the problem. Imagine for a moment that youre working on an interactive application that responds to gestures. Maybe the gestures are ultimately meant to be recorded via body tracking, but you want to start with something much simpler—a single stroke of the mouse (see Figure 10.20).</p>
<figure>
<img src="images/10_nn/10_nn_21.png" alt="Figure 10.20: A single mouse gesture as a vector between a start and end point">
<figcaption>Figure 10.20:<em> </em>A single mouse gesture as a vector between a start and end point</figcaption>
</figure>
<p>Each gesture could be recorded as a vector extending from the start to the end point of a mouse movement. The x- and y-components of the vector will be the models inputs. The models task could be to predict one of four possible labels for the gesture: <em>up</em>, <em>down</em>, <em>left</em>, or <em>right</em>. With a discrete set of possible outputs, this sounds like a classification problem. The four labels will be the models outputs.</p>
<p>Much like some of the GA demonstrations in <a href="/genetic-algorithms#">Chapter 9</a>—and like the simple perceptron example earlier in this chapter—the problem Im selecting here has a known solution and could be solved more easily and efficiently without a neural network. The direction of a vector can be classified with the <code>heading()</code> function and a series of <code>if</code> statements! However, by using this seemingly trivial scenario, I hope to explain the process of training a machine learning model in an understandable and friendly way. Additionally, this example will make it easy to check that the code is working as expected. When Im done, Ill provide some ideas about how to expand the classifier to a scenario that couldnt use simple <code>if</code> statements.</p>
<h3 id="collecting-and-preparing-the-data">Collecting and Preparing the Data</h3>
<p>With the problem established, I can turn to steps 1 and 2: collecting and preparing the data. In the real world, these steps can be tedious, especially when the raw data you collect is messy and needs a lot of initial processing. You can think of this like having to organize, wash, and chop all your ingredients before you can start cooking a meal from scratch.</p>
<p>For simplicity, Id instead like to take the approach of ordering a machine learning “meal kit,” with the ingredients (data) already portioned and prepared. This way, Ill get straight to the cooking itself, the process of training the model. After all, this is really just an appetizer for what will be the ultimate meal in <a href="/neuroevolution#">Chapter 11</a>, when I apply neural networks to steering agents.</p>
<p>With that in mind, Ill handcode some example data and manually keep it normalized within a range of 1 and +1. Ill organize the data into an array of objects, pairing the x- and y-components of a vector with a string label. Im picking values that I feel clearly point in a specific direction and assigning the appropriate label—two examples per label:</p>
<pre class="codesplit" data-code-language="javascript">let data = [
{ x: 0.99, y: 0.02, label: "right" },
{ x: 0.76, y: -0.1, label: "right" },
{ x: -1.0, y: 0.12, label: "left" },
{ x: -0.9, y: -0.1, label: "left" },
{ x: 0.02, y: 0.98, label: "down" },
{ x: -0.2, y: 0.75, label: "down" },
{ x: 0.01, y: -0.9, label: "up" },
{ x: -0.1, y: -0.8, label: "up" },
];</pre>
<p>Figure 10.21 shows the same data expressed as arrows.</p>
<figure>
<img src="images/10_nn/10_nn_22.png" alt="Figure 10.21: The input data visualized as vectors (arrows)">
<figcaption>Figure 10.21: The input data visualized as vectors (arrows)</figcaption>
</figure>
<p>In a more realistic scenario, Id probably have a much larger dataset that would be loaded in from a separate file, instead of written directly into the code. For example, JavaScript Object Notation (JSON) and comma-separated values (CSV) are two popular formats for storing and loading data. JSON stores data in key-value pairs and follows the same exact format as JavaScript object literals. CSV is a file format that stores tabular data (like a spreadsheet). You could use numerous other data formats, depending on your needs and the programming environment youre working with.</p>
<p>In the real world, the values in that larger dataset would actually come from somewhere. Maybe I would collect the data by asking users to perform specific gestures and recording their inputs, or by writing an algorithm to automatically generate larger amounts of synthetic data that represent the idealized versions of the gestures I want the model to recognize. In either case, the key would be to collect a diverse set of examples that adequately represent the variations in how the gestures might be performed. For now, however, lets see how it goes with just a few servings of data.</p>
<div data-type="exercise">
<h3 id="exercise-104">Exercise 10.4</h3>
<p>Create a p5.js sketch that collects gesture data from users and saves it to a JSON file. You can use <code>mousePressed()</code> and <code>mouseReleased()</code> to mark the start and end of each gesture, and <code>saveJSON()</code> to download the data into a file.</p>
</div>
<h3 id="choosing-a-model">Choosing a Model</h3>
<p>Ive now come to step 3 of the machine learning life cycle, selecting a model. This is where Im going to start letting ml5.js do the heavy lifting for me. To create the model with ml5.js, all I need to do is specify the task, the inputs, and the outputs:</p>
<pre class="codesplit" data-code-language="javascript">let options = {
task: "classification",
inputs: 2,
outputs: ["up", "down", "left", "right"],
debug: true
};
let classifier = ml5.neuralNetwork(options);</pre>
<p>Thats it! Im done! Thanks to ml5.js, I can bypass a host of complexities such as the number of layers and neurons per layer to have, the kinds of activation functions to use, and how to set up the algorithms for training the network. The library will make these decisions for me.</p>
<p>Of course, the default ml5.js model architecture may not be perfect for all cases. I encourage you to read the ml5.js documentation for additional details on how to customize the model. Ill also point out that ml5.js is able to infer the inputs and outputs from the data, so those properties arent entirely necessary to include here in the <code>options</code> object. However, for the sake of clarity (and since Ill need to specify them for later examples), Im including them here.</p>
<p>The <code>debug</code> property, when set to <code>true</code>, turns on a visual interface for the training process. Its a helpful tool for spotting potential issues during training and for getting a better understanding of whats happening behind the scenes. Youll see what this interface looks like later in the chapter.</p>
<h3 id="training-the-model">Training the Model</h3>
<p>Now that I have the data in a <code>data</code> variable and a neural network initialized in the <code>classifier</code> variable, Im ready to train the model. That process starts with adding the data to the model. And for that, it turns out Im not quite done with preparing the data.</p>
<p>Right now, my data is neatly organized in an array of objects, each containing the x- and y-components of a vector and a corresponding string label. This is a typical format for training data, but it isnt directly consumable by ml5.js. (Sure, I could have initially organized the data into a format that ml5.js recognizes, but Im including this extra step because it will likely be necessary when youre using a dataset that has been collected or sourced elsewhere.) To add the data to the model, I need to separate the inputs from the outputs so that the model understands which are which.</p>
<p>The ml5.js library offers a fair amount of flexibility in the kinds of formats it will accept, but Ill choose to use arrays—one for the <code>inputs</code> and one for the <code>outputs</code>. I can use a loop to reorganize each data item and add it to the model:</p>
<pre class="codesplit" data-code-language="javascript">for (let item of data) {
// An array of two numbers for the inputs
let inputs = [item.x, item.y];
// A single string label for the output
let outputs = [item.label];
//{!1} Add the training data to the classifier.
classifier.addData(inputs, outputs);
}</pre>
<p>What Ive done here is set the <strong>shape</strong> of the data. In machine learning, this term describes the datas dimensions and structure. It indicates how the data is organized in terms of rows, columns, and potentially even deeper, into additional dimensions. Understanding the shape of your data is crucial because it determines the way the model should be structured.</p>
<p>Here, the input datas shape is a 1D array containing two numbers (representing <em>x</em> and <em>y</em>). The output data, similarly, is a 1D array containing just a single string label. Every piece of data going in and out of the network will follow this pattern. While this is a small and simple example, it nicely mirrors many real-world scenarios in which the inputs are numerically represented in an array, and the outputs are string labels.</p>
<p>After passing the data into the <code>classifier</code>, ml5.js provides a helper function to normalize it. As Ive mentioned, normalizing data (adjusting the scale to a standard range) is a critical step in the machine learning process:</p>
<pre class="codesplit" data-code-language="javascript">// Normalize the data.
classifier.normalizeData();</pre>
<p>In this case, the handcoded data was limited to a range of 1 to +1 from the get-go, so calling <code>normalizeData()</code> here is likely redundant. Still, this function call is important to demonstrate. Normalizing your data ahead of time as part of the preprocessing step will absolutely work, but the auto-normalization feature of ml5.js is a big help!</p>
<p>Now for the heart of the machine learning process: actually training the model. Heres the code:</p>
<pre class="codesplit" data-code-language="javascript">// The <code>train()</code> method initiates the training process.
classifier.train(finishedTraining);
// A callback function for when the training is complete
function finishedTraining() {
console.log("Training complete!");
}</pre>
<p>Yes, thats it! After all, the hard work has already been completed. The data was collected, prepared, and fed into the model. All that remains is to call the <code>train()</code> method, sit back, and let ml5.js do its thing.</p>
<p>In truth, it isnt <em>quite</em> that simple. If I were to run the code as written and then test the model, the results would probably be inadequate. Heres where another key term in machine learning comes into play: <strong>epochs</strong>. The <code>train()</code> method tells the neural network to start the learning process. But how long should it train for? You can think of an epoch as one round of practice, one cycle of using the entire training dataset to update the weights of the neural network. Generally speaking, the more epochs you go through, the better the network will perform, but at a certain point youll have diminishing returns. The number of epochs can be set by passing in an <code>options</code> object into <code>train()</code>:</p>
<pre class="codesplit" data-code-language="javascript">//{!1} Set the number of epochs for training.
let options = { epochs: 25 };
classifier.train(options, finishedTraining);</pre>
<p>The number of epochs is an example of a hyperparameter, a global setting for the training process. You can set others through the <code>options</code> object (the learning rate, for example), but Im going to stick with the defaults. You can read more about customization options in the ml5.js documentation.</p>
<p>The second argument to <code>train()</code> is optional, but its good to include one. It specifies a callback function that runs when the training process is complete—in this case, <code>finshedTraining()</code>. (See the “Callbacks” box for more on callback functions.) This is useful for knowing when you can proceed to the next steps in your code. Another optional callback, which I usually name <code>whileTraining()</code>, is triggered after each epoch. However, for my purposes, knowing when the training is done is plenty!</p>
<div data-type="note">
<h3 id="callbacks">Callbacks</h3>
<p>A <strong>callback function</strong> in JavaScript is a function you dont actually call yourself. Instead, you provide it as an argument to another function, intending for it to be <em>called back</em> automatically at a later time (typically associated with an event, like a mouse click). Youve seen this before when working with Matter.js in <a href="/physics-libraries#">Chapter 6</a>, where you specified a function to call whenever a collision was detected.</p>
<p>Callbacks are needed for <strong>asynchronous</strong> operations, when you want your code to continue along with animating or doing other things while waiting for another task (like training a machine learning model) to finish. A classic example of this in p5.js is loading data into a sketch with <code>loadJSON()</code>.</p>
<p>JavaScript also provides a more recent approach for handling asynchronous operations known as <strong>promises</strong>. With promises, you can use keywords like <code>async</code> and <code>await</code> to make your asynchronous code look more like traditional synchronous code. While ml5.js also supports this style, Ill stick to using callbacks to stay aligned with p5.js style.</p>
</div>
<h3 id="evaluating-the-model">Evaluating the Model</h3>
<p>If <code>debug</code> is set to <code>true</code> in the initial call to <code>ml5.neuralNetwork()</code>, a visual interface should appear after <code>train()</code> is called, covering most of the p5.js page and canvas (see Figure 10.22). This interface, called the <em>Visor</em>, represents the evaluation step.</p>
<figure>
<img src="images/10_nn/10_nn_23.png" alt="Figure 10.22: The Visor, with a graph of the loss function and model details">
<figcaption>Figure 10.22: The Visor, with a graph of the loss function and model details</figcaption>
</figure>
<p>The Visor comes from TensorFlow.js (which underlies ml5.js) and includes a graph that provides real-time feedback on the progress of the training. This graph plots the loss of the model on the y-axis against the number of epochs along the x-axis. <strong>Loss</strong> is a measure of how far off the models predictions are from the correct outputs provided by the training data. It quantifies the models total error. When training begins, its common for the loss to be high because the model has yet to learn anything. Ideally, as the model trains through more epochs, it should get better at its predictions, and the loss should decrease. If the graph goes down as the epochs increase, this is a good sign!</p>
<p>Running the training for the 200 epochs depicted in Figure 10.21 might strike you as a bit excessive. In a real-world scenario with more extensive data, I would probably use fewer epochs, like the 25 I specified in the original code snippet. However, because the dataset here is so tiny, the higher number of epochs helps the model get enough practice with the data. Remember, this is a toy example, aiming to make the concepts clear rather than to produce a sophisticated machine learning model.</p>
<p>Below the graph, the Visor shows a Model Summary table with details on the lower-level TensorFlow.js model architecture created behind the scenes. The summary includes layer names, neuron counts per layer (in the Output Shape column), and a parameters count, which is the total number of weights, one for each connection between two neurons. In this case, dense_Dense1 is the hidden layer with 16 neurons (a number chosen by ml5.js), and dense_Dense2 is the output layer with 4 neurons, one for each classification category. (TensorFlow.js doesnt think of the inputs as a distinct layer; rather, theyre merely the starting point of the data flow.) The <em>batch</em> in the Output Shape column doesnt refer to a specific number but indicates that the model can process a variable amount of training data (a batch) for any single cycle of model training.</p>
<p>Before moving on from the evaluation stage, I have a loose end to tie up. When I first outlined the steps of the machine learning life cycle, I mentioned that preparing the data typically involves splitting the dataset into three parts to help with the evaluation process:</p>
<ul>
<li><strong>Training:</strong> The primary dataset used to train the model</li>
<li><strong>Validation:</strong> A subset of the data used to check the model during training, typically at the end of each epoch</li>
<li><strong>Testing:</strong> Additional untouched data never considered during the training process, for determining the models final performance after the training is completed</li>
</ul>
<p>You may have noticed that I never did this. For simplicity, Ive instead used the entire dataset for training. After all, my dataset has only eight records; its much too small to divide three sets! With a large dataset, this three-way split would be more appropriate.</p>
<p>Using such a small dataset risks the model <strong>overfitting</strong> the data, however: the model becomes so tuned to the specific peculiarities of the training data that its much less effective when working with new, unseen data. The main reason to use a validation set is to monitor the model during the training process. As training progresses, if the models accuracy improves on the training data but deteriorates on the validation data, its a strong indicator that overfitting might be occurring. (The testing set is reserved strictly for the final evaluation, one more chance after training is complete to gauge the models performance.)</p>
<p>For more realistic scenarios, ml5.js provides a way to split up the data, as well as automatic features for employing validation data. If youre inclined to go further, <a href="http://ml5js.org/">you can explore the full set of neural network examples on the ml5.js website</a>.</p>
<h3 id="tuning-the-parameters">Tuning the Parameters</h3>
<p>After the evaluation step, theres typically an iterative process of adjusting hyperparameters and going through training again to achieve the best performance from the model. While ml5.js offers capabilities for parameter tuning (which you can learn about in the librarys reference), it isnt really geared toward making low-level, fine-grained adjustments to a model. Using TensorFlow.js directly might be your best bet if you want to explore this step in more detail, since it offers a broader suite of tools and allows for lower-level control over the training process.</p>
<p>In this case, tuning the parameters isnt strictly necessary. The graph in the Visor shows a loss all the way down at 0.1, which is plenty accurate for my purposes. Im happy to move on.</p>
<h3 id="deploying-the-model">Deploying the Model</h3>
<p>Its finally time to deploy the model and see the payoff of all that hard work. This typically involves integrating the model into a separate application to make predictions or decisions based on new, previously unseen data. For this, ml5.js offers the convenience of a <code>save()</code> function to download the trained model to a file from one sketch and a <code>load()</code> function to load it for use in a completely different sketch. This saves you from having to retrain the model from scratch every single time you need it.</p>
<p>While a model would typically be deployed to a different sketch from the one where it was trained, Im going to deploy the model in the same sketch for the sake of simplicity. In fact, once the training process is complete, the resulting model is, in essence, already deployed in the current sketch. Its saved in the <code>classifier</code> variable and can be used to make predictions by passing the model new data through the <code>classify()</code> method. The shape of the data sent to <code>classify()</code> should match that of the input data used in training—in this case, two floating-point numbers, representing the x- and y-components of a direction vector:</p>
<pre class="codesplit" data-code-language="javascript">// Manually create a vector.
let direction = createVector(1, 0);
// Convert the x- and y-components into an input array.
let inputs = [direction.x, direction.y];
// Ask the model to classify the inputs.
classifier.classify(inputs, gotResults);</pre>
<p>The second argument to <code>classify()</code> is another callback function for accessing the results:</p>
<pre class="codesplit" data-code-language="javascript">function gotResults(results) {
console.log(results);
}</pre>
<p>The models prediction arrives in the argument to the callback, which Im calling <code>results</code> in the code. Inside, youll find an array of the possible labels, sorted by <strong>confidence</strong>, a probability value that the model assigns to each label. These probabilities represent how sure the model is of that particular prediction. They range from 0 to 1, with values closer to 1 indicating higher confidence and values near 0 suggesting lower confidence:</p>
<pre class="codesplit" data-code-language="json">[
{
"label": "right",
"confidence": 0.9669702649116516
},
{
"label": "up",
"confidence": 0.01878807507455349
},
{
"label": "down",
"confidence": 0.013948931358754635
},
{
"label": "left",
"confidence": 0.00029277068097144365
}
]</pre>
<p>In this example output, the model is highly confident (approximately 96.7 percent) that the correct label is <code>"right"</code>, while it has minimal confidence (0.03 percent) in the <code>"left"</code> label. The confidence values are normalized and add up to 100 percent.</p>
<p>All that remains now is to fill out the sketch with code so the model can receive live input from the mouse. The first step is to signal the completion of the training process so the user knows the model is ready. Ill include a global <code>status</code> variable to track the training process and ultimately display the predicted label on the canvas. The variable is initialized to <code>"training"</code> but updated to <code>"ready"</code> through the <code>finishedTraining()</code> callback:</p>
<pre class="codesplit" data-code-language="javascript">// When the sketch starts, it will show a status of <code>training</code>.
let status = "training";
function draw() {
background(255);
textAlign(CENTER, CENTER);
textSize(64);
text(status, width / 2, height / 2);
}
// This is the callback for when training is complete, and the message changes to <code>ready</code>.
function finishedTraining() {
status = "ready";
}</pre>
<p>Finally, Ill use p5.jss mouse functions to build a vector while the mouse is being dragged and call <code>classifier.classify()</code> on that vector when the mouse is clicked.</p>
<div data-type="example">
<h3 id="example-102-gesture-classifier">Example 10.2: Gesture Classifier</h3>
<figure>
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/SbfSv_GhM" data-example-path="examples/10_nn/10_2_gesture_classifier"><img src="examples/10_nn/10_2_gesture_classifier/screenshot.png"></div>
<figcaption></figcaption>
</figure>
</div>
<pre class="codesplit" data-code-language="javascript">// Store the start of a gesture when the mouse is pressed.
function mousePressed() {
start = createVector(mouseX, mouseY);
}
// Update the end of a gesture as the mouse is dragged.
function mouseDragged() {
end = createVector(mouseX, mouseY);
}
// The gesture is complete when the mouse is released.
function mouseReleased() {
// Calculate and normalize a direction vector.
let dir = p5.Vector.sub(end, start);
dir.normalize();
// Convert to an input array and classify.
let inputs = [dir.x, dir.y];
classifier.classify(inputs, gotResults);
}
// Store the resulting label in the <code>status</code> variable for showing in the canvas.
function gotResults(error, results) {
status = results[0].label;
}</pre>
<p>Since the <code>results</code> array is sorted by confidence, if I just want to use a single label as the prediction, I can access the first element of the array with <code>results[0].label</code>, as in the <code>gotResults()</code> function in Example 10.2. This label is passed to the <code>status</code> variable to be displayed on the canvas.</p>
<div data-type="exercise">
<h3 id="exercise-105">Exercise 10.5</h3>
<p>Divide Example 10.2 into three sketches: one for collecting data, one for training, and one for deployment. Use the <code>ml5.neuralNetwork</code> functions <code>save()</code> and <code>load()</code> for saving and loading the model to and from a file, respectively.</p>
</div>
<div data-type="exercise">
<h3 id="exercise-106">Exercise 10.6</h3>
<p>Expand the gesture-recognition model to classify a sequence of vectors, capturing more accurately the path of a longer mouse movement. Remember, your input data must have a consistent shape, so youll have to decide how many vectors to use to represent a gesture and store no more and no less for each data point. While this approach can work, other machine learning models (such as recurrent neural networks) are specifically designed to handle sequential data and might offer more flexibility and potential accuracy.</p>
</div>
<div data-type="exercise">
<h3 id="exercise-107">Exercise 10.7</h3>
<p>One of the pretrained models in ml5.js is called <em>Handpose</em>. The input of the model is an image, and the prediction is a list of 21 key points—x- and y-positions, also known as <em>landmarks</em>—that describe a hand.</p>
<figure>
<img src="images/10_nn/10_nn_24.png" alt="">
<figcaption></figcaption>
</figure>
<p>Can you use the outputs of the <code>ml5.handpose()</code> model as the inputs to an <code>ml5.neuralNetwork()</code> and classify various hand gestures (like a thumbs-up or thumbs-down)? For hints, you can watch my <a href="https://thecodingtrain.com/pose-classifier">video tutorial that walks you through this process for body poses in the machine learning track on the Coding Train website</a>.</p>
</div>
<div data-type="project">
<h3 id="the-ecosystem-project-11">The Ecosystem Project</h3>
<p>Incorporate machine learning into your ecosystem to enhance the behavior of creatures. How could classification or regression be applied?</p>
<ul>
<li>Can you classify the creatures of your ecosystem into multiple categories? What if you use an initial population as a training dataset, and as new creatures are born, the system classifies them according to their features? What are the inputs and outputs for your system?</li>
<li>Can you use a regression to predict the life span of a creature based on its properties? Think about how size and speed affected the life span of the bloops from <a href="/genetic-algorithms#">Chapter 9</a>. Could you analyze how well the regression models predictions align with the actual outcomes?</li>
</ul>
<figure>
<img src="images/10_nn/10_nn_25.png" alt="">
<figcaption></figcaption>
</figure>
</div>
<p></p>
</section>