mirror of
https://github.com/nature-of-code/noc-book-2
synced 2024-11-17 07:49:05 +01:00
1648 lines
No EOL
151 KiB
HTML
1648 lines
No EOL
151 KiB
HTML
<section data-type="chapter">
|
||
<h1 id="chapter-10-neural-networks">Chapter 10. Neural Networks</h1>
|
||
<blockquote data-type="epigraph">
|
||
<p>“The human brain has 100 billion neurons, each neuron connected to 10 thousand other neurons. Sitting on your shoulders is the most complicated object in the known universe.”</p>
|
||
<p>— Michio Kaku</p>
|
||
</blockquote>
|
||
<p>I began with inanimate objects living in a world of forces, and gave them desires, autonomy, and the ability to take action according to a system of rules. Next, I allowed those objects, now called creatures, to live in a population and evolve over time. Now I’d like to ask: What is each creature’s decision-making process? How can it adjust its choices by learning over time? Can a computational entity process its environment and generate a decision?</p>
|
||
<p>The human brain can be described as a biological neural network—an interconnected web of neurons transmitting elaborate patterns of electrical signals. Dendrites receive input signals and, based on those inputs, fire an output signal via an axon. Or something like that. How the human brain actually works is an elaborate and complex mystery, one that I certainly am not going to attempt to tackle in rigorous detail in this chapter.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_1.png" alt="Figure 10.1 An illustration of a neuron with dendrites and an axon connected to another neuron.">
|
||
<figcaption>Figure 10.1 An illustration of a neuron with dendrites and an axon connected to another neuron.</figcaption>
|
||
</figure>
|
||
<p>The good news is that developing engaging animated systems with code does not require scientific rigor or accuracy, as you've learned throughout this book. You can simply be inspired by the idea of brain function.</p>
|
||
<p>In this chapter, I'll begin with a conceptual overview of the properties and features of neural networks and build the simplest possible example of one (a network that consists of a single neuron). I’ll then introduce you to more complex neural networks using the ml5.js library. Finally, I'll cover “neuroevolution”, a technique that combines genetic algorithms with neural networks to create a “Brain” object that can be inserted into the <code>Vehicle</code> class and used to calculate steering.</p>
|
||
<h2 id="artificial-neural-networks-introduction-and-application">Artificial Neural Networks: Introduction and Application</h2>
|
||
<p>Computer scientists have long been inspired by the human brain. In 1943, Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, developed the first conceptual model of an artificial neural network. In their paper, "A logical calculus of the ideas immanent in nervous activity,” they describe the concept of a neuron, a single cell living in a network of cells that receives inputs, processes those inputs, and generates an output.</p>
|
||
<p>Their work, and the work of many scientists and researchers that followed, was not meant to accurately describe how the biological brain works. Rather, an <em>artificial</em> neural network (hereafter referred to as a “neural network”) was designed as a computational model based on the brain to solve certain kinds of problems.</p>
|
||
<p>It’s probably pretty obvious to you that there are problems that are incredibly simple for a computer to solve, but difficult for you. Take the square root of 964,324, for example. A quick line of code produces the value 982, a number your computer computed in less than a millisecond. There are, on the other hand, problems that are incredibly simple for you or me to solve, but not so easy for a computer. Show any toddler a picture of a kitten or puppy and they’ll be able to tell you very quickly which one is which. Say “hello” and shake my hand one morning and you should be able to pick me out of a crowd of people the next day. But need a machine to perform one of these tasks? Scientists have already spent entire careers researching and implementing complex solutions.</p>
|
||
<p>The most prevalent use of neural networks in computing today involves these “easy-for-a-human, difficult-for-a-machine” tasks known as pattern recognition. These encompass a wide variety of problem areas, where the aim is to detect, interpret, and classify data. This includes everything from identifying objects in images, recognizing spoken words, understanding and generating human-like text, and even more complex tasks such as predicting your next favorite song or movie, teaching a machine to win at complex games, and detecting unusual cyber activities.</p>
|
||
<div class="half-width-right">
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_2.png" alt="Figure 10.2: A neural network is a system of neurons and connections.">
|
||
<figcaption>Figure 10.2: A neural network is a system of neurons and connections.</figcaption>
|
||
</figure>
|
||
</div>
|
||
<p>One of the key elements of a neural network is its ability to learn. A neural network is not just a complex system, but a complex <strong>adaptive</strong> system, meaning it can change its internal structure based on the information flowing through it. Typically, this is achieved through the adjusting of weights. In the diagram above, each line represents a connection between two neurons and indicates the pathway for the flow of information. Each connection has a <strong>weight</strong>, a number that controls the signal between the two neurons. If the network generates a “good” output (which I'll define later), there is no need to adjust the weights. However, if the network generates a “poor” output—an error, so to speak—then the system adapts, altering the weights in order to improve subsequent results.</p>
|
||
<p>There are several strategies for learning, and I'll examine two of them in this chapter.</p>
|
||
<ul>
|
||
<li><strong>Supervised Learning</strong> —Essentially, a strategy that involves a teacher that is smarter than the network itself. For example, let’s take the facial recognition example. The teacher shows the network a bunch of faces, and the teacher already knows the name associated with each face. The network makes its guesses, then the teacher provides the network with the answers. The network can then compare its answers to the known “correct” ones and make adjustments according to its errors. Our first neural network in the next section will follow this model.</li>
|
||
<li><strong>Unsupervised Learning</strong> —Required when there isn’t an example data set with known answers. Imagine searching for a hidden pattern in a data set. An application of this is clustering, i.e. dividing a set of elements into groups according to some unknown pattern. I won’t be showing at any examples of unsupervised learning in this chapter, as this strategy is less relevant for the examples in this book.</li>
|
||
<li><strong>Reinforcement Learning</strong> —A strategy built on observation. Think of a little mouse running through a maze. If it turns left, it gets a piece of cheese; if it turns right, it receives a little shock. (Don’t worry, this is just a pretend mouse.) Presumably, the mouse will learn over time to turn left. Its neural network makes a decision with an outcome (turn left or right) and observes its environment (yum or ouch). If the observation is negative, the network can adjust its weights in order to make a different decision the next time. Reinforcement learning is common in robotics. At time <code>t</code>, the robot performs a task and observes the results. Did it crash into a wall or fall off a table? Or is it unharmed? I'll showcase how reinforcement learning works in the context of our simulated steering vehicles.</li>
|
||
</ul>
|
||
<p>Reinforcement learning comes in many variants and styles. In this chapter, while I will lay the groundwork of neural networks using supervised learning, my primary focus will be a technique related to reinforcement learning known as <em>neuroevolution</em>. This method builds upon the code from chapter 9 and "evolves" the weights (and in some cases, the structure itself) of a neural network over generations of "trial and error" learning. It is especially effective in environments where the learning rules are not precisely defined or the task is complex with numerous potential solutions. And yes, it can indeed be applied to simulated steering vehicles!</p>
|
||
<p>A neural network itself is a “connectionist” computational system. The computational systems I have been writing in this book are procedural; a program starts at the first line of code, executes it, and goes on to the next, following instructions in a linear fashion. A true neural network does not follow a linear path. Rather, information is processed collectively, in parallel throughout a network of nodes (the nodes, in this case, being neurons).</p>
|
||
<p>Here I am showing yet another example of a complex system, much like the ones seen throughout this book. Remember how the individual boids in a flocking system, following only three rules—separation, alignment, cohesion, created complex behaviors? The individual elements of a neural network network are equally simple to understand. They read an input, a number, process it, and generate an output, another number. A network of many neurons, however, can exhibit incredibly rich and intelligent behaviors, echoing the complex dynamics seen in a flock of boids.</p>
|
||
<p>This ability of a neural network to learn, to make adjustments to its structure over time, is what makes it so useful in the field of artificial intelligence. Here are some standard uses of neural networks in software today.</p>
|
||
<ul>
|
||
<li><strong>Pattern Recognition</strong> — As I’ve discussed, this is one of the most common applications, with examples that range from facial recognition and optical character recognition to more complex tasks like gesture recognition.</li>
|
||
<li><strong>Time Series Prediction and Anomaly Detection</strong> — Neural networks are utilized both in forecasting, such as predicting stock market trends or weather patterns, and in recognizing anomalies, which can be applied to areas like cyberattack detection and fraud prevention.</li>
|
||
<li><strong>Natural Language Processing (or “NLP” for short)</strong> — One of the biggest developments in recent years has been the use of neural networks for processing and understanding human language. They are used in various tasks including machine translation, sentiment analysis, text summarization, and are the underlying technology behind many digital assistants and chat bots.</li>
|
||
<li><strong>Signal Processing and Soft Sensors</strong> — Neural networks play a crucial role in devices like cochlear implants and hearing aids by filtering noise and amplifying essential sounds. They're also involved in 'soft sensor' scenarios, where they process data from multiple sources to give a comprehensive analysis of the environment.</li>
|
||
<li><strong>Control and Adaptive Decision-Making Systems</strong> — These applications range from autonomous systems like self-driving cars and drones, to adaptive decision-making used in game playing, pricing models, and recommendation systems on media platforms.</li>
|
||
<li><strong>Generative Models</strong> — The rise of novel neural network architectures has made it possible to generate new content. They are used for synthesizing images, enhancing image resolution, style transfer between images, and even generating music and video.</li>
|
||
</ul>
|
||
<p>This is by no means a comprehensive list of applications of neural networks. But hopefully it gives you an overall sense of the features and possibilities. Today, leveraging machine learning in creative coding and interactive media is not only feasible, but increasingly common. Two libraries that you may want to consider exploring further for working with neural networks are tensorflow.js and ml5.js. TensorFlow.js<strong> </strong>is an open-source library that lets you define, train, and run machine learning models in JavaScript. It's part of the TensorFlow ecosystem, which is maintained and developed by by Google. ml5.js is a library built on top of tensorflow.js designed specifically for use with p5.js. It’s goal is to be beginner friendly and make machine learning approachable for a braod audience of artists, creative coders, and students.</p>
|
||
<p>One of the more common things to do with tensorflow.js and ml5.js is to use something known as a “pre-trained model.” A “model” in machine learning is a specific setup of neurons and connections and a “pre-trained” model is one that has already been trained on a dataset for a particular task. It can be used “as is” or as a starting point for additional learning (commonly referred to as “transfer learning”).</p>
|
||
<p>Examples of popular pretrained models are ones that can classify images, identify body poses, recognize facial landmarks or hand positions, or even analyze the sentiment expressed in a text. Covering the full gamit of possibilities in this rapidly expanding and evolving space probably merits an entire additional book, maybe a series of books. And by the time that book was printed it would probably be out of date.</p>
|
||
<p>So instead, for me, as I embark on this last hurrah in the nature of code, I’ll stick to just two things. First, I’ll look at how to build the simplest of all neural networks from scratch using only p5.js. The goal is to gain an understanding of how the concepts of neural networks and machine learning are implemented in code. Second, I’ll explore one library, specifically ml5.js, which offers the ability to create more sophisticated neural network models and use them to drive simulated vehicles.</p>
|
||
<h2 id="the-perceptron">The Perceptron</h2>
|
||
<p>Invented in 1957 by Frank Rosenblatt at the Cornell Aeronautical Laboratory, a perceptron is the simplest neural network possible: a computational model of a single neuron. A perceptron consists of one or more inputs, a processor, and a single output.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_3.png" alt="Figure 10.3: A simple perceptron with two inputs and one output.">
|
||
<figcaption>Figure 10.3: A simple perceptron with two inputs and one output.</figcaption>
|
||
</figure>
|
||
<p>A perceptron follows the “feed-forward” model, meaning inputs are sent into the neuron, are processed, and result in an output. In the diagram above, this means the network (one neuron) reads from left to right: inputs come in, output goes out.</p>
|
||
<p>Let’s follow each of these steps in more detail.</p>
|
||
<p><span class="highlight">Step 1: Receive inputs.</span></p>
|
||
<p>Say I have a perceptron with two inputs—let’s call them <em>x0</em> and <em>x1</em>.</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>Input</th>
|
||
<th>Value</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>x0</td>
|
||
<td>12</td>
|
||
</tr>
|
||
<tr>
|
||
<td>x1</td>
|
||
<td>4</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p><span class="highlight">Step 2: Weight inputs.</span></p>
|
||
<p>Each input sent into the neuron must first be weighted, meaning it is multiplied by some value, often a number between -1 and 1. When creating a perceptron, the inputs are typically assigned random weights. Let’s give the example inputs the following weights:</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>Weight</th>
|
||
<th>Value</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>w0</td>
|
||
<td>0.5</td>
|
||
</tr>
|
||
<tr>
|
||
<td>w1</td>
|
||
<td>-1</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>The next step is each input and multiply it by its weight.</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>Weight</th>
|
||
<th>Input</th>
|
||
<th>Weight <span data-type="equation">\times</span> Input</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>12</td>
|
||
<td>0.5</td>
|
||
<td>6</td>
|
||
</tr>
|
||
<tr>
|
||
<td>4</td>
|
||
<td>-1</td>
|
||
<td>-4</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p><span class="highlight">Step 3: Sum inputs.</span></p>
|
||
<p>The weighted inputs are then summed.</p>
|
||
<p><span data-type="equation">6 + -4 = 2</span></p>
|
||
<p><span class="highlight">Step 4: Generate output.</span></p>
|
||
<p>The output of a perceptron is produced by passing the sum through an activation function. Think about a “binary” output, one that is only “off” or “on” like an LED. In this case, the activation function determines whether the perceptron should "fire" or not. If it fires, the light turns on; otherwise, it remains off.</p>
|
||
<p>Activation functions can get a little bit hairy. If you start reading about activation functions in artificial intelligence textbooks, you may find yourself reaching for a calculus textbook. However, with your new friend the simple perceptron, there’s an easy option which demonstrates the concept. Let’s make the activation function the sign of the sum. In other words, if the sum is a positive number, the output is 1; if it is negative, the output is -1.</p>
|
||
<p><span data-type="equation">\text{sign}(2) = +1</span></p>
|
||
<p>Let’s review and condense these steps and translate them into code.</p>
|
||
<p><strong>The Perceptron Algorithm:</strong></p>
|
||
<ol>
|
||
<li>For every input, multiply that input by its weight.</li>
|
||
<li>Sum all of the weighted inputs.</li>
|
||
<li>Compute the output of the perceptron based on that sum passed through an activation function (the sign of the sum).</li>
|
||
</ol>
|
||
<p>I can start writing this algorithm in code using two arrays of values, one for the inputs and the weights.</p>
|
||
<pre class="codesplit" data-code-language="javascript">let inputs = [12 , 4];
|
||
let weights = [0.5, -1];</pre>
|
||
<p>Step #1 "for every input" implies a loop that multiplies each input by its corresponding weight. To obtain the sum, the results can be added up in that same loop.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Steps 1 and 2: Add up all the weighted inputs.
|
||
let sum = 0;
|
||
for (let i = 0; i < inputs.length; i++) {
|
||
sum += inputs[i] * weights[i];
|
||
}</pre>
|
||
<p>With the sum, I can then compute the output.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Step 3: Passing the sum through an activation function
|
||
let output = activate(sum);
|
||
|
||
// The activation function
|
||
function activate(sum) {
|
||
//{!5} Return a 1 if positive, -1 if negative.
|
||
if (sum > 0) {
|
||
return 1;
|
||
} else {
|
||
return -1;
|
||
}
|
||
}</pre>
|
||
<h3 id="simple-pattern-recognition-using-a-perceptron">Simple Pattern Recognition Using a Perceptron</h3>
|
||
<p>Now that I have explained the computational process of a perceptron, let's take a look at an example of one in action. As I mentioned earlier, neural networks are commonly used for pattern recognition applications, such as facial recognition. Even simple perceptrons can demonstrate the fundamentals of classification. Let’s demonstrate with the following scenario.</p>
|
||
<div class="half-width-right">
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_4.png" alt="Figure 10.4: A collection of points in two dimensional space divided by a line.">
|
||
<figcaption>Figure 10.4: A collection of points in two dimensional space divided by a line.</figcaption>
|
||
</figure>
|
||
</div>
|
||
<p>Consider a line in two-dimensional space. Points in that space can be classified as living on either one side of the line or the other. While this is a somewhat silly example (since there is clearly no need for a neural network; on which side a point lies can be determined with some simple algebra), it shows how a perceptron can be trained to recognize points on one side versus another.</p>
|
||
<p>Let’s say a perceptron has 2 inputs: <span data-type="equation">x,y</span> coordinates of a point). When using a sign activation function, the output will either be -1 or 1. The input data are classified according to the sign of the output, the weighted sum of inputs. In the above diagram, you can see how each point is either below the line (-1) or above (+1).</p>
|
||
<p>The perceptron itself can be diagrammed as follows. In machine learning <span data-type="equation">x</span>’s are typically the notation for inputs and <span data-type="equation">y</span> is typically the notation for an output. To keep this convention I’ll note in the diagram the inputs as <span data-type="equation">x_0</span> and <span data-type="equation">x_1</span>. <span data-type="equation">x_0</span> will correspond to the x cooordinate and <span data-type="equation">x_1</span> to the y. I name the output simply “<span data-type="equation">\text{output}</span>”.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_5.png" alt="Figure 10.5 A perceptron with two inputs (x_0 and x_1), a weight for each input (\text{weight}_0 and \text{weight}_1) as well as a processing neuron that generates the output.">
|
||
<figcaption>Figure 10.5 A perceptron with two inputs (<span data-type="equation">x_0</span> and <span data-type="equation">x_1</span>), a weight for each input (<span data-type="equation">\text{weight}_0</span> and <span data-type="equation">\text{weight}_1</span>) as well as a processing neuron that generates the output.</figcaption>
|
||
</figure>
|
||
<p>There is a pretty significant problem in Figure 10.5, however. Let’s consider the point <span data-type="equation">(0,0)</span>. What if I send this point into the perceptron as its input: <span data-type="equation">x_0 = 0</span> and <span data-type="equation">x_1=1</span>? What will the sum of its weighted inputs be? No matter what the weights are, the sum will always be 0! But this can’t be right—after all, the point <span data-type="equation">(0,0)</span> could certainly be above or below various lines in this two-dimensional world.</p>
|
||
<p>To avoid this dilemma, the perceptron requires a third input, typically referred to as a <strong>bias</strong> input. A bias input always has the value of 1 and is also weighted. Here is the perceptron with the addition of the bias:</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_6.png" alt="Figure 10.6: Adding a “bias” input, along with its weight to the Perceptron.">
|
||
<figcaption>Figure 10.6: Adding a “bias” input, along with its weight to the Perceptron.</figcaption>
|
||
</figure>
|
||
<p>Let’s go back to the point <span data-type="equation">(0,0)</span>.</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>input value</th>
|
||
<th>weight</th>
|
||
<th>result</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>0</td>
|
||
<td><span data-type="equation">w_0</span></td>
|
||
<td>0</td>
|
||
</tr>
|
||
<tr>
|
||
<td>0</td>
|
||
<td><span data-type="equation">w_1</span></td>
|
||
<td>0</td>
|
||
</tr>
|
||
<tr>
|
||
<td>1</td>
|
||
<td><span data-type="equation">w_\text{bias}</span></td>
|
||
<td><span data-type="equation">w_\text{bias}</span></td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>The output is then the sum of the above three results: <span data-type="equation">0 + 0 + w_\text{bias}</span>. Therefore, the bias, by itself, answers the question of where <span data-type="equation">(0,0)</span> is in relation to the line. If the bias's weight is positive, then <span data-type="equation">(0,0)</span> is above the line; if negative, it is below. Its weight <strong>biases</strong> the perceptron's understanding of the line's position relative to <span data-type="equation">(0,0)</span>!</p>
|
||
<h3 id="coding-the-perceptron">Coding the Perceptron</h3>
|
||
<p>I am now ready to assemble the code for a <code>Perceptron</code> class. The perceptron only needs to track the input weights, which I can store using an array.</p>
|
||
<pre class="codesplit" data-code-language="javascript">class Perceptron {
|
||
constructor() {
|
||
this.weights = [];
|
||
}</pre>
|
||
<p>The constructor could receive an argument indicating the number of inputs (in this case three: <span data-type="equation">x_0</span>, <span data-type="equation">x_1</span>, and a bias) and size the array accordingly.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> // The argument "n" determines the number of inputs (including the bias)
|
||
constructor(n) {
|
||
this.weights = [];
|
||
for (let i = 0; i < n; i++) {
|
||
//{!1} The weights are picked randomly to start.
|
||
this.weights[i] = random(-1, 1);
|
||
}
|
||
}</pre>
|
||
<p>A perceptron’s job is to receive inputs and produce an output. These requirements can be packaged together in a <code>feedForward()</code> function. In this example, the perceptron's inputs are an array (which should be the same length as the array of weights), and the output is an a number, <span data-type="equation">+1</span> or <span data-type="equation">-1</span>, depending on the sign as returned by the activation function.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> feedForward(inputs) {
|
||
let sum = 0;
|
||
for (let i = 0; i < this.weights.length; i++) {
|
||
sum += inputs[i] * this.weights[i];
|
||
}
|
||
//{!1} Result is the sign of the sum, -1 or +1.
|
||
// Here the perceptron is making a guess.
|
||
// Is it on one side of the line or the other?
|
||
return this.activate(sum);
|
||
}</pre>
|
||
<p>I’ll note that the name of the function "feed forward" in this context comes from a commonly used term in neural networks to describe the process data passing through the network. This name relates to the way the data <em>feeds</em> directly <em>forward</em> through the network, read from left to right in a neural network diagram.</p>
|
||
<p>Presumably, I could now create a <code>Perceptron</code> object and ask it to make a guess for any given point.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_7.png" alt="Figure 10.7: An xy coordinate from the two-dimensional space is the input to the perceptron. ">
|
||
<figcaption>Figure 10.7: An <span data-type="equation">xy</span> coordinate from the two-dimensional space is the input to the perceptron.</figcaption>
|
||
</figure>
|
||
<pre class="codesplit" data-code-language="javascript">// Create the Perceptron.
|
||
let perceptron = new Perceptron(3);
|
||
// The input is 3 values: x, y, and bias.
|
||
let inputs = [50, -12, 1];
|
||
// The answer!
|
||
let guess = perceptron.feedForward(inputs);</pre>
|
||
<p>Did the perceptron get it right? At this point, the perceptron has no better than a 50/50 chance of arriving at the right answer. Remember, when I created it, I gave each weight a random value. A neural network is not a magic tool that can guess things correctly on its own. I need to teach it how to do so!</p>
|
||
<p>To train a neural network to answer correctly, I will use the method of <em>supervised learning</em>, which I described in section 10.1. In this method, the network is provided with inputs for which there is a known answer. This enables the network to determine if it has made a correct guess. If it is incorrect, the network can learn from its mistake and adjust its weights. The process is as follows:</p>
|
||
<ol>
|
||
<li>Provide the perceptron with inputs for which there is a known answer.</li>
|
||
<li>Ask the perceptron to guess an answer.</li>
|
||
<li>Compute the error. (Did it get the answer right or wrong?)</li>
|
||
<li>Adjust all the weights according to the error.</li>
|
||
<li>Return to Step 1 and repeat!</li>
|
||
</ol>
|
||
<p>Steps 1 through 4 can be packaged into a function. Before I can write the entire function, however, I need to examine Steps 3 and 4 in more detail. How do I define the perceptron’s error? And how should I adjust the weights according to this error?</p>
|
||
<p>The perceptron’s error can be defined as the difference between the desired answer and its guess.</p>
|
||
<div data-type="equation">\text{error} = \text{desired output} - \text{guess output}</div>
|
||
<p>Does the above formula look familiar to you? Maybe you are thinking what I’m thinking? What was that formula for a steering force again?</p>
|
||
<div data-type="equation">\text{steering} = \text{desired velocity} - \text{current velocity}</div>
|
||
<p>This is also a calculation of an error! The current velocity serves as a guess, and the error (the steering force) indicates how to adjust the velocity in the correct direction. In a moment, you will see how adjusting a vehicle's velocity to follow a target is similar to adjusting the weights of a neural network to arrive at the correct answer.</p>
|
||
<p>In the case of the perceptron, the output has only two possible values: <span data-type="equation">+1</span> or <span data-type="equation">-1</span>. This means there are only three possible errors.</p>
|
||
<p>If the perceptron guesses the correct answer, then the guess equals e the desired output and the error is 0. If the correct answer is -1 and it guessed +1, then the error is -2. If the correct answer is +1 and it guessed -1, then the error is +2.</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>Desired</th>
|
||
<th>Guess</th>
|
||
<th>Error</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><span data-type="equation">-1</span></td>
|
||
<td><span data-type="equation">-1</span></td>
|
||
<td><span data-type="equation">0</span></td>
|
||
</tr>
|
||
<tr>
|
||
<td><span data-type="equation">-1</span></td>
|
||
<td><span data-type="equation">+1</span></td>
|
||
<td><span data-type="equation">-2</span></td>
|
||
</tr>
|
||
<tr>
|
||
<td><span data-type="equation">+1</span></td>
|
||
<td><span data-type="equation">-1</span></td>
|
||
<td><span data-type="equation">+2</span></td>
|
||
</tr>
|
||
<tr>
|
||
<td><span data-type="equation">+1</span></td>
|
||
<td><span data-type="equation">+1</span></td>
|
||
<td><span data-type="equation">0</span></td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>The error is the determining factor in how the perceptron’s weights should be adjusted. For any given weight, what I am looking to calculate is the change in weight, often called <span data-type="equation">\Delta\text{weight}</span> (or “delta” weight, delta being the Greek letter <span data-type="equation">\Delta</span>).</p>
|
||
<div data-type="equation">\text{new weight} = \text{weight} + \Delta\text{weight}</div>
|
||
<p><span data-type="equation">\Delta\text{weight}</span> is calculated as the error multiplied by the input.</p>
|
||
<div data-type="equation">\Delta\text{weight} = \text{error} \times \text{input}</div>
|
||
<p>Therefore:</p>
|
||
<div data-type="equation">\text{new weight} = \text{weight} + \text{error} \times \text{input}</div>
|
||
<p>To understand why this works, I will again return to steering. A steering force is essentially an error in velocity. By applying a steering force as an acceleration (or <span data-type="equation">\Delta\text{velocity}</span>), then the velocity is adjusted to move in the correct direction. This is what I want to do with the neural network’s weights. I want to adjust them in the right direction, as defined by the error.</p>
|
||
<p>With steering, however, I had an additional variable that controlled the vehicle’s ability to steer: the <em>maximum force</em>. A high maximum force allowed the vehicle to accelerate and turn quickly, while a lower force resulted in a slower velocity adjustment. The neural network will use a similar strategy with a variable called the "learning constant."</p>
|
||
<div data-type="equation">\text{new weight} = \text{weight} + (\text{error} \times \text{input}) \times \text{learning constant}</div>
|
||
<p>Note that a high learning constant causes the weight to change more drastically. This may help the perceptron arrive at a solution more quickly, but it also increases the risk of overshooting the optimal weights. A small learning constant, however, will adjust the weights slowly and require more training time, but allow the network to make small adjustments that could improve overall accuracy.</p>
|
||
<p>Assuming the addition of a <code>this.learningConstant</code> property to the <code>Perceptron</code>class, , I can now write a training function for the perceptron following the above steps.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Step 1: Provide the inputs and known answer.
|
||
// These are passed in as arguments to train().
|
||
train(inputs, desired) {
|
||
|
||
// Step 2: Guess according to those inputs.
|
||
let guess = this.feedforward(inputs);
|
||
|
||
// Step 3: Compute the error (difference between desired and guess).
|
||
let error = desired - guess;
|
||
|
||
//{!3} Step 4: Adjust all the weights according to the error and learning constant.
|
||
for (let i = 0; i < this.weights.length; i++) {
|
||
this.weights[i] += error * inputs[i] * this.learningConstant;
|
||
}
|
||
}</pre>
|
||
<p>Here’s the <code>Perceptron</code> class as a whole.</p>
|
||
<pre class="codesplit" data-code-language="javascript">class Perceptron {
|
||
constructor(n) {
|
||
//{!2} The Perceptron stores its weights and learning constants.
|
||
this.weights = [];
|
||
this.learningConstant = 0.01;
|
||
//{!3} Weights start off random.
|
||
for (let i = 0; i < n; i++) {
|
||
this.weights[i] = random(-1,1);
|
||
}
|
||
}
|
||
|
||
//{!7} Return an output based on inputs.
|
||
feedforward(inputs) {
|
||
let sum = 0;
|
||
for (let i = 0; i < this.weights.length; i++) {
|
||
sum += inputs[i] * this.weights[i];
|
||
}
|
||
return this.activate(sum);
|
||
}
|
||
|
||
// Output is a +1 or -1.
|
||
activate(sum) {
|
||
if (sum > 0) {
|
||
return 1;
|
||
} else {
|
||
return -1;
|
||
}
|
||
}
|
||
|
||
//{!7} Train the network against known data.
|
||
train(inputs, desired) {
|
||
let guess = this.feedforward(inputs);
|
||
let error = desired - guess;
|
||
for (let i = 0; i < this.weights.length; i++) {
|
||
this.weights[i] += error * inputs[i] * this.learningConstant;
|
||
}
|
||
}
|
||
}</pre>
|
||
<p>To train the perceptron, I need a set of inputs with a known answer. Now the question becomes, how do I pick a point and know whether it is above or below a line? Let’s start with the formula for a line, where <span data-type="equation">y</span> is calculated as a function of <span data-type="equation">x</span>:</p>
|
||
<div data-type="equation">y = f(x)</div>
|
||
<p>In generic terms, a line can be described as:</p>
|
||
<div data-type="equation">y = ax + b</div>
|
||
<p>Here’s a specific example:</p>
|
||
<div data-type="equation">y = 2x + 1</div>
|
||
<p>I can then write a function with this in mind.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// A function to calculate y based on x along a line
|
||
f(x) {
|
||
return 2 * x + 1;
|
||
}</pre>
|
||
<p>So, if I make up a point:</p>
|
||
<pre class="codesplit" data-code-language="javascript">let x = random(width);
|
||
let y = random(height);</pre>
|
||
<p>How do I know if this point is above or below the line? The line function <span data-type="equation">f(x)</span> returns <span data-type="equation">y</span> value on the line for that <span data-type="equation">x</span> position. Let’s call that <span data-type="equation">y_\text{line}</span>.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// The y position on the line
|
||
let yline = f(x);</pre>
|
||
<p>If the <span data-type="equation">y</span> value I am examining is above the line, it will be less than <span data-type="equation">y_\text{line}</span>.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_8.png" alt="Figure 10.8: If y is less than y_\text{line} then it is above the line. Note this is only true for a p5.js canvas where the y axis points down in the positive direction.">
|
||
<figcaption>Figure 10.8: If <span data-type="equation">y</span> is less than <span data-type="equation">y_\text{line}</span> then it is above the line. Note this is only true for a p5.js canvas where the y axis points down in the positive direction.</figcaption>
|
||
</figure>
|
||
<pre class="codesplit" data-code-language="javascript">// Start with the value of +1
|
||
let desired = 1;
|
||
if (y < yline) {
|
||
//{!1} The answer is -1 if y is above the line.
|
||
desired = -1;
|
||
}</pre>
|
||
<p>I can then make an inputs array to go with the <code>desired</code> output.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Don't forget to include the bias!
|
||
let trainingInputs = [x, y, 1];</pre>
|
||
<p>Assuming that I have a <code>perceptron</code> variable, I can train it by providing the inputs along with the desired answer.</p>
|
||
<pre class="codesplit" data-code-language="javascript">perceptron.train(trainingInputs, desired);</pre>
|
||
<p>Now, it’s important to remember that this is just a demonstration. Remember the Shakespeare-typing monkeys? I asked the genetic algorithm to solve for “to be or not to be”—an answer I already knew. I did this to make sure the genetic algorithm worked properly. The same reasoning applies to this example. I don’t need a perceptron to tell me whether a point is above or below a line; I can do that with simple math. By using an example that I can easily solve without a perceptron, I can both demonstrate the algorithm of the perceptron and verify that it is working properly.</p>
|
||
<p>Let’s look the perceptron trained with with an array of many points.</p>
|
||
<div data-type="example">
|
||
<h3 id="example-101-the-perceptron">Example 10.1: The Perceptron</h3>
|
||
<figure>
|
||
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/sMozIaMCW" data-example-path="examples/10_nn/10_1_perceptron_with_normalization"><img src="examples/10_nn/10_1_perceptron_with_normalization/screenshot.png"></div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript">// The Perceptron
|
||
let perceptron;
|
||
//{!1} 2,000 training points
|
||
let training = [];
|
||
// A counter to track training points one by one
|
||
let count = 0;
|
||
|
||
//{!3} The formula for a line
|
||
function f(x) {
|
||
return 2 * x + 1;
|
||
}
|
||
|
||
function setup() {
|
||
createCanvas(640, 240);
|
||
|
||
// Perceptron has 3 inputs (including bias) and learning rate of 0.01
|
||
perceptron = new Perceptron(3, 0.01);
|
||
|
||
//{!1} Make 1,000 training points.
|
||
for (let i = 0; i < 2000; i++) {
|
||
let x = random(-width / 2,width / 2);
|
||
let y = random(-height / 2,height / 2);
|
||
//{!2} Is the correct answer 1 or -1?
|
||
let desired = 1;
|
||
if (y < f(x)) {
|
||
desired = -1;
|
||
}
|
||
training[i] = {
|
||
input: [x, y, 1],
|
||
output: desired
|
||
};
|
||
}
|
||
}
|
||
|
||
|
||
function draw() {
|
||
background(255);
|
||
translate(width/0wiu2, height/2);
|
||
|
||
ptron.train(training[count].inputs, training[count].answer);
|
||
//{!1} For animation, we are training one point at a time.
|
||
count = (count + 1) % training.length;
|
||
|
||
for (let i = 0; i < count; i++) {
|
||
stroke(0);
|
||
let guess = ptron.feedforward(training[i].inputs);
|
||
//{!2} Show the classification—no fill for -1, black for +1.
|
||
if (guess > 0) noFill();
|
||
else fill(0);
|
||
ellipse(training[i].inputs[0], training[i].inputs[1], 8, 8);
|
||
}
|
||
}</pre>
|
||
<p>Section on Normalizing Here?</p>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-101">Exercise 10.1</h3>
|
||
<p>Instead of using the supervised learning model above, can you train the neural network to find the right weights by using a genetic algorithm?</p>
|
||
</div>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-102">Exercise 10.2</h3>
|
||
<p>Visualize the perceptron itself. Draw the inputs, the processing node, and the output.</p>
|
||
</div>
|
||
<h2 id="its-a-network-remember">It’s a “Network,” Remember?</h2>
|
||
<p>Yes, a perceptron can have multiple inputs, but it is still a lonely neuron. The power of neural networks comes in the networking itself. Perceptrons are, sadly, incredibly limited in their abilities. If you read an AI textbook, it will say that a perceptron can only solve <strong>linearly separable</strong> problems. What’s a linearly separable problem? Let’s take a look at the first example, which determined whether points were on one side of a line or the other.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_9.png" alt="Figure 10.9: One the left a collection of points that is linearly separable. On the right, non-linearly separable data where a curve is required to separate the points.">
|
||
<figcaption>Figure 10.9: One the left a collection of points that is linearly separable. On the right, non-linearly separable data where a curve is required to separate the points.</figcaption>
|
||
</figure>
|
||
<p>On the left of Figure 10.11, is an example of classic linearly separable data. Graph all of the possibilities; if you can classify the data with a straight line, then it is linearly separable. On the right, however, is non-linearly separable data. You can’t draw a straight line to separate the black dots from the gray ones.</p>
|
||
<p>One of the simplest examples of a non-linearly separable problem is <em>XOR</em>, or “exclusive or.” I’m guessing, as someone who works with coding and p5.js, you are familiar with a logical <span data-type="equation">\text{AND}</span>. For <span data-type="equation">A \text{ AND } B</span> to be true, both <span data-type="equation">A</span> and <span data-type="equation">B</span> must be true. With <span data-type="equation">\text{OR}|</span>, either <span data-type="equation">A</span> or <span data-type="equation">B</span> can be true for <span data-type="equation">A \text{ OR } B</span> to evaluate as true. These are both linearly separable problems. Let’s look at the solution space, a “truth table.”</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_10.png" alt="Figure 10.10: Truth tables for AND and OR logical operators, true and false outputs are separated by a line.">
|
||
<figcaption>Figure 10.10: Truth tables for AND and OR logical operators, true and false outputs are separated by a line.</figcaption>
|
||
</figure>
|
||
<p>See how you can draw a line to separate the true outputs from the false ones?</p>
|
||
<p><span data-type="equation">\text{XOR}</span> (”exclusive” or) is the equivalent <span data-type="equation">\text{OR}</span> and <span data-type="equation">\text{NOT AND}</span>. In other words, <span data-type="equation">A \text{ XOR } B</span> only evaluates to true if one of them is true. If both are false or both are true, then we get false. Take a look at the following truth table.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_11.png" alt="Figure 10.11: Truth table for XOR (“exclusive or”), true and false outputs cannot be separated by a single line.">
|
||
<figcaption>Figure 10.11: Truth table for XOR (“exclusive or”), true and false outputs cannot be separated by a single line.</figcaption>
|
||
</figure>
|
||
<p>This is not linearly separable. Try to draw a straight line to separate the true outputs from the false ones—you can’t!</p>
|
||
<p>So perceptrons can’t even solve something as simple as <span data-type="equation">\text{XOR}</span>. But what if we made a network out of two perceptrons? If one perceptron can solve <span data-type="equation">\text{OR}</span> and one perceptron can solve <span data-type="equation">\text{NOT AND}</span>, then two perceptrons combined can solve <span data-type="equation">\text{XOR}</span>.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_12.png" alt="Figure 10.12: A multi-layered perceptron, same inputs and output as the simple Perceptron, but now including a hidden layer of neurons.">
|
||
<figcaption>Figure 10.12: A multi-layered perceptron, same inputs and output as the simple Perceptron, but now including a hidden layer of neurons.</figcaption>
|
||
</figure>
|
||
<p>The above diagram is known as a <em>multi-layered perceptron</em>, a network of many neurons. Some are input neurons and receive the inputs, some are part of what’s called a “hidden” layer (as they are connected to neither the inputs nor the outputs of the network directly), and then there are the output neurons, from which the results are read.</p>
|
||
<p>Training these networks is more complex. With the simple perceptron, you could easily evaluate how to change the weights according to the error. But here there are so many different connections, each in a different layer of the network. How does one know how much each neuron or connection contributed to the overall error of the network?</p>
|
||
<p>The solution to optimizing the weights of a multi-layered network is known as <strong>backpropagation</strong>. In this process, the output of the network is generated in the same manner as a perceptron. The inputs multiplied by the weights are summed and fed forward through the network. The difference here is that they pass through additional layers of neurons before reaching the output. Training the network (i.e. adjusting the weights) also involves taking the error (desired result - guess). The error, however, must be fed backwards through the network. The final error ultimately adjusts the weights of all the connections.</p>
|
||
<p>Backpropagation is beyond the scope of this book and involves a variety of different activation functions (one class example is the “sigmoid” function) as well as some calculus. If you are interested in continuing down this road and learning more about how backpropagation works, you can find <a href="https://github.com/CodingTrain/Toy-Neural-Network-JS">my “toy neural network” project at github.com/CodingTrain</a> with links to accompanying video tutorials. They go through all the steps of solving <span data-type="equation">\text{XOR}</span> using a multi-layered feed forward network with backpropagation. For this chapter, however, I’d like to get some help and phone a friend.</p>
|
||
<h2 id="machine-learning-with-ml5js">Machine Learning with ml5.js</h2>
|
||
<p>That friend is ml5.js. Inspired by the philosophy of p5.js, ml5.js is a JavaScript library that aims to make machine learning accessible to a wide range of artists, creative coders, and students. It is built on top of TensorFlow.js, Google's open-source library that runs machine learning models directly in the browser without the need to install or configure complex environments. TensorFlow.js's low-level operations and highly technical API, however, can be intimidating to beginners. That's where ml5.js comes in, providing a friendly entry point for those who are new to machine learning and neural networks.</p>
|
||
<p>Before I get to my goal of adding a "neural network" brain to a steering agent and tying ml5.js back into the story of the book, I would like to demonstrate step-by-step how to train a neural network model with "supervised learning." There are several key terms and concepts important to cover, namely “classification”, “regression”, “inputs”, and “outputs”. By walking through the full process of a supervised learning scenario, I hope to define these terms, explore other foundational concepts, introduce the syntax of the ml5.js library, and provide the tools to train your first machine learning model with your own data.</p>
|
||
<h3 id="classification-and-regression">Classification and Regression</h3>
|
||
<p>The majority of machine learning tasks fall into one of two categories: classification and regression. Classification is probably the easier of the two to understand at the start. It involves predicting a “label” (or “category” or “class”) for a piece of data. For example, an image classifier might try to guess if a photo is of a cat or a dog and assign the corresponding label.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_13.jpg" alt="Figure 10.13: CAT OR DOG OR BIRD OR MONKEY OR ILLUSTRATIONS ASSIGNED A LABEL???">
|
||
<figcaption>Figure 10.13: <strong><em>CAT OR DOG OR BIRD OR MONKEY OR ILLUSTRATIONS ASSIGNED A LABEL???</em></strong></figcaption>
|
||
</figure>
|
||
<p>This doesn’t happen by magic, however. The model must first be shown many examples of dogs and cats with the correct labels in order to properly configure the weights of all the connections. This is the supervised learning training process.</p>
|
||
<p>The classic “Hello, World” demonstration of machine learning and supervised learning is known as “MNIST”. MNIST, short for “Modified National Institute of Standards and Technology,” is a dataset that was collected and processed by Yann LeCun and Corinna Cortes (AT&T Labs) and Christopher J.C. Burges (Microsoft Research). It is widely used for training and testing in the field of machine learning and consists of 70,000 handwritten digits from 0 to 9, with each one being a 28x28 pixel grayscale image.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_14.png" alt="Figure 10.14 https://commons.wikimedia.org/wiki/File:MnistExamplesModified.png">
|
||
<figcaption>Figure 10.14 <a href="https://commons.wikimedia.org/wiki/File:MnistExamplesModified.png">https://commons.wikimedia.org/wiki/File:MnistExamplesModified.png</a></figcaption>
|
||
</figure>
|
||
<p>While I won't be building a complete MNIST model with ml5.js (you could if you wanted to!), it serves as a canonical example of a training dataset for image classification: 70,000 images each assigned one of 10 possible labels. This idea of a “label” is fundamental to classification, where the output of a model involves a fixed number of discrete options. There are only 10 possible digits that the model can guess, no more and no less. After the data is used to train the model, the goal is to classify new images and assign the appropriate label.</p>
|
||
<p>Regression, on the other hand, is a machine learning task where the prediction is a continuous value, typically a floating point number. A regression problem can involve multiple outputs, but when beginning it’s often simpler to think of it as just one. Consider a machine learning model that predicts the daily electricity usage of a house based on any number of factors like number of occupants, size of house, temperature outside. Here, rather than a goal of the neural network picking from a discrete set of options, it makes more sense for the neural network to guess a number. Will the house use 30.5 kilowatt-hours of energy that day? 48.7 kWh? 100.2 kWh? The output is therefore a continuous value that the model attempts to predict.</p>
|
||
<h3 id="inputs-and-outputs">Inputs and Outputs</h3>
|
||
<p>Once the task has been determined, the next step is to finalize the configuration of inputs and outputs of the neural network. In the case of MNIST, each image is a collection of 28x28 grayscale pixels and each pixel can be represented as a single value (ranging from 0-255). The total pixels is <span data-type="equation">28 \times 28 = 784</span>. The grayscale value of each pixel is an input to the neural network.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_15.jpg" alt="Figure 10.16 Place holder figure (borrowed and adapted from https://ml4a.github.io/ml4a/looking_inside_neural_nets/">
|
||
<figcaption>Figure 10.16 Place holder figure (borrowed and adapted from <a href="https://ml4a.github.io/ml4a/looking_inside_neural_nets/">https://ml4a.github.io/ml4a/looking_inside_neural_nets/</a></figcaption>
|
||
</figure>
|
||
<p>Since there are 10 possible digits 0-9, the output of the neural network is a prediction of one of 10 labels.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_16.png" alt="Figure 10.17 Place holder figure (borrowed and adapted from https://ml4a.github.io/ml4a/looking_inside_neural_nets/">
|
||
<figcaption>Figure 10.17 Place holder figure (borrowed and adapted from <a href="https://ml4a.github.io/ml4a/looking_inside_neural_nets/">https://ml4a.github.io/ml4a/looking_inside_neural_nets/</a></figcaption>
|
||
</figure>
|
||
<p>Consider the regression scenario of predicting the electricity usage of a house. Let’s assume you have a table with the following data:</p>
|
||
<table>
|
||
<tbody>
|
||
<tr>
|
||
<td><strong>Occupants</strong></td>
|
||
<td><strong>Size (m²)</strong></td>
|
||
<td><strong>Temperature Outside (°C)</strong></td>
|
||
<td><strong>Electricity Usage (kWh)</strong></td>
|
||
</tr>
|
||
<tr>
|
||
<td>4</td>
|
||
<td>150</td>
|
||
<td>24</td>
|
||
<td>25.3</td>
|
||
</tr>
|
||
<tr>
|
||
<td>2</td>
|
||
<td>100</td>
|
||
<td>25.5</td>
|
||
<td>16.2</td>
|
||
</tr>
|
||
<tr>
|
||
<td>1</td>
|
||
<td>70</td>
|
||
<td>26.5</td>
|
||
<td>12.1</td>
|
||
</tr>
|
||
<tr>
|
||
<td>4</td>
|
||
<td>120</td>
|
||
<td>23</td>
|
||
<td>22.1</td>
|
||
</tr>
|
||
<tr>
|
||
<td>2</td>
|
||
<td>90</td>
|
||
<td>21.5</td>
|
||
<td>15.2</td>
|
||
</tr>
|
||
<tr>
|
||
<td>5</td>
|
||
<td>180</td>
|
||
<td>20</td>
|
||
<td>24.4</td>
|
||
</tr>
|
||
<tr>
|
||
<td>1</td>
|
||
<td>60</td>
|
||
<td>18.5</td>
|
||
<td>11.7</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>Here in this table, the inputs to the neural network are the first three columns (occupants, size, temperature). The fourth column on the right is what the neural network is expected to guess, or the output.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_17.jpg" alt="Figure 10.18 Possible network architecture for 3 inputs and 1 regression output">
|
||
<figcaption>Figure 10.18 Possible network architecture for 3 inputs and 1 regression output</figcaption>
|
||
</figure>
|
||
<h3 id="setting-up-the-neural-network-with-ml5js">Setting up the Neural Network with ml5.js</h3>
|
||
<p>In a typical machine learning scenario, the next step after establishing the inputs and outputs is to configure the architecture of the neural network. This involves specifying the number of hidden layers between the inputs and outputs, the number of neurons in each layer, which activation functions to use, and more! While all of this is possible with ml5.js, it will make its best guess and design a model for you based on the task and data.</p>
|
||
<p>As demonstrated with Matter.js and toxiclibs.js in chapter 6, you can import the ml5.js library into your <strong>index.html</strong> file.</p>
|
||
<pre class="codesplit" data-code-language="javascript"><script src="https://unpkg.com/ml5@latest/dist/ml5.min.js"></script></pre>
|
||
<p>The ml5.js library is a collection of machine learning models that can be accessed using the syntax <code>ml5.functionName()</code>. For example, to use a pre-trained model that detects hands, you can use <code>ml5.handpose()</code>. For classifying images, you can use <code>ml5.imageClassifier()</code>. While I encourage exploring all that ml5.js has to offer (I will reference some of these pre-trained models in upcoming exercise ideas), for this chapter, I will focus on only one function in ml5.js: <code>ml5.neuralNetwork()</code>, which creates an empty neural network for you to train.</p>
|
||
<p>To create a neural network, you must first create a JavaScript object that will configure the model. While there are many properties that you can set, most of them are optional, as the network will use default values. Let’s begin by specifying the "task" that you intend the model to perform: "regression" or "classification.”</p>
|
||
<pre class="codesplit" data-code-language="javascript">let options = { task: "classification" }
|
||
let classifier = ml5.neuralNetwork(options);</pre>
|
||
<p>This, however, gives ml5.js very little to go on in terms of designing the network architecture. Adding the inputs and outputs will complete the rest of the puzzle for it. In the case of MNIST, there are 784 inputs (grayscale pixel colors) and 10 possible output labels (digits “0” through “9”). This can be configured in ml5.js with a single integer for the number of inputs and an array of strings for the list of output labels.</p>
|
||
<pre class="codesplit" data-code-language="javascript">let options = {
|
||
inputs: 784,
|
||
outputs: ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
|
||
task: "classification",
|
||
};
|
||
let digitClassifier = ml5.neuralNetwork(options);</pre>
|
||
<p>The electricity regression scenario involved 3 input values (occupants, size, temperature) and 1 output value (usage in kWh).</p>
|
||
<pre class="codesplit" data-code-language="javascript">let options = {
|
||
inputs: 3,
|
||
outputs: 1,
|
||
task: "regression",
|
||
};
|
||
let energyPredictor = ml5.neuralNetwork(options);</pre>
|
||
<p>While the MNIST and energy predictor scenarios are useful starting points for understanding how machine learning works, it's important to note that they are simplified versions of what you might encounter in a “real-world” machine learning application. Depending on the problem, there could be significantly higher levels of complexity both in terms of the network architecture and the scale and preparation of data. Instead of a neatly packaged dataset like MNIST, you might be dealing with enormous amounts of messy data. This data might need to be processed and refined before it can be effectively used. You can think of it like organizing, washing, and chopping ingredients before you can start cooking with them.</p>
|
||
<p>The “lifecycle” of a machine learning model is typically broken down into seven steps.</p>
|
||
<ol>
|
||
<li><strong>Data Collection</strong>: Data forms the foundation of any machine learning task. This stage might involve running experiments, manually inputting values, sourcing public data, or a myriad of other methods.</li>
|
||
<li><strong>Data Preparation</strong>: Raw data often isn't in a format suitable for machine learning algorithms. It might also have duplicate or missing values, or contain outliers that skew the data. Such inconsistencies may need to be manually adjusted. Additionally, neural networks work best with “normalized” data. While this term might remind you of normalizing vectors, it's important to understand that it carries a slightly different meaning in the context of data preparation. A “normalized” vector’s length is set to a fixed value, usually 1, with the direction intact. However, data normalized for machine learning involves adjusting the values so that they fit within a specific range, generally between 0 and 1 or -1 and 1. Another key part of preparing data is separating it into distinct sets: training, validation, and testing. The training data is used to teach the model (Step 5). On the other hand, the validation and testing data (the distinction is subtle, more on this later) are set aside and reserved for evaluating the model's performance (Step 6).</li>
|
||
<li><strong>Choosing a Model:</strong> This step involves designing the architecture of the neural network. Different models are more suitable for certain types of data and outputs.</li>
|
||
<li><strong>Training</strong>: This step involves feeding the "training" data through the model, allowing the model to adjust the weights of the neural network based on its errors. This process is known as “optimization” where the model tunes the weights to <em>optimize</em> for the least amount of errors.</li>
|
||
<li><strong>Evaluation</strong>: Remember that “testing” data that was saved for in step 3? Since that data wasn’t used in training, it provides a means to evaluate how well the model performs on new, unseen data.</li>
|
||
<li><strong>Parameter Tuning:</strong> The training process is influenced by a set of parameters (often called “hyperparameters”), such as the "learning rate," which dictates how much the model should adjust its weights based on errors in prediction. By fine-tuning these parameters and revisiting steps 5 (Training), 4 (Choosing a Model), or even 3 (Data Preparation), you can often improve the model's performance.</li>
|
||
<li><strong>Deployment: </strong>Once the model is trained and its performance is evaluated satisfactorily, it’s time to actually use the model out in the real world with new data!</li>
|
||
</ol>
|
||
<h2 id="building-a-gesture-classifier">Building a Gesture Classifier</h2>
|
||
<p>I’d like to now follow the 7 steps outlined with an example problem well suited for p5.js and build all the code for each step using ml5.js. However, even though 7 is a truly excellent number, I think I missed a critical step. Let’s call it step 0.</p>
|
||
<ol>
|
||
<li><strong>Identify the Problem</strong>: This initial step involves defining the problem that needs solving. What is the objective? What are you trying to accomplish or predict with your machine learning model?</li>
|
||
</ol>
|
||
<p>After all, how are you supposed to collect your data without knowing what you are even trying to do? Are you predicting a number? A category? A sequence? Is it a binary choice, or are there multiple options? These considerations about your inputs (the data fed into the model) and outputs (the predictions) are critical for every other step of the machine learning journey.</p>
|
||
<p>Let’s take a crack at step 0 for an example problem of training your first machine learning model with ml5.js and p5.js. Imagine for a moment, you’re working on an interactive application that responds to a gesture, maybe that gesture is ultimately meant to be classified via body tracking, but you want to start with something much simpler—one single stroke of the mouse.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_18.jpg" alt="[POSSIBLE ILLUSTRATION OF A SINGLE MOUSE SWIPE AS A GESTURE: basically can the paragraph below be made into a drawing?]">
|
||
<figcaption><strong><em>[POSSIBLE ILLUSTRATION OF A SINGLE MOUSE SWIPE AS A GESTURE: basically can the paragraph below be made into a drawing?]</em></strong></figcaption>
|
||
</figure>
|
||
<p>Each gesture could be recorded as a vector (extending from the start to the end points of a mouse movement) and the model’s task could be to predict one of four options: “up”, “down”, “left”, or “right.” Perfect! I’ve now got the objective and boiled it down into inputs and outputs!</p>
|
||
<h3 id="data-collection-and-preparation">Data Collection and Preparation</h3>
|
||
<p>Next, I’ve got steps 1 and 2: data collection and preparation. Here, I’d like to take the approach of ordering a machine learning “meal-kit,” where the ingredients (data) comes pre-portioned and prepared. This way, I’ll get straight to the cooking itself, the process of training the model. After all, this is really just an appetizer for what will be the ultimate meal later in this chapter when I get to applying neural networks to steering agents.</p>
|
||
<p>For this step, I’ll hard-code that data itself and manually keep it normalized within a range of -1 and 1. Here it is directly written into the code, rather than loaded from a separate file. It is organized into an array of objects, pairing the <span data-type="equation">x,y</span> components of a vector with a string label.</p>
|
||
<pre class="codesplit" data-code-language="javascript">let data = [
|
||
{ x: 0.99, y: 0.02, label: "right" },
|
||
{ x: 0.76, y: -0.1, label: "right" },
|
||
{ x: -1.0, y: 0.12, label: "left" },
|
||
{ x: -0.9, y: -0.1, label: "left" },
|
||
{ x: 0.02, y: 0.98, label: "down" },
|
||
{ x: -0.2, y: 0.75, label: "down" },
|
||
{ x: 0.01, y: -0.9, label: "up" },
|
||
{ x: -0.1, y: -0.8, label: "up" },
|
||
];</pre>
|
||
<p>In truth, it would likely be better to collect example data by asking users to perform specific gestures and recording their inputs, or by creating synthetic data that represents the idealized versions of the gestures I want the model to recognize. In either case, the key is to collect a diverse set of examples that adequately represent the variations in how the gestures might be performed. But let’s see how it goes with just a few servings of data.</p>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-103">Exercise 10.3</h3>
|
||
<p>
|
||
Create a p5.js sketch that collects gesture data from users and saves it to a JSON file. You can use <code>mousePressed()</code> and <code>mouseReleased()</code> to mark the start and end of each gesture and <code>saveJSON()</code> to download the data into a file.
|
||
<em>JSON (JavaScript Object Notation) and CSV (Comma-Separated Values) are two popular formats for storing and loading data. JSON stores data in key-value pairs and follows the same exact format as JavaScript objects. CSV is a file format that stores “tabular” data (like a spreadsheet). There are numerous other data formats you could use depending on your needs what programming environment you are working with.</em>
|
||
</p>
|
||
</div>
|
||
<p>I’ll also note that, much like some of the genetic algorithm demonstrations in chapter 9, I am selecting a problem here that has a known solution and could have been solved more easily and efficiently without a neural network. The direction of a vector can be classified with the <code>heading2D()</code> function and a series of if statements! However, by using this seemingly trivial scenario, I hope to explain the process of training a machine learning model in an understandable and friendly way. Additionally, it will make it easy to check if the code is working as expected! When I’m done I’ll provide some ideas about how to expand the classifier to a scenario where <code>if</code> statements would not apply.</p>
|
||
<h3 id="choosing-a-model">Choosing a Model</h3>
|
||
<p>This is where I am going to let ml5.js do the heavy lifting for me. To create the model with ml5.js, all I need to do is specify the task, the inputs, and the outputs!</p>
|
||
<pre class="codesplit" data-code-language="javascript">let options = {
|
||
task: "classification",
|
||
inputs: 2,
|
||
outputs: ["up", "down", "left", "right"],
|
||
debug: true
|
||
};
|
||
let classifier = ml5.neuralNetwork(options);</pre>
|
||
<p>That's it! I'm done! Thanks to ml5.js, I can bypass a host of complexities related to the manual configuration and setup of the model. This includes decisions about the network architecture, such as how many layers and neurons per layer to have, the kind of activation functions to use, and the setup of algorithms for training the network. Keep in mind that the default model architecture selected by ml5.js may not be perfect for all cases. I encourage you to read the ml5.js reference for additional details on how to customize the model.</p>
|
||
<p>I’ll also point out that ml5.js is able to infer the inputs and outputs from the data itself, so those properties are not entirely necessary to include here in the <code>options</code> object. However, for the sake of clarity (and since I’ll need to specify those for later examples), I’m including them here.</p>
|
||
<p>The <code>debug</code> property, when set to <code>true</code>, enables a visual interface for the training process. It’s a helpful tool for spotting potential issues during training and for getting a better understanding of what's happening behind the scenes.</p>
|
||
<h3 id="training">Training</h3>
|
||
<p>Now that I have the data and a neural network initialized in the <code>classifier</code> variable, I’m ready to train the model! The thing is, I’m not really done with the data. In the “Data Collection and Preparation” section, I organized the data neatly into an array of objects, representing the <span data-type="equation">x,y</span> components of a vector paired with a string label. This format, while typical, isn't directly consumable by ml5.js for training. I need to specify which elements of the data are the inputs and which are the outputs for training the model. I could have initially organized the data into a format that ml5.js recognizes, but I'm including this extra step because it's more likely to be what happens when using a "real" dataset that has been collected or sourced elsewhere.</p>
|
||
<p>The ml5.js library offers a fair amount of flexibility in the kinds of formats it will accept, I will choose to use arrays—one for the <code>inputs</code> and one for the <code>outputs</code>.</p>
|
||
<pre class="codesplit" data-code-language="javascript">for (let i = 0; i < data.length; i++) {
|
||
let item = data[i];
|
||
// An array of 2 numbers for the inputs
|
||
let inputs = [item.x, item.y];
|
||
// A single string "label" for the output
|
||
let outputs = [item.label];
|
||
//{!1} Add the training data to the classifier
|
||
classifier.addData(inputs, outputs);
|
||
}</pre>
|
||
<p>A term you will often hear when talking about data in machine learning is “shape.” What is the “shape” of your data?</p>
|
||
<p>The "shape" of data in machine learning describes its dimensions and structure. It indicates how the data is organized in terms of rows, columns, and potentially even deeper, into additional dimensions. In the context of machine learning, understanding the shape of your data is crucial because it determines how the model should be structured.</p>
|
||
<p>Here, the input data's shape is a one-dimensional array containing 2 numbers (representing x and y). The output data, similarly, is an array but just contains a single string label. While this is a very small and simple example, it nicely mirrors many real-world scenarios where input features are numerically represented in an array, and outputs are string labels.</p>
|
||
<p>Oh dear, another term to unpack—features! In machine learning, the individual pieces of information used to make predictions are often called <strong>features</strong>. The term “feature” is chosen because it underscores the idea of distinct characteristics of the data are that most salient for the prediction. This will come into focus more clearly in future examples in this chapter.</p>
|
||
<p>After passing the data into the <code>classifier</code>, ml5.js provides a helper function to normalize it.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Normalize the data
|
||
classifier.normalizeData();</pre>
|
||
<p>As I’ve mentioned, normalizing data (adjusting the scale to a standard range) is a critical step in the machine learning process. However, if you recall during the data collection process, the hand-coded data was written with values that already range between -1 and 1. So, while calling <code>normalizeData()</code> here is likely redundant, it's important to demonstrate. Normalizing your data as part of the pre-processing step will absolutely work, but the auto-normalization feature of ml5.js is a quite convenient alternative.</p>
|
||
<p>Ok, this subsection is called training. So now it’s time to train! Here’s the code:</p>
|
||
<pre class="codesplit" data-code-language="javascript">// The "train" method initiates the training process
|
||
classifier.train(finishedTraining);
|
||
|
||
// A callback function for when the training is complete
|
||
function finishedTraining() {
|
||
console.log("Training complete!");
|
||
}</pre>
|
||
<p>Yes, that’s it! After all, the hard work as already been completed! The data was collected, prepared, and fed into the model. However, if I were to run the above code and then test the model, the results would probably be inadequate. Here is where it’s important to introduce another key term in machine learning: <strong>epoch.</strong> The <code>train()</code> method tells the neural network to start the learning process. But how long should it train for? You can think of an epoch as one round of practice, one cycle of using the entire dataset to update the weights of the neural network. Generally speaking, the longer you train, the better the network will perform, but at a certain point there are diminishing returns. The number of epochs can be set by passing in an options object into <code>train()</code>.</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
//{!1} Setting the number of epochs for training
|
||
let options = { epochs: 25 };
|
||
classifier.train(options, finishedTraining);</pre>
|
||
<p>There are other "hyperparameters" that you can set in the <code>options</code> variable (learning rate is one again!), but I'm going to stick with the defaults. You can read more about customization options in the ml5.js reference.</p>
|
||
<p>The second argument, <code>finishedTraining()</code>, is optional, but it's good to include because it's a callback that runs when the training process is complete. This is useful for knowing when you can proceed to the next steps in your code. There is even another optional callback, which I usually name <code>whileTraining()</code>, that is triggered after each epoch. However, for my purposes, knowing when the training is done is plenty!</p>
|
||
<div data-type="note">
|
||
<h3 id="callbacks">Callbacks</h3>
|
||
<p>If you've worked with p5.js, you're already familiar with the concept of a callback even if you don't know it by that name. Think of the <code>mousePressed()</code> function. You define what should happen inside it, and p5.js takes care of <em>calling </em>it at the right moment, when the mouse is pressed.</p>
|
||
<p>A callback function in JavaScript operates on a similar principle. It's a function that you provide as an argument to another function, intending for it to be “called back” at a later time. They are needed for “asynchronous” operations, where you want your code to continue along with animating or doing other things while waiting for another task to finish. A classic example of this in p5.js is loading data into a sketch with <code>loadJSON()</code>.</p>
|
||
<p>In JavaScript, there's also a more recent approach for handling asynchronous operations known as "Promises." With Promises, you can use keywords like <code>async</code> and <code>await</code> to make your asynchronous code look more like traditional synchronous code. While ml5.js also supports this style, I’ll stick to using callbacks to stay aligned with p5.js style.</p>
|
||
</div>
|
||
<h3 id="evaluation">Evaluation</h3>
|
||
<p>If <code>debug</code> is set to true in the initial call to <code>ml5.neuralNetwork()</code>, once <code>train()</code> is called, a visual interface appears covering most of the p5.js page and canvas.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_19.png" alt="Figure 10.19: The TenorFlow.js “visor” with a graph of the loss function and model details.">
|
||
<figcaption>Figure 10.19: The TenorFlow.js “visor” with a graph of the loss function and model details.</figcaption>
|
||
</figure>
|
||
<p>This panel, called "Visor," represents the evaluation step, as shown in Figure X.X. The Visor is a part of TensorFlow.js and includes a graph that provides real-time feedback on the progress of the training. Let’s take a moment to focus on the "loss" plotted on the y-axis against the number of epochs along the x-axis.</p>
|
||
<p>So, what exactly is this "loss"? Loss is a measure of how far off the model's predictions are from the “correct” outputs provided by the training data. It quantifies the model’s total error. When training begins, it's common for the loss to be high because the model has yet to learn anything. As the model trains through more epochs, it should, ideally, get better at its predictions, and the loss should decrease. If the graph goes down as the epochs increase, this is a good sign!</p>
|
||
<p>Running the training for 200 epochs might strike you as a bit excessive. In a real-world scenario with more extensive data, I would probably use fewer epochs. However, because the dataset here is so tiny, the higher number of epochs helps the model get enough "practice" with the data. Remember, this is a "toy" example, aiming to make the concepts clear rather than to produce a sophisticated machine learning model.</p>
|
||
<p>Below the graph, you will find a "model summary" table that provides details on the lower-level TensorFlow.js model architecture created behind the scenes. The summary includes layer names, neuron counts per layer, and a "parameters" count, which is the total number of weights, one for each connection between two neurons.</p>
|
||
<p>Now, before moving on, I’d like to refer back to the data preparation step. There I mentioned the idea of splitting the data between “training,” “validation,” and “testing.”</p>
|
||
<ol>
|
||
<li><strong><em>training</em></strong>: primary dataset used to train the model</li>
|
||
<li><strong><em>validation</em></strong>: subset of data used to check the model during training</li>
|
||
<li><strong><em>testing</em></strong>: additional untouched data never considered during the training process to determine its final performance.</li>
|
||
</ol>
|
||
<p>With ml5.js, while it’s possible to incorporate all three categories of data. However, I’m simplifying things here and focusing only on the training dataset. After all, my dataset only has 8 records, it’s much too small to divide three different sets! Using such a small dataset risks the model “overfitting” the data. Overfitting is a term that describes when a machine learning model has learned the training data <em>too well</em>. In this case, it’s become so “tuned” to the specific peculiarities of the training data, that is is much less effective when working with new, unseen data. The best way to combat overfitting, is to use validation data during the training process! If it performs well on the training data but poorly on the validation data, it's a strong indicator that overfitting might be occurring.</p>
|
||
<p>ml5.js provides some automatic features to employ validation data, if you are inclined to go further, you can explore the full set of neural network examples at <a href="http://ml5js.org/">ml5js.org</a>.</p>
|
||
<h3 id="parameter-tuning">Parameter Tuning</h3>
|
||
<p>After the evaluation step, there is typically an iterative process of adjusting "hyperparameters" to achieve the best performance from the model. The ml5.js library is designed to provide a higher-level, user-friendly interface to machine learning. So while it does offer some capabilities for parameter tuning (which you can explore in the reference), it is not as geared towards low-level, fine-grained adjustments as some other frameworks might be. Using TensorFlow.js directly might be your best bet since it offers a broader suite of tools and allows for lower-level control over the training process. For this demonstration—seeing a loss all the way down to 0.1 on the evaluation graph—I am satisfied with the result and happy to move onto deployment!</p>
|
||
<h3 id="deployment">Deployment</h3>
|
||
<p>This is it, all that hard work has paid off! Now it’s time to deploy the model. This typically involves integrating it into a separate application to make predictions or decisions based on new, unseen data. For this, ml5.js offers the convenience of a <code>save()</code> and <code>load()</code> function. After all, there’s no reason to re-train a model every single time you use it! You can download the model to a file in one sketch and then load it for use in a completely different one. However, for simplicity, I’m going to demonstrate deploying and utilizing the model in the same sketch where it was trained.</p>
|
||
<p>Once the training process is complete, the resulting model is saved in the <code>classifier</code> variable and is, in essence, deployed. You can detect the completion of the training process using the <code>finishedTraining()</code> callback and use a boolean variable or other logic to initiate the prediction stage of the code. For this example, I’ll include a global variable <code>status</code>to track the training process and ultimately display the predicted label on the canvas.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// When the sketch starts, it will show a status of "training"
|
||
let status = "training";
|
||
|
||
function draw() {
|
||
background(255);
|
||
textAlign(CENTER, CENTER);
|
||
textSize(64);
|
||
text(status, width / 2, height / 2);
|
||
}
|
||
|
||
// This is the callback for when training is complete, and the message changes to "ready"
|
||
function finishedTraining() {
|
||
status = "ready";
|
||
}</pre>
|
||
<p>Once the model is trained, the <code>classify()</code> method can be called to send new data into the model for prediction. The format of the data sent to <code>classify()</code> should match the format of the data used in training, in this case two floating point numbers, representing the <code>x</code> and <code>y</code> components of a direction vector.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Manually creating a vector
|
||
let direction = createVector(1, 0);
|
||
// Converting the x and y components into an input array
|
||
let inputs = [direction.x, direction.y];
|
||
// Asking the model to classify the inputs
|
||
classifier.classify(inputs, gotResults);</pre>
|
||
<p>The second argument of the <code>classify()</code> function is a callback. Although it would be more convenient to receive the results immediately and move on to the next line of code, the results are returned later through a separate callback event (just as with model loading and training).</p>
|
||
<pre class="codesplit" data-code-language="javascript">function gotResults(results) {
|
||
console.log(results);
|
||
}</pre>
|
||
<p>The model’s prediction arrives in the argument to the callback, which I’m calling <code>results</code> in the code. Inside, you’ll find an array of the labels, sorted by “confidence.” Confidence refers to the probability assigned by the model to each label, representing how sure it is of that particular prediction. It ranges from 0 to 1, with values closer to 1 indicating higher confidence and values near 0 suggesting lower confidence.</p>
|
||
<pre class="codesplit" data-code-language="json">[
|
||
{
|
||
"label": "right",
|
||
"confidence": 0.9669702649116516
|
||
},
|
||
{
|
||
"label": "up",
|
||
"confidence": 0.01878807507455349
|
||
},
|
||
{
|
||
"label": "down",
|
||
"confidence": 0.013948931358754635
|
||
},
|
||
{
|
||
"label": "left",
|
||
"confidence": 0.00029277068097144365
|
||
}
|
||
]</pre>
|
||
<p>In the example output here, the model is highly confident (approximately 96.7%) that the correct label is "right," while it has minimal confidence in the "left" label, 0.03%. The confidence values are normalized and add up to 100%.</p>
|
||
<div data-type="example">
|
||
<h3 id="example-102-gesture-classifier">Example 10.2: Gesture Classifier</h3>
|
||
<figure>
|
||
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/SbfSv_GhM" data-example-path="examples/10_nn/10_2_gesture_classifier"></div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Storing the start of a gesture when the mouse is pressed
|
||
function mousePressed() {
|
||
start = createVector(mouseX, mouseY);
|
||
}
|
||
|
||
// Updating the end of a gesture as the mouse is dragged
|
||
function mouseDragged() {
|
||
end = createVector(mouseX, mouseY);
|
||
}
|
||
|
||
// The gesture is complete when the mouse is released
|
||
function mouseReleased() {
|
||
// Calculate and normalize a direction vector
|
||
let dir = p5.Vector.sub(end, start);
|
||
dir.normalize();
|
||
// Convert to an inputs array and classify
|
||
let inputs = [dir.x, dir.y];
|
||
classifier.classify(inputs, gotResults);
|
||
}
|
||
|
||
// Store the resulting label in the status variable for showing in the canvas
|
||
function gotResults(error, results) {
|
||
status = results[0].label;
|
||
}</pre>
|
||
<p>Since the array is sorted by confidence, if I just want to use a single label as the prediction, I can access the first element of the array with <code>results[0].label</code> as in the <code>gotResults()</code> function in Example 10.2.</p>
|
||
<div data-type="note">
|
||
<h3 id="exercise-104">Exercise 10.4</h3>
|
||
<p>Divide Example 10.2 into three different sketches, one for collecting data, one for training, and one for deployment. Using the <code>ml5.neuralNetwork</code> functions <code>save()</code> and <code>load()</code> for saving and loading the model to and from a file.</p>
|
||
</div>
|
||
<div data-type="note">
|
||
<h3 id="exercise-105">Exercise 10.5</h3>
|
||
<p>Expand the gesture recognition to classify a sequence of vectors, capturing more accurately the path of a longer mouse movement. Remember your input data must have a consistent shape! So you’ll have to decide on how many vectors to use to represent a gesture and store no more and no less for each data point. While this approach can work, other machine learning models (such as Recurrent Neural Networks) are specifically designed to handle sequential data and might offer more flexibility and potential accuracy.</p>
|
||
</div>
|
||
<div data-type="note">
|
||
<h3 id="exercise-106">Exercise 10.6</h3>
|
||
<p><strong><em>[Exercise around hand pose classifier?]</em></strong></p>
|
||
</div>
|
||
<h2 id="reinforcement-learning">Reinforcement Learning</h2>
|
||
<p>There is so much more to working with data, machine learning, ml5.js, and beyond. I’ve only scratched the surface. As I close out this book, my goal is to tie the foundational machine learning concepts I’ve covered back into animated, interactive p5.js sketches that simulate physics and complex systems. Let’s see if I can bring as many concepts from the entire book back together for one last hurrah!</p>
|
||
<p>Towards the start of this chapter, I referenced an approach to incorporating machine learning into a simulated environment called “reinforcement learning.” Imagine embedding a neural network into any of the example objects (walker, mover, particle, vehicle) and calculating a force or some other action. The neural network could receive inputs related to the environment (such as distance to an obstacle) and produce a decision that requires a choice from a set of discrete options (e.g., move “left” or “right”) or a set of continuous values (e.g., magnitude and direction of a steering force). This is starting to sound familiar: it’s a neural network that receives inputs and performs classification or regression!</p>
|
||
<p>Here is where things take a turn, however. To better illustrate the concept, let’s start with a hopefully easy to understand and possibly familiar scenario, the game “Flappy Bird.” The game is deceptively simple. You control a small bird that continually moves horizontally across the screen. With each tap or click, the bird flaps its wings and rises upward. The challenge? A series of vertical pipes spaced apart at irregular intervals emerges from the right. The pipes have gaps, and your primary objective is to navigate the bird safely through these gaps. If you hit one, it’s game over. As you progress, the game’s speed increases, and the more pipes you navigate, the higher your score.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_20.png" alt="Figure 10.x - redraw a simplified version? (from https://yourstory.com/2023/04/flappy-bird-rise-fall-viral-mobile-gaming-phenomenon)">
|
||
<figcaption>Figure 10.x - redraw a simplified version? (from <a href="https://yourstory.com/2023/04/flappy-bird-rise-fall-viral-mobile-gaming-phenomenon">https://yourstory.com/2023/04/flappy-bird-rise-fall-viral-mobile-gaming-phenomenon</a>)</figcaption>
|
||
</figure>
|
||
<p>Suppose you wanted to automate the gameplay, and instead of a human tapping, a neural network will make the decision as to whether to “flap” or not. Could machine learning work here? Skipping over the “data” steps for a moment, let’s think about “choosing a model.” What are the inputs and outputs of the neural network?</p>
|
||
<p>Let’s begin with the inputs. This is quite the intriguing question because there isn’t a definitive answer! In a scenario where you want to see if you could train an automated neural network player without any knowledge of the game itself, it might make the most sense to have the inputs be all the pixels of the game screen. Maybe you don’t want to put your thumb on the scale in terms of what aspects of the game are important. This approach attempts to feed <em>everything</em> about the game into the model.</p>
|
||
<p>For me, I understand the flappy bird game quite well, I believe I can identify the important data points needed to make a decision. I can bypass all the pixels and boil the essence of the game down into the important <strong>features </strong>that define the game. Remember the discussion about features in the context of the gesture classifier? It applies here as well. These features are not arbitrary aspects of the game; they represent the distinct characteristics of Flappy Bird that are most salient for the neural network's decisions.</p>
|
||
<ol>
|
||
<li><span data-type="equation">y</span> position of the bird</li>
|
||
<li><span data-type="equation">y</span> velocity of the bird.</li>
|
||
<li><span data-type="equation">y</span> position of the next pipe’s top opening.</li>
|
||
<li><span data-type="equation">y</span> position of the next pipe’s bottom opening.</li>
|
||
<li><span data-type="equation">x</span> distance to the next pipes.</li>
|
||
</ol>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_21.jpg" alt="Figure 10.x - with features added a simplified version? (from https://yourstory.com/2023/04/flappy-bird-rise-fall-viral-mobile-gaming-phenomenon)">
|
||
<figcaption>Figure 10.x - with features added a simplified version? (from <a href="https://yourstory.com/2023/04/flappy-bird-rise-fall-viral-mobile-gaming-phenomenon">https://yourstory.com/2023/04/flappy-bird-rise-fall-viral-mobile-gaming-phenomenon</a>)</figcaption>
|
||
</figure>
|
||
<p>These are the inputs to the neural network. But what about the outputs? Is the problem a "classification" or "regression" one? This may seem like an odd question to ask in the context of a game like Flappy Bird, but it's actually incredibly important and relates to how the game is controlled. Tapping the screen, pressing a button, or using keyboard controls are all examples of classification. After all, is are only a discrete set of choices: tap or not, press 'w', 'a', 's', or 'd' on the keyboard. On the other hand, using an analog controller like a joystick leans towards regression. A joystick can be tilted in varying degrees in any direction, translating to continuous output values for both its horizontal and vertical axes.</p>
|
||
<p>For flappy bird, it’s a classification decision with only two choices:</p>
|
||
<ol>
|
||
<li>flap</li>
|
||
<li>don’t flap</li>
|
||
</ol>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_22.png" alt="Figure 10.22: The neural network as ml5.js might design it">
|
||
<figcaption>Figure 10.22: The neural network as ml5.js might design it</figcaption>
|
||
</figure>
|
||
<p>This gives me the information needed to choose the model and I can let ml5.js build it.</p>
|
||
<pre class="codesplit" data-code-language="javascript">let options = {
|
||
inputs: 5,
|
||
outputs: ["flap", "no flap"]
|
||
}
|
||
let birdBrain = ml5.neuralNetwork(options);</pre>
|
||
<p>Now if I were to continue this line of thinking further, I’d have to go back to steps 1 and 2 of the machine learning process: data collection and preparation. How exactly would that work here? One idea would be to scour the earth for the greatest flappy bird player of all time and record them playing for hours. I could log all of the input features for every moment of gameplay along with whether the player flapped or not. Feed all that data into the model, train it, and I can see the headlines already: “Artificial Intelligence Bot Defeats Flappy Bird.”</p>
|
||
<p>But um, wait a second here, has an agent really learned to play Flappy Bird on its own or has it really just learned to mirror the play of a human? What if that human missed a key aspect of flappy bird strategy? The automated player would never discover it. Not to mention the fact that collecting all that data would be an incredibly tedious and laborious process.</p>
|
||
<p>This is where reinforcement learning comes in. Reinforcement learning is a type of machine learning where an agent learns through interacting with the environment and receiving feedback in the form of rewards or penalties. Unlike supervised learning, where the “correct” answers are provided by a training dataset, the agent in reinforcement learning learns the answers, the optimal decisions, through trial and error. For example, in Flappy Bird, the bird could receive a positive rewards every time it successfully navigates a pipe, but a negative reward if it hits a pipe or the ground. The agent's goal is to figure out which actions lead to the most cumulative rewards over time.</p>
|
||
<p>At the start, the Flappy Bird agent won't know the best time to flap its wings, leading to many crashes. But as it accrues more and more feedback from countless play-throughs, it begins to refine its actions and develop the optimal strategy to navigate the pipes without crashing, maximizing its total reward. This process of "learning by doing" and optimizing based on feedback is the essence of reinforcement learning.</p>
|
||
<p>In the next section, I'll explore the principles I’m outlining here with a twist. Traditional techniques in reinforcement learning involve defining something called a “policy” and a corresponding “reward function.” Instead of going down this road, however, I will introduce a related technique that is baked into ml5.js: <strong>neuroevolution. </strong>This technique combines the evolutionary algorithms from Chapter 9 with neural networks. By evolving the weights of a neural network, I’ll demonstrate how the bird can perfect its journey through the pipes! I'll then finish off the chapter with a variation of Craig Reynold's steering behaviors from Chapter 5 using neuroevolution.</p>
|
||
<h2 id="evolving-neural-networks-is-neat">Evolving Neural Networks is NEAT!</h2>
|
||
<p>Instead of traditional backpropagation to train the weights in a neural network, neuroevolution applies principles of genetic algorithms and natural selection: the best-performing neural networks are "selected" and their "genes" (or weights) are combined and mutated to create the next generation of networks.</p>
|
||
<p>One of the first examples of neuroevolution can be found in the 1994 paper "<a href="https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.3139">Genetic Lander: An experiment in accurate neuro-genetic control</a>" by Edmund Ronald and Marc Schoenauer. In the 1990s traditional neural network training methods were still nascent, and this work explored an alternative approach. The paper describes how a simulated spacecraft—in a game aptly named "Lunar Lander"—can learn how to safely descend and land on a surface. Rather than use hand-crafted rules or labeled datasets, the researchers opted for genetic algorithms to evolve and train neural networks over multiple generations. And it worked!</p>
|
||
<p>In 2002, Kenneth O. Stanley and Risto Miikkulainen expanded on earlier neuroevolutionary approaches with their paper titled "<a href="https://direct.mit.edu/evco/article-abstract/10/2/99/1123/Evolving-Neural-Networks-through-Augmenting?redirectedFrom=fulltext">Evolving Neural Networks Through Augmenting Topologies</a>." Unlike the lunar lander method that focused on evolving the weights of a neural network, Stanley and Miikkulainen introduced a method that also evolved the network's structure itself! The “NEAT” algorithm—NeuroEvolution of Augmenting Topologies—starts with simple networks and progressively refines their topology through evolution. As a result, NEAT can discover network architectures tailored to specific tasks, often yielding more optimized and effective solutions.</p>
|
||
<p>A comprehensive NEAT implementation would require going deeper into the neural network architecture with TensorFlow.js directly. My goal here is to emulate Ronald and Schoenauer’s research in the modern context of the web browser with ml5.js. Rather than use the lunar lander game, I’ll give this a try with Flappy Bird!</p>
|
||
<h2 id="coding-flappy-bird">Coding Flappy Bird</h2>
|
||
<p>The game Flappy Bird was created by Vietnamese game developer Dong Nguyen in 2013. In January 2014, it became the most downloaded app on the Apple App Store. However, on February 8th, Nguyen announced that he was removing the game due to its addictive nature. Since then, it has been one of the most cloned games in history. Flappy Bird is a perfect example of "Nolan's Law," an aphorism attributed to the founder of Atari and creator of Pong, Nolan Bushnell: "All the best games are easy to learn and difficult to master.”</p>
|
||
<p>Flappy Bird is also a terrific game for beginner coders to recreate as a learning exercise, and it fits perfectly with the concepts in this book. To create the game with p5.js, I’ll start with by defining a <code>Bird</code> class. Now, I’m going to do something that may shock you here, but I’m going to skip using <code>p5.Vector</code> for this demonstration and instead use separate <code>x</code> and <code>y</code> properties for the bird’s position. Since the bird only moves along the vertical axis in the game, <code>x</code> remains constant! Therefore, the <code>velocity</code> (and all of the relevant forces) can be a single scalar value for just the y-axis. To simplify things even further, I’ll add the forces directly to the bird's velocity instead of accumulating them into an acceleration variable. In addition to the usual <code>update()</code>, I’ll include a <code>flap()</code> method for the bird to fly upward. The <code>show()</code> method is not included below as it remains the same and draws only a circle.</p>
|
||
<pre class="codesplit" data-code-language="javascript">class Bird {
|
||
constructor() {
|
||
// The bird's position (x will be constant)
|
||
this.x = 50
|
||
this.y = 120;
|
||
|
||
// Velocity and forces are scalar since the bird only moves along the y-axis
|
||
this.velocity = 0;
|
||
this.gravity = 0.5;
|
||
this.flapForce = -10;
|
||
}
|
||
|
||
// The bird flaps its wings
|
||
flap() {
|
||
this.velocity += this.flapForce;
|
||
}
|
||
|
||
update() {
|
||
// Add gravity
|
||
this.velocity += this.gravity;
|
||
this.y += this.velocity;
|
||
// Dampen velocity
|
||
this.velocity *= 0.95;
|
||
|
||
// Handle the "floor"
|
||
if (this.y > height) {
|
||
this.y = height;
|
||
this.velocity = 0;
|
||
}
|
||
}
|
||
}</pre>
|
||
<p>The other primary element of the game are the pipes that the bird must navigate through. I’ll create a <code>Pipe</code> class to describe a pair of rectangles, one that emanates from the top of the canvas and one from the bottom. Just as the bird only moves vertically, the pipes slide along only the horizontal axis, so the properties can also be scalar values rather than vectors. The pipes move at a constant speed and don’t experience any physics.</p>
|
||
<pre class="codesplit" data-code-language="javascript">class Pipe {
|
||
constructor() {
|
||
// The size of the opening between the two parts of the pipe
|
||
this.spacing = 100;
|
||
// A random height for the top of the pipe
|
||
this.top = random(height - this.spacing);
|
||
// The starting position of the bottom pipe (based on the top)
|
||
this.bottom = this.top + this.spacing;
|
||
// The pipe starts at the edge of the canvas
|
||
this.x = width;
|
||
// Width of the pipe
|
||
this.w = 20;
|
||
// Horizontal speed of the pipe
|
||
this.velocity = 2;
|
||
}
|
||
|
||
// Draw the two pipes
|
||
show() {
|
||
fill(0 );
|
||
noStroke();
|
||
rect(this.x, 0, this.w, this.top);
|
||
rect(this.x, this.bottom, this.w, height - this.bottom);
|
||
}
|
||
|
||
// Update the pipe horizontal position
|
||
update() {
|
||
this.x -= this.velocity;
|
||
}
|
||
}</pre>
|
||
<p>To be clear, the "reality" depicted in the game is a bird flying through pipes. The bird is moving along two dimensions while the pipes remain stationary. However, it is simpler in terms of code to consider the bird as stationary in its horizontal position and treat the pipes as moving.</p>
|
||
<p>With a <code>Bird</code> and <code>Pipe</code> class written, I'm almost set to run the game. However, there remains a key missing piece: collisions. The whole game rides on the bird attempting to avoid the pipes! This is nothing new, you’ve seen many examples of objects checking their positions against others throughout this book.</p>
|
||
<p>Now, there's a design choice to make. A function to check collisions could logically be placed in either the <code>Bird</code> class (to check if the bird hits a pipe) or in the <code>Pipe</code> class (to check if a pipe hits the bird). Either can be justified depending on your point of view. I'll place it in the <code>Pipe</code> class and call it <code>collides()</code>.</p>
|
||
<p>It's a little trickier than you might think on first glance as the function needs to check both the top and bottom rectangles of a pipe against the position of the bird. There are a variety of ways you could approach this, one way is to first check if the bird is vertically within the bounds of either rectangle (either above the top pipe or below the bottom one). But it's only actually colliding with the pipe if the bird is also horizontally within the boundaries of the pipe's width. An elegant way to write this is to combining each of these checks with a logical "and."</p>
|
||
<pre class="codesplit" data-code-language="javascript"> collides(bird) {
|
||
// Is the bird within the vertical range of the top or bottom pipe?
|
||
let verticalCollision = bird.y < this.top || bird.y > this.bottom;
|
||
// Is the bird within the horizontal range of the pipes?
|
||
let horizontalCollision = bird.x > this.x && bird.x < this.x + this.w;
|
||
//{!1} If it's both a vertical and horizontal hit, it's a hit!
|
||
return verticalCollision && horizontalCollision;
|
||
}</pre>
|
||
<p>The algorithm currently treats the bird as a single point and does not take into account its size. This is something that should be improved for a more realistic version of the game.</p>
|
||
<p>All that’s left to do is write <code>setup()</code> and <code>draw()</code>. I need a single variable for the bird and an array for a list of pipes. The interaction is just a single press of the mouse. Rather than build a fully functional game with a score, end screen, and other usual elements, I’ll just make sure things are working by drawing the text “OOPS!” near any pipe when there is a collision. The code also assumes an additional <code>offscreen()</code> method to the <code>Pipe</code> class for when it has moved beyond the left edge of the canvas.</p>
|
||
<div data-type="example">
|
||
<h3 id="example-103-flappy-bird-clone">Example 10.3: Flappy Bird Clone</h3>
|
||
<figure>
|
||
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/Pv-JlO0cl" data-example-path="examples/10_nn/10_3_flappy_bird"><img src="examples/10_nn/10_3_flappy_bird/screenshot.png"></div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript">let bird;
|
||
let pipes = [];
|
||
|
||
function setup() {
|
||
createCanvas(640, 240);
|
||
//{!2} Create a bird and start with one pipe
|
||
bird = new Bird();
|
||
pipes.push(new Pipe());
|
||
}
|
||
|
||
//{!3} The bird flaps its wings when the mouse is pressed
|
||
function mousePressed() {
|
||
bird.flap();
|
||
}
|
||
|
||
function draw() {
|
||
background(255);
|
||
// Handle all of the pipes
|
||
for (let i = pipes.length - 1; i >= 0; i--) {
|
||
pipes[i].show();
|
||
pipes[i].update();
|
||
if (pipes[i].collides(bird)) {
|
||
text("OOPS!", pipes[i].x, pipes[i].top + 20);
|
||
}
|
||
if (pipes[i].offscreen()) {
|
||
pipes.splice(i, 1);
|
||
}
|
||
}
|
||
// Update and show the bird
|
||
bird.update();
|
||
bird.show();
|
||
//{!3} Add a new pipe every 75 frames
|
||
if (frameCount % 75 == 0) {
|
||
pipes.push(new Pipe());
|
||
}
|
||
}</pre>
|
||
<p>The trickiest aspect of the above code lies in spawning the pipes at regular intervals with the <code>frameCount</code> variable and modulo operator <code>%</code>. In p5.js, <code>frameCount</code> is a system variable that tracks the number of frames rendered since the sketch began, incrementing with each cycle of the <code>draw()</code> loop. The modulo operator, denoted by <code><strong>%</strong></code>, returns the remainder of a division operation. For example, <code>7 % 3</code> would yield <code>1</code> because when dividing 7 by 3, the result is 2 with a remainder of 1. The boolean expression <code>frameCount % 75 == 0</code> therefore checks if the current <code>frameCount</code> value, when divided by 75, has a remainder of 0. This condition is true every 75 frames and at those frame counts, a new pipe is spawned and added to the <code>pipes</code> array.</p>
|
||
<div data-type="note">
|
||
<h3 id="exercise-107">Exercise 10.7</h3>
|
||
<p>Implement a scoring system that awards points for successfully navigating through each set of pipes. Feel free to add your own visual design elements for the bird, pipes, and environment!</p>
|
||
</div>
|
||
<h2 id="neuroevolution-flappy-bird">Neuroevolution Flappy Bird</h2>
|
||
<p>The game, as it currently stands, is controlled by mouse clicks. The first step to implementing neuroevolution is to give each bird a brain so that it can decide on its own whether or not to flap its wings.</p>
|
||
<h3 id="the-bird-brain">The Bird Brain</h3>
|
||
<p>In the previous section on reinforcement learning, I established a list of input features that comprise the bird's decision-making process. I’m going to use that same list with one simplification. Since the size of the opening between the pipes will remain constant, there’s no need to include both the <span data-type="equation">y</span> positions of the top and bottom; one will suffice.</p>
|
||
<ol>
|
||
<li><span data-type="equation">y</span> position of the bird.</li>
|
||
<li><span data-type="equation">y</span> velocity of the bird.</li>
|
||
<li><span data-type="equation">y</span> position of the next pipe’s top (or the bottom!) opening.</li>
|
||
<li><span data-type="equation">x</span> distance to the next pipes.</li>
|
||
</ol>
|
||
<p>The outputs have just two options: to flap or not to flap! With the inputs and outputs set, I can add a <code>brain</code> property to the bird’s constructor with the appropriate configuration. Just to demonstrate a different style here, I’ll skip including a separate <code>options</code> variable and pass the properties as an object literal directly into the <code>ml5.neuralNetwork()</code> function. Note the addition of a <code>neuroEvolution</code> property set to <code>true</code>. This is necessary to enable some of the features I’ll be using later in the code.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> constructor() {
|
||
this.brain = ml5.neuralNetwork({
|
||
// A bird's brain receives 4 inputs and classifies them into one of two labels
|
||
inputs: 4,
|
||
outputs: ["flap", "no flap"],
|
||
task: "classification",
|
||
//{!1} A new property necessary to enable neuro evolution functionality
|
||
neuroEvolution: true
|
||
});
|
||
}</pre>
|
||
<p>Next, I’ll add a new method called <code>think()</code> to the <code>Bird</code> class where all of the necessary inputs for the bird are calculated. The first two are easy, as they are simply the <code>y</code> and <code>velocity</code> properties of the bird itself. However, for inputs 3 and 4, I need to determine which pipe is the “next” pipe.</p>
|
||
<p>At first glance, it might seem that the next pipe is always the first one in the array, since the pipes are added one at a time to the end of the array. However, once a pipe passes the bird, it is no longer relevant. I need to find the first pipe in the array whose right edge (x-position plus width) is greater than the bird’s x position.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> think(pipes) {
|
||
let nextPipe = null;
|
||
for (let pipe of pipes) {
|
||
//{!4} The next pipe is the one who hasn't passed the bird yet.
|
||
if (pipe.x + pipe.w > this.x) {
|
||
nextPipe = pipe;
|
||
break;
|
||
}
|
||
}</pre>
|
||
<p>Once I have the next pipe, I can create the four inputs:</p>
|
||
<pre class="codesplit" data-code-language="javascript"> let inputs = [
|
||
// y-position of bird
|
||
this.y,
|
||
// y-velocity of bird
|
||
this.velocity,
|
||
// top opening of next pipe
|
||
nextPipe.top,
|
||
//{!1} distance from next pipe to this pipe
|
||
nextPipe.x - this.x,
|
||
];</pre>
|
||
<p>However, I have forgotten a critical step! The range of all input values is determined by the dimensions of the canvas. The neural network, however, expects values in a standardized range, such as 0 to 1. One method to normalize these values is to divide the inputs related to vertical properties by<code>height</code>, and those related to horizontal ones by <code>width</code>.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> let inputs = [
|
||
//{!4} All of the inputs are now normalized by width and height
|
||
this.y / height,
|
||
this.velocity / height,
|
||
nextPipe.top / height,
|
||
(nextPipe.x - this.x) / width,
|
||
];</pre>
|
||
<p>With the inputs in hand, I’m ready to pass them to the neural network’s <code>classify()</code> method. There is, however, one small problem. Remember, <code>classify()</code> is asynchronous! This means I need implement a callback inside the <code>Bird</code> class to process the decision! Unfortunately, doing so adds a level of complexity to the code here which is entirely unnecessary. Asynchronous callbacks with machine learning functions in ml5.js are typically necessary due to the time required to process a large amount of data in a model. Without a callback, the code might have to wait a long time and if it’s in the context of a p5.js animation, it could severely impact the smoothness of any animation. The neural network here, however, only has four floating point inputs and two output labels! It’s tiny and can run so fast there’s no reason to implement this asynchronously.</p>
|
||
<p>For completeness, I will include a version of the example on this book’s website that implements neuroevolution with asynchronous callbacks. For the discussion here, however, I’m going to use a feature of ml5.js that allows me to take a shortcut. The method <code>classifySync()</code> is identical to <code>classify()</code>, but it runs synchronously, meaning that the code stops and waits for the results before moving on. You should be very careful when using this version of the method as it can cause problems in other contexts, but it will work well for this scenario. Here is the end of the <code>think()</code> method with <code>classifySync()</code>.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> let results = this.brain.classifySync(inputs);
|
||
if (results[0].label == "flap") {
|
||
this.flap();
|
||
}
|
||
}</pre>
|
||
<p>The neural network's prediction is in the same format as the gesture classifier and the decision can be made by checking the first element of the <code>results</code> array. If the output label is <code>"flap"</code>, then call <code>flap()</code>.</p>
|
||
<p>Now is where the real challenge begins: teaching the bird to win the game and flap its wings at the right moment! Recalling the discussion of genetic algorithms from Chapter 9, there are three key principles that underpin Darwinian evolution: <strong>Variation</strong>, <strong>Selection</strong>, and <strong>Heredity</strong>. Let’s go through each of these principles, implementing all the steps of the genetic algorithm itself with neural networks.</p>
|
||
<h3 id="variation-a-flock-of-flappy-birds">Variation: A Flock of Flappy Birds</h3>
|
||
<p>A single bird with a randomly initialized neural network isn’t likely to have any success at all. That lone bird will most likely jump incessantly and fly way offscreen or sit perched at the bottom of the canvas awaiting collision after collision with the pipes. This erratic and nonsensical behavior is a reminder: a randomly initialized neural network lacks any knowledge or experience! The bird is essentially making wild guesses for its actions and success is going to be very rare.</p>
|
||
<p>This is where the first key principle of genetic algorithms comes in: <strong>variation</strong>. The hope is that by introducing as many different neural network configurations as possible, a few might perform slightly better than the rest. The very first step towards variation is to add an array of many birds.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Population size
|
||
let populationSize = 200;
|
||
// Array of birds
|
||
let birds = [];
|
||
|
||
function setup() {
|
||
//{!3} Create the bird population
|
||
for (let i = 0; i < populationSize; i++) {
|
||
birds[i] = new Bird();
|
||
}
|
||
|
||
//{!1} Run the computations on the "cpu" for better performance
|
||
ml5.setBackend("cpu");
|
||
}
|
||
|
||
function draw() {
|
||
for (let bird of birds) {
|
||
//{!1} This is the new method for the bird to make a decision to flap or not
|
||
bird.think(pipes);
|
||
bird.update();
|
||
bird.show();
|
||
}
|
||
}</pre>
|
||
<p>You might notice a peculiar line of code that's crept into setup: <code>ml5.setBackend("cpu")</code>. When running neural networks, a lot of the heavy computational lifting is often offloaded to the GPU. This is the default behavior, and especially critical for larger pre-trained models included as part of ml5.js.</p>
|
||
<div data-type="note">
|
||
<h3 id="gpu-vs-cpu">GPU vs. CPU</h3>
|
||
<ul>
|
||
<li><strong>GPU (Graphics Processing Unit)</strong>: Originally designed for rendering graphics, GPUs are adept at handling a massive number of operations in parallel. This makes them excellent for the kind of math operations and computations that machine learning models frequently perform.</li>
|
||
<li><strong>CPU (Central Processing Unit)</strong>: Often considered the "brain" or general-purpose heart of a computer, a CPU handles a wider variety of tasks than the specialized GPU.</li>
|
||
</ul>
|
||
</div>
|
||
<p>But there's a catch! Transferring data to and from the GPU introduces some overhead. In most cases, the gains from the GPU's parallel processing offset this overhead. However, for such a tiny model like the one here, copying data to the GPU and back slows things down more than it helps.</p>
|
||
<p>This is where <code>ml5.setBackend("cpu")</code> comes in. By specifying <code>"cpu"</code>, the neural network computations will instead run on the “Central Processing Unit” —the general-purpose heart of your computer— which handles the operations more efficiently for a population of many tiny bird brains.</p>
|
||
<h3 id="selection-flappy-bird-fitness">Selection: Flappy Bird Fitness</h3>
|
||
<p>Once I’ve got a diverse population of birds, each with their own neural network, the next step in the genetic algorithm is <strong>selection</strong>. Which birds should pass on their genes (in this case, neural network weights) to the next generation? In the world of Flappy Bird, the measure of success is the ability to stay alive the longest avoiding the pipes. This is the bird's "fitness." A bird that dodges many pipes is considered more "fit" than one that crashes into the first one it encounters.</p>
|
||
<p>To track the bird’s fitness, I am going to add two properties to the <code>Bird</code> class: <code>fitness</code> and <code>alive</code>.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> constructor() {
|
||
// The bird's fitness
|
||
this.fitness = 0;
|
||
//{!1} Keeping track if the bird is alive or not
|
||
this.alive = true;
|
||
}</pre>
|
||
<p>I’ll assign the fitness a numeric value that increases by 1 every cycle through <code>draw()</code>, as long as the bird remains alive. The birds that survive longer should have a higher fitness.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> update() {
|
||
//{!1} Incrementing the fitness each time through update
|
||
this.fitness++;
|
||
}</pre>
|
||
<p>The <code>alive</code> property is a <code>boolean</code> flag that is initially set to <code>true</code>. However, when a bird collides with a pipe, it is set to <code>false</code>. Only birds that are still alive are updated and drawn to the canvas.</p>
|
||
<pre class="codesplit" data-code-language="javascript">function draw() {
|
||
// There are now an array of birds!
|
||
for (let bird of birds) {
|
||
//{!1} Only operate on the birds that are still alive
|
||
if (bird.alive) {
|
||
// Make a decision based on the pipes
|
||
bird.think(pipes);
|
||
// Update and show the bird
|
||
bird.update();
|
||
bird.show();
|
||
|
||
//{!4} Has the bird hit a pipe? If so, it's no longer alive.
|
||
for (let pipe of pipes) {
|
||
if (pipe.collides(bird)) {
|
||
bird.alive = false;
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}</pre>
|
||
<p>In Chapter 9, I demonstrated two techniques for running an evolutionary simulation. The first involved a population living for a fixed amount of time each generation. The same approach would likely work here as well, but I want to allow the birds to accumulate the highest fitness possible and not arbitrarily stop them based on a time limit. The second technique, demonstrated with the "bloops" example, involved eliminating the fitness score entirely and setting a random probability for cloning alive birds. However, this approach could become messy and risks overpopulation or all the birds dying out completely. Instead, I propose combining elements of both approaches. I will allow a generation to continue as long as at least one bird is still alive. When all the birds have died, I will select parents for the reproduction step and start anew.</p>
|
||
<p>Let’s begin by writing a function to check if all the birds have died.</p>
|
||
<pre class="codesplit" data-code-language="javascript">function allBirdsDead() {
|
||
for (let bird of birds) {
|
||
//{!3} If a single bird is alive, they are not all dead!
|
||
if (bird.alive) {
|
||
return false;
|
||
}
|
||
}
|
||
//{!1} If the loop completes without finding a living bird, they are all dead
|
||
return true;
|
||
}</pre>
|
||
<p>When all the birds have died, then it’s time for selection! In the previous genetic algorithm examples I demonstrated a technique for giving a fair shot to all members of a population, but increasing the chances of selection for those with higher fitness scores. I’ll use that same <code>weightedSelection()</code> function here.</p>
|
||
<pre class="codesplit" data-code-language="javascript">//{!1} See chapter 9 for a detailed explanation of this algorithm
|
||
function weightedSelection() {
|
||
let index = 0;
|
||
let start = random(1);
|
||
while (start > 0) {
|
||
start = start - birds[index].fitness;
|
||
index++;
|
||
}
|
||
index--;
|
||
//{!1} Instead of returning the entire Bird object, just the brain is returned
|
||
return birds[index].brain;
|
||
}</pre>
|
||
<p>However, for this algorithm to function properly, I need to first normalize the fitness values of the birds so that they collectively sum to 1. This way, each bird's fitness is equal to its probability of being selected.</p>
|
||
<pre class="codesplit" data-code-language="javascript">function normalizeFitness() {
|
||
// Sum the total fitness of all birds
|
||
let sum = 0;
|
||
for (let bird of birds) {
|
||
sum += bird.fitness;
|
||
}
|
||
//{!3} Divide each bird's fitness by the sum
|
||
for (let bird of birds) {
|
||
bird.fitness = bird.fitness / sum;
|
||
}
|
||
}</pre>
|
||
<h3 id="heredity-baby-birds">Heredity: Baby Birds</h3>
|
||
<p>There’s only one step left in the genetic algorithm—reproduction. In Chapter 9, I explored in great detail the two step process for generating a “child” element: crossover and mutation. Crossover is where the third key principle of <strong>heredity</strong> arrives. After selecting the DNA of two parents, they are combined to form the child’s DNA. At first glance, the idea of inventing an algorithm for crossover of two neural networks might seem daunting. Yet, it’s actually quite straightforward. Think of the individual “genes” of a bird’s brain to be the weights within the network. Mixing two such brains boils down to creating a new neural network, where each weight is chosen by a virtual coin flip—picking a value from the first or second parent.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Picking two parents and creating a child with crossover
|
||
let parentA = weightedSelection();
|
||
let parentB = weightedSelection();
|
||
let child = parentA.crossover(parentB);</pre>
|
||
<p>As you can see, today is my lucky day, as ml5.js includes a <code>crossover()</code> that manages the algorithm for mixing the two neural networks. I can happily move onto the mutation step.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Mutating the child
|
||
child.mutate(0.01);</pre>
|
||
<p>The ml5.js library also provides a <code>mutate()</code> method that accepts a "mutation rate" as its primary argument. The rate determines how often a weight will be altered. For example, a rate of 0.01 indicates a 1% chance that any given weight will mutate. During mutation, ml5.js adjusts the weight slightly by adding a small random number to it, rather than selecting a completely new random value. This behavior mimics real-world genetic mutations, which typically introduce minor changes rather than entirely new traits. Although this default approach works for many cases, ml5.js offers more control over the process by allowing the use of a "custom" function as an optional second argument to <code>mutate()</code>.</p>
|
||
<p>These crossover and mutation steps are repeated for the size of the population to create an entire new generation of birds. This is accomplished by populating an empty local array <code>nextBirds</code> with the new birds. Once the population is full, the global <code>birds</code> array is then updated to this fresh generation.</p>
|
||
<pre class="codesplit" data-code-language="javascript">function reproduction() {
|
||
//{!1} Start with a new empty array
|
||
let nextBirds = [];
|
||
for (let i = 0; i < populationSize; i++) {
|
||
// Pick 2 parents
|
||
let parentA = weightedSelection();
|
||
let parentB = weightedSelection();
|
||
// Create a child with crossover
|
||
let child = parentA.crossover(parentB);
|
||
// Apply mutation
|
||
child.mutate(0.01);
|
||
//{!1} Create the new bird object
|
||
nextBirds[i] = new Bird(child);
|
||
}
|
||
//{!1} The next generation is now the current one!
|
||
birds = nextBirds;
|
||
}</pre>
|
||
<p>If you look closely at the <code>reproduction()</code> function, you may notice that I’ve slipped in another new feature of the <code>Bird</code> class, specifically an argument to the constructor. When I first introduced the idea of a bird “brain,” each new <code>Bird</code> object was created with a brand new brain—a fresh neural network courtesy of ml5.js. However, I now want the new birds to “inherit” a child brain that was generated through the processes of crossover and mutation.</p>
|
||
<p>To make this possible, I’ll subtly change the <code>Bird</code> constructor to look for an “optional” argument named, of course, <code>brain</code>.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> constructor(brain) {
|
||
//{!1} Check if a brain was passed in
|
||
if (brain) {
|
||
this.brain = brain;
|
||
//{!1} If not, proceed as usual
|
||
} else {
|
||
this.brain = ml5.neuralNetwork({
|
||
inputs: 4,
|
||
outputs: ["flap", "no flap"],
|
||
task: "classification",
|
||
neuroEvolution: true,
|
||
});
|
||
}
|
||
}</pre>
|
||
<p>Here’s the magic, if no <code>brain</code> is provided when a new bird is created, the <code>brain</code> argument remains <code>undefined</code>. In JavaScript, <code>undefined</code> is treated as <code>false</code> and so the code moves on to the <code>else</code> and calls <code>ml5.neuralNetwork()</code>. On the other hand, if I I do pass in an existing neural network, <code>brain</code> evaluates to <code>true</code> and is assigned directly to <code>this.brain</code>. This elegant trick allows the constructor to handle different scenarios.</p>
|
||
<p>And with that, the example is complete. All that is left to do is call <code>normalizeFitness()</code> and <code>reproduction()</code> in <code>draw()</code> at the end of each generation when all the birds have died out.</p>
|
||
<div data-type="example">
|
||
<h3 id="example-104-flappy-bird-neuroevolution">Example 10.4: Flappy Bird NeuroEvolution</h3>
|
||
<figure>
|
||
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/PEUKc5dpZ" data-example-path="examples/10_nn/10_4_flappy_bird_neuro_evolution"><img src="examples/10_nn/10_4_flappy_bird_neuro_evolution/screenshot.png"></div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript">function draw() {
|
||
//{inline} all the rest of draw
|
||
|
||
//{!4} Create the next generation when all the birds have died
|
||
if (allBirdsDead()) {
|
||
normalizeFitness();
|
||
reproduction();
|
||
}
|
||
}</pre>
|
||
<p>Example 10.4 also adjusts the behavior of birds so that they die when they leave the canvas, either by crashing into the ground or soaring too high above the top.</p>
|
||
<p><strong>EXERCISE: SPEED UP TIME, ANNOTATE PROCESS, ETC.</strong></p>
|
||
<p><strong>EXERCISE: SAVE AND LOAD BIRD</strong></p>
|
||
<h2 id="steering-the-neuroevolutionary-way">Steering the Neuroevolutionary Way</h2>
|
||
<p>Having explored neuroevolution with Flappy Bird, I’d like to shift the focus back to the realm of simulation, specifically the steering agents introduced in chapter 5. What if, instead of dictating the rules for an algorithm to calculate a steering force, a simulated creature could evolve its own strategy? Drawing inspiration from Craig Reynolds’ aim of “life-like and improvisational” behaviors, my goal is not to use neuroevolution to engineer the perfect creature that can flawlessly execute a task. Instead, I hope to create a captivating world of simulated life, where the quirks, nuances, and happy accidents of evolution unfold in the canvas.</p>
|
||
<p>Let’s begin with adapting the Smart Rockets example from Chapter 9. In that example, the genetic code for each rocket was an array of vectors.</p>
|
||
<pre class="codesplit" data-code-language="javascript">this.genes = [];
|
||
for (let i = 0; i < lifeSpan; i++) {
|
||
//{!2} Each gene is a vector with random direction and magnitude
|
||
this.genes[i] = p5.Vector.random2D();
|
||
this.genes[i].mult(random(0, this.maxforce));
|
||
}</pre>
|
||
<p>I propose adapting the above to instead use a neural network to "predict" the vector or steering force, transforming the <code>genes</code> into a <code>brain</code>.</p>
|
||
<pre class="codesplit" data-code-language="javascript">this.brain = ml5.neuralNetwork({
|
||
inputs: 2,
|
||
outputs: 2,
|
||
task: "regression",
|
||
neuroEvolution: true,
|
||
});</pre>
|
||
<p>But what are the inputs and outputs? In the original example, the vectors from the <code>genes</code> array were applied sequentially, querying the array with a <code>counter</code> variable.</p>
|
||
<pre class="codesplit" data-code-language="javascript">this.applyForce(this.genes[this.counter]);</pre>
|
||
<p>Now, instead of an array lookup, I want the neural network to return a vector with <code>predictSync()</code>.</p>
|
||
<pre class="codesplit" data-code-language="javascript">// Get the outputs from the neural network
|
||
let outputs = this.brain.predictSync(inputs);
|
||
// Use one output for an angle
|
||
let angle = outputs[0].value * TWO_PI;
|
||
// Use another outputs for magnitude
|
||
let magnitude = outputs[1].value * this.maxforce;
|
||
// Create and apply the force
|
||
let force = p5.Vector.fromAngle(angle).setMag(magnitude);
|
||
this.applyForce(force);</pre>
|
||
<p>The neural network brain outputs two values; one for the angle of the vector, one for the magnitude. You might think to use these outputs for the vector’s <span data-type="equation">x</span> and <span data-type="equation">y</span> components. However, the default output range for an ml5 neural network is between 0 and 1. I want the forces to be capable of pointing in both positive and negative directions! Mapping an angle offers the full range.</p>
|
||
<p>You may have noticed that the code includes a variable called <code>inputs</code> that I have yet to declare or initialize. Defining the inputs to the neural network is where you as the designer of the system can be the most creative, and consider the simulated biology and capabilities of your creatures.</p>
|
||
<p>As a first try, I’ll assign something very basic for the inputs and see if it works. Since the Smart Rockets environment is static, with fixed obstacles and targets, what if the brain could learn and estimate a "flow field" to navigate towards its goal? A flow field receives a position and returns a vector, so the neural network can mirror this functionality and use the rocket's position as input (normalizing the x and y values according to the canvas dimensions).</p>
|
||
<pre class="codesplit" data-code-language="javascript">let inputs = [this.position.x / width, this.position.y / height];</pre>
|
||
<p>That’s it! Everything else from the original example can remain unchanged: the population, the fitness function, and the selection process. The only other small adjustment is to use ml5.js’s <code>crossover()</code> and <code>mutate()</code> functions, eliminating the the need for a separate <code>DNA</code> class with implementations of these steps.</p>
|
||
<div data-type="example">
|
||
<h3 id="example-105-smart-rockets-neuroevolution">Example 10.5: Smart Rockets Neuroevolution</h3>
|
||
<figure>
|
||
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/KkV4lTS4H" data-example-path="examples/10_nn/10_5_smart_rockets_neuro_evolution"><img src="examples/10_nn/10_5_smart_rockets_neuro_evolution/screenshot.png"></div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript"> reproduction() {
|
||
let nextPopulation = [];
|
||
// Create the next population
|
||
for (let i = 0; i < this.population.length; i++) {
|
||
// Sping the wheel of fortune to pick two parents
|
||
let parentA = this.weightedSelection();
|
||
let parentB = this.weightedSelection();
|
||
let child = parentA.crossover(parentB);
|
||
//{!1} Apply mutation
|
||
child.mutate(this.mutationRate);
|
||
nextPopulation[i] = new Rocket(320, 220, child);
|
||
}
|
||
//{!1} Replace the old population
|
||
this.population = nextPopulation;
|
||
this.generations++;
|
||
}</pre>
|
||
<p><strong>EXERCISE: something about desired vs. steering and using the velocity as inputs also</strong></p>
|
||
<h3 id="a-changing-world">A Changing World</h3>
|
||
<p>In the Smart Rockets example, the environment was static. This made the rocket's task of finding the target easy to accomplish using only its position as input. However, what if the target and the obstacles in the rocket's path were moving? To handle a more complex and changing environment, I need to expand the neural network's inputs and consider additional "features" of the environment. This is similar to what I did with Flappy Bird, where I identified the key data points of the environment to guide the bird's decision-making process.</p>
|
||
<p>Let’s begin with the simplest version of this scenario, almost identical to the Smart Rockets, but removing obstacles and replacing the fixed target with a random “perlin noise” walker. In this world, I’ll rename the <code>Rocket</code> to <code>Creature</code> and write a new <code>Glow</code> class to represent a gentle, drifting orb. Imagine that the creature’s goal is to reach the light source and dance in its radiant embrace as long as it can.</p>
|
||
<pre class="codesplit" data-code-language="javascript">class Glow {
|
||
constructor() {
|
||
//{!2} Two different perlin noise offsets
|
||
this.xoff = 0;
|
||
this.yoff = 1000;
|
||
this.position = createVector();
|
||
this.r = 24;
|
||
}
|
||
|
||
update() {
|
||
//{!2} Assign the position according to Perlin noise
|
||
this.position.x = noise(this.xoff) * width;
|
||
this.position.y = noise(this.yoff) * height;
|
||
//{!2} Move along the perlin noise space
|
||
this.xoff += 0.01;
|
||
this.yoff += 0.01;
|
||
}
|
||
|
||
show() {
|
||
stroke(0);
|
||
strokeWeight(2);
|
||
fill(200);
|
||
circle(this.position.x, this.position.y, this.r * 2);
|
||
}
|
||
}</pre>
|
||
<p>As the glow moves, the creature should take the glow’s position into account, as an input to its brain. However, it is not sufficient to know only the light’s position; it’s the position relative to the creature’s own that is key. A nice way to synthesize this information as an input feature is to calculate a vector that points from the creature to the glow. Here is where I can reinvent the <code>seek()</code> method from Chapter 5 using a neural network to estimate the steering force.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> seek(target) {
|
||
//{!1} Calculate a vector from the position to the target
|
||
let v = p5.Vector.sub(target, this.position);</pre>
|
||
<p>This is a good start, but the components of the vector do not fall within a normalized input range. I could divide <code>v.x</code> by <code>width</code> and <code>v.y</code> by <code>height</code>, but since my canvas is not a perfect square, it may skew the data. Another solution is to normalize the vector, but with that, I would lose any measure of the distance to the glow itself. After all, if the creature is sitting on top of the glow, it should steer differently than if it were very far away. There are multiple approaches I could take here. I’ll go with saving the distance in a separate variable before normalizing and plan to use it as an additional input feature.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> seek(target) {
|
||
let v = p5.Vector.sub(target, this.position);
|
||
// Save the distance in a variable (one input)
|
||
let distance = v.mag();
|
||
// Normalize the vector pointing from position to target (two inputs)
|
||
v.normalize();</pre>
|
||
<p>Now, if you recall, a key element of Reynolds’ steering formula involves comparing the desired velocity to the current velocity. How the vehicle is currently moving plays a significant role in how it should steer! For the creature to consider its own velocity as part of its decision-making, I can include the velocity vector in the inputs as well. To normalize these values, it works beautifully to divide the vector’s components by the <code>maxspeed</code> property. This retains both the direction and magnitude of the vector. The rest of the code follows the same with the output of the neural network synthesized into a force to be applied to the creature.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> seek(target) {
|
||
let v = p5.Vector.sub(target.position, this.position);
|
||
let distance = v.mag();
|
||
v.normalize();
|
||
// Compiling the features into an inputs array
|
||
let inputs = [
|
||
v.x,
|
||
v.y,
|
||
distance / width,
|
||
this.velocity.x / this.maxspeed,
|
||
this.velocity.y / this.maxspeed,
|
||
];
|
||
//{!5} Predicting the force to apply
|
||
let outputs = this.brain.predictSync(inputs);
|
||
let angle = outputs[0].value * TWO_PI;
|
||
let magnitude = outputs[1].value;
|
||
let force = p5.Vector.fromAngle(angle).setMag(magnitude);
|
||
this.applyForce(force);
|
||
}</pre>
|
||
<p>Enough has changed here from the rockets that it is also worth reconsidering the fitness function. Previously, fitness was calculated based on the rocket's distance from the target at the end of each generation. However, since this new target is moving, I prefer to accumulate the amount of time the creature is able to catch the glow as the measure of fitness. This can be achieved by checking the distance between the creature and the glow in the <code>update()</code> method and incrementing a <code>fitness</code> value when they are intersecting. Both the <code>Glow</code> and <code>Creature</code> class include a radius property <code>r</code> which can be used to determine collision.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> update(target) {
|
||
//{inline} the usual updating of position, velocity, accleration
|
||
|
||
//{!4} Increase the fitness whenever the creature reaches the glow
|
||
let d = p5.Vector.dist(this.position, target.position);
|
||
if (d < this.r + target.r) {
|
||
this.fitness++;
|
||
}
|
||
}</pre>
|
||
<p>Now, one thing you may have noticed about these examples is that testing them requires a delightful exercise in patience as you watch the slow crawl of the simulation play out generation after generation. This is part of the point—I want to watch the process! It’s also a nice excuse to take a break, which is to be encouraged. Head outside, enjoy some non-simulated nature, perhaps a small cup of soothing tea while you wait? Take comfort in the fact that you only have to wait billions of milliseconds rather than the billions of years required for actual biology.</p>
|
||
<p>Nevertheless, for the system to evolve, there’s no inherent requirement that you draw and animate the world. Hundreds of generations could be completed in the blink of an eye if you could skip all that time spent rendering the scene.</p>
|
||
<p>One way to avoid tearing your hair out every time you change a small parameter and find yourself waiting what seems like hours to see if it had any effect is to render the environment, well, <em>less often</em>. In other words, you can compute multiple simulation steps per <code>draw()</code> cycle with a <code>for</code> loop.</p>
|
||
<p>Here is where I can make use of one of my favorite features of p5.js: the ability to quickly create standard interface elements. You saw this before in the interactive selection example from Chapter 10 with <code>createButton()</code>. In the following code, a "range" slider is used to control the skips in time. Only the code for the new time slider is shown here, excluding all the other global variables and their initializations in <code>setup()</code>. Remember, you will also need to separate the code for visuals from the physics to ensure that rendering still occurs only once.</p>
|
||
<pre class="codesplit" data-code-language="javascript">//{!1} A variable to hold the slider
|
||
let timeSlider;
|
||
|
||
function setup() {
|
||
//{!1} Creating the slider with a min and max range, and starting value
|
||
timeSlider = createSlider(1, 20, 1);
|
||
}
|
||
|
||
function draw() {
|
||
//{!5} All of the drawing code happening just once!
|
||
background(255);
|
||
glow.show();
|
||
for (let creature of creatures) {
|
||
creature.show();
|
||
}
|
||
|
||
//{!8} All of the simulation code running multiple times according to the slider
|
||
for (let i = 0; i < timeSlider.value(); i++) {
|
||
for (let creature of creatures) {
|
||
creature.seek(glow);
|
||
creature.update(glow);
|
||
}
|
||
glow.update();
|
||
lifeCounter++;
|
||
}
|
||
}</pre>
|
||
<p>In p5.js, a slider is defined with three arguments: a minimum value (for when the slider is all the way to the left), a maximum value (for when the slider is all the way to the right), and a starting value (for when the page first loads). This allows the simulation to run at 20X speed to reach the results of evolution more quickly, then slow back down to bask in the glory of the intelligent behaviors on display. Here is the final version of the example with a new<code>Creature</code> constructor to create a neural network. Everything else has remained the same from the Flappy Bird example code.</p>
|
||
<div data-type="example">
|
||
<h3 id="example-106-neuroevolution-steering">Example 10.6: Neuroevolution Steering</h3>
|
||
<figure>
|
||
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/fZDfxxVrf" data-example-path="examples/10_nn/10_6_neuro_evolution_steering_seek"><img src="examples/10_nn/10_6_neuro_evolution_steering_seek/screenshot.png"></div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript">class Creature {
|
||
constructor(x, y, brain) {
|
||
this.position = createVector(x, y);
|
||
this.velocity = createVector(0, 0);
|
||
this.acceleration = createVector(0, 0);
|
||
this.r = 4;
|
||
this.maxspeed = 4;
|
||
this.fitness = 0;
|
||
|
||
if (brain) {
|
||
this.brain = brain;
|
||
} else {
|
||
this.brain = ml5.neuralNetwork({
|
||
inputs: 5,
|
||
outputs: 2,
|
||
task: "regression",
|
||
neuroEvolution: true,
|
||
});
|
||
}
|
||
}
|
||
|
||
//{inline} seek() predicts a steering force as described previously
|
||
|
||
//{inline} update() increments the fitness if the glow is reached as described previously
|
||
|
||
}</pre>
|
||
<h3 id="neuroevolution-ecosystem">Neuroevolution Ecosystem</h3>
|
||
<p>If I’m being honest here, this chapter is getting kind of long. My goodness, this book is incredibly long, are you really still here reading? I’ve been working on it for over ten years and right now, at this very moment as I type these letters, I feel like stopping. But I cannot. I will not. There is one more thing I must demonstrate, that I am obligated to, that I won’t be able to tolerate skipping. So bear with me just a little longer. I hope it will be worth it.</p>
|
||
<p>There are two key elements of what I’ve demonstrated so far that don’t fit into my dream of the Ecosystem Project that has been the through-line of this book. The first is something I covered in chapter 9 with the introduction of the bloops—a system of creatures that all lives and dies together, starting completely over with each subsequent generation, is not how the biological world works! I’d like to also examine this in the context of neuroevolution.</p>
|
||
<p>But even more so, there’s a major flaw in the way I am extracting features from a scene. The creatures in Example 10.6 are all knowing. They know exactly where the glow is regardless of how far away they are or what might be blocking their vision or senses. Yes, it may be reasonable to assume they are aware of their current velocity, but I didn’t introduce any limits to the perception of external elements in their environment.</p>
|
||
<p>A common approach in reinforcement learning simulations is to attach sensors to an agent. For example, consider a simulated mouse in a maze searching for cheese in the dark. Its whiskers might act as proximity sensors to detect walls and turns. The mouse can’t see the entire maze, only its immediate surroundings. Another example is a bat using echolocation to navigate, or a car on a winding road that can only see what is projected in front of its headlights.</p>
|
||
<p>I’d like to build on this idea of the whiskers (or more formally the “vibrissae”) found in mice, cats, and other mammals. In the real world, animals use their vibrissae to navigate and detect nearby objects, especially in the dark or obscured environments.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_23.jpg" alt="ILLUSTRATION OF A MOUSE OR CAT OR FICTIONAL CREATURE SENSING ITS ENVIRONMENT WITH ITS WHISKERS (image temporarily from https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Cat_whiskers_closeup.jpg/629px-Cat_whiskers_closeup.jpg?20120309014158)">
|
||
<figcaption><strong><em>ILLUSTRATION OF A MOUSE OR CAT OR FICTIONAL CREATURE SENSING ITS ENVIRONMENT WITH ITS WHISKERS (image temporarily from </em></strong><a href="https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Cat_whiskers_closeup.jpg/629px-Cat_whiskers_closeup.jpg?20120309014158=">https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Cat_whiskers_closeup.jpg/629px-Cat_whiskers_closeup.jpg?20120309014158</a>)</figcaption>
|
||
</figure>
|
||
<p>I’ll keep the generic class name <code>Creature</code> but think of them now as the circular “bloops” of chapter 9, enhanced with whisker-like sensors that emanate from their center in all directions.</p>
|
||
<pre class="codesplit" data-code-language="javascript">class Creature {
|
||
constructor(x, y) {
|
||
// The creature has a position and radius
|
||
this.position = createVector(x, y);
|
||
this.r = 16;
|
||
// The creature has an array of sensors
|
||
this.sensors = [];
|
||
|
||
// The creature has a 5 sensors
|
||
let totalSensors = 5;
|
||
for (let i = 0; i < totalSensors; i++) {
|
||
// First, calculate a direction for the sensor
|
||
let angle = map(i, 0, totalSensors, 0, TWO_PI);
|
||
// Create a vector a little bit longer than the radius as the sensor
|
||
this.sensors[i] = p5.Vector.fromAngle(angle).mult(this.r * 1.5);
|
||
}
|
||
}
|
||
}</pre>
|
||
<p>The code creates a series of vectors that each describe the direction and length of one “whisker” sensor attached to the creature. However, just the vector is not enough. I want the sensor to include a <code>value</code>, a numeric representation of what it is sensing. This <code>value</code> can be thought of as analogous to the intensity of touch. Just as a cat's whisker might detect a faint touch from a distant object or a stronger push from a closer one, the virtual sensor's value could range to represent proximity. Let’s assume there is a <code>Food</code> class to describe a circle of deliciousness that the creature wants to find.</p>
|
||
<pre class="codesplit" data-code-language="javascript">class Food {
|
||
//{!4} A piece of food has a random position and fixed radius
|
||
constructor() {
|
||
this.position = createVector(random(width), random(height));
|
||
this.r = 50;
|
||
}
|
||
|
||
show() {
|
||
noStroke();
|
||
fill(0, 100);
|
||
circle(this.position.x, this.position.y, this.r * 2);
|
||
}
|
||
}</pre>
|
||
<p>A <code>Food</code> object is a circle drawn according to a position and radius. I’ll assume the creature in my simulation has no vision and relies on sensors to detect if there is food nearby. This begs the question: how can I determine if a sensor is touching the food? One approach is to use a technique called “raycasting.” This method is commonly employed in computer graphics to project rays (often representing light) from an origin point in a scene to determine what objects they intersect with. Raycasting is useful for visibility and collision checks, exactly what I am doing here!</p>
|
||
<p>Although raycasting is a robust solution, it requires more involved mathematics than I'd like to delve into here. For those interested, an explanation and implementation are available in Coding Challenge #145 on <a href="http://thecodingtrain.com/">thecodingtrain.com</a>. For the example now, I will opt for a more straightforward approach and check whether the endpoint of a sensor lies inside the food circle.</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_24.jpg" alt="Figure 10.x: Endpoint of sensor is inside or outside of the food based on distance to center of food.">
|
||
<figcaption>Figure 10.x: Endpoint of sensor is inside or outside of the food based on distance to center of food.</figcaption>
|
||
</figure>
|
||
<p>As I want the sensor to store a value for its sensing along with the sensing algorithm itself, it makes sense to encapsulate these elements into a <code>Sensor</code> class.</p>
|
||
<pre class="codesplit" data-code-language="javascript">class Sensor {
|
||
constructor(v) {
|
||
this.v = v.copy();
|
||
//{!1} The sensor also stores a value for the proximity of what it is sensing
|
||
this.value = 0;
|
||
}
|
||
|
||
sense(position, food) {
|
||
//{!1} Find the "tip" (or endpoint) of the sensor by adding position
|
||
let end = p5.Vector.add(position, this.v);
|
||
//{!1} How far is it from the food center
|
||
let d = end.dist(food.position);
|
||
//{!1} If it is within the radius light up the sensor
|
||
if (d < food.r) {
|
||
//{!1} The further into the center the food, the more the sensor activates
|
||
this.value = map(d, 0, food.r, 1, 0);
|
||
} else {
|
||
this.value = 0;
|
||
}
|
||
}
|
||
}</pre>
|
||
<p>Notice how the sensing mechanism gauges how deep inside the food’s radius the endpoint is with the <code>map()</code> function. When the sensor's endpoint is just touching the outer boundary of the food, the <code>value</code> starts at 0. As the endpoint moves closer to the center of the food, the value increases, maxing out at 1. If the sensor isn't touching the food at all, its value remains at 0. This gradient of feedback mirrors the varying intensity of touch or pressure in the real world.</p>
|
||
<p>Let’s look at testing the sensors with one bloop (controlled by the mouse) and one piece of food (placed at the center of the canvas). When the sensors touch the food, they light up and get brighter the closer to the center.</p>
|
||
<div data-type="example">
|
||
<h3 id="example-107-bloops-with-sensors">Example 10.7: Bloops with Sensors</h3>
|
||
<figure>
|
||
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/vCTMtXXSS" data-example-path="examples/10_nn/10_7_creature_sensors"><img src="examples/10_nn/10_7_creature_sensors/screenshot.png"></div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript">let bloop, food;
|
||
|
||
function setup() {
|
||
createCanvas(640, 240);
|
||
//{!2} One bloop, one piece of food
|
||
bloop = new Creature();
|
||
food = new Food();
|
||
}
|
||
|
||
function draw() {
|
||
background(255);
|
||
|
||
// Temporarily control the bloop with the mouse
|
||
bloop.position.x = mouseX;
|
||
bloop.position.y = mouseY;
|
||
// Draw the food and the bloop
|
||
food.show();
|
||
bloop.show();
|
||
// The bloop senses the food
|
||
bloop.sense(food);
|
||
|
||
}
|
||
|
||
class Creature {
|
||
constructor(x, y) {
|
||
this.position = createVector(x, y);
|
||
this.r = 16;
|
||
|
||
//{!8} Create the sensors for the creature
|
||
this.sensors = [];
|
||
let totalSensors = 15;
|
||
for (let i = 0; i < totalSensors; i++) {
|
||
let a = map(i, 0, totalSensors, 0, TWO_PI);
|
||
let v = p5.Vector.fromAngle(a);
|
||
v.mult(this.r * 2);
|
||
this.sensors[i] = new Sensor(v);
|
||
}
|
||
}
|
||
|
||
//{!4} Call the sense() method for each sensor
|
||
sense(food) {
|
||
for (let i = 0; i < this.sensors.length; i++) {
|
||
this.sensors[i].sense(this.position, food);
|
||
}
|
||
}
|
||
|
||
//{inline} see book website for the drawing code
|
||
}</pre>
|
||
<p>Are you thinking what I’m thinking? What if the values of those sensors are the inputs to a neural network?! Assuming I bring back all of the necessary physics bits in the <code>Creature</code> class, I could write a new <code>think()</code> method that processes the sensor values through the neural network “brain” and outputs a steering force, just as with the previous two examples.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> think() {
|
||
// Build an input array from the sensor values
|
||
let inputs = [];
|
||
for (let i = 0; i < this.sensors.length; i++) {
|
||
inputs[i] = this.sensors[i].value;
|
||
}
|
||
|
||
// Predicting a steering force from the sensors
|
||
let outputs = this.brain.predictSync(inputs);
|
||
let angle = outputs[0].value * TWO_PI;
|
||
let magnitude = outputs[1].value;
|
||
let force = p5.Vector.fromAngle(angle).setMag(magnitude);
|
||
this.applyForce(force);
|
||
}</pre>
|
||
<p>The logical next step would be incorporate all the usual parts of the genetic algorithm, writing a fitness function (how much food did each creature eat?) and performing selection after a fixed generation time period. But this is a great opportunity to test out the principles of a ”continuous” ecosystem with a more sophisticated environment and set of potential behaviors for the creatures themselves.</p>
|
||
<p>Instead of a fixed lifespan cycle for the population, I will introduce the concept of <code>health</code> for each one. For every cycle through <code>draw()</code> that a creature lives, the health deteriorates.</p>
|
||
<pre class="codesplit" data-code-language="javascript">class Creature {
|
||
constructor() {
|
||
//{inline} All of the creature's properties
|
||
|
||
// The health starts at 100
|
||
this.health = 100;
|
||
}
|
||
|
||
update() {
|
||
//{inline} the usual updating position, velocity, acceleration
|
||
|
||
// Losing some health!
|
||
this.health -= 0.25;
|
||
}</pre>
|
||
<p>Now in <code>draw()</code>, if any bloop’s health drops below zero, it dies and is deleted from the array. And for reproduction, instead of performing the usual crossover and mutation all at once, each bloop (with a health grader than zero) will have a 0.1% chance of reproducing.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> function draw() {
|
||
for (let i = bloops.length - 1; i >= 0; i--) {
|
||
if (bloops[i].health < 0) {
|
||
bloops.splice(i, 1);
|
||
} else if (random(1) < 0.001) {
|
||
let child = bloops[i].reproduce();
|
||
bloops.push(child);
|
||
}
|
||
}
|
||
}</pre>
|
||
<p>This methodology will lose the <code>crossover()</code> functionality and instead use the <code>copy()</code> method. The reproductive process in this case is cloning rather than mating. A higher mutation rate isn’t always ideal but it will help introduce additional variation without the mixing of weights. However, I encourage you to consider ways that you could also incorporate crossover.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> reproduce() {
|
||
//{!2} copy and mutate rather than crossover and mutate
|
||
let brain = this.brain.copy();
|
||
brain.mutate(0.1);
|
||
return new Creature(this.position.x, this.position.y, brain);
|
||
}</pre>
|
||
<p>Now, for this to work, some bloops should live longer than others. By consuming food, their health increases giving them a boost of time to reproduce. I’ll manage in this an <code>eat()</code> method of the <code>Creature</code> class.</p>
|
||
<pre class="codesplit" data-code-language="javascript"> eat(food) {
|
||
// If the bloop is close to the food, increase its health!
|
||
let d = p5.Vector.dist(this.position, food.position);
|
||
if (d < this.r + food.r) {
|
||
this.health += 0.5;
|
||
}
|
||
}</pre>
|
||
<p>Is this enough for the system to evolve and find its equilibrium? I could dive deeper, tweaking parameters and behaviors in pursuit of the ultimate evolutionary system. The allure of the infinite rabbit hole is one I cannot easily escape. I will do that on my own time and for the purpose of this book, invite you to run the example, experiment, and draw your own conclusions.</p>
|
||
<div data-type="example">
|
||
<h3 id="example-108-neuroevolution-ecosystem">Example 10.8: Neuroevolution Ecosystem</h3>
|
||
<figure>
|
||
<div data-type="embed" data-p5-editor="https://editor.p5js.org/natureofcode/sketches/IQbcREjUK" data-example-path="examples/10_nn/10_8_neuroevolution_ecosystem"><img src="examples/10_nn/10_8_neuroevolution_ecosystem/screenshot.png"></div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript">let bloops = [];
|
||
let timeSlider;
|
||
let food = [];
|
||
|
||
function setup() {
|
||
createCanvas(640, 240);
|
||
ml5.setBackend("cpu");
|
||
for (let i = 0; i < 20; i++) {
|
||
bloops[i] = new Creature(random(width), random(height));
|
||
}
|
||
for (let i = 0; i < 8; i++) {
|
||
food[i] = new Food();
|
||
}
|
||
timeSlider = createSlider(1, 20, 1);
|
||
}
|
||
|
||
function draw() {
|
||
background(255);
|
||
for (let i = 0; i < timeSlider.value(); i++) {
|
||
for (let i = bloops.length - 1; i >= 0; i--) {
|
||
bloops[i].think();
|
||
bloops[i].eat();
|
||
bloops[i].update();
|
||
bloops[i].borders();
|
||
if (bloops[i].health < 0) {
|
||
bloops.splice(i, 1);
|
||
} else if (random(1) < 0.001) {
|
||
let child = bloops[i].reproduce();
|
||
bloops.push(child);
|
||
}
|
||
}
|
||
}
|
||
for (let treat of food) {
|
||
treat.show();
|
||
}
|
||
for (let bloop of bloops) {
|
||
bloop.show();
|
||
}
|
||
}</pre>
|
||
<p>The final example also includes a few additional features that you’ll find in the accompanying code such as an array of food that shrinks as it gets eaten (re-spawning when it is depleted). Additionally, the bloops shrink as their health deteriorates.</p>
|
||
<div data-type="project">
|
||
<h3 id="the-ecosystem-project-9">The Ecosystem Project</h3>
|
||
<p>Step 10 Exercise:</p>
|
||
<p>Try incorporating the concept of a “brain” into the creatures in your world!</p>
|
||
<ul>
|
||
<li>What are each creature’s inputs and outputs?</li>
|
||
<li>How do the creatures perceive? Do they “see” everything or have limits based on sensors?</li>
|
||
<li>How can you find balance in your system?</li>
|
||
</ul>
|
||
</div>
|
||
<h3 id="the-end">The end</h3>
|
||
<p>If you’re still reading, thank you! You’ve reached the end of the book. But for as much material as this book contains, I’ve barely scratched the surface of the physical world we inhabit and of techniques for simulating it. It’s my intention for this book to live as an ongoing project, and I hope to continue adding new tutorials and examples to the book’s website as well as expand and update accompanying video tutorials on <a href="https://thecodingtrain.com/">thecodingtrain.com</a>. Your feedback is truly appreciated, so please get in touch via email at <code>(daniel@shiffman.net)</code> or by contributing to the GitHub repository at <a href="https://github.com/nature-of-code">github.com/nature-of-code</a>, in keeping with the open-source spirit of the project. Share your work. Keep in touch. Let’s be two with nature.</p>
|
||
<p></p>
|
||
</section> |