noc-book-2/content/10_nn.html
2024-02-10 16:21:07 -05:00

2458 lines
106 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<section data-type="chapter">
<h1 id="chapter-10-neural-networks">Chapter 10. Neural Networks</h1>
<div class="chapter-opening-quote">
<blockquote data-type="epigraph">
<p>The human brain has 100 billion neurons,</p>
<p>each neuron connected to 10 thousand</p>
<p>other neurons. Sitting on your shoulders</p>
<p>is the most complicated object</p>
<p>in the known universe.</p>
<div class="chapter-opening-quote-source">
<p>—Michio Kaku</p>
</div>
</blockquote>
</div>
<div class="chapter-opening-figure">
<figure>
<img src="images/10_nn/10_nn_1.jpg" alt="" />
<figcaption></figcaption>
</figure>
<h3
id="khipu-on-display-at-the-machu-picchu-museum-cusco-peru-photo-by-pi3124"
>
Khipu on display at the Machu Picchu Museum, Cusco, Peru (photo by
Pi3.124)
</h3>
<p>
The <em>khipu</em> (or <em>quipu</em>) is an ancient Incan device used for
recordkeeping and communication. It comprised a complex system of knotted
cords to encode and transmit information. Each colored string and knot
type and pattern represented specific data, such as census records or
calendrical information. Interpreters, known as <em>quipucamayocs</em>,
acted as a kind of accountant and decoded the stringed narrative into
understandable information.
</p>
</div>
<p>
I began with inanimate objects living in a world of forces, and I gave them
desires, autonomy, and the ability to take action according to a system of
rules. Next, I allowed those objects, now called <em>creatures</em>, to live
in a population and evolve over time. Now Id like to ask, What is each
creatures decision-making process? How can it adjust its choices by
learning over time? Can a computational entity process its environment and
generate a decision?
</p>
<p>
To answer these questions, Ill once again look to nature for
inspiration—specifically, the human brain. A brain can be described as a
biological <strong>neural network</strong>, an interconnected web of neurons
transmitting elaborate patterns of electrical signals. Within each neuron,
dendrites receive input signals, and based on those inputs, the neuron fires
an output signal via an axon (see Figure 10.1). Or something like that. How
the human brain actually works is an elaborate and complex mystery, one that
Im certainly not going to attempt to unravel in rigorous detail in this
chapter.
</p>
<figure>
<img
src="images/10_nn/10_nn_2.png"
alt="Figure 10.1: A neuron with dendrites and an axon connected to another neuron"
/>
<figcaption>
Figure 10.1: A neuron with dendrites and an axon connected to another
neuron
</figcaption>
</figure>
<p>
Fortunately, as youve seen throughout this book, developing engaging
animated systems with code doesnt require scientific rigor or accuracy.
Designing a smart rocket isnt rocket science, and neither is designing an
artificial neural network brain science. Its enough to simply be inspired
by the <em>idea</em> of brain function.
</p>
<p>
In this chapter, Ill begin with a conceptual overview of the properties and
features of neural networks and build the simplest possible example of one,
a network that consists of a single neuron. Ill then introduce you to more
complex neural networks by using the ml5.js library. This will serve as a
foundation for <a href="/neuroevolution#">Chapter 11</a>, the grand finale
of this book, where Ill combine GAs with neural networks for physics
simulation.
</p>
<h2 id="introducing-artificial-neural-networks">
Introducing Artificial Neural Networks
</h2>
<p>
Computer scientists have long been inspired by the human brain. In 1943,
Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician,
developed the first conceptual model of an artificial neural network. In
their paper “A Logical Calculus of the Ideas Immanent in Nervous Activity,”
they describe a <strong>neuron </strong>as a single computational cell
living in a network of cells that receives inputs, processes those inputs,
and generates an output.
</p>
<p>
Their work, and the work of many scientists and researchers who followed,
wasnt meant to accurately describe how the biological brain works. Rather,
an <em>artificial</em> neural network (hereafter referred to as just a
<em>neural network</em>) was intended as a computational model based on the
brain, designed to solve certain kinds of problems that were traditionally
difficult for computers.
</p>
<p>
Some problems are incredibly simple for a computer to solve but difficult
for humans like you and me. Finding the square root of 964,324 is an
example. A quick line of code produces the value 982, a number my computer
can compute in less than a millisecond, but if you asked me to calculate
that number myself, youd be in for quite a wait. On the other hand, certain
problems are incredibly simple for you or me to solve, but not so easy for a
computer. Show any toddler a picture of a kitten or puppy, and theyll
quickly be able to tell you which one is which. Listen to a conversation in
a noisy café and focus on just one persons voice, and you can effortlessly
comprehend their words. But need a machine to perform one of these tasks?
Scientists have spent entire careers researching and implementing complex
solutions, and neural networks are one of them.
</p>
<p>
Here are some of the easy-for-a-human, difficult-for-a-machine applications
of neural networks in software today:
</p>
<ul>
<li>
<strong>Pattern recognition:</strong> Neural networks are well suited to
problems when the aim is to detect, interpret, and classify features or
patterns within a dataset. This includes everything from identifying
objects (like faces) in images, to optical character recognition, to more
complex tasks like gesture recognition.
</li>
<li>
<strong>Time-series prediction and anomaly detection: </strong>Neural
networks are utilized both in forecasting, such as predicting stock market
trends or weather patterns, and in recognizing anomalies, which can be
applied to areas like cyberattack detection and fraud prevention.
</li>
<li>
<strong>Natural language processing (NLP):</strong> One of the biggest
developments in recent years has been the use of neural networks for
processing and understanding human language. Theyre used in various tasks
including machine translation, sentiment analysis, and text summarization,
and are the underlying technology behind many digital assistants and
chatbots.
</li>
<li>
<strong>Signal processing and soft sensors:</strong> Neural networks play
a crucial role in devices like cochlear implants and hearing aids by
filtering noise and amplifying essential sounds. Theyre also involved in
<em>soft sensors</em>, software systems that process data from multiple
sources to give a comprehensive analysis of the environment.
</li>
<li>
<strong>Control and adaptive decision-making systems: </strong>These
applications range from autonomous vehicles like self-driving cars and
drones to adaptive decision-making used in game playing, pricing models,
and recommendation systems on media platforms.
</li>
<li>
<strong>Generative models:</strong> The rise of novel neural network
architectures has made it possible to generate new content. These systems
can synthesize images, enhance image resolution, transfer style between
images, and even generate music and video.
</li>
</ul>
<p>
Covering the full gamut of applications for neural networks would merit an
entire book (or series of books), and by the time that book was printed, it
would probably be out of date. Hopefully, this list gives you an overall
sense of the features and possibilities.
</p>
<h3 id="how-neural-networks-work">How Neural Networks Work</h3>
<p>
In some ways, neural networks are quite different from other computer
programs. The computational systems Ive been writing so far in this book
are <strong>procedural</strong>: a program starts at the first line of code,
executes it, and goes on to the next, following instructions in a linear
fashion. By contrast, a true neural network doesnt follow a linear path.
Instead, information is processed collectively, in parallel, throughout a
network of nodes, with each node representing a neuron. In this sense, a
neural network is considered a <strong>connectionist </strong>system.
</p>
<p>
In other ways, neural networks arent so different from some of the programs
youve seen. A neural network exhibits all the hallmarks of a complex
system, much like a cellular automaton or a flock of boids. Remember how
each individual boid was simple to understand, yet by following only three
rules—separation, alignment, cohesion—it contributed to complex behaviors?
Each individual element in a neural network is equally simple to understand.
It reads an input (a number), processes it, and generates an output (another
number). Thats all there is to it, and yet a network of many neurons can
exhibit incredibly rich and intelligent behaviors, echoing the complex
dynamics seen in a flock of boids.
</p>
<div class="half-width-right">
<figure>
<img
src="images/10_nn/10_nn_3.png"
alt="Figure 10.2: A neural network is a system of neurons and connections."
/>
<figcaption>
Figure 10.2: A neural network is a system of neurons and connections.
</figcaption>
</figure>
</div>
<p>
In fact, a neural network isnt just a complex system, but a complex
<em>adaptive</em> system, meaning it can change its internal structure based
on the information flowing through it. In other words, it has the ability to
learn. Typically, this is achieved by adjusting <strong>weights</strong>. In
Figure 10.2, each arrow represents a connection between two neurons and
indicates the pathway for the flow of information. Each connection has a
weight, a number that controls the signal between the two neurons. If the
network generates a <em>good</em> output (which Ill define later), theres
no need to adjust the weights. However, if the network generates a
<em>poor</em> output—an error, so to speak—then the system adapts, altering
the weights with the hope of improving subsequent results.
</p>
<p>
Neural networks may use a variety of strategies for learning, and Ill focus
on one of them in this chapter:
</p>
<ul>
<li>
<strong>Supervised learning:</strong> Essentially, this strategy involves
a teacher thats smarter than the network itself. Take the case of facial
recognition. The teacher shows the network a bunch of faces, and the
teacher already knows the name associated with each face. The network
makes its guesses; then the teacher provides the network with the actual
names. The network can compare its answers to the known correct ones and
make adjustments according to its errors. The neural networks in this
chapter follow this model.
</li>
<li>
<strong>Unsupervised learning:</strong> This technique is required when
you dont have an example dataset with known answers. Instead, the network
works on its own to uncover hidden patterns in the data. An application of
this is clustering: a set of elements is divided into groups according to
an unknown pattern. I wont be showing any instances of unsupervised
learning, as the strategy is less relevant to the books examples.
</li>
<li>
<strong>R</strong><strong>einforcement learning:</strong> This strategy is
built on observation: a learning agent makes decisions and looks to its
environment for the results. Its rewarded for good decisions and
penalized for bad decisions, such that it learns to make better decisions
over time. Ill discuss this strategy in more detail in
<a href="/neuroevolution#">Chapter 11</a>.
</li>
</ul>
<p>
The ability of a neural network to learn, to make adjustments to its
structure over time, is what makes it so useful in the field of
<strong>machine learning</strong>. This term can be traced back to the 1959
paper “Some Studies in Machine Learning Using the Game of Checkers,” in
which computer scientist Arthur Lee Samuel outlines a “self-learning”
program for playing checkers. The concept of an algorithm enabling a
computer to learn without explicit programming is the foundation of machine
learning.
</p>
<p>
Think about what youve been doing throughout this book: coding! In
traditional programming, a computer program takes inputs and, based on the
rules youve provided, produces outputs. Machine learning, however, turns
this approach upside down. Instead of you writing the rules, the system is
given example inputs and outputs, and generates the rules itself! Many
algorithms can be used to implement machine learning, and a neural network
is just one of them.
</p>
<p>
Machine learning is part of the broad, sweeping field of
<strong>artificial intelligence (AI)</strong>, although the terms are
sometimes used interchangeably. In their thoughtful and friendly primer
<em>A Peoples Guide to AI</em>, Mimi Onuoha and Diana Nucera (aka Mother
Cyborg) define AI as “the theory and development of computer systems able to
perform tasks that normally require human intelligence.” Machine learning
algorithms are one approach to these tasks, but not all AI systems feature a
self-learning component.
</p>
<h3 id="machine-learning-libraries">Machine Learning Libraries</h3>
<p>
Today, leveraging machine learning in creative coding and interactive media
isnt only feasible but increasingly common, thanks to third-party libraries
that handle a lot of the neural network implementation details under the
hood. While the vast majority of machine learning development and research
is done in Python, the world of web development has seen the emergence of
powerful JavaScript-based tools. Two libraries of note are TensorFlow.js and
ml5.js.
</p>
<p>
TensorFlow.js<strong> </strong>is an open source library that lets you
define, train, and run neural networks directly in the browser using
JavaScript, without the need to install or configure complex environments.
Its part of the TensorFlow ecosystem, which is maintained and developed by
Google. TensorFlow.js is a powerful tool, but its low-level operations and
highly technical API can be intimidating to beginners. Enter ml5.js, a
library built on top of TensorFlow.js and designed specifically for use with
p5.js. Its goal is to be beginner friendly and make machine learning
approachable for a broad audience of artists, creative coders, and students.
Ill demonstrate how to use ml5.js in
<a href="#machine-learning-with-ml5js">“Machine Learning with ml5.js”</a>.
</p>
<p>
A benefit of libraries like TensorFlow.js and ml5.js is that you can use
them to run pretrained models. A machine learning <strong>model</strong> is
a specific setup of neurons and connections, and a
<strong>pretrained</strong> model is one that has already been prepared for
a particular task. For example, popular pretrained models are used for
classifying images, identifying body poses, recognizing facial landmarks or
hand positions, and even analyzing the sentiment expressed in a text. You
can use such a model as is or treat it as a starting point for additional
learning (commonly referred to as <strong>transfer learning</strong>).
</p>
<p>
Before I get to exploring the ml5.js library, however, Id like to try my
hand at building the simplest of all neural networks from scratch, using
only p5.js, to illustrate how the concepts of neural networks and machine
learning are implemented in code.
</p>
<h2 id="the-perceptron">The Perceptron</h2>
<p>
A <strong>perceptron</strong> is the simplest neural network possible: a
computational model of a single neuron. Invented in 1957 by Frank Rosenblatt
at the Cornell Aeronautical Laboratory, a perceptron consists of one or more
inputs, a processor, and a single output, as shown in Figure 10.3.
</p>
<figure>
<img
src="images/10_nn/10_nn_4.png"
alt="Figure 10.3: A simple perceptron with two inputs and one output"
/>
<figcaption>
Figure 10.3: A simple perceptron with two inputs and one output
</figcaption>
</figure>
<p>
A perceptron follows the <strong>feed-forward</strong> model: data passes
(feeds) through the network in one direction. The inputs are sent into the
neuron, are processed, and result in an output. This means the one-neuron
network diagrammed in Figure 10.3 reads from left to right (forward): inputs
come in, and output goes out.
</p>
<p>
Say I have a perceptron with two inputs, the values 12 and 4. In machine
learning, its customary to denote each input with an
<span data-type="equation">x</span>, so Ill call these inputs
<span data-type="equation">x_0</span> and
<span data-type="equation">x_1</span>:
</p>
<table>
<thead>
<tr>
<th style="width: 100px">Phrase</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><span data-type="equation">x_0</span></td>
<td>12</td>
</tr>
<tr>
<td><span data-type="equation">x_1</span></td>
<td>4</td>
</tr>
</tbody>
</table>
<h3 id="perceptron-steps">Perceptron Steps</h3>
<p>
To get from these inputs to an output, the perceptron follows a series of
steps.
</p>
<h4 id="step-1-weight-the-inputs">Step 1: Weight the Inputs</h4>
<p>
Each input sent into the neuron must first be weighted, meaning its
multiplied by a value, often a number from 1 to +1. When creating a
perceptron, the inputs are typically assigned random weights. Ill call my
weights <span data-type="equation">w_0</span> and
<span data-type="equation">w_1</span>:
</p>
<table>
<thead>
<tr>
<th style="width: 100px">Phrase</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><span data-type="equation">w_0</span></td>
<td>0.5</td>
</tr>
<tr>
<td><span data-type="equation">w_1</span></td>
<td>1</td>
</tr>
</tbody>
</table>
<p>Each input needs to be multiplied by its corresponding weight:</p>
<table>
<thead>
<tr>
<th style="width: 100px">Phrase</th>
<th style="width: 100px">Phrase</th>
<th>
Input <span data-type="equation">\boldsymbol{\times}</span> Weight
</th>
</tr>
</thead>
<tbody>
<tr>
<td>12</td>
<td>0.5</td>
<td>6</td>
</tr>
<tr>
<td>4</td>
<td>1</td>
<td>4</td>
</tr>
</tbody>
</table>
<h4 id="step-2-sum-the-inputs">Step 2: Sum the Inputs</h4>
<p>The weighted inputs are then added together:</p>
<div data-type="equation">6 + -4 = 2</div>
<h4 id="step-3-generate-the-output">Step 3: Generate the Output</h4>
<p>
The output of a perceptron is produced by passing the sum through an
<strong>activation function</strong> that reduces the output to one of two
possible values. Think of this binary output as an LED thats only
<em>off</em> or <em>on</em>, or as a neuron in an actual brain that either
fires or doesnt fire. The activation function determines whether the
perceptron should “fire.”
</p>
<p>
Activation functions can get a little bit hairy. If you start reading about
them in an AI textbook, you may soon find yourself reaching in turn for a
calculus textbook. However, your new friend the simple perceptron provides
an easier option that still demonstrates the concept. Ill make the
activation function the sign of the sum. If the sum is a positive number,
the output is 1; if its negative, the output is 1:
</p>
<div data-type="equation">\text{sign}(2) = +1</div>
<h3 id="putting-it-all-together-1">Putting It All Together</h3>
<p>
Putting the preceding three parts together, here are the steps of the
<strong>perceptron algorithm</strong>:
</p>
<ol>
<li>For every input, multiply that input by its weight.</li>
<li>Sum all the weighted inputs.</li>
<li>
Compute the output of the perceptron by passing that sum through an
activation function (the sign of the sum).
</li>
</ol>
<p>
I can start writing this algorithm in code by using two arrays of values,
one for the inputs and one for the weights:
</p>
<pre class="codesplit" data-code-language="javascript">
let inputs = [12, 4];
let weights = [0.5, -1];</pre
>
<p>
The “for every input” in step 1 implies a loop that multiplies each input by
its corresponding weight. To obtain the sum, the results can be added up in
that same loop:
</p>
<pre class="codesplit" data-code-language="javascript">
// Steps 1 and 2: Add up all the weighted inputs.
let sum = 0;
for (let i = 0; i &#x3C; inputs.length; i++) {
sum += inputs[i] * weights[i];
}</pre
>
<p>With the sum, I can then compute the output:</p>
<pre class="codesplit" data-code-language="javascript">
// Step 3: Pass the sum through an activation function.
let output = activate(sum);
// The activation function
function activate(sum) {
//{!5} Return a 1 if positive, 1 if negative.
if (sum > 0) {
return 1;
} else {
return -1;
}
}</pre
>
<p>
You might be wondering how Im handling the value of 0 in the activation
function. Is 0 positive or negative? The deep philosophical implications of
this question aside, Im choosing here to arbitrarily return a 1 for 0, but
I could easily change the <code>></code> to <code>>=</code> to go the other
way. Depending on the application, this decision could be significant, but
for demonstration purposes here, I can just pick one.
</p>
<p>
Now that Ive explained the computational process of a perceptron, lets
look at an example of one in action.
</p>
<h3 id="simple-pattern-recognition-using-a-perceptron">
Simple Pattern Recognition Using a Perceptron
</h3>
<p>
Ive mentioned that neural networks are commonly used for pattern
recognition. The scenarios outlined earlier require more complex networks,
but even a simple perceptron can demonstrate a fundamental type of pattern
recognition in which data points are classified as belonging to one of two
groups. For instance, imagine you have a dataset of plants and want to
identify them as either <em>xerophytes</em> (plants that have evolved to
survive in an environment with little water and lots of sunlight, like the
desert) or <em>hydrophytes</em> (plants that have adapted to living
submerged in water, with reduced light). Thats how Ill use my perceptron
in this section.
</p>
<p>
One way to approach classifying the plants is to plot their data on a 2D
graph and treat the problem as a spatial one. On the x-axis, plot the amount
of daily sunlight received by the plant, and on the y-axis, plot the amount
of water. Once all the data has been plotted, its easy to draw a line
across the graph, with all the xerophytes on one side and all the
hydrophytes on the other, as in Figure 10.4. (Im simplifying a little here.
Real-world data would probably be messier, making the line harder to draw.)
Thats how each plant can be classified. Is it below the line? Then its a
xerophyte. Is it above the line? Then its a hydrophyte.
</p>
<figure>
<img
src="images/10_nn/10_nn_5.png"
alt="Figure 10.4: A collection of points in 2D space divided by a line, representing plant categories according to their water and sunlight intake "
/>
<figcaption>
Figure 10.4: A collection of points in 2D space divided by a line,
representing plant categories according to their water and sunlight intake
</figcaption>
</figure>
<p>
In truth, I dont need a neural network—not even a simple perceptron—to tell
me whether a point is above or below a line. I can see the answer for myself
with my own eyes, or have my computer figure it out with simple algebra. But
just like solving a problem with a known answer—“to be or not to be”—was a
convenient first test for the GA in
<a href="/genetic-algorithms#">Chapter 9</a>, training a perceptron to
categorize points as being on one side of a line versus the other will be a
valuable way to demonstrate the algorithm of the perceptron and verify that
its working properly.
</p>
<p>
To solve this problem, Ill give my perceptron two inputs:
<span data-type="equation">x_0</span> is the x-coordinate of a point,
representing a plants amount of sunlight, and
<span data-type="equation">x_1</span> is the y-coordinate of that point,
representing the plants amount of water. The perceptron then guesses the
plants classification according to the sign of the weighted sum of these
inputs. If the sum is positive, the perceptron outputs a +1, signifying a
hydrophyte (above the line). If the sum is negative, it outputs a 1,
signifying a xerophyte (below the line). Figure 10.5 shows this perceptron
(note the shorthand of <span data-type="equation">w_0</span> and
<span data-type="equation">w_1</span> for the weights).
</p>
<figure>
<img
src="images/10_nn/10_nn_6.png"
alt="Figure 10.5: A perceptron with two inputs (x_0 and x_1), a weight for each input (w_0 and w_1), and a processing neuron that generates the output"
/>
<figcaption>
Figure 10.5: A perceptron with two inputs (<span data-type="equation"
>x_0</span
>
and <span data-type="equation">x_1</span>), a weight for each input (<span
data-type="equation"
>w_0</span
>
and <span data-type="equation">w_1</span>), and a processing neuron that
generates the output
</figcaption>
</figure>
<p>
This scheme has a pretty significant problem, however. What if my data point
is (0, 0), and I send this point into the perceptron as inputs
<span data-type="equation">x_0 = 0</span> and
<span data-type="equation">x_1=0</span>? No matter what the weights are,
multiplication by 0 is 0. The weighted inputs are therefore still 0, and
their sum will be 0 too. And the sign of 0 is . . . hmmm, theres that deep
philosophical quandary again. Regardless of how I feel about it, the point
(0, 0) could certainly be above or below various lines in a 2D world. How is
the perceptron supposed to interpret it accurately?
</p>
<p>
To avoid this dilemma, the perceptron requires a third input, typically
referred to as a <strong>bias</strong> input. This extra input always has
the value of 1 and is also weighted. Figure 10.6 shows the perceptron with
the addition of the bias.
</p>
<figure>
<img
src="images/10_nn/10_nn_7.png"
alt="Figure 10.6: Adding a bias input, along with its weight, to the perceptron"
/>
<figcaption>
Figure 10.6: Adding a bias input, along with its weight, to the perceptron
</figcaption>
</figure>
<p>How does this affect point (0, 0)?</p>
<table>
<thead>
<tr>
<th style="width: 100px">Phrase</th>
<th style="width: 100px">Phrase</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td><span data-type="equation">w_0</span></td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td><span data-type="equation">w_1</span></td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td><span data-type="equation">w_\text{bias}</span></td>
<td><span data-type="equation">w_\text{bias}</span></td>
</tr>
</tbody>
</table>
<p>
The output is then the sum of the weighted results:
<span data-type="equation">0 + 0 + w_\text{bias}</span>. Therefore, the bias
by itself answers the question of where (0, 0) is in relation to the line.
If the biass weight is positive, (0, 0) is above the line; if negative,
its below. The extra input and its weight <em>bias</em> the perceptrons
understanding of the lines position relative to (0, 0)!
</p>
<h3 id="the-perceptron-code">The Perceptron Code</h3>
<p>
Im now ready to assemble the code for a <code>Perceptron</code> class. The
perceptron needs to track only the input weights, which I can store using an
array:
</p>
<div class="snip-below">
<pre class="codesplit" data-code-language="javascript">
class Perceptron {
constructor() {
this.weights = [];
}</pre
>
</div>
<p>
The constructor can receive an argument indicating the number of inputs (in
this case, three: <span data-type="equation">x_0</span>,
<span data-type="equation">x_1</span>, and a bias) and size the
<code>weights</code> array accordingly, filling it with random values to
start:
</p>
<div class="snip-above snip-below">
<pre class="codesplit" data-code-language="javascript">
// The argument <code>n</code> determines the number of inputs (including the bias).
constructor(n) {
this.weights = [];
for (let i = 0; i &#x3C; n; i++) {
//{!1} The weights are picked randomly to start.
this.weights[i] = random(-1, 1);
}
}</pre
>
</div>
<p>
A perceptrons job is to receive inputs and produce an output. These
requirements can be packaged together in a
<code>feedForward()</code> method. In this example, the perceptrons inputs
are an array (which should be the same length as the array of weights), and
the output is a number, +1 or 1, as returned by the activation function
based on the sign of the sum:
</p>
<div class="snip-above">
<pre class="codesplit" data-code-language="javascript">
feedForward(inputs) {
let sum = 0;
for (let i = 0; i &#x3C; this.weights.length; i++) {
sum += inputs[i] * this.weights[i];
}
//{!1} The result is the sign of the sum, 1 or +1.
// Here the perceptron is making a guess:
// Is it on one side of the line or the other?
return this.activate(sum);
}
}</pre
>
</div>
<p>
Presumably, I could now create a <code>Perceptron</code> object and ask it
to make a guess for any given point, as in Figure 10.7.
</p>
<figure>
<img
src="images/10_nn/10_nn_8.png"
alt="Figure 10.7: An (x, y) coordinate from the 2D space is the input to the perceptron. "
/>
<figcaption>
Figure 10.7: An (<em>x</em>, <em>y</em>) coordinate from the 2D space is
the input to the perceptron.
</figcaption>
</figure>
<p>Heres the code to generate a guess:</p>
<pre class="codesplit" data-code-language="javascript">
// Create the perceptron.
let perceptron = new Perceptron(3);
// The input is three values: x, y, and the bias.
let inputs = [50, -12, 1];
// The answer!
let guess = perceptron.feedForward(inputs);</pre
>
<p>
Did the perceptron get it right? Maybe yes, maybe no. At this point, the
perceptron has no better than a 50/50 chance of arriving at the correct
answer, since each weight starts out as a random value. A neural network
isnt a magic tool that can automatically guess correctly on its own. I need
to teach it how to do so!
</p>
<p>
To train a neural network to answer correctly, Ill use the supervised
learning method I described earlier in the chapter. Remember, this technique
involves giving the network inputs with known answers. This enables the
network to check whether it has made a correct guess. If not, the network
can learn from its mistake and adjust its weights. The process is as
follows:
</p>
<ol>
<li>
Provide the perceptron with inputs for which there is a known answer.
</li>
<li>Ask the perceptron to guess an answer.</li>
<li>Compute the error. (Did it get the answer right or wrong?)</li>
<li>Adjust all the weights according to the error.</li>
<li>Return to step 1 and repeat!</li>
</ol>
<p>
This process can be packaged into a method on the
<code>Perceptron</code> class, but before I can write it, I need to examine
steps 3 and 4 in more detail. How do I define the perceptrons error? And
how should I adjust the weights according to this error?
</p>
<p>
The perceptrons error can be defined as the difference between the desired
answer and its guess:
</p>
<div data-type="equation">
\text{error} = \text{desired output} - \text{guess output}
</div>
<p>
Does this formula look familiar? Think back to the formula for a vehicles
steering force that I worked out in
<a href="/autonomous-agents#">Chapter 5</a>:
</p>
<div data-type="equation">
\text{steering} = \text{desired velocity} - \text{current velocity}
</div>
<p>
This is also a calculation of an error! The current velocity serves as a
guess, and the error (the steering force) indicates how to adjust the
velocity in the correct direction. Adjusting a vehicles velocity to follow
a target is similar to adjusting the weights of a neural network toward the
correct answer.
</p>
<p>
For the perceptron, the output has only two possible values: +1 or 1.
Therefore, only three errors are possible. If the perceptron guesses the
correct answer, the guess equals the desired output and the error is 0. If
the correct answer is 1 and the perceptron guessed +1, then the error is
2. If the correct answer is +1 and the perceptron guessed 1, then the
error is +2. Heres that process summarized in a table:
</p>
<table>
<thead>
<tr>
<th style="width: 100px">Phrase</th>
<th style="width: 100px">Phrase</th>
<th>Error</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>+1</td>
<td>2</td>
</tr>
<tr>
<td>+1</td>
<td>1</td>
<td>+2</td>
</tr>
<tr>
<td>+1</td>
<td>+1</td>
<td>0</td>
</tr>
</tbody>
</table>
<p>
The error is the determining factor in how the perceptrons weights should
be adjusted. For any given weight, what Im looking to calculate is the
change in weight, often called
<span data-type="equation">\Delta\text{weight}</span> (or
<em>delta weight</em>, <span data-type="equation">\Delta</span> being the
Greek letter delta):
</p>
<div data-type="equation">
\text{new weight} = \text{weight} + \Delta\text{weight}
</div>
<p>
To calculate <span data-type="equation">\Delta\text{weight}</span>, I need
to multiply the error by the input:
</p>
<div data-type="equation">
\Delta\text{weight} = \text{error} \times \text{input}
</div>
<p>Therefore, the new weight is calculated as follows:</p>
<div data-type="equation">
\text{new weight} = \text{weight} + \text{error} \times \text{input}
</div>
<p>
To understand why this works, think again about steering. A steering force
is essentially an error in velocity. By applying a steering force as an
acceleration (or <span data-type="equation">\Delta\text{velocity}</span>),
the velocity is adjusted to move in the correct direction. This is what I
want to do with the neural networks weights. I want to adjust them in the
right direction, as defined by the error.
</p>
<p>
With steering, however, I had an additional variable that controlled the
vehicles ability to steer: the maximum force. A high maximum force allowed
the vehicle to accelerate and turn quickly, while a lower force resulted in
a slower velocity adjustment. The neural network will use a similar strategy
with a variable called the <strong>learning constant</strong>:
</p>
<div data-type="equation">
\text{new weight} = \text{weight} + (\text{error} \times \text{input})
\times \text{learning constant}
</div>
<p>
A high learning constant causes the weight to change more drastically. This
may help the perceptron arrive at a solution more quickly, but it also
increases the risk of overshooting the optimal weights. A small learning
constant will adjust the weights more slowly and require more training time,
but will allow the network to make small adjustments that could improve
overall accuracy.
</p>
<p>
Assuming the addition of a <code>learningConstant</code> property to the
<code>Perceptron</code> class, I can now write a training method for the
perceptron following the steps I outlined earlier:
</p>
<pre class="codesplit" data-code-language="javascript">
// Step 1: Provide the inputs and known answer.
// These are passed in as arguments to <code>train()</code>.
train(inputs, desired) {
// Step 2: Guess according to those inputs.
let guess = this.feedforward(inputs);
// Step 3: Compute the error (the difference between <code>desired</code> and <code>guess</code>).
let error = desired - guess;
//{!3} Step 4: Adjust all the weights according to the error and learning constant.
for (let i = 0; i &#x3C; this.weights.length; i++) {
this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
}
}</pre
>
<p>Heres the <code>Perceptron</code> class as a whole:</p>
<pre class="codesplit" data-code-language="javascript">
class Perceptron {
constructor(totalInputs) {
//{!2} The perceptron stores its weights and learning constants.
this.weights = [];
this.learningConstant = 0.01;
//{!3} The weights start off random.
for (let i = 0; i &#x3C; totalInputs; i++) {
this.weights[i] = random(-1, 1);
}
}
//{!7} Return an output based on inputs.
feedforward(inputs) {
let sum = 0;
for (let i = 0; i &#x3C; this.weights.length; i++) {
sum += inputs[i] * this.weights[i];
}
return this.activate(sum);
}
// The output is a +1 or 1.
activate(sum) {
if (sum > 0) {
return 1;
} else {
return -1;
}
}
//{!7} Train the network against known data.
train(inputs, desired) {
let guess = this.feedforward(inputs);
let error = desired - guess;
for (let i = 0; i &#x3C; this.weights.length; i++) {
this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
}
}
}</pre
>
<p>
To train the perceptron, I need a set of inputs with known answers. However,
I dont happen to have a real-world dataset (or time to research and collect
one) for the xerophytes and hydrophytes scenario. In truth, though, the
purpose of this demonstration isnt to show you how to classify plants. Its
about how a perceptron can learn whether points are above or below a line on
a graph, and so any set of points will do. In other words, I can just make
up the data.
</p>
<p>
What Im describing is an example of <strong>synthetic data</strong>,
artificially generated data thats often used in machine learning to create
controlled scenarios for training and testing. In this case, my synthetic
data will consist of a set of random input points, each with a known answer
indicating whether the point is above or below a line. To define the line
and generate the data, Ill use simple algebra. This approach allows me to
clearly demonstrate the training process and show how the perceptron learns.
</p>
<p>
The question therefore becomes, how do I pick a point and know whether its
above or below a line (without a neural network, that is)? A line can be
described as a collection of points, where each points y-coordinate is a
function of its x-coordinate:
</p>
<div data-type="equation">y = f(x)</div>
<p>
For a straight line (specifically, a linear function), the relationship can
be written like this:
</p>
<div data-type="equation">y = mx + b</div>
<p>
Here <em>m</em> is the slope of the line, and <em>b</em> is the value of
<em>y</em> when <em>x</em> is 0 (the y-intercept). Heres a specific
example, with the corresponding graph in Figure 10.8.
</p>
<div data-type="equation">y = \frac{1}2x - 1</div>
<figure>
<img
src="images/10_nn/10_nn_9.png"
alt="Figure 10.8: A graph of y = \frac{1}2x - 1"
/>
<figcaption>
Figure 10.8: A graph of
<span data-type="equation">y = \frac{1}2x - 1</span>
</figcaption>
</figure>
<p>
Ill arbitrarily choose that as the equation for my line, and write a
function accordingly:
</p>
<pre class="codesplit" data-code-language="javascript">
// A function to calculate <code>y</code> based on <code>x</code> along a line
function f(x) {
return 0.5 * x - 1;
}</pre
>
<p>
Now theres the matter of the p5.js canvas defaulting to (0, 0) in the
top-left corner with the y-axis pointing down. For this discussion, Ill
assume Ive built the following into the code to reorient the canvas to
match a more traditional Cartesian space:
</p>
<pre
class="codesplit"
data-code-language="javascript"
>// Move the origin <code>(0, 0)</code> to the center.
translate(width / 2, height / 2);
// Flip the y-axis orientation (positive points up!).
scale(1, -1);</pre>
<p>I can now pick a random point in the 2D space:</p>
<pre class="codesplit" data-code-language="javascript">
let x = random(-100, 100);
let y = random(-100, 100);</pre
>
<p>
How do I know if this point is above or below the line? The line function
<em>f</em>(<em>x</em>) returns the <em>y</em> value on the line for that
x-position. Ill call that <span data-type="equation">y_\text{line}</span>:
</p>
<pre class="codesplit" data-code-language="javascript">
// The <code>y</code> position on the line
let yline = f(x);</pre
>
<p>
If the <em>y</em> value Im examining is above the line, it will be greater
than <span data-type="equation">y_\text{line}</span>, as in Figure 10.9.
</p>
<figure>
<img
src="images/10_nn/10_nn_10.png"
alt="Figure 10.9: If y_\text{line} is less than y, the point is above the line."
/>
<figcaption>
Figure 10.9: If <span data-type="equation">y_\text{line}</span> is less
than <em>y</em>, the point is above the line.
</figcaption>
</figure>
<p>Heres the code for that logic:</p>
<pre class="codesplit" data-code-language="javascript">
// Start with a value of 1.
let desired = -1;
if (y > yline) {
//{!1} The answer becomes +1 if <code>y</code> is above the line.
desired = 1;
}</pre
>
<p>
I can then make an input array to go with the <code>desired</code> output:
</p>
<pre class="codesplit" data-code-language="javascript">
// Dont forget to include the bias!
let trainingInputs = [x, y, 1];</pre
>
<p>
Assuming that I have a <code>perceptron</code> variable, I can train it by
providing the inputs along with the desired answer:
</p>
<pre class="codesplit" data-code-language="javascript">
perceptron.train(trainingInputs, desired);</pre
>
<p>
If I train the perceptron on a new random point (and its answer) for each
cycle through <code>draw()</code>, it will gradually get better at
classifying the points as above or below the line.
</p>
<div data-type="example">
<h3 id="example-101-the-perceptron">Example 10.1: The Perceptron</h3>
<figure>
<div
data-type="embed"
data-p5-editor="https://editor.p5js.org/natureofcode/sketches/sMozIaMCW"
data-example-path="examples/10_nn/10_1_perceptron_with_normalization"
>
<img
src="examples/10_nn/10_1_perceptron_with_normalization/screenshot.png"
/>
</div>
<figcaption></figcaption>
</figure>
</div>
<pre class="codesplit" data-code-language="javascript">// The perceptron
let perceptron;
//{!1} An array for training data
let training = [];
// A counter to track training data points one by one
let count = 0;
//{!3} The formula for a line
function f(x) {
return 0.5 * x + 1;
}
function setup() {
createCanvas(640, 240);
// The perceptron has three inputs (including bias) and a learning rate of 0.0001.
perceptron = new Perceptron(3, 0.0001);
//{!1} Make 2,000 training data points.
for (let i = 0; i &#x3C; 2000; i++) {
let x = random(-width / 2, width / 2);
let y = random(-height / 2, height / 2);
training[i] = [x, y, 1];
}
}
function draw() {
background(255);
// Reorient the canvas to match a traditional Cartesian plane.
translate(width / 2, height / 2);
scale(1, -1);
// Draw the line.
stroke(0);
strokeWeight(2);
line(-width / 2, f(-width / 2), width / 2, f(width / 2));
// Get the current <code>(x, y)</code> of the training data.
let x = training[count][0];
let y = training[count][1];
// What is the desired output?
let desired = -1;
if (y > f(x)) {
desired = 1;
}
// Train the perceptron.
perceptron.train(training[count], desired);
// For animation, train one point at a time.
count = (count + 1) % training.length;
// Draw all the points and color according to the output of the perceptron.
for (let dataPoint of training) {
let guess = perceptron.feedforward(dataPoint);
if (guess > 0) {
fill(127);
} else {
fill(255);
}
strokeWeight(1);
stroke(0);
circle(dataPoint[0], dataPoint[1], 8);
}
}</pre>
<p>
In Example 10.1, the training data is visualized alongside the target
solution line. Each point represents a piece of training data, and its color
is determined by the perceptrons current classification—gray for +1 or
white for 1. I use a small learning constant (0.0001) to slow down how the
system refines its classifications over time.
</p>
<p>
An intriguing aspect of this example lies in the relationship between the
perceptrons weights and the characteristics of the line dividing the
points—specifically, the lines slope and y-intercept (the <em>m</em> and
<em>b</em> in <em>y</em> = <em>mx</em> + <em>b</em>). The weights in this
context arent just arbitrary or “magic” values; they bear a direct
relationship to the geometry of the dataset. In this case, Im using just 2D
data, but for many machine learning applications, the data exists in much
higher-dimensional spaces. The weights of a neural network help navigate
these spaces, defining <em>hyperplanes</em> or decision boundaries that
segment and classify the data.
</p>
<div data-type="exercise">
<h3 id="exercise-101">Exercise 10.1</h3>
<p>
Modify the code from Example 10.1 to also draw the perceptrons current
decision boundary during the training process—its best guess for where the
line should be. Hint: Use the perceptrons current weights to calculate
the lines equation.
</p>
</div>
<p>
While this perceptron example offers a conceptual foundation, real-world
datasets often feature more diverse and dynamic ranges of input values. For
the simplified scenario here, the range of values for <em>x</em> is larger
than that for <em>y</em> because of the canvas size of 640<span
data-type="equation"
>\times</span
>240. Despite this, the example still works—after all, the sign activation
function doesnt rely on specific input ranges, and its such a
straightforward binary classification task.
</p>
<p>
However, real-world data often has much greater complexity in terms of input
ranges. To this end, <strong>data normalization</strong> is a critical step
in machine learning. Normalizing data involves mapping the training data to
ensure that all inputs (and outputs) conform to a uniform range—typically 0
to 1, or perhaps 1 to 1. This process can improve training efficiency and
prevent individual inputs from dominating the learning process. In the next
section, using the ml5.js library, Ill build data normalization into the
process.
</p>
<div data-type="exercise">
<h3 id="exercise-102">Exercise 10.2</h3>
<p>
Instead of using supervised learning, can you train the neural network to
find the right weights by using a GA?
</p>
</div>
<div data-type="exercise">
<h3 id="exercise-103">Exercise 10.3</h3>
<p>
Incorporate data normalization into the example. Does this improve the
learning efficiency?
</p>
</div>
<h2 id="putting-the-network-in-neural-network">
Putting the “Network” in Neural Network
</h2>
<p>
A perceptron can have multiple inputs, but its still just a single, lonely
neuron. Unfortunately, that limits the range of problems it can solve. The
true power of neural networks comes from the <em>network</em> part. Link
multiple neurons together and youre able to solve problems of much greater
complexity.
</p>
<p>
If you read an AI textbook, it will say that a perceptron can solve only
<strong>linearly separable</strong> problems. If a dataset is linearly
separable, you can graph it and classify it into two groups simply by
drawing a straight line (see Figure 10.10, left). Classifying plants as
xerophytes or hydrophytes is a linearly separable problem.
</p>
<figure>
<img
src="images/10_nn/10_nn_11.png"
alt="Figure 10.10: Data points that are linearly separable (left) and data points that are nonlinearly separable, as a curve is required to separate the points (right)"
/>
<figcaption>
Figure 10.10: Data points that are linearly separable (left) and data
points that are nonlinearly separable, as a curve is required to separate
the points (right)
</figcaption>
</figure>
<p>
Now imagine youre classifying plants according to soil acidity (x-axis) and
temperature (y-axis). Some plants might thrive in acidic soils but only
within a narrow temperature range, while other plants prefer less acidic
soils but tolerate a broader range of temperatures. A more complex
relationship exists between the two variables, so a straight line cant be
drawn to separate the two categories of plants, <em>acidophilic</em> and
<em>alkaliphilic</em> (see Figure 10.10, right). A lone perceptron cant
handle this type of <strong>nonlinearly separable</strong> problem. (Caveat
here: Im making up these scenarios. If you happen to be a botanist, please
let me know if Im anywhere close to reality.)
</p>
<p>
One of the simplest examples of a nonlinearly separable problem is XOR
(exclusive or). This is a logical operator, similar to the more familiar AND
and OR. For <em>A</em> AND <em>B </em>to be true, both <em>A</em> and
<em>B</em> must be true. With OR, either <em>A</em> or <em>B</em> (or both)
can be true. These are both linearly separable problems. The truth tables in
Figure 10.11 show their solution space. Each true or false value in the
table shows the output for a particular combination of true or false inputs.
</p>
<figure>
<img
src="images/10_nn/10_nn_12.png"
alt="Figure 10.11: Truth tables for the AND and OR logical operators. The true and false outputs can be separated by a line."
/>
<figcaption>
Figure 10.11: Truth tables for the AND and OR logical operators. The true
and false outputs can be separated by a line.
</figcaption>
</figure>
<p>
See how you can draw a straight line to separate the true outputs from the
false ones?
</p>
<p>
The XOR operator is the equivalent of (OR) AND (NOT AND). In other words,
<em>A</em> XOR <em>B </em>evaluates to true only if one of the inputs is
true. If both inputs are false or both are true, the output is false. To
illustrate, lets say youre having pizza for dinner. You love pineapple on
pizza, and you love mushrooms on pizza, but put them together, and yech! And
plain pizza, thats no good either!
</p>
<figure>
<img
src="images/10_nn/10_nn_13.png"
alt="Figure 10.12: The “truth” table for whether you want to eat the pizza (left) and XOR (right). Note how the true and false outputs cant be separated by a single line."
/>
<figcaption>
Figure 10.12: The “truth” table for whether you want to eat the pizza
(left) and XOR (right). Note how the true and false outputs cant be
separated by a single line.
</figcaption>
</figure>
<p>
The XOR truth table in Figure 10.12 isnt linearly separable. Try to draw a
straight line to separate the true outputs from the false ones—you cant!
</p>
<p>
The fact that a perceptron cant even solve something as simple as XOR may
seem extremely limiting. But what if I made a network out of two
perceptrons? If one perceptron can solve the linearly separable OR and one
perceptron can solve the linearly separate NOT AND, then two perceptrons
combined can solve the nonlinearly separable XOR.
</p>
<p>
When you combine multiple perceptrons, you get a
<strong>multilayered perceptron</strong>, a network of many neurons (see
Figure 10.13). Some are input neurons and receive the initial inputs, some
are part of whats called a <strong>hidden layer</strong> (as theyre
connected to neither the inputs nor the outputs of the network directly),
and then there are the output neurons, from which the results are read.
</p>
<figure>
<img
src="images/10_nn/10_nn_14.png"
alt="Figure 10.13: A multilayered perceptron has the same inputs and output as the simple perceptron, but now it includes a hidden layer of neurons."
/>
<figcaption>
Figure 10.13: A multilayered perceptron has the same inputs and output as
the simple perceptron, but now it includes a hidden layer of neurons.
</figcaption>
</figure>
<p>
Up until now, Ive been visualizing a singular perceptron with one circle
representing a neuron processing its input signals. Now, as I move on to
larger networks, its more typical to represent all the elements (inputs,
neurons, outputs) as circles, with arrows that indicate the flow of data. In
Figure 10.13, you can see the inputs and bias flowing into the hidden layer,
which then flows to the output.
</p>
<p>
Training a simple perceptron is pretty straightforward: you feed the data
through and evaluate how to change the input weights according to the error.
With a multilayered perceptron, however, the training process becomes more
complex. The overall output of the network is still generated in essentially
the same manner as before: the inputs multiplied by the weights are summed
and fed forward through the various layers of the network. And you still use
the networks guess to calculate the error (desired result guess). But now
so many connections exist between layers of the network, each with its own
weight. How do you know how much each neuron or connection contributed to
the overall error of the network, and how it should be adjusted?
</p>
<p>
The solution to optimizing the weights of a multilayered network is
<strong>backpropagation</strong>. This process takes the error and feeds it
backward through the network so it can adjust the weights of all the
connections in proportion to how much theyve contributed to the total
error. The details of backpropagation are beyond the scope of this book. The
algorithm uses a variety of activation functions (one classic example is the
sigmoid function) as well as some calculus. If youre interested in
continuing down this road and learning more about how backpropagation works,
you can find my
<a href="https://thecodingtrain.com/neural-network"
>“Toy Neural Network” project at the Coding Train website with
accompanying video tutorials</a
>. They go through all the steps of solving XOR using a multilayered
feed-forward network with backpropagation. For this chapter, however, Id
instead like to get some help and phone a friend.
</p>
<h2 id="machine-learning-with-ml5js">Machine Learning with ml5.js</h2>
<p>
That friend is ml5.js. This machine learning library can manage the details
of complex processes like backpropagation so you and I dont have to worry
about them. As I mentioned earlier in the chapter, ml5.js aims to provide a
friendly entry point for those who are new to machine learning and neural
networks, while still harnessing the power of Googles TensorFlow.js behind
the scenes.
</p>
<p>
To use ml5.js in a sketch, you must import it via a
<code>&#x3C;script></code> element in your <em>index.html</em> file, much as
you did with Matter.js and Toxiclibs.js in
<a href="/physics-libraries#">Chapter 6</a>:
</p>
<pre class="codesplit" data-code-language="html">
&#x3C;script src="https://unpkg.com/ml5@latest/dist/ml5.min.js">&#x3C;/script></pre
>
<p>
My goal for the rest of this chapter is to introduce ml5.js by developing a
system that can recognize mouse gestures. This will prepare you for
<a href="/neuroevolution#">Chapter 11</a>, where Ill add a neural network
“brain” to an autonomous steering agent and tie machine learning back into
the story of the book. First, however, Id like to talk more generally
through the steps of training a multilayered neural network model using
supervised learning. Outlining these steps will highlight important
decisions youll have to make before developing a learning model, introduce
the syntax of the ml5.js library, and provide you with the context youll
need before training your own machine learning models.
</p>
<h3 id="the-machine-learning-life-cycle">The Machine Learning Life Cycle</h3>
<p>
The life cycle of a machine learning model is typically broken into seven
steps:
</p>
<ol>
<li>
<strong>Collect the data.</strong> Data forms the foundation of any
machine learning task. This stage might involve running experiments,
manually inputting values, sourcing public data, or a myriad of other
methods (like generating synthetic data).
</li>
<li>
<strong>Prepare the data.</strong> Raw data often isnt in a format
suitable for machine learning algorithms. It might also have duplicate or
missing values, or contain outliers that skew the data. Such
inconsistencies may need to be manually adjusted. Additionally, as I
mentioned earlier, neural networks work best with normalized data, which
has values scaled to fit within a standard range. Another key part of
preparing data is separating it into distinct sets: training, validation,
and testing. The training data is used to teach the model (step 4), while
the validation and testing data (the distinction is subtle—more on this
later) are set aside and reserved for evaluating the models performance
(step 5).
</li>
<li>
<strong>Choose a model.</strong> Design the architecture of the neural
network. Different models are more suitable for certain types of data and
outputs.
</li>
<li>
<strong>Train the model.</strong> Feed the training portion of the data
through the model and allow the model to adjust the weights of the neural
network based on its errors. This process is known as
<strong>optimization</strong>: the model tunes the weights so they result
in the fewest number of errors.
</li>
<li>
<strong>Evaluate the model.</strong> Remember the testing data that was
set aside in step 2? Since that data wasnt used in training, it provides
a means to evaluate how well the model performs on new, unseen data.
</li>
<li>
<strong>Tune the parameters.</strong> The training process is influenced
by a set of parameters (often called <strong>hyperparameters</strong>)
such as the learning rate, which dictates how much the model should adjust
its weights based on errors in prediction. I called this the
<code>learningConstant</code> in the perceptron example. By fine-tuning
these parameters and revisiting steps 4 (training), 3 (model selection),
and even 2 (data preparation), you can often improve the models
performance.
</li>
<li>
<strong>Deploy the model. </strong>Once the model is trained and its
performance is evaluated satisfactorily, its time to use the model out in
the real world with new data!
</li>
</ol>
<p>
These steps are the cornerstone of supervised machine learning. However,
even though 7 is a truly excellent number, I think I missed one more
critical step. Ill call it step 0.
</p>
<ol>
<li value="0">
<strong>Identify the problem.</strong> This initial step defines the
problem that needs solving. What is the objective? What are you trying to
accomplish or predict with your machine learning model?
</li>
</ol>
<p>
This zeroth step informs all the other steps in the process. After all, how
are you supposed to collect your data and choose a model without knowing
what youre even trying to do? Are you predicting a number? A category? A
sequence? Is it a binary choice, or are there many options? These sorts of
questions often boil down to choosing between two types of tasks that the
majority of machine learning applications fall into: classification and
regression.
</p>
<h3 id="classification-and-regression">Classification and Regression</h3>
<p>
<strong>Classification</strong> is a type of machine learning problem that
involves predicting a <strong>label</strong> (also called a
<strong>category</strong> or <strong>class</strong>) for a piece of data. If
this sounds familiar, thats because it is: the simple perceptron in Example
10.1 was trained to classify points as above or below a line. To give
another example, an image classifier might try to guess if a photo is of a
cat or a dog and assign the corresponding label (see Figure 10.14).
</p>
<figure>
<img
src="images/10_nn/10_nn_15.png"
alt="Figure 10.14: Labeling images as cats or dogs"
/>
<figcaption>Figure 10.14: Labeling images as cats or dogs</figcaption>
</figure>
<p>
Classification doesnt happen by magic. The model must first be shown many
examples of dogs and cats with the correct labels in order to properly
configure the weights of all the connections. This is the training part of
supervised learning.
</p>
<p>
The classic “Hello, world!” demonstration of machine learning and supervised
learning is a classification problem of the MNIST dataset. Short for
<em>Modified National Institute of Standards and Technology</em>,
<strong>MNIST</strong> is a dataset that was collected and processed by Yann
LeCun (Courant Institute, NYU), Corinna Cortes (Google Labs), and
Christopher J.C. Burges (Microsoft Research). Widely used for training and
testing in the field of machine learning, this dataset consists of 70,000
handwritten digits from 0 to 9; each is a 28<span data-type="equation"
>\times</span
>28-pixel grayscale image (see Figure 10.15 for examples). Each image is
labeled with its corresponding digit.
</p>
<figure>
<img
src="images/10_nn/10_nn_16.png"
alt="Figure 10.15: A selection of handwritten digits 09 from the MNIST dataset (courtesy of Suvanjanprasai)"
/>
<figcaption>
Figure 10.15: A selection of handwritten digits 09 from the MNIST dataset
(courtesy of Suvanjanprasai)
</figcaption>
</figure>
<p>
MNIST is a canonical example of a training dataset for image classification:
the model has a discrete number of categories to choose from (10 to be
exact—no more, no less). After the model is trained on the 70,000 labeled
images, the goal is for it to classify new images and assign the appropriate
label, a digit from 0 to 9.
</p>
<p>
<strong>Regression</strong>, on the other hand, is a machine learning task
for which the prediction is a continuous value, typically a floating-point
number. A regression problem can involve multiple outputs, but thinking
about just one is often simpler to start. For example, consider a machine
learning model that predicts the daily electricity usage of a house based on
input factors like the number of occupants, the size of the house, and the
temperature outside (see Figure 10.16).
</p>
<figure>
<img
src="images/10_nn/10_nn_17.png"
alt="Figure 10.16: Factors like weather and the size and occupancy of a home can influence its daily electricity usage."
/>
<figcaption>
Figure 10.16: Factors like weather and the size and occupancy of a home
can influence its daily electricity usage.
</figcaption>
</figure>
<p>
Rather than picking from a discrete set of output options, the goal of the
neural network is now to guess a number—any number. Will the house use 30.5
kilowatt-hours of electricity that day? Or 48.7 kWh? Or 100.2 kWh? The
output prediction could be any value from a continuous range.
</p>
<h3 id="network-design">Network Design</h3>
<p>
Knowing what problem youre trying to solve (step 0) also has a significant
bearing on the design of the neural network—in particular, on its input and
output layers. Ill demonstrate with another classic “Hello, world!”
classification example from the field of data science and machine learning:
the iris dataset. This dataset, which can be found in the Machine Learning
Repository at the University of California, Irvine, originated from the work
of American botanist Edgar Anderson.
</p>
<p>
Anderson collected flower data over many years across multiple regions of
the United States and Canada. For more on the origins of this famous
dataset, see “The Iris Data Set: In Search of the Source of
<em>Virginica</em
><a href="https://academic.oup.com/jrssig/article/18/6/26/7038520"
>” by Antony Unwin and Kim Kleinman</a
>. After carefully analyzing the data, Anderson built a table to classify
iris flowers into three distinct species: <em>Iris setosa</em>,
<em>Iris virginica</em>, and <em>Iris versicolor </em>(see Figure 10.17).
</p>
<figure>
<img
src="images/10_nn/10_nn_18.png"
alt="Figure 10.17: Three distinct species of iris flowers"
/>
<figcaption>
Figure 10.17: Three distinct species of iris flowers
</figcaption>
</figure>
<p>
Anderson included four numeric attributes for each flower: sepal length,
sepal width, petal length, and petal width, all measured in centimeters. (He
also recorded color information, but that data appears to have been lost.)
Each record is then paired with the appropriate iris categorization:
</p>
<table>
<thead>
<tr>
<th>Sepal Length</th>
<th>Sepal Width</th>
<th>Petal Length</th>
<th>Petal Width</th>
<th>Classification</th>
</tr>
</thead>
<tbody>
<tr>
<td>5.1</td>
<td>3.5</td>
<td>1.4</td>
<td>0.2</td>
<td><em>Iris setosa</em></td>
</tr>
<tr>
<td>4.9</td>
<td>3.0</td>
<td>1.4</td>
<td>0.2</td>
<td><em>Iris setosa</em></td>
</tr>
<tr>
<td>7.0</td>
<td>3.2</td>
<td>4.7</td>
<td>1.4</td>
<td><em>Iris versicolor</em></td>
</tr>
<tr>
<td>6.4</td>
<td>3.2</td>
<td>4.5</td>
<td>1.5</td>
<td><em>Iris versicolor</em></td>
</tr>
<tr>
<td>6.3</td>
<td>3.3</td>
<td>6.0</td>
<td>2.5</td>
<td><em>Iris virginica</em></td>
</tr>
<tr>
<td>5.8</td>
<td>2.7</td>
<td>5.1</td>
<td>1.9</td>
<td><em>Iris virginica</em></td>
</tr>
</tbody>
</table>
<p>
In this dataset, the first four columns (sepal length, sepal width, petal
length, petal width) serve as inputs to the neural network. The output is
the classification provided in the fifth column. Figure 10.18 depicts a
possible architecture for a neural network that can be trained on this data.
</p>
<figure>
<img
src="images/10_nn/10_nn_19.png"
alt="Figure 10.18: A possible network architecture for iris classification"
/>
<figcaption>
Figure 10.18: A possible network architecture for iris classification
</figcaption>
</figure>
<p>
On the left are the four inputs to the network, corresponding to the first
four columns of the data table. On the right are three possible outputs,
each representing one of the iris species labels. In between is the hidden
layer, which, as mentioned earlier, adds complexity to the networks
architecture, necessary for handling nonlinearly separable data. Each node
in the hidden layer is connected to every node that comes before and after
it. This is commonly called a <strong>fully connected</strong> or
<strong>dense </strong>layer.
</p>
<p>
You might also notice the absence of explicit bias nodes in this diagram.
While biases play an important role in the output of each neuron, theyre
often left out of visual representations to keep the diagrams clean and
focused on the primary data flow. (The ml5.js library will ultimately manage
the biases for me internally.)
</p>
<p>
The neural networks goal is to “activate” the correct output for the input
data, just as the perceptron would output a +1 or 1 for its single binary
classification. In this case, the output values are like signals that help
the network decide which iris species label to assign. The highest computed
value activates to signify the networks best guess about the
classification.
</p>
<p>
The key takeaway here is that a classification network should have as many
inputs as there are values for each item in the dataset, and as many outputs
as there are categories. As for the hidden layer, the design is much less
set in stone. The hidden layer in Figure 10.18 has five nodes, but this
number is entirely arbitrary. Neural network architectures can vary greatly,
and the number of hidden nodes is often determined through trial and error
or other educated guessing methods (called <em>heuristics</em>). In the
context of this book, Ill be relying on ml5.js to automatically configure
the architecture based on the input and output data.
</p>
<p>
What about the inputs and outputs in a regression scenario, like the
household electricity consumption example I mentioned earlier? Ill go ahead
and make up a dataset for this scenario, with values representing the
occupants and size of the house, the days temperature, and the
corresponding electricity usage. This is much like a synthetic dataset,
given that its not data collected for a real-world scenario—but whereas
synthetic data is generated automatically, here Im manually inputting
numbers from my own imagination:
</p>
<table>
<tbody>
<tr>
<td><strong>Occupants</strong></td>
<td><strong>Size (m²)</strong></td>
<td><strong>Temperature Outside (°C)</strong></td>
<td><strong>Electricity Usage (kWh)</strong></td>
</tr>
<tr>
<td>4</td>
<td>150</td>
<td>24</td>
<td>25.3</td>
</tr>
<tr>
<td>2</td>
<td>100</td>
<td>25.5</td>
<td>16.2</td>
</tr>
<tr>
<td>1</td>
<td>70</td>
<td>26.5</td>
<td>12.1</td>
</tr>
<tr>
<td>4</td>
<td>120</td>
<td>23</td>
<td>22.1</td>
</tr>
<tr>
<td>2</td>
<td>90</td>
<td>21.5</td>
<td>15.2</td>
</tr>
<tr>
<td>5</td>
<td>180</td>
<td>20</td>
<td>24.4</td>
</tr>
<tr>
<td>1</td>
<td>60</td>
<td>18.5</td>
<td>11.7</td>
</tr>
</tbody>
</table>
<p>
The neural network for this problem should have three input nodes
corresponding to the first three columns (occupants, size, temperature).
Meanwhile, it should have one output node representing the fourth column,
the networks guess about the electricity usage. And Ill arbitrarily say
the networks hidden layer should have four nodes rather than five. Figure
10.19 shows this network architecture.
</p>
<figure>
<img
src="images/10_nn/10_nn_20.png"
alt="Figure 10.19: A possible network architecture for three inputs and one regression output"
/>
<figcaption>
Figure 10.19: A possible network architecture for three inputs and one
regression output
</figcaption>
</figure>
<p>
Unlike the iris classification network, which is choosing from three labels
and therefore has three outputs, this network is trying to predict just one
number, so it has only one output. Ill note, however, that a single output
isnt a requirement of regression. A machine learning model can also perform
a regression that predicts multiple continuous values, in which case the
model would have multiple outputs.
</p>
<h3 id="ml5js-syntax">ml5.js Syntax</h3>
<p>
The ml5.js library is a collection of machine learning models that can be
accessed using the syntax <code>ml5.</code><code><em>functionName</em></code
><code>()</code>. For example, to use a pretrained model that detects hand
positions, you can use <code>ml5.handpose()</code>. For classifying images,
you can use <code>ml5.imageClassifier()</code>. While I encourage you to
explore all that ml5.js has to offer (Ill reference some of these
pretrained models in upcoming exercise ideas), for this chapter Ill focus
on only one function in ml5.js, <code>ml5.neuralNetwork()</code>, which
creates an empty neural network for you to train.
</p>
<p>
To use this function, you must first create a JavaScript object that will
configure the model being created. Heres where some of the big-picture
factors I just discussed—is this a classification or a regression task? How
many inputs and outputs?—come into play. Ill begin by specifying the task I
want the model to perform (<code>"regression"</code> or
<code>"classification"</code>):
</p>
<pre class="codesplit" data-code-language="javascript">
let options = { task: "classification" };
let classifier = ml5.neuralNetwork(options);</pre
>
<p>
This, however, gives ml5.js little to go on in terms of designing the
network architecture. Adding the inputs and outputs will complete the rest
of the puzzle. The iris flower classification has four inputs and three
possible output labels. This can be configured as part of the
<code>options</code> object with a single integer for the number of inputs
and an array of strings listing the output labels:
</p>
<pre class="codesplit" data-code-language="javascript">
let options = {
inputs: 4,
outputs: ["iris-setosa", "iris-virginica", "iris-versicolor"],
task: "classification",
};
let digitClassifier = ml5.neuralNetwork(options);</pre
>
<p>
The electricity regression scenario had three input values (occupants, size,
temperature) and one output value (usage in kWh). With regression, there are
no string output labels, so only an integer indicating the number of outputs
is required:
</p>
<pre class="codesplit" data-code-language="javascript">
let options = {
inputs: 3,
outputs: 1,
task: "regression",
};
let energyPredictor = ml5.neuralNetwork(options);</pre
>
<p>
You can set many other properties of the model through the
<code>options</code> object. For example, you could specify the number of
hidden layers between the inputs and outputs (there are typically several),
the number of neurons in each layer, which activation functions to use, and
more. In most cases, however, you can leave out these extra settings and let
ml5.js make its best guess on how to design the model based on the task and
data at hand.
</p>
<h2 id="building-a-gesture-classifier">Building a Gesture Classifier</h2>
<p>
Ill now walk through the steps of the machine learning life cycle with an
example problem well suited for p5.js, building all the code for each step
along the way using ml5.js. Ill begin at step 0 by articulating the
problem. Imagine for a moment that youre working on an interactive
application that responds to gestures. Maybe the gestures are ultimately
meant to be recorded via body tracking, but you want to start with something
much simpler—a single stroke of the mouse (see Figure 10.20).
</p>
<figure>
<img
src="images/10_nn/10_nn_21.png"
alt="Figure 10.20: A single mouse gesture as a vector between a start and end point"
/>
<figcaption>
Figure 10.20:<em> </em>A single mouse gesture as a vector between a start
and end point
</figcaption>
</figure>
<p>
Each gesture could be recorded as a vector extending from the start to the
end point of a mouse movement. The x- and y-components of the vector will be
the models inputs. The models task could be to predict one of four
possible labels for the gesture: <em>up</em>, <em>down</em>, <em>left</em>,
or <em>right</em>. With a discrete set of possible outputs, this sounds like
a classification problem. The four labels will be the models outputs.
</p>
<p>
Much like some of the GA demonstrations in
<a href="/genetic-algorithms#">Chapter 9</a>—and like the simple perceptron
example earlier in this chapter—the problem Im selecting here has a known
solution and could be solved more easily and efficiently without a neural
network. The direction of a vector can be classified with the
<code>heading()</code> function and a series of <code>if</code> statements!
However, by using this seemingly trivial scenario, I hope to explain the
process of training a machine learning model in an understandable and
friendly way. Additionally, this example will make it easy to check that the
code is working as expected. When Im done, Ill provide some ideas about
how to expand the classifier to a scenario that couldnt use simple
<code>if</code> statements.
</p>
<h3 id="collecting-and-preparing-the-data">
Collecting and Preparing the Data
</h3>
<p>
With the problem established, I can turn to steps 1 and 2: collecting and
preparing the data. In the real world, these steps can be tedious,
especially when the raw data you collect is messy and needs a lot of initial
processing. You can think of this like having to organize, wash, and chop
all your ingredients before you can start cooking a meal from scratch.
</p>
<p>
For simplicity, Id instead like to take the approach of ordering a machine
learning “meal kit,” with the ingredients (data) already portioned and
prepared. This way, Ill get straight to the cooking itself, the process of
training the model. After all, this is really just an appetizer for what
will be the ultimate meal in <a href="/neuroevolution#">Chapter 11</a>, when
I apply neural networks to steering agents.
</p>
<p>
With that in mind, Ill handcode some example data and manually keep it
normalized within a range of 1 and +1. Ill organize the data into an array
of objects, pairing the x- and y-components of a vector with a string label.
Im picking values that I feel clearly point in a specific direction and
assigning the appropriate label—two examples per label:
</p>
<pre class="codesplit" data-code-language="javascript">
let data = [
{ x: 0.99, y: 0.02, label: "right" },
{ x: 0.76, y: -0.1, label: "right" },
{ x: -1.0, y: 0.12, label: "left" },
{ x: -0.9, y: -0.1, label: "left" },
{ x: 0.02, y: 0.98, label: "down" },
{ x: -0.2, y: 0.75, label: "down" },
{ x: 0.01, y: -0.9, label: "up" },
{ x: -0.1, y: -0.8, label: "up" },
];</pre
>
<p>Figure 10.21 shows the same data expressed as arrows.</p>
<figure>
<img
src="images/10_nn/10_nn_22.png"
alt="Figure 10.21: The input data visualized as vectors (arrows)"
/>
<figcaption>
Figure 10.21: The input data visualized as vectors (arrows)
</figcaption>
</figure>
<p>
In a more realistic scenario, Id probably have a much larger dataset that
would be loaded in from a separate file, instead of written directly into
the code. For example, JavaScript Object Notation (JSON) and comma-separated
values (CSV) are two popular formats for storing and loading data. JSON
stores data in key-value pairs and follows the same exact format as
JavaScript object literals. CSV is a file format that stores tabular data
(like a spreadsheet). You could use numerous other data formats, depending
on your needs and the programming environment youre working with.
</p>
<p>
In the real world, the values in that larger dataset would actually come
from somewhere. Maybe I would collect the data by asking users to perform
specific gestures and recording their inputs, or by writing an algorithm to
automatically generate larger amounts of synthetic data that represent the
idealized versions of the gestures I want the model to recognize. In either
case, the key would be to collect a diverse set of examples that adequately
represent the variations in how the gestures might be performed. For now,
however, lets see how it goes with just a few servings of data.
</p>
<div data-type="exercise">
<h3 id="exercise-104">Exercise 10.4</h3>
<p>
Create a p5.js sketch that collects gesture data from users and saves it
to a JSON file. You can use <code>mousePressed()</code> and
<code>mouseReleased()</code> to mark the start and end of each gesture,
and <code>saveJSON()</code> to download the data into a file.
</p>
</div>
<h3 id="choosing-a-model">Choosing a Model</h3>
<p>
Ive now come to step 3 of the machine learning life cycle, selecting a
model. This is where Im going to start letting ml5.js do the heavy lifting
for me. To create the model with ml5.js, all I need to do is specify the
task, the inputs, and the outputs:
</p>
<pre class="codesplit" data-code-language="javascript">
let options = {
task: "classification",
inputs: 2,
outputs: ["up", "down", "left", "right"],
debug: true
};
let classifier = ml5.neuralNetwork(options);</pre
>
<p>
Thats it! Im done! Thanks to ml5.js, I can bypass a host of complexities
such as the number of layers and neurons per layer to have, the kinds of
activation functions to use, and how to set up the algorithms for training
the network. The library will make these decisions for me.
</p>
<p>
Of course, the default ml5.js model architecture may not be perfect for all
cases. I encourage you to read the ml5.js documentation for additional
details on how to customize the model. Ill also point out that ml5.js is
able to infer the inputs and outputs from the data, so those properties
arent entirely necessary to include here in the
<code>options</code> object. However, for the sake of clarity (and since
Ill need to specify them for later examples), Im including them here.
</p>
<p>
The <code>debug</code> property, when set to <code>true</code>, turns on a
visual interface for the training process. Its a helpful tool for spotting
potential issues during training and for getting a better understanding of
whats happening behind the scenes. Youll see what this interface looks
like later in the chapter.
</p>
<h3 id="training-the-model">Training the Model</h3>
<p>
Now that I have the data in a <code>data</code> variable and a neural
network initialized in the <code>classifier</code> variable, Im ready to
train the model. That process starts with adding the data to the model. And
for that, it turns out Im not quite done with preparing the data.
</p>
<p>
Right now, my data is neatly organized in an array of objects, each
containing the x- and y-components of a vector and a corresponding string
label. This is a typical format for training data, but it isnt directly
consumable by ml5.js. (Sure, I could have initially organized the data into
a format that ml5.js recognizes, but Im including this extra step because
it will likely be necessary when youre using a dataset that has been
collected or sourced elsewhere.) To add the data to the model, I need to
separate the inputs from the outputs so that the model understands which are
which.
</p>
<p>
The ml5.js library offers a fair amount of flexibility in the kinds of
formats it will accept, but Ill choose to use arrays—one for the
<code>inputs</code> and one for the <code>outputs</code>. I can use a loop
to reorganize each data item and add it to the model:
</p>
<pre class="codesplit" data-code-language="javascript">
for (let item of data) {
// An array of two numbers for the inputs
let inputs = [item.x, item.y];
// A single string label for the output
let outputs = [item.label];
//{!1} Add the training data to the classifier.
classifier.addData(inputs, outputs);
}</pre
>
<p>
What Ive done here is set the <strong>shape</strong> of the data. In
machine learning, this term describes the datas dimensions and structure.
It indicates how the data is organized in terms of rows, columns, and
potentially even deeper, into additional dimensions. Understanding the shape
of your data is crucial because it determines the way the model should be
structured.
</p>
<p>
Here, the input datas shape is a 1D array containing two numbers
(representing <em>x</em> and <em>y</em>). The output data, similarly, is a
1D array containing just a single string label. Every piece of data going in
and out of the network will follow this pattern. While this is a small and
simple example, it nicely mirrors many real-world scenarios in which the
inputs are numerically represented in an array, and the outputs are string
labels.
</p>
<p>
After passing the data into the <code>classifier</code>, ml5.js provides a
helper function to normalize it. As Ive mentioned, normalizing data
(adjusting the scale to a standard range) is a critical step in the machine
learning process:
</p>
<pre class="codesplit" data-code-language="javascript">
// Normalize the data.
classifier.normalizeData();</pre
>
<p>
In this case, the handcoded data was limited to a range of 1 to +1 from the
get-go, so calling <code>normalizeData()</code> here is likely redundant.
Still, this function call is important to demonstrate. Normalizing your data
ahead of time as part of the preprocessing step will absolutely work, but
the auto-normalization feature of ml5.js is a big help!
</p>
<p>
Now for the heart of the machine learning process: actually training the
model. Heres the code:
</p>
<pre class="codesplit" data-code-language="javascript">
// The <code>train()</code> method initiates the training process.
classifier.train(finishedTraining);
// A callback function for when the training is complete
function finishedTraining() {
console.log("Training complete!");
}</pre
>
<p>
Yes, thats it! After all, the hard work has already been completed. The
data was collected, prepared, and fed into the model. All that remains is to
call the <code>train()</code> method, sit back, and let ml5.js do its thing.
</p>
<p>
In truth, it isnt <em>quite</em> that simple. If I were to run the code as
written and then test the model, the results would probably be inadequate.
Heres where another key term in machine learning comes into play:
<strong>epochs</strong>. The <code>train()</code> method tells the neural
network to start the learning process. But how long should it train for? You
can think of an epoch as one round of practice, one cycle of using the
entire training dataset to update the weights of the neural network.
Generally speaking, the more epochs you go through, the better the network
will perform, but at a certain point youll have diminishing returns. The
number of epochs can be set by passing in an <code>options</code> object
into <code>train()</code>:
</p>
<pre class="codesplit" data-code-language="javascript">
//{!1} Set the number of epochs for training.
let options = { epochs: 25 };
classifier.train(options, finishedTraining);</pre
>
<p>
The number of epochs is an example of a hyperparameter, a global setting for
the training process. You can set others through the
<code>options</code> object (the learning rate, for example), but Im going
to stick with the defaults. You can read more about customization options in
the ml5.js documentation.
</p>
<p>
The second argument to <code>train()</code> is optional, but its good to
include one. It specifies a callback function that runs when the training
process is complete—in this case, <code>finshedTraining()</code>. (See the
“Callbacks” box for more on callback functions.) This is useful for knowing
when you can proceed to the next steps in your code. Another optional
callback, which I usually name <code>whileTraining()</code>, is triggered
after each epoch. However, for my purposes, knowing when the training is
done is plenty!
</p>
<div data-type="note">
<h3 id="callbacks">Callbacks</h3>
<p>
A <strong>callback function</strong> in JavaScript is a function you dont
actually call yourself. Instead, you provide it as an argument to another
function, intending for it to be <em>called back</em> automatically at a
later time (typically associated with an event, like a mouse click).
Youve seen this before when working with Matter.js in
<a href="/physics-libraries#">Chapter 6</a>, where you specified a
function to call whenever a collision was detected.
</p>
<p>
Callbacks are needed for <strong>asynchronous</strong> operations, when
you want your code to continue along with animating or doing other things
while waiting for another task (like training a machine learning model) to
finish. A classic example of this in p5.js is loading data into a sketch
with <code>loadJSON()</code>.
</p>
<p>
JavaScript also provides a more recent approach for handling asynchronous
operations known as <strong>promises</strong>. With promises, you can use
keywords like <code>async</code> and <code>await</code> to make your
asynchronous code look more like traditional synchronous code. While
ml5.js also supports this style, Ill stick to using callbacks to stay
aligned with p5.js style.
</p>
</div>
<h3 id="evaluating-the-model">Evaluating the Model</h3>
<p>
If <code>debug</code> is set to <code>true</code> in the initial call to
<code>ml5.neuralNetwork()</code>, a visual interface should appear after
<code>train()</code> is called, covering most of the p5.js page and canvas
(see Figure 10.22). This interface, called the <em>Visor</em>, represents
the evaluation step.
</p>
<figure>
<img
src="images/10_nn/10_nn_23.png"
alt="Figure 10.22: The Visor, with a graph of the loss function and model details"
/>
<figcaption>
Figure 10.22: The Visor, with a graph of the loss function and model
details
</figcaption>
</figure>
<p>
The Visor comes from TensorFlow.js (which underlies ml5.js) and includes a
graph that provides real-time feedback on the progress of the training. This
graph plots the loss of the model on the y-axis against the number of epochs
along the x-axis. <strong>Loss</strong> is a measure of how far off the
models predictions are from the correct outputs provided by the training
data. It quantifies the models total error. When training begins, its
common for the loss to be high because the model has yet to learn anything.
Ideally, as the model trains through more epochs, it should get better at
its predictions, and the loss should decrease. If the graph goes down as the
epochs increase, this is a good sign!
</p>
<p>
Running the training for the 200 epochs depicted in Figure 10.21 might
strike you as a bit excessive. In a real-world scenario with more extensive
data, I would probably use fewer epochs, like the 25 I specified in the
original code snippet. However, because the dataset here is so tiny, the
higher number of epochs helps the model get enough practice with the data.
Remember, this is a toy example, aiming to make the concepts clear rather
than to produce a sophisticated machine learning model.
</p>
<p>
Below the graph, the Visor shows a Model Summary table with details on the
lower-level TensorFlow.js model architecture created behind the scenes. The
summary includes layer names, neuron counts per layer (in the Output Shape
column), and a parameters count, which is the total number of weights, one
for each connection between two neurons. In this case, dense_Dense1 is the
hidden layer with 16 neurons (a number chosen by ml5.js), and dense_Dense2
is the output layer with 4 neurons, one for each classification category.
(TensorFlow.js doesnt think of the inputs as a distinct layer; rather,
theyre merely the starting point of the data flow.) The <em>batch</em> in
the Output Shape column doesnt refer to a specific number but indicates
that the model can process a variable amount of training data (a batch) for
any single cycle of model training.
</p>
<p>
Before moving on from the evaluation stage, I have a loose end to tie up.
When I first outlined the steps of the machine learning life cycle, I
mentioned that preparing the data typically involves splitting the dataset
into three parts to help with the evaluation process:
</p>
<ul>
<li>
<strong>Training:</strong> The primary dataset used to train the model
</li>
<li>
<strong>Validation:</strong> A subset of the data used to check the model
during training, typically at the end of each epoch
</li>
<li>
<strong>Testing:</strong> Additional untouched data never considered
during the training process, for determining the models final performance
after the training is completed
</li>
</ul>
<p>
You may have noticed that I never did this. For simplicity, Ive instead
used the entire dataset for training. After all, my dataset has only eight
records; its much too small to divide three sets! With a large dataset,
this three-way split would be more appropriate.
</p>
<p>
Using such a small dataset risks the model <strong>overfitting</strong> the
data, however: the model becomes so tuned to the specific peculiarities of
the training data that its much less effective when working with new,
unseen data. The main reason to use a validation set is to monitor the model
during the training process. As training progresses, if the models accuracy
improves on the training data but deteriorates on the validation data, its
a strong indicator that overfitting might be occurring. (The testing set is
reserved strictly for the final evaluation, one more chance after training
is complete to gauge the models performance.)
</p>
<p>
For more realistic scenarios, ml5.js provides a way to split up the data, as
well as automatic features for employing validation data. If youre inclined
to go further,
<a href="http://ml5js.org/"
>you can explore the full set of neural network examples on the ml5.js
website</a
>.
</p>
<h3 id="tuning-the-parameters">Tuning the Parameters</h3>
<p>
After the evaluation step, theres typically an iterative process of
adjusting hyperparameters and going through training again to achieve the
best performance from the model. While ml5.js offers capabilities for
parameter tuning (which you can learn about in the librarys reference), it
isnt really geared toward making low-level, fine-grained adjustments to a
model. Using TensorFlow.js directly might be your best bet if you want to
explore this step in more detail, since it offers a broader suite of tools
and allows for lower-level control over the training process.
</p>
<p>
In this case, tuning the parameters isnt strictly necessary. The graph in
the Visor shows a loss all the way down at 0.1, which is plenty accurate for
my purposes. Im happy to move on.
</p>
<h3 id="deploying-the-model">Deploying the Model</h3>
<p>
Its finally time to deploy the model and see the payoff of all that hard
work. This typically involves integrating the model into a separate
application to make predictions or decisions based on new, previously unseen
data. For this, ml5.js offers the convenience of a
<code>save()</code> function to download the trained model to a file from
one sketch and a <code>load()</code> function to load it for use in a
completely different sketch. This saves you from having to retrain the model
from scratch every single time you need it.
</p>
<p>
While a model would typically be deployed to a different sketch from the one
where it was trained, Im going to deploy the model in the same sketch for
the sake of simplicity. In fact, once the training process is complete, the
resulting model is, in essence, already deployed in the current sketch. Its
saved in the <code>classifier</code> variable and can be used to make
predictions by passing the model new data through the
<code>classify()</code> method. The shape of the data sent to
<code>classify()</code> should match that of the input data used in
training—in this case, two floating-point numbers, representing the x- and
y-components of a direction vector:
</p>
<pre class="codesplit" data-code-language="javascript">
// Manually create a vector.
let direction = createVector(1, 0);
// Convert the x- and y-components into an input array.
let inputs = [direction.x, direction.y];
// Ask the model to classify the inputs.
classifier.classify(inputs, gotResults);</pre
>
<p>
The second argument to <code>classify()</code> is another callback function
for accessing the results:
</p>
<pre class="codesplit" data-code-language="javascript">
function gotResults(results) {
console.log(results);
}</pre
>
<p>
The models prediction arrives in the argument to the callback, which Im
calling <code>results</code> in the code. Inside, youll find an array of
the possible labels, sorted by <strong>confidence</strong>, a probability
value that the model assigns to each label. These probabilities represent
how sure the model is of that particular prediction. They range from 0 to 1,
with values closer to 1 indicating higher confidence and values near 0
suggesting lower confidence:
</p>
<pre class="codesplit" data-code-language="json">
[
{
"label": "right",
"confidence": 0.9669702649116516
},
{
"label": "up",
"confidence": 0.01878807507455349
},
{
"label": "down",
"confidence": 0.013948931358754635
},
{
"label": "left",
"confidence": 0.00029277068097144365
}
]</pre
>
<p>
In this example output, the model is highly confident (approximately 96.7
percent) that the correct label is <code>"right"</code>, while it has
minimal confidence (0.03 percent) in the <code>"left"</code> label. The
confidence values are normalized and add up to 100 percent.
</p>
<p>
All that remains now is to fill out the sketch with code so the model can
receive live input from the mouse. The first step is to signal the
completion of the training process so the user knows the model is ready.
Ill include a global <code>status</code> variable to track the training
process and ultimately display the predicted label on the canvas. The
variable is initialized to <code>"training"</code> but updated to
<code>"ready"</code> through the <code>finishedTraining()</code> callback:
</p>
<pre class="codesplit" data-code-language="javascript">
// When the sketch starts, it will show a status of <code>training</code>.
let status = "training";
function draw() {
background(255);
textAlign(CENTER, CENTER);
textSize(64);
text(status, width / 2, height / 2);
}
// This is the callback for when training is complete, and the message changes to <code>ready</code>.
function finishedTraining() {
status = "ready";
}</pre
>
<p>
Finally, Ill use p5.jss mouse functions to build a vector while the mouse
is being dragged and call <code>classifier.classify()</code> on that vector
when the mouse is clicked.
</p>
<div data-type="example">
<h3 id="example-102-gesture-classifier">
Example 10.2: Gesture Classifier
</h3>
<figure>
<div
data-type="embed"
data-p5-editor="https://editor.p5js.org/natureofcode/sketches/SbfSv_GhM"
data-example-path="examples/10_nn/10_2_gesture_classifier"
>
<img src="examples/10_nn/10_2_gesture_classifier/screenshot.png" />
</div>
<figcaption></figcaption>
</figure>
</div>
<pre class="codesplit" data-code-language="javascript">
// Store the start of a gesture when the mouse is pressed.
function mousePressed() {
start = createVector(mouseX, mouseY);
}
// Update the end of a gesture as the mouse is dragged.
function mouseDragged() {
end = createVector(mouseX, mouseY);
}
// The gesture is complete when the mouse is released.
function mouseReleased() {
// Calculate and normalize a direction vector.
let dir = p5.Vector.sub(end, start);
dir.normalize();
// Convert to an input array and classify.
let inputs = [dir.x, dir.y];
classifier.classify(inputs, gotResults);
}
// Store the resulting label in the <code>status</code> variable for showing in the canvas.
function gotResults(error, results) {
status = results[0].label;
}</pre
>
<p>
Since the <code>results</code> array is sorted by confidence, if I just want
to use a single label as the prediction, I can access the first element of
the array with <code>results[0].label</code>, as in the
<code>gotResults()</code> function in Example 10.2. This label is passed to
the <code>status</code> variable to be displayed on the canvas.
</p>
<div data-type="exercise">
<h3 id="exercise-105">Exercise 10.5</h3>
<p>
Divide Example 10.2 into three sketches: one for collecting data, one for
training, and one for deployment. Use the
<code>ml5.neuralNetwork</code> functions <code>save()</code> and
<code>load()</code> for saving and loading the model to and from a file,
respectively.
</p>
</div>
<div data-type="exercise">
<h3 id="exercise-106">Exercise 10.6</h3>
<p>
Expand the gesture-recognition model to classify a sequence of vectors,
capturing more accurately the path of a longer mouse movement. Remember,
your input data must have a consistent shape, so youll have to decide how
many vectors to use to represent a gesture and store no more and no less
for each data point. While this approach can work, other machine learning
models (such as recurrent neural networks) are specifically designed to
handle sequential data and might offer more flexibility and potential
accuracy.
</p>
</div>
<div data-type="exercise">
<h3 id="exercise-107">Exercise 10.7</h3>
<p>
One of the pretrained models in ml5.js is called <em>Handpose</em>. The
input of the model is an image, and the prediction is a list of 21 key
points—x- and y-positions, also known as <em>landmarks</em>—that describe
a hand.
</p>
<figure>
<img src="images/10_nn/10_nn_24.png" alt="" />
<figcaption></figcaption>
</figure>
<p>
Can you use the outputs of the <code>ml5.handpose()</code> model as the
inputs to an <code>ml5.neuralNetwork()</code> and classify various hand
gestures (like a thumbs-up or thumbs-down)? For hints, you can watch my
<a href="https://thecodingtrain.com/pose-classifier"
>video tutorial that walks you through this process for body poses in
the machine learning track on the Coding Train website</a
>.
</p>
</div>
<div data-type="project">
<h3 id="the-ecosystem-project-11">The Ecosystem Project</h3>
<p>
Incorporate machine learning into your ecosystem to enhance the behavior
of creatures. How could classification or regression be applied?
</p>
<ul>
<li>
Can you classify the creatures of your ecosystem into multiple
categories? What if you use an initial population as a training dataset,
and as new creatures are born, the system classifies them according to
their features? What are the inputs and outputs for your system?
</li>
<li>
Can you use a regression to predict the life span of a creature based on
its properties? Think about how size and speed affected the life span of
the bloops from <a href="/genetic-algorithms#">Chapter 9</a>. Could you
analyze how well the regression models predictions align with the
actual outcomes?
</li>
</ul>
<figure>
<img src="images/10_nn/10_nn_25.png" alt="" />
<figcaption></figcaption>
</figure>
</div>
<p></p>
</section>