mirror of
https://github.com/nature-of-code/noc-book-2
synced 2024-11-17 07:49:05 +01:00
2458 lines
106 KiB
HTML
2458 lines
106 KiB
HTML
<section data-type="chapter">
|
||
<h1 id="chapter-10-neural-networks">Chapter 10. Neural Networks</h1>
|
||
<div class="chapter-opening-quote">
|
||
<blockquote data-type="epigraph">
|
||
<p>The human brain has 100 billion neurons,</p>
|
||
<p>each neuron connected to 10 thousand</p>
|
||
<p>other neurons. Sitting on your shoulders</p>
|
||
<p>is the most complicated object</p>
|
||
<p>in the known universe.</p>
|
||
<div class="chapter-opening-quote-source">
|
||
<p>—Michio Kaku</p>
|
||
</div>
|
||
</blockquote>
|
||
</div>
|
||
<div class="chapter-opening-figure">
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_1.jpg" alt="" />
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
<h3
|
||
id="khipu-on-display-at-the-machu-picchu-museum-cusco-peru-photo-by-pi3124"
|
||
>
|
||
Khipu on display at the Machu Picchu Museum, Cusco, Peru (photo by
|
||
Pi3.124)
|
||
</h3>
|
||
<p>
|
||
The <em>khipu</em> (or <em>quipu</em>) is an ancient Incan device used for
|
||
recordkeeping and communication. It comprised a complex system of knotted
|
||
cords to encode and transmit information. Each colored string and knot
|
||
type and pattern represented specific data, such as census records or
|
||
calendrical information. Interpreters, known as <em>quipucamayocs</em>,
|
||
acted as a kind of accountant and decoded the stringed narrative into
|
||
understandable information.
|
||
</p>
|
||
</div>
|
||
<p>
|
||
I began with inanimate objects living in a world of forces, and I gave them
|
||
desires, autonomy, and the ability to take action according to a system of
|
||
rules. Next, I allowed those objects, now called <em>creatures</em>, to live
|
||
in a population and evolve over time. Now I’d like to ask, What is each
|
||
creature’s decision-making process? How can it adjust its choices by
|
||
learning over time? Can a computational entity process its environment and
|
||
generate a decision?
|
||
</p>
|
||
<p>
|
||
To answer these questions, I’ll once again look to nature for
|
||
inspiration—specifically, the human brain. A brain can be described as a
|
||
biological <strong>neural network</strong>, an interconnected web of neurons
|
||
transmitting elaborate patterns of electrical signals. Within each neuron,
|
||
dendrites receive input signals, and based on those inputs, the neuron fires
|
||
an output signal via an axon (see Figure 10.1). Or something like that. How
|
||
the human brain actually works is an elaborate and complex mystery, one that
|
||
I’m certainly not going to attempt to unravel in rigorous detail in this
|
||
chapter.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_2.png"
|
||
alt="Figure 10.1: A neuron with dendrites and an axon connected to another neuron"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.1: A neuron with dendrites and an axon connected to another
|
||
neuron
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
Fortunately, as you’ve seen throughout this book, developing engaging
|
||
animated systems with code doesn’t require scientific rigor or accuracy.
|
||
Designing a smart rocket isn’t rocket science, and neither is designing an
|
||
artificial neural network brain science. It’s enough to simply be inspired
|
||
by the <em>idea</em> of brain function.
|
||
</p>
|
||
<p>
|
||
In this chapter, I’ll begin with a conceptual overview of the properties and
|
||
features of neural networks and build the simplest possible example of one,
|
||
a network that consists of a single neuron. I’ll then introduce you to more
|
||
complex neural networks by using the ml5.js library. This will serve as a
|
||
foundation for <a href="/neuroevolution#">Chapter 11</a>, the grand finale
|
||
of this book, where I’ll combine GAs with neural networks for physics
|
||
simulation.
|
||
</p>
|
||
<h2 id="introducing-artificial-neural-networks">
|
||
Introducing Artificial Neural Networks
|
||
</h2>
|
||
<p>
|
||
Computer scientists have long been inspired by the human brain. In 1943,
|
||
Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician,
|
||
developed the first conceptual model of an artificial neural network. In
|
||
their paper “A Logical Calculus of the Ideas Immanent in Nervous Activity,”
|
||
they describe a <strong>neuron </strong>as a single computational cell
|
||
living in a network of cells that receives inputs, processes those inputs,
|
||
and generates an output.
|
||
</p>
|
||
<p>
|
||
Their work, and the work of many scientists and researchers who followed,
|
||
wasn’t meant to accurately describe how the biological brain works. Rather,
|
||
an <em>artificial</em> neural network (hereafter referred to as just a
|
||
<em>neural network</em>) was intended as a computational model based on the
|
||
brain, designed to solve certain kinds of problems that were traditionally
|
||
difficult for computers.
|
||
</p>
|
||
<p>
|
||
Some problems are incredibly simple for a computer to solve but difficult
|
||
for humans like you and me. Finding the square root of 964,324 is an
|
||
example. A quick line of code produces the value 982, a number my computer
|
||
can compute in less than a millisecond, but if you asked me to calculate
|
||
that number myself, you’d be in for quite a wait. On the other hand, certain
|
||
problems are incredibly simple for you or me to solve, but not so easy for a
|
||
computer. Show any toddler a picture of a kitten or puppy, and they’ll
|
||
quickly be able to tell you which one is which. Listen to a conversation in
|
||
a noisy café and focus on just one person’s voice, and you can effortlessly
|
||
comprehend their words. But need a machine to perform one of these tasks?
|
||
Scientists have spent entire careers researching and implementing complex
|
||
solutions, and neural networks are one of them.
|
||
</p>
|
||
<p>
|
||
Here are some of the easy-for-a-human, difficult-for-a-machine applications
|
||
of neural networks in software today:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<strong>Pattern recognition:</strong> Neural networks are well suited to
|
||
problems when the aim is to detect, interpret, and classify features or
|
||
patterns within a dataset. This includes everything from identifying
|
||
objects (like faces) in images, to optical character recognition, to more
|
||
complex tasks like gesture recognition.
|
||
</li>
|
||
<li>
|
||
<strong>Time-series prediction and anomaly detection: </strong>Neural
|
||
networks are utilized both in forecasting, such as predicting stock market
|
||
trends or weather patterns, and in recognizing anomalies, which can be
|
||
applied to areas like cyberattack detection and fraud prevention.
|
||
</li>
|
||
<li>
|
||
<strong>Natural language processing (NLP):</strong> One of the biggest
|
||
developments in recent years has been the use of neural networks for
|
||
processing and understanding human language. They’re used in various tasks
|
||
including machine translation, sentiment analysis, and text summarization,
|
||
and are the underlying technology behind many digital assistants and
|
||
chatbots.
|
||
</li>
|
||
<li>
|
||
<strong>Signal processing and soft sensors:</strong> Neural networks play
|
||
a crucial role in devices like cochlear implants and hearing aids by
|
||
filtering noise and amplifying essential sounds. They’re also involved in
|
||
<em>soft sensors</em>, software systems that process data from multiple
|
||
sources to give a comprehensive analysis of the environment.
|
||
</li>
|
||
<li>
|
||
<strong>Control and adaptive decision-making systems: </strong>These
|
||
applications range from autonomous vehicles like self-driving cars and
|
||
drones to adaptive decision-making used in game playing, pricing models,
|
||
and recommendation systems on media platforms.
|
||
</li>
|
||
<li>
|
||
<strong>Generative models:</strong> The rise of novel neural network
|
||
architectures has made it possible to generate new content. These systems
|
||
can synthesize images, enhance image resolution, transfer style between
|
||
images, and even generate music and video.
|
||
</li>
|
||
</ul>
|
||
<p>
|
||
Covering the full gamut of applications for neural networks would merit an
|
||
entire book (or series of books), and by the time that book was printed, it
|
||
would probably be out of date. Hopefully, this list gives you an overall
|
||
sense of the features and possibilities.
|
||
</p>
|
||
<h3 id="how-neural-networks-work">How Neural Networks Work</h3>
|
||
<p>
|
||
In some ways, neural networks are quite different from other computer
|
||
programs. The computational systems I’ve been writing so far in this book
|
||
are <strong>procedural</strong>: a program starts at the first line of code,
|
||
executes it, and goes on to the next, following instructions in a linear
|
||
fashion. By contrast, a true neural network doesn’t follow a linear path.
|
||
Instead, information is processed collectively, in parallel, throughout a
|
||
network of nodes, with each node representing a neuron. In this sense, a
|
||
neural network is considered a <strong>connectionist </strong>system.
|
||
</p>
|
||
<p>
|
||
In other ways, neural networks aren’t so different from some of the programs
|
||
you’ve seen. A neural network exhibits all the hallmarks of a complex
|
||
system, much like a cellular automaton or a flock of boids. Remember how
|
||
each individual boid was simple to understand, yet by following only three
|
||
rules—separation, alignment, cohesion—it contributed to complex behaviors?
|
||
Each individual element in a neural network is equally simple to understand.
|
||
It reads an input (a number), processes it, and generates an output (another
|
||
number). That’s all there is to it, and yet a network of many neurons can
|
||
exhibit incredibly rich and intelligent behaviors, echoing the complex
|
||
dynamics seen in a flock of boids.
|
||
</p>
|
||
<div class="half-width-right">
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_3.png"
|
||
alt="Figure 10.2: A neural network is a system of neurons and connections."
|
||
/>
|
||
<figcaption>
|
||
Figure 10.2: A neural network is a system of neurons and connections.
|
||
</figcaption>
|
||
</figure>
|
||
</div>
|
||
<p>
|
||
In fact, a neural network isn’t just a complex system, but a complex
|
||
<em>adaptive</em> system, meaning it can change its internal structure based
|
||
on the information flowing through it. In other words, it has the ability to
|
||
learn. Typically, this is achieved by adjusting <strong>weights</strong>. In
|
||
Figure 10.2, each arrow represents a connection between two neurons and
|
||
indicates the pathway for the flow of information. Each connection has a
|
||
weight, a number that controls the signal between the two neurons. If the
|
||
network generates a <em>good</em> output (which I’ll define later), there’s
|
||
no need to adjust the weights. However, if the network generates a
|
||
<em>poor</em> output—an error, so to speak—then the system adapts, altering
|
||
the weights with the hope of improving subsequent results.
|
||
</p>
|
||
<p>
|
||
Neural networks may use a variety of strategies for learning, and I’ll focus
|
||
on one of them in this chapter:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<strong>Supervised learning:</strong> Essentially, this strategy involves
|
||
a teacher that’s smarter than the network itself. Take the case of facial
|
||
recognition. The teacher shows the network a bunch of faces, and the
|
||
teacher already knows the name associated with each face. The network
|
||
makes its guesses; then the teacher provides the network with the actual
|
||
names. The network can compare its answers to the known correct ones and
|
||
make adjustments according to its errors. The neural networks in this
|
||
chapter follow this model.
|
||
</li>
|
||
<li>
|
||
<strong>Unsupervised learning:</strong> This technique is required when
|
||
you don’t have an example dataset with known answers. Instead, the network
|
||
works on its own to uncover hidden patterns in the data. An application of
|
||
this is clustering: a set of elements is divided into groups according to
|
||
an unknown pattern. I won’t be showing any instances of unsupervised
|
||
learning, as the strategy is less relevant to the book’s examples.
|
||
</li>
|
||
<li>
|
||
<strong>R</strong><strong>einforcement learning:</strong> This strategy is
|
||
built on observation: a learning agent makes decisions and looks to its
|
||
environment for the results. It’s rewarded for good decisions and
|
||
penalized for bad decisions, such that it learns to make better decisions
|
||
over time. I’ll discuss this strategy in more detail in
|
||
<a href="/neuroevolution#">Chapter 11</a>.
|
||
</li>
|
||
</ul>
|
||
<p>
|
||
The ability of a neural network to learn, to make adjustments to its
|
||
structure over time, is what makes it so useful in the field of
|
||
<strong>machine learning</strong>. This term can be traced back to the 1959
|
||
paper “Some Studies in Machine Learning Using the Game of Checkers,” in
|
||
which computer scientist Arthur Lee Samuel outlines a “self-learning”
|
||
program for playing checkers. The concept of an algorithm enabling a
|
||
computer to learn without explicit programming is the foundation of machine
|
||
learning.
|
||
</p>
|
||
<p>
|
||
Think about what you’ve been doing throughout this book: coding! In
|
||
traditional programming, a computer program takes inputs and, based on the
|
||
rules you’ve provided, produces outputs. Machine learning, however, turns
|
||
this approach upside down. Instead of you writing the rules, the system is
|
||
given example inputs and outputs, and generates the rules itself! Many
|
||
algorithms can be used to implement machine learning, and a neural network
|
||
is just one of them.
|
||
</p>
|
||
<p>
|
||
Machine learning is part of the broad, sweeping field of
|
||
<strong>artificial intelligence (AI)</strong>, although the terms are
|
||
sometimes used interchangeably. In their thoughtful and friendly primer
|
||
<em>A People’s Guide to AI</em>, Mimi Onuoha and Diana Nucera (aka Mother
|
||
Cyborg) define AI as “the theory and development of computer systems able to
|
||
perform tasks that normally require human intelligence.” Machine learning
|
||
algorithms are one approach to these tasks, but not all AI systems feature a
|
||
self-learning component.
|
||
</p>
|
||
<h3 id="machine-learning-libraries">Machine Learning Libraries</h3>
|
||
<p>
|
||
Today, leveraging machine learning in creative coding and interactive media
|
||
isn’t only feasible but increasingly common, thanks to third-party libraries
|
||
that handle a lot of the neural network implementation details under the
|
||
hood. While the vast majority of machine learning development and research
|
||
is done in Python, the world of web development has seen the emergence of
|
||
powerful JavaScript-based tools. Two libraries of note are TensorFlow.js and
|
||
ml5.js.
|
||
</p>
|
||
<p>
|
||
TensorFlow.js<strong> </strong>is an open source library that lets you
|
||
define, train, and run neural networks directly in the browser using
|
||
JavaScript, without the need to install or configure complex environments.
|
||
It’s part of the TensorFlow ecosystem, which is maintained and developed by
|
||
Google. TensorFlow.js is a powerful tool, but its low-level operations and
|
||
highly technical API can be intimidating to beginners. Enter ml5.js, a
|
||
library built on top of TensorFlow.js and designed specifically for use with
|
||
p5.js. Its goal is to be beginner friendly and make machine learning
|
||
approachable for a broad audience of artists, creative coders, and students.
|
||
I’ll demonstrate how to use ml5.js in
|
||
<a href="#machine-learning-with-ml5js">“Machine Learning with ml5.js”</a>.
|
||
</p>
|
||
<p>
|
||
A benefit of libraries like TensorFlow.js and ml5.js is that you can use
|
||
them to run pretrained models. A machine learning <strong>model</strong> is
|
||
a specific setup of neurons and connections, and a
|
||
<strong>pretrained</strong> model is one that has already been prepared for
|
||
a particular task. For example, popular pretrained models are used for
|
||
classifying images, identifying body poses, recognizing facial landmarks or
|
||
hand positions, and even analyzing the sentiment expressed in a text. You
|
||
can use such a model as is or treat it as a starting point for additional
|
||
learning (commonly referred to as <strong>transfer learning</strong>).
|
||
</p>
|
||
<p>
|
||
Before I get to exploring the ml5.js library, however, I’d like to try my
|
||
hand at building the simplest of all neural networks from scratch, using
|
||
only p5.js, to illustrate how the concepts of neural networks and machine
|
||
learning are implemented in code.
|
||
</p>
|
||
<h2 id="the-perceptron">The Perceptron</h2>
|
||
<p>
|
||
A <strong>perceptron</strong> is the simplest neural network possible: a
|
||
computational model of a single neuron. Invented in 1957 by Frank Rosenblatt
|
||
at the Cornell Aeronautical Laboratory, a perceptron consists of one or more
|
||
inputs, a processor, and a single output, as shown in Figure 10.3.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_4.png"
|
||
alt="Figure 10.3: A simple perceptron with two inputs and one output"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.3: A simple perceptron with two inputs and one output
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
A perceptron follows the <strong>feed-forward</strong> model: data passes
|
||
(feeds) through the network in one direction. The inputs are sent into the
|
||
neuron, are processed, and result in an output. This means the one-neuron
|
||
network diagrammed in Figure 10.3 reads from left to right (forward): inputs
|
||
come in, and output goes out.
|
||
</p>
|
||
<p>
|
||
Say I have a perceptron with two inputs, the values 12 and 4. In machine
|
||
learning, it’s customary to denote each input with an
|
||
<span data-type="equation">x</span>, so I’ll call these inputs
|
||
<span data-type="equation">x_0</span> and
|
||
<span data-type="equation">x_1</span>:
|
||
</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th style="width: 100px">Phrase</th>
|
||
<th>Value</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><span data-type="equation">x_0</span></td>
|
||
<td>12</td>
|
||
</tr>
|
||
<tr>
|
||
<td><span data-type="equation">x_1</span></td>
|
||
<td>4</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<h3 id="perceptron-steps">Perceptron Steps</h3>
|
||
<p>
|
||
To get from these inputs to an output, the perceptron follows a series of
|
||
steps.
|
||
</p>
|
||
<h4 id="step-1-weight-the-inputs">Step 1: Weight the Inputs</h4>
|
||
<p>
|
||
Each input sent into the neuron must first be weighted, meaning it’s
|
||
multiplied by a value, often a number from –1 to +1. When creating a
|
||
perceptron, the inputs are typically assigned random weights. I’ll call my
|
||
weights <span data-type="equation">w_0</span> and
|
||
<span data-type="equation">w_1</span>:
|
||
</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th style="width: 100px">Phrase</th>
|
||
<th>Value</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><span data-type="equation">w_0</span></td>
|
||
<td>0.5</td>
|
||
</tr>
|
||
<tr>
|
||
<td><span data-type="equation">w_1</span></td>
|
||
<td>–1</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>Each input needs to be multiplied by its corresponding weight:</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th style="width: 100px">Phrase</th>
|
||
<th style="width: 100px">Phrase</th>
|
||
<th>
|
||
Input <span data-type="equation">\boldsymbol{\times}</span> Weight
|
||
</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>12</td>
|
||
<td>0.5</td>
|
||
<td>6</td>
|
||
</tr>
|
||
<tr>
|
||
<td>4</td>
|
||
<td>–1</td>
|
||
<td>–4</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<h4 id="step-2-sum-the-inputs">Step 2: Sum the Inputs</h4>
|
||
<p>The weighted inputs are then added together:</p>
|
||
<div data-type="equation">6 + -4 = 2</div>
|
||
<h4 id="step-3-generate-the-output">Step 3: Generate the Output</h4>
|
||
<p>
|
||
The output of a perceptron is produced by passing the sum through an
|
||
<strong>activation function</strong> that reduces the output to one of two
|
||
possible values. Think of this binary output as an LED that’s only
|
||
<em>off</em> or <em>on</em>, or as a neuron in an actual brain that either
|
||
fires or doesn’t fire. The activation function determines whether the
|
||
perceptron should “fire.”
|
||
</p>
|
||
<p>
|
||
Activation functions can get a little bit hairy. If you start reading about
|
||
them in an AI textbook, you may soon find yourself reaching in turn for a
|
||
calculus textbook. However, your new friend the simple perceptron provides
|
||
an easier option that still demonstrates the concept. I’ll make the
|
||
activation function the sign of the sum. If the sum is a positive number,
|
||
the output is 1; if it’s negative, the output is –1:
|
||
</p>
|
||
<div data-type="equation">\text{sign}(2) = +1</div>
|
||
<h3 id="putting-it-all-together-1">Putting It All Together</h3>
|
||
<p>
|
||
Putting the preceding three parts together, here are the steps of the
|
||
<strong>perceptron algorithm</strong>:
|
||
</p>
|
||
<ol>
|
||
<li>For every input, multiply that input by its weight.</li>
|
||
<li>Sum all the weighted inputs.</li>
|
||
<li>
|
||
Compute the output of the perceptron by passing that sum through an
|
||
activation function (the sign of the sum).
|
||
</li>
|
||
</ol>
|
||
<p>
|
||
I can start writing this algorithm in code by using two arrays of values,
|
||
one for the inputs and one for the weights:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
let inputs = [12, 4];
|
||
let weights = [0.5, -1];</pre
|
||
>
|
||
<p>
|
||
The “for every input” in step 1 implies a loop that multiplies each input by
|
||
its corresponding weight. To obtain the sum, the results can be added up in
|
||
that same loop:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Steps 1 and 2: Add up all the weighted inputs.
|
||
let sum = 0;
|
||
for (let i = 0; i < inputs.length; i++) {
|
||
sum += inputs[i] * weights[i];
|
||
}</pre
|
||
>
|
||
<p>With the sum, I can then compute the output:</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Step 3: Pass the sum through an activation function.
|
||
let output = activate(sum);
|
||
|
||
// The activation function
|
||
function activate(sum) {
|
||
//{!5} Return a 1 if positive, –1 if negative.
|
||
if (sum > 0) {
|
||
return 1;
|
||
} else {
|
||
return -1;
|
||
}
|
||
}</pre
|
||
>
|
||
<p>
|
||
You might be wondering how I’m handling the value of 0 in the activation
|
||
function. Is 0 positive or negative? The deep philosophical implications of
|
||
this question aside, I’m choosing here to arbitrarily return a –1 for 0, but
|
||
I could easily change the <code>></code> to <code>>=</code> to go the other
|
||
way. Depending on the application, this decision could be significant, but
|
||
for demonstration purposes here, I can just pick one.
|
||
</p>
|
||
<p>
|
||
Now that I’ve explained the computational process of a perceptron, let’s
|
||
look at an example of one in action.
|
||
</p>
|
||
<h3 id="simple-pattern-recognition-using-a-perceptron">
|
||
Simple Pattern Recognition Using a Perceptron
|
||
</h3>
|
||
<p>
|
||
I’ve mentioned that neural networks are commonly used for pattern
|
||
recognition. The scenarios outlined earlier require more complex networks,
|
||
but even a simple perceptron can demonstrate a fundamental type of pattern
|
||
recognition in which data points are classified as belonging to one of two
|
||
groups. For instance, imagine you have a dataset of plants and want to
|
||
identify them as either <em>xerophytes</em> (plants that have evolved to
|
||
survive in an environment with little water and lots of sunlight, like the
|
||
desert) or <em>hydrophytes</em> (plants that have adapted to living
|
||
submerged in water, with reduced light). That’s how I’ll use my perceptron
|
||
in this section.
|
||
</p>
|
||
<p>
|
||
One way to approach classifying the plants is to plot their data on a 2D
|
||
graph and treat the problem as a spatial one. On the x-axis, plot the amount
|
||
of daily sunlight received by the plant, and on the y-axis, plot the amount
|
||
of water. Once all the data has been plotted, it’s easy to draw a line
|
||
across the graph, with all the xerophytes on one side and all the
|
||
hydrophytes on the other, as in Figure 10.4. (I’m simplifying a little here.
|
||
Real-world data would probably be messier, making the line harder to draw.)
|
||
That’s how each plant can be classified. Is it below the line? Then it’s a
|
||
xerophyte. Is it above the line? Then it’s a hydrophyte.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_5.png"
|
||
alt="Figure 10.4: A collection of points in 2D space divided by a line, representing plant categories according to their water and sunlight intake "
|
||
/>
|
||
<figcaption>
|
||
Figure 10.4: A collection of points in 2D space divided by a line,
|
||
representing plant categories according to their water and sunlight intake
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
In truth, I don’t need a neural network—not even a simple perceptron—to tell
|
||
me whether a point is above or below a line. I can see the answer for myself
|
||
with my own eyes, or have my computer figure it out with simple algebra. But
|
||
just like solving a problem with a known answer—“to be or not to be”—was a
|
||
convenient first test for the GA in
|
||
<a href="/genetic-algorithms#">Chapter 9</a>, training a perceptron to
|
||
categorize points as being on one side of a line versus the other will be a
|
||
valuable way to demonstrate the algorithm of the perceptron and verify that
|
||
it’s working properly.
|
||
</p>
|
||
<p>
|
||
To solve this problem, I’ll give my perceptron two inputs:
|
||
<span data-type="equation">x_0</span> is the x-coordinate of a point,
|
||
representing a plant’s amount of sunlight, and
|
||
<span data-type="equation">x_1</span> is the y-coordinate of that point,
|
||
representing the plant’s amount of water. The perceptron then guesses the
|
||
plant’s classification according to the sign of the weighted sum of these
|
||
inputs. If the sum is positive, the perceptron outputs a +1, signifying a
|
||
hydrophyte (above the line). If the sum is negative, it outputs a –1,
|
||
signifying a xerophyte (below the line). Figure 10.5 shows this perceptron
|
||
(note the shorthand of <span data-type="equation">w_0</span> and
|
||
<span data-type="equation">w_1</span> for the weights).
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_6.png"
|
||
alt="Figure 10.5: A perceptron with two inputs (x_0 and x_1), a weight for each input (w_0 and w_1), and a processing neuron that generates the output"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.5: A perceptron with two inputs (<span data-type="equation"
|
||
>x_0</span
|
||
>
|
||
and <span data-type="equation">x_1</span>), a weight for each input (<span
|
||
data-type="equation"
|
||
>w_0</span
|
||
>
|
||
and <span data-type="equation">w_1</span>), and a processing neuron that
|
||
generates the output
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
This scheme has a pretty significant problem, however. What if my data point
|
||
is (0, 0), and I send this point into the perceptron as inputs
|
||
<span data-type="equation">x_0 = 0</span> and
|
||
<span data-type="equation">x_1=0</span>? No matter what the weights are,
|
||
multiplication by 0 is 0. The weighted inputs are therefore still 0, and
|
||
their sum will be 0 too. And the sign of 0 is . . . hmmm, there’s that deep
|
||
philosophical quandary again. Regardless of how I feel about it, the point
|
||
(0, 0) could certainly be above or below various lines in a 2D world. How is
|
||
the perceptron supposed to interpret it accurately?
|
||
</p>
|
||
<p>
|
||
To avoid this dilemma, the perceptron requires a third input, typically
|
||
referred to as a <strong>bias</strong> input. This extra input always has
|
||
the value of 1 and is also weighted. Figure 10.6 shows the perceptron with
|
||
the addition of the bias.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_7.png"
|
||
alt="Figure 10.6: Adding a bias input, along with its weight, to the perceptron"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.6: Adding a bias input, along with its weight, to the perceptron
|
||
</figcaption>
|
||
</figure>
|
||
<p>How does this affect point (0, 0)?</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th style="width: 100px">Phrase</th>
|
||
<th style="width: 100px">Phrase</th>
|
||
<th>Result</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>0</td>
|
||
<td><span data-type="equation">w_0</span></td>
|
||
<td>0</td>
|
||
</tr>
|
||
<tr>
|
||
<td>0</td>
|
||
<td><span data-type="equation">w_1</span></td>
|
||
<td>0</td>
|
||
</tr>
|
||
<tr>
|
||
<td>1</td>
|
||
<td><span data-type="equation">w_\text{bias}</span></td>
|
||
<td><span data-type="equation">w_\text{bias}</span></td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>
|
||
The output is then the sum of the weighted results:
|
||
<span data-type="equation">0 + 0 + w_\text{bias}</span>. Therefore, the bias
|
||
by itself answers the question of where (0, 0) is in relation to the line.
|
||
If the bias’s weight is positive, (0, 0) is above the line; if negative,
|
||
it’s below. The extra input and its weight <em>bias</em> the perceptron’s
|
||
understanding of the line’s position relative to (0, 0)!
|
||
</p>
|
||
<h3 id="the-perceptron-code">The Perceptron Code</h3>
|
||
<p>
|
||
I’m now ready to assemble the code for a <code>Perceptron</code> class. The
|
||
perceptron needs to track only the input weights, which I can store using an
|
||
array:
|
||
</p>
|
||
<div class="snip-below">
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
class Perceptron {
|
||
constructor() {
|
||
this.weights = [];
|
||
}</pre
|
||
>
|
||
</div>
|
||
<p>
|
||
The constructor can receive an argument indicating the number of inputs (in
|
||
this case, three: <span data-type="equation">x_0</span>,
|
||
<span data-type="equation">x_1</span>, and a bias) and size the
|
||
<code>weights</code> array accordingly, filling it with random values to
|
||
start:
|
||
</p>
|
||
<div class="snip-above snip-below">
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// The argument <code>n</code> determines the number of inputs (including the bias).
|
||
constructor(n) {
|
||
this.weights = [];
|
||
for (let i = 0; i < n; i++) {
|
||
//{!1} The weights are picked randomly to start.
|
||
this.weights[i] = random(-1, 1);
|
||
}
|
||
}</pre
|
||
>
|
||
</div>
|
||
<p>
|
||
A perceptron’s job is to receive inputs and produce an output. These
|
||
requirements can be packaged together in a
|
||
<code>feedForward()</code> method. In this example, the perceptron’s inputs
|
||
are an array (which should be the same length as the array of weights), and
|
||
the output is a number, +1 or –1, as returned by the activation function
|
||
based on the sign of the sum:
|
||
</p>
|
||
<div class="snip-above">
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
feedForward(inputs) {
|
||
let sum = 0;
|
||
for (let i = 0; i < this.weights.length; i++) {
|
||
sum += inputs[i] * this.weights[i];
|
||
}
|
||
//{!1} The result is the sign of the sum, –1 or +1.
|
||
// Here the perceptron is making a guess:
|
||
// Is it on one side of the line or the other?
|
||
return this.activate(sum);
|
||
}
|
||
}</pre
|
||
>
|
||
</div>
|
||
<p>
|
||
Presumably, I could now create a <code>Perceptron</code> object and ask it
|
||
to make a guess for any given point, as in Figure 10.7.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_8.png"
|
||
alt="Figure 10.7: An (x, y) coordinate from the 2D space is the input to the perceptron. "
|
||
/>
|
||
<figcaption>
|
||
Figure 10.7: An (<em>x</em>, <em>y</em>) coordinate from the 2D space is
|
||
the input to the perceptron.
|
||
</figcaption>
|
||
</figure>
|
||
<p>Here’s the code to generate a guess:</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Create the perceptron.
|
||
let perceptron = new Perceptron(3);
|
||
// The input is three values: x, y, and the bias.
|
||
let inputs = [50, -12, 1];
|
||
// The answer!
|
||
let guess = perceptron.feedForward(inputs);</pre
|
||
>
|
||
<p>
|
||
Did the perceptron get it right? Maybe yes, maybe no. At this point, the
|
||
perceptron has no better than a 50/50 chance of arriving at the correct
|
||
answer, since each weight starts out as a random value. A neural network
|
||
isn’t a magic tool that can automatically guess correctly on its own. I need
|
||
to teach it how to do so!
|
||
</p>
|
||
<p>
|
||
To train a neural network to answer correctly, I’ll use the supervised
|
||
learning method I described earlier in the chapter. Remember, this technique
|
||
involves giving the network inputs with known answers. This enables the
|
||
network to check whether it has made a correct guess. If not, the network
|
||
can learn from its mistake and adjust its weights. The process is as
|
||
follows:
|
||
</p>
|
||
<ol>
|
||
<li>
|
||
Provide the perceptron with inputs for which there is a known answer.
|
||
</li>
|
||
<li>Ask the perceptron to guess an answer.</li>
|
||
<li>Compute the error. (Did it get the answer right or wrong?)</li>
|
||
<li>Adjust all the weights according to the error.</li>
|
||
<li>Return to step 1 and repeat!</li>
|
||
</ol>
|
||
<p>
|
||
This process can be packaged into a method on the
|
||
<code>Perceptron</code> class, but before I can write it, I need to examine
|
||
steps 3 and 4 in more detail. How do I define the perceptron’s error? And
|
||
how should I adjust the weights according to this error?
|
||
</p>
|
||
<p>
|
||
The perceptron’s error can be defined as the difference between the desired
|
||
answer and its guess:
|
||
</p>
|
||
<div data-type="equation">
|
||
\text{error} = \text{desired output} - \text{guess output}
|
||
</div>
|
||
<p>
|
||
Does this formula look familiar? Think back to the formula for a vehicle’s
|
||
steering force that I worked out in
|
||
<a href="/autonomous-agents#">Chapter 5</a>:
|
||
</p>
|
||
<div data-type="equation">
|
||
\text{steering} = \text{desired velocity} - \text{current velocity}
|
||
</div>
|
||
<p>
|
||
This is also a calculation of an error! The current velocity serves as a
|
||
guess, and the error (the steering force) indicates how to adjust the
|
||
velocity in the correct direction. Adjusting a vehicle’s velocity to follow
|
||
a target is similar to adjusting the weights of a neural network toward the
|
||
correct answer.
|
||
</p>
|
||
<p>
|
||
For the perceptron, the output has only two possible values: +1 or –1.
|
||
Therefore, only three errors are possible. If the perceptron guesses the
|
||
correct answer, the guess equals the desired output and the error is 0. If
|
||
the correct answer is –1 and the perceptron guessed +1, then the error is
|
||
–2. If the correct answer is +1 and the perceptron guessed –1, then the
|
||
error is +2. Here’s that process summarized in a table:
|
||
</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th style="width: 100px">Phrase</th>
|
||
<th style="width: 100px">Phrase</th>
|
||
<th>Error</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>–1</td>
|
||
<td>–1</td>
|
||
<td>0</td>
|
||
</tr>
|
||
<tr>
|
||
<td>–1</td>
|
||
<td>+1</td>
|
||
<td>–2</td>
|
||
</tr>
|
||
<tr>
|
||
<td>+1</td>
|
||
<td>–1</td>
|
||
<td>+2</td>
|
||
</tr>
|
||
<tr>
|
||
<td>+1</td>
|
||
<td>+1</td>
|
||
<td>0</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>
|
||
The error is the determining factor in how the perceptron’s weights should
|
||
be adjusted. For any given weight, what I’m looking to calculate is the
|
||
change in weight, often called
|
||
<span data-type="equation">\Delta\text{weight}</span> (or
|
||
<em>delta weight</em>, <span data-type="equation">\Delta</span> being the
|
||
Greek letter delta):
|
||
</p>
|
||
<div data-type="equation">
|
||
\text{new weight} = \text{weight} + \Delta\text{weight}
|
||
</div>
|
||
<p>
|
||
To calculate <span data-type="equation">\Delta\text{weight}</span>, I need
|
||
to multiply the error by the input:
|
||
</p>
|
||
<div data-type="equation">
|
||
\Delta\text{weight} = \text{error} \times \text{input}
|
||
</div>
|
||
<p>Therefore, the new weight is calculated as follows:</p>
|
||
<div data-type="equation">
|
||
\text{new weight} = \text{weight} + \text{error} \times \text{input}
|
||
</div>
|
||
<p>
|
||
To understand why this works, think again about steering. A steering force
|
||
is essentially an error in velocity. By applying a steering force as an
|
||
acceleration (or <span data-type="equation">\Delta\text{velocity}</span>),
|
||
the velocity is adjusted to move in the correct direction. This is what I
|
||
want to do with the neural network’s weights. I want to adjust them in the
|
||
right direction, as defined by the error.
|
||
</p>
|
||
<p>
|
||
With steering, however, I had an additional variable that controlled the
|
||
vehicle’s ability to steer: the maximum force. A high maximum force allowed
|
||
the vehicle to accelerate and turn quickly, while a lower force resulted in
|
||
a slower velocity adjustment. The neural network will use a similar strategy
|
||
with a variable called the <strong>learning constant</strong>:
|
||
</p>
|
||
<div data-type="equation">
|
||
\text{new weight} = \text{weight} + (\text{error} \times \text{input})
|
||
\times \text{learning constant}
|
||
</div>
|
||
<p>
|
||
A high learning constant causes the weight to change more drastically. This
|
||
may help the perceptron arrive at a solution more quickly, but it also
|
||
increases the risk of overshooting the optimal weights. A small learning
|
||
constant will adjust the weights more slowly and require more training time,
|
||
but will allow the network to make small adjustments that could improve
|
||
overall accuracy.
|
||
</p>
|
||
<p>
|
||
Assuming the addition of a <code>learningConstant</code> property to the
|
||
<code>Perceptron</code> class, I can now write a training method for the
|
||
perceptron following the steps I outlined earlier:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Step 1: Provide the inputs and known answer.
|
||
// These are passed in as arguments to <code>train()</code>.
|
||
train(inputs, desired) {
|
||
// Step 2: Guess according to those inputs.
|
||
let guess = this.feedforward(inputs);
|
||
|
||
// Step 3: Compute the error (the difference between <code>desired</code> and <code>guess</code>).
|
||
let error = desired - guess;
|
||
|
||
//{!3} Step 4: Adjust all the weights according to the error and learning constant.
|
||
for (let i = 0; i < this.weights.length; i++) {
|
||
this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
|
||
}
|
||
}</pre
|
||
>
|
||
<p>Here’s the <code>Perceptron</code> class as a whole:</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
class Perceptron {
|
||
constructor(totalInputs) {
|
||
//{!2} The perceptron stores its weights and learning constants.
|
||
this.weights = [];
|
||
this.learningConstant = 0.01;
|
||
//{!3} The weights start off random.
|
||
for (let i = 0; i < totalInputs; i++) {
|
||
this.weights[i] = random(-1, 1);
|
||
}
|
||
}
|
||
|
||
//{!7} Return an output based on inputs.
|
||
feedforward(inputs) {
|
||
let sum = 0;
|
||
for (let i = 0; i < this.weights.length; i++) {
|
||
sum += inputs[i] * this.weights[i];
|
||
}
|
||
return this.activate(sum);
|
||
}
|
||
|
||
// The output is a +1 or –1.
|
||
activate(sum) {
|
||
if (sum > 0) {
|
||
return 1;
|
||
} else {
|
||
return -1;
|
||
}
|
||
}
|
||
|
||
//{!7} Train the network against known data.
|
||
train(inputs, desired) {
|
||
let guess = this.feedforward(inputs);
|
||
let error = desired - guess;
|
||
for (let i = 0; i < this.weights.length; i++) {
|
||
this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
|
||
}
|
||
}
|
||
}</pre
|
||
>
|
||
<p>
|
||
To train the perceptron, I need a set of inputs with known answers. However,
|
||
I don’t happen to have a real-world dataset (or time to research and collect
|
||
one) for the xerophytes and hydrophytes scenario. In truth, though, the
|
||
purpose of this demonstration isn’t to show you how to classify plants. It’s
|
||
about how a perceptron can learn whether points are above or below a line on
|
||
a graph, and so any set of points will do. In other words, I can just make
|
||
up the data.
|
||
</p>
|
||
<p>
|
||
What I’m describing is an example of <strong>synthetic data</strong>,
|
||
artificially generated data that’s often used in machine learning to create
|
||
controlled scenarios for training and testing. In this case, my synthetic
|
||
data will consist of a set of random input points, each with a known answer
|
||
indicating whether the point is above or below a line. To define the line
|
||
and generate the data, I’ll use simple algebra. This approach allows me to
|
||
clearly demonstrate the training process and show how the perceptron learns.
|
||
</p>
|
||
<p>
|
||
The question therefore becomes, how do I pick a point and know whether it’s
|
||
above or below a line (without a neural network, that is)? A line can be
|
||
described as a collection of points, where each point’s y-coordinate is a
|
||
function of its x-coordinate:
|
||
</p>
|
||
<div data-type="equation">y = f(x)</div>
|
||
<p>
|
||
For a straight line (specifically, a linear function), the relationship can
|
||
be written like this:
|
||
</p>
|
||
<div data-type="equation">y = mx + b</div>
|
||
<p>
|
||
Here <em>m</em> is the slope of the line, and <em>b</em> is the value of
|
||
<em>y</em> when <em>x</em> is 0 (the y-intercept). Here’s a specific
|
||
example, with the corresponding graph in Figure 10.8.
|
||
</p>
|
||
<div data-type="equation">y = \frac{1}2x - 1</div>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_9.png"
|
||
alt="Figure 10.8: A graph of y = \frac{1}2x - 1"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.8: A graph of
|
||
<span data-type="equation">y = \frac{1}2x - 1</span>
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
I’ll arbitrarily choose that as the equation for my line, and write a
|
||
function accordingly:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// A function to calculate <code>y</code> based on <code>x</code> along a line
|
||
function f(x) {
|
||
return 0.5 * x - 1;
|
||
}</pre
|
||
>
|
||
<p>
|
||
Now there’s the matter of the p5.js canvas defaulting to (0, 0) in the
|
||
top-left corner with the y-axis pointing down. For this discussion, I’ll
|
||
assume I’ve built the following into the code to reorient the canvas to
|
||
match a more traditional Cartesian space:
|
||
</p>
|
||
<pre
|
||
class="codesplit"
|
||
data-code-language="javascript"
|
||
>// Move the origin <code>(0, 0)</code> to the center.
|
||
translate(width / 2, height / 2);
|
||
// Flip the y-axis orientation (positive points up!).
|
||
scale(1, -1);</pre>
|
||
<p>I can now pick a random point in the 2D space:</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
let x = random(-100, 100);
|
||
let y = random(-100, 100);</pre
|
||
>
|
||
<p>
|
||
How do I know if this point is above or below the line? The line function
|
||
<em>f</em>(<em>x</em>) returns the <em>y</em> value on the line for that
|
||
x-position. I’ll call that <span data-type="equation">y_\text{line}</span>:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// The <code>y</code> position on the line
|
||
let yline = f(x);</pre
|
||
>
|
||
<p>
|
||
If the <em>y</em> value I’m examining is above the line, it will be greater
|
||
than <span data-type="equation">y_\text{line}</span>, as in Figure 10.9.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_10.png"
|
||
alt="Figure 10.9: If y_\text{line} is less than y, the point is above the line."
|
||
/>
|
||
<figcaption>
|
||
Figure 10.9: If <span data-type="equation">y_\text{line}</span> is less
|
||
than <em>y</em>, the point is above the line.
|
||
</figcaption>
|
||
</figure>
|
||
<p>Here’s the code for that logic:</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Start with a value of –1.
|
||
let desired = -1;
|
||
if (y > yline) {
|
||
//{!1} The answer becomes +1 if <code>y</code> is above the line.
|
||
desired = 1;
|
||
}</pre
|
||
>
|
||
<p>
|
||
I can then make an input array to go with the <code>desired</code> output:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Don’t forget to include the bias!
|
||
let trainingInputs = [x, y, 1];</pre
|
||
>
|
||
<p>
|
||
Assuming that I have a <code>perceptron</code> variable, I can train it by
|
||
providing the inputs along with the desired answer:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
perceptron.train(trainingInputs, desired);</pre
|
||
>
|
||
<p>
|
||
If I train the perceptron on a new random point (and its answer) for each
|
||
cycle through <code>draw()</code>, it will gradually get better at
|
||
classifying the points as above or below the line.
|
||
</p>
|
||
<div data-type="example">
|
||
<h3 id="example-101-the-perceptron">Example 10.1: The Perceptron</h3>
|
||
<figure>
|
||
<div
|
||
data-type="embed"
|
||
data-p5-editor="https://editor.p5js.org/natureofcode/sketches/sMozIaMCW"
|
||
data-example-path="examples/10_nn/10_1_perceptron_with_normalization"
|
||
>
|
||
<img
|
||
src="examples/10_nn/10_1_perceptron_with_normalization/screenshot.png"
|
||
/>
|
||
</div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript">// The perceptron
|
||
let perceptron;
|
||
//{!1} An array for training data
|
||
let training = [];
|
||
// A counter to track training data points one by one
|
||
let count = 0;
|
||
|
||
//{!3} The formula for a line
|
||
function f(x) {
|
||
return 0.5 * x + 1;
|
||
}
|
||
|
||
function setup() {
|
||
createCanvas(640, 240);
|
||
|
||
// The perceptron has three inputs (including bias) and a learning rate of 0.0001.
|
||
perceptron = new Perceptron(3, 0.0001);
|
||
|
||
//{!1} Make 2,000 training data points.
|
||
for (let i = 0; i < 2000; i++) {
|
||
let x = random(-width / 2, width / 2);
|
||
let y = random(-height / 2, height / 2);
|
||
training[i] = [x, y, 1];
|
||
}
|
||
}
|
||
|
||
function draw() {
|
||
background(255);
|
||
// Reorient the canvas to match a traditional Cartesian plane.
|
||
translate(width / 2, height / 2);
|
||
scale(1, -1);
|
||
|
||
// Draw the line.
|
||
stroke(0);
|
||
strokeWeight(2);
|
||
line(-width / 2, f(-width / 2), width / 2, f(width / 2));
|
||
|
||
// Get the current <code>(x, y)</code> of the training data.
|
||
let x = training[count][0];
|
||
let y = training[count][1];
|
||
// What is the desired output?
|
||
let desired = -1;
|
||
if (y > f(x)) {
|
||
desired = 1;
|
||
}
|
||
// Train the perceptron.
|
||
perceptron.train(training[count], desired);
|
||
|
||
// For animation, train one point at a time.
|
||
count = (count + 1) % training.length;
|
||
|
||
// Draw all the points and color according to the output of the perceptron.
|
||
for (let dataPoint of training) {
|
||
let guess = perceptron.feedforward(dataPoint);
|
||
if (guess > 0) {
|
||
fill(127);
|
||
} else {
|
||
fill(255);
|
||
}
|
||
strokeWeight(1);
|
||
stroke(0);
|
||
circle(dataPoint[0], dataPoint[1], 8);
|
||
}
|
||
}</pre>
|
||
<p>
|
||
In Example 10.1, the training data is visualized alongside the target
|
||
solution line. Each point represents a piece of training data, and its color
|
||
is determined by the perceptron’s current classification—gray for +1 or
|
||
white for –1. I use a small learning constant (0.0001) to slow down how the
|
||
system refines its classifications over time.
|
||
</p>
|
||
<p>
|
||
An intriguing aspect of this example lies in the relationship between the
|
||
perceptron’s weights and the characteristics of the line dividing the
|
||
points—specifically, the line’s slope and y-intercept (the <em>m</em> and
|
||
<em>b</em> in <em>y</em> = <em>mx</em> + <em>b</em>). The weights in this
|
||
context aren’t just arbitrary or “magic” values; they bear a direct
|
||
relationship to the geometry of the dataset. In this case, I’m using just 2D
|
||
data, but for many machine learning applications, the data exists in much
|
||
higher-dimensional spaces. The weights of a neural network help navigate
|
||
these spaces, defining <em>hyperplanes</em> or decision boundaries that
|
||
segment and classify the data.
|
||
</p>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-101">Exercise 10.1</h3>
|
||
<p>
|
||
Modify the code from Example 10.1 to also draw the perceptron’s current
|
||
decision boundary during the training process—its best guess for where the
|
||
line should be. Hint: Use the perceptron’s current weights to calculate
|
||
the line’s equation.
|
||
</p>
|
||
</div>
|
||
<p>
|
||
While this perceptron example offers a conceptual foundation, real-world
|
||
datasets often feature more diverse and dynamic ranges of input values. For
|
||
the simplified scenario here, the range of values for <em>x</em> is larger
|
||
than that for <em>y</em> because of the canvas size of 640<span
|
||
data-type="equation"
|
||
>\times</span
|
||
>240. Despite this, the example still works—after all, the sign activation
|
||
function doesn’t rely on specific input ranges, and it’s such a
|
||
straightforward binary classification task.
|
||
</p>
|
||
<p>
|
||
However, real-world data often has much greater complexity in terms of input
|
||
ranges. To this end, <strong>data normalization</strong> is a critical step
|
||
in machine learning. Normalizing data involves mapping the training data to
|
||
ensure that all inputs (and outputs) conform to a uniform range—typically 0
|
||
to 1, or perhaps –1 to 1. This process can improve training efficiency and
|
||
prevent individual inputs from dominating the learning process. In the next
|
||
section, using the ml5.js library, I’ll build data normalization into the
|
||
process.
|
||
</p>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-102">Exercise 10.2</h3>
|
||
<p>
|
||
Instead of using supervised learning, can you train the neural network to
|
||
find the right weights by using a GA?
|
||
</p>
|
||
</div>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-103">Exercise 10.3</h3>
|
||
<p>
|
||
Incorporate data normalization into the example. Does this improve the
|
||
learning efficiency?
|
||
</p>
|
||
</div>
|
||
<h2 id="putting-the-network-in-neural-network">
|
||
Putting the “Network” in Neural Network
|
||
</h2>
|
||
<p>
|
||
A perceptron can have multiple inputs, but it’s still just a single, lonely
|
||
neuron. Unfortunately, that limits the range of problems it can solve. The
|
||
true power of neural networks comes from the <em>network</em> part. Link
|
||
multiple neurons together and you’re able to solve problems of much greater
|
||
complexity.
|
||
</p>
|
||
<p>
|
||
If you read an AI textbook, it will say that a perceptron can solve only
|
||
<strong>linearly separable</strong> problems. If a dataset is linearly
|
||
separable, you can graph it and classify it into two groups simply by
|
||
drawing a straight line (see Figure 10.10, left). Classifying plants as
|
||
xerophytes or hydrophytes is a linearly separable problem.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_11.png"
|
||
alt="Figure 10.10: Data points that are linearly separable (left) and data points that are nonlinearly separable, as a curve is required to separate the points (right)"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.10: Data points that are linearly separable (left) and data
|
||
points that are nonlinearly separable, as a curve is required to separate
|
||
the points (right)
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
Now imagine you’re classifying plants according to soil acidity (x-axis) and
|
||
temperature (y-axis). Some plants might thrive in acidic soils but only
|
||
within a narrow temperature range, while other plants prefer less acidic
|
||
soils but tolerate a broader range of temperatures. A more complex
|
||
relationship exists between the two variables, so a straight line can’t be
|
||
drawn to separate the two categories of plants, <em>acidophilic</em> and
|
||
<em>alkaliphilic</em> (see Figure 10.10, right). A lone perceptron can’t
|
||
handle this type of <strong>nonlinearly separable</strong> problem. (Caveat
|
||
here: I’m making up these scenarios. If you happen to be a botanist, please
|
||
let me know if I’m anywhere close to reality.)
|
||
</p>
|
||
<p>
|
||
One of the simplest examples of a nonlinearly separable problem is XOR
|
||
(exclusive or). This is a logical operator, similar to the more familiar AND
|
||
and OR. For <em>A</em> AND <em>B </em>to be true, both <em>A</em> and
|
||
<em>B</em> must be true. With OR, either <em>A</em> or <em>B</em> (or both)
|
||
can be true. These are both linearly separable problems. The truth tables in
|
||
Figure 10.11 show their solution space. Each true or false value in the
|
||
table shows the output for a particular combination of true or false inputs.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_12.png"
|
||
alt="Figure 10.11: Truth tables for the AND and OR logical operators. The true and false outputs can be separated by a line."
|
||
/>
|
||
<figcaption>
|
||
Figure 10.11: Truth tables for the AND and OR logical operators. The true
|
||
and false outputs can be separated by a line.
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
See how you can draw a straight line to separate the true outputs from the
|
||
false ones?
|
||
</p>
|
||
<p>
|
||
The XOR operator is the equivalent of (OR) AND (NOT AND). In other words,
|
||
<em>A</em> XOR <em>B </em>evaluates to true only if one of the inputs is
|
||
true. If both inputs are false or both are true, the output is false. To
|
||
illustrate, let’s say you’re having pizza for dinner. You love pineapple on
|
||
pizza, and you love mushrooms on pizza, but put them together, and yech! And
|
||
plain pizza, that’s no good either!
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_13.png"
|
||
alt="Figure 10.12: The “truth” table for whether you want to eat the pizza (left) and XOR (right). Note how the true and false outputs can’t be separated by a single line."
|
||
/>
|
||
<figcaption>
|
||
Figure 10.12: The “truth” table for whether you want to eat the pizza
|
||
(left) and XOR (right). Note how the true and false outputs can’t be
|
||
separated by a single line.
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
The XOR truth table in Figure 10.12 isn’t linearly separable. Try to draw a
|
||
straight line to separate the true outputs from the false ones—you can’t!
|
||
</p>
|
||
<p>
|
||
The fact that a perceptron can’t even solve something as simple as XOR may
|
||
seem extremely limiting. But what if I made a network out of two
|
||
perceptrons? If one perceptron can solve the linearly separable OR and one
|
||
perceptron can solve the linearly separate NOT AND, then two perceptrons
|
||
combined can solve the nonlinearly separable XOR.
|
||
</p>
|
||
<p>
|
||
When you combine multiple perceptrons, you get a
|
||
<strong>multilayered perceptron</strong>, a network of many neurons (see
|
||
Figure 10.13). Some are input neurons and receive the initial inputs, some
|
||
are part of what’s called a <strong>hidden layer</strong> (as they’re
|
||
connected to neither the inputs nor the outputs of the network directly),
|
||
and then there are the output neurons, from which the results are read.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_14.png"
|
||
alt="Figure 10.13: A multilayered perceptron has the same inputs and output as the simple perceptron, but now it includes a hidden layer of neurons."
|
||
/>
|
||
<figcaption>
|
||
Figure 10.13: A multilayered perceptron has the same inputs and output as
|
||
the simple perceptron, but now it includes a hidden layer of neurons.
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
Up until now, I’ve been visualizing a singular perceptron with one circle
|
||
representing a neuron processing its input signals. Now, as I move on to
|
||
larger networks, it’s more typical to represent all the elements (inputs,
|
||
neurons, outputs) as circles, with arrows that indicate the flow of data. In
|
||
Figure 10.13, you can see the inputs and bias flowing into the hidden layer,
|
||
which then flows to the output.
|
||
</p>
|
||
<p>
|
||
Training a simple perceptron is pretty straightforward: you feed the data
|
||
through and evaluate how to change the input weights according to the error.
|
||
With a multilayered perceptron, however, the training process becomes more
|
||
complex. The overall output of the network is still generated in essentially
|
||
the same manner as before: the inputs multiplied by the weights are summed
|
||
and fed forward through the various layers of the network. And you still use
|
||
the network’s guess to calculate the error (desired result – guess). But now
|
||
so many connections exist between layers of the network, each with its own
|
||
weight. How do you know how much each neuron or connection contributed to
|
||
the overall error of the network, and how it should be adjusted?
|
||
</p>
|
||
<p>
|
||
The solution to optimizing the weights of a multilayered network is
|
||
<strong>backpropagation</strong>. This process takes the error and feeds it
|
||
backward through the network so it can adjust the weights of all the
|
||
connections in proportion to how much they’ve contributed to the total
|
||
error. The details of backpropagation are beyond the scope of this book. The
|
||
algorithm uses a variety of activation functions (one classic example is the
|
||
sigmoid function) as well as some calculus. If you’re interested in
|
||
continuing down this road and learning more about how backpropagation works,
|
||
you can find my
|
||
<a href="https://thecodingtrain.com/neural-network"
|
||
>“Toy Neural Network” project at the Coding Train website with
|
||
accompanying video tutorials</a
|
||
>. They go through all the steps of solving XOR using a multilayered
|
||
feed-forward network with backpropagation. For this chapter, however, I’d
|
||
instead like to get some help and phone a friend.
|
||
</p>
|
||
<h2 id="machine-learning-with-ml5js">Machine Learning with ml5.js</h2>
|
||
<p>
|
||
That friend is ml5.js. This machine learning library can manage the details
|
||
of complex processes like backpropagation so you and I don’t have to worry
|
||
about them. As I mentioned earlier in the chapter, ml5.js aims to provide a
|
||
friendly entry point for those who are new to machine learning and neural
|
||
networks, while still harnessing the power of Google’s TensorFlow.js behind
|
||
the scenes.
|
||
</p>
|
||
<p>
|
||
To use ml5.js in a sketch, you must import it via a
|
||
<code><script></code> element in your <em>index.html</em> file, much as
|
||
you did with Matter.js and Toxiclibs.js in
|
||
<a href="/physics-libraries#">Chapter 6</a>:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="html">
|
||
<script src="https://unpkg.com/ml5@latest/dist/ml5.min.js"></script></pre
|
||
>
|
||
<p>
|
||
My goal for the rest of this chapter is to introduce ml5.js by developing a
|
||
system that can recognize mouse gestures. This will prepare you for
|
||
<a href="/neuroevolution#">Chapter 11</a>, where I’ll add a neural network
|
||
“brain” to an autonomous steering agent and tie machine learning back into
|
||
the story of the book. First, however, I’d like to talk more generally
|
||
through the steps of training a multilayered neural network model using
|
||
supervised learning. Outlining these steps will highlight important
|
||
decisions you’ll have to make before developing a learning model, introduce
|
||
the syntax of the ml5.js library, and provide you with the context you’ll
|
||
need before training your own machine learning models.
|
||
</p>
|
||
<h3 id="the-machine-learning-life-cycle">The Machine Learning Life Cycle</h3>
|
||
<p>
|
||
The life cycle of a machine learning model is typically broken into seven
|
||
steps:
|
||
</p>
|
||
<ol>
|
||
<li>
|
||
<strong>Collect the data.</strong> Data forms the foundation of any
|
||
machine learning task. This stage might involve running experiments,
|
||
manually inputting values, sourcing public data, or a myriad of other
|
||
methods (like generating synthetic data).
|
||
</li>
|
||
<li>
|
||
<strong>Prepare the data.</strong> Raw data often isn’t in a format
|
||
suitable for machine learning algorithms. It might also have duplicate or
|
||
missing values, or contain outliers that skew the data. Such
|
||
inconsistencies may need to be manually adjusted. Additionally, as I
|
||
mentioned earlier, neural networks work best with normalized data, which
|
||
has values scaled to fit within a standard range. Another key part of
|
||
preparing data is separating it into distinct sets: training, validation,
|
||
and testing. The training data is used to teach the model (step 4), while
|
||
the validation and testing data (the distinction is subtle—more on this
|
||
later) are set aside and reserved for evaluating the model’s performance
|
||
(step 5).
|
||
</li>
|
||
<li>
|
||
<strong>Choose a model.</strong> Design the architecture of the neural
|
||
network. Different models are more suitable for certain types of data and
|
||
outputs.
|
||
</li>
|
||
<li>
|
||
<strong>Train the model.</strong> Feed the training portion of the data
|
||
through the model and allow the model to adjust the weights of the neural
|
||
network based on its errors. This process is known as
|
||
<strong>optimization</strong>: the model tunes the weights so they result
|
||
in the fewest number of errors.
|
||
</li>
|
||
<li>
|
||
<strong>Evaluate the model.</strong> Remember the testing data that was
|
||
set aside in step 2? Since that data wasn’t used in training, it provides
|
||
a means to evaluate how well the model performs on new, unseen data.
|
||
</li>
|
||
<li>
|
||
<strong>Tune the parameters.</strong> The training process is influenced
|
||
by a set of parameters (often called <strong>hyperparameters</strong>)
|
||
such as the learning rate, which dictates how much the model should adjust
|
||
its weights based on errors in prediction. I called this the
|
||
<code>learningConstant</code> in the perceptron example. By fine-tuning
|
||
these parameters and revisiting steps 4 (training), 3 (model selection),
|
||
and even 2 (data preparation), you can often improve the model’s
|
||
performance.
|
||
</li>
|
||
<li>
|
||
<strong>Deploy the model. </strong>Once the model is trained and its
|
||
performance is evaluated satisfactorily, it’s time to use the model out in
|
||
the real world with new data!
|
||
</li>
|
||
</ol>
|
||
<p>
|
||
These steps are the cornerstone of supervised machine learning. However,
|
||
even though 7 is a truly excellent number, I think I missed one more
|
||
critical step. I’ll call it step 0.
|
||
</p>
|
||
<ol>
|
||
<li value="0">
|
||
<strong>Identify the problem.</strong> This initial step defines the
|
||
problem that needs solving. What is the objective? What are you trying to
|
||
accomplish or predict with your machine learning model?
|
||
</li>
|
||
</ol>
|
||
<p>
|
||
This zeroth step informs all the other steps in the process. After all, how
|
||
are you supposed to collect your data and choose a model without knowing
|
||
what you’re even trying to do? Are you predicting a number? A category? A
|
||
sequence? Is it a binary choice, or are there many options? These sorts of
|
||
questions often boil down to choosing between two types of tasks that the
|
||
majority of machine learning applications fall into: classification and
|
||
regression.
|
||
</p>
|
||
<h3 id="classification-and-regression">Classification and Regression</h3>
|
||
<p>
|
||
<strong>Classification</strong> is a type of machine learning problem that
|
||
involves predicting a <strong>label</strong> (also called a
|
||
<strong>category</strong> or <strong>class</strong>) for a piece of data. If
|
||
this sounds familiar, that’s because it is: the simple perceptron in Example
|
||
10.1 was trained to classify points as above or below a line. To give
|
||
another example, an image classifier might try to guess if a photo is of a
|
||
cat or a dog and assign the corresponding label (see Figure 10.14).
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_15.png"
|
||
alt="Figure 10.14: Labeling images as cats or dogs"
|
||
/>
|
||
<figcaption>Figure 10.14: Labeling images as cats or dogs</figcaption>
|
||
</figure>
|
||
<p>
|
||
Classification doesn’t happen by magic. The model must first be shown many
|
||
examples of dogs and cats with the correct labels in order to properly
|
||
configure the weights of all the connections. This is the training part of
|
||
supervised learning.
|
||
</p>
|
||
<p>
|
||
The classic “Hello, world!” demonstration of machine learning and supervised
|
||
learning is a classification problem of the MNIST dataset. Short for
|
||
<em>Modified National Institute of Standards and Technology</em>,
|
||
<strong>MNIST</strong> is a dataset that was collected and processed by Yann
|
||
LeCun (Courant Institute, NYU), Corinna Cortes (Google Labs), and
|
||
Christopher J.C. Burges (Microsoft Research). Widely used for training and
|
||
testing in the field of machine learning, this dataset consists of 70,000
|
||
handwritten digits from 0 to 9; each is a 28<span data-type="equation"
|
||
>\times</span
|
||
>28-pixel grayscale image (see Figure 10.15 for examples). Each image is
|
||
labeled with its corresponding digit.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_16.png"
|
||
alt="Figure 10.15: A selection of handwritten digits 0–9 from the MNIST dataset (courtesy of Suvanjanprasai)"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.15: A selection of handwritten digits 0–9 from the MNIST dataset
|
||
(courtesy of Suvanjanprasai)
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
MNIST is a canonical example of a training dataset for image classification:
|
||
the model has a discrete number of categories to choose from (10 to be
|
||
exact—no more, no less). After the model is trained on the 70,000 labeled
|
||
images, the goal is for it to classify new images and assign the appropriate
|
||
label, a digit from 0 to 9.
|
||
</p>
|
||
<p>
|
||
<strong>Regression</strong>, on the other hand, is a machine learning task
|
||
for which the prediction is a continuous value, typically a floating-point
|
||
number. A regression problem can involve multiple outputs, but thinking
|
||
about just one is often simpler to start. For example, consider a machine
|
||
learning model that predicts the daily electricity usage of a house based on
|
||
input factors like the number of occupants, the size of the house, and the
|
||
temperature outside (see Figure 10.16).
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_17.png"
|
||
alt="Figure 10.16: Factors like weather and the size and occupancy of a home can influence its daily electricity usage."
|
||
/>
|
||
<figcaption>
|
||
Figure 10.16: Factors like weather and the size and occupancy of a home
|
||
can influence its daily electricity usage.
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
Rather than picking from a discrete set of output options, the goal of the
|
||
neural network is now to guess a number—any number. Will the house use 30.5
|
||
kilowatt-hours of electricity that day? Or 48.7 kWh? Or 100.2 kWh? The
|
||
output prediction could be any value from a continuous range.
|
||
</p>
|
||
<h3 id="network-design">Network Design</h3>
|
||
<p>
|
||
Knowing what problem you’re trying to solve (step 0) also has a significant
|
||
bearing on the design of the neural network—in particular, on its input and
|
||
output layers. I’ll demonstrate with another classic “Hello, world!”
|
||
classification example from the field of data science and machine learning:
|
||
the iris dataset. This dataset, which can be found in the Machine Learning
|
||
Repository at the University of California, Irvine, originated from the work
|
||
of American botanist Edgar Anderson.
|
||
</p>
|
||
<p>
|
||
Anderson collected flower data over many years across multiple regions of
|
||
the United States and Canada. For more on the origins of this famous
|
||
dataset, see “The Iris Data Set: In Search of the Source of
|
||
<em>Virginica</em
|
||
><a href="https://academic.oup.com/jrssig/article/18/6/26/7038520"
|
||
>” by Antony Unwin and Kim Kleinman</a
|
||
>. After carefully analyzing the data, Anderson built a table to classify
|
||
iris flowers into three distinct species: <em>Iris setosa</em>,
|
||
<em>Iris virginica</em>, and <em>Iris versicolor </em>(see Figure 10.17).
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_18.png"
|
||
alt="Figure 10.17: Three distinct species of iris flowers"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.17: Three distinct species of iris flowers
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
Anderson included four numeric attributes for each flower: sepal length,
|
||
sepal width, petal length, and petal width, all measured in centimeters. (He
|
||
also recorded color information, but that data appears to have been lost.)
|
||
Each record is then paired with the appropriate iris categorization:
|
||
</p>
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>Sepal Length</th>
|
||
<th>Sepal Width</th>
|
||
<th>Petal Length</th>
|
||
<th>Petal Width</th>
|
||
<th>Classification</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>5.1</td>
|
||
<td>3.5</td>
|
||
<td>1.4</td>
|
||
<td>0.2</td>
|
||
<td><em>Iris setosa</em></td>
|
||
</tr>
|
||
<tr>
|
||
<td>4.9</td>
|
||
<td>3.0</td>
|
||
<td>1.4</td>
|
||
<td>0.2</td>
|
||
<td><em>Iris setosa</em></td>
|
||
</tr>
|
||
<tr>
|
||
<td>7.0</td>
|
||
<td>3.2</td>
|
||
<td>4.7</td>
|
||
<td>1.4</td>
|
||
<td><em>Iris versicolor</em></td>
|
||
</tr>
|
||
<tr>
|
||
<td>6.4</td>
|
||
<td>3.2</td>
|
||
<td>4.5</td>
|
||
<td>1.5</td>
|
||
<td><em>Iris versicolor</em></td>
|
||
</tr>
|
||
<tr>
|
||
<td>6.3</td>
|
||
<td>3.3</td>
|
||
<td>6.0</td>
|
||
<td>2.5</td>
|
||
<td><em>Iris virginica</em></td>
|
||
</tr>
|
||
<tr>
|
||
<td>5.8</td>
|
||
<td>2.7</td>
|
||
<td>5.1</td>
|
||
<td>1.9</td>
|
||
<td><em>Iris virginica</em></td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>
|
||
In this dataset, the first four columns (sepal length, sepal width, petal
|
||
length, petal width) serve as inputs to the neural network. The output is
|
||
the classification provided in the fifth column. Figure 10.18 depicts a
|
||
possible architecture for a neural network that can be trained on this data.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_19.png"
|
||
alt="Figure 10.18: A possible network architecture for iris classification"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.18: A possible network architecture for iris classification
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
On the left are the four inputs to the network, corresponding to the first
|
||
four columns of the data table. On the right are three possible outputs,
|
||
each representing one of the iris species labels. In between is the hidden
|
||
layer, which, as mentioned earlier, adds complexity to the network’s
|
||
architecture, necessary for handling nonlinearly separable data. Each node
|
||
in the hidden layer is connected to every node that comes before and after
|
||
it. This is commonly called a <strong>fully connected</strong> or
|
||
<strong>dense </strong>layer.
|
||
</p>
|
||
<p>
|
||
You might also notice the absence of explicit bias nodes in this diagram.
|
||
While biases play an important role in the output of each neuron, they’re
|
||
often left out of visual representations to keep the diagrams clean and
|
||
focused on the primary data flow. (The ml5.js library will ultimately manage
|
||
the biases for me internally.)
|
||
</p>
|
||
<p>
|
||
The neural network’s goal is to “activate” the correct output for the input
|
||
data, just as the perceptron would output a +1 or –1 for its single binary
|
||
classification. In this case, the output values are like signals that help
|
||
the network decide which iris species label to assign. The highest computed
|
||
value activates to signify the network’s best guess about the
|
||
classification.
|
||
</p>
|
||
<p>
|
||
The key takeaway here is that a classification network should have as many
|
||
inputs as there are values for each item in the dataset, and as many outputs
|
||
as there are categories. As for the hidden layer, the design is much less
|
||
set in stone. The hidden layer in Figure 10.18 has five nodes, but this
|
||
number is entirely arbitrary. Neural network architectures can vary greatly,
|
||
and the number of hidden nodes is often determined through trial and error
|
||
or other educated guessing methods (called <em>heuristics</em>). In the
|
||
context of this book, I’ll be relying on ml5.js to automatically configure
|
||
the architecture based on the input and output data.
|
||
</p>
|
||
<p>
|
||
What about the inputs and outputs in a regression scenario, like the
|
||
household electricity consumption example I mentioned earlier? I’ll go ahead
|
||
and make up a dataset for this scenario, with values representing the
|
||
occupants and size of the house, the day’s temperature, and the
|
||
corresponding electricity usage. This is much like a synthetic dataset,
|
||
given that it’s not data collected for a real-world scenario—but whereas
|
||
synthetic data is generated automatically, here I’m manually inputting
|
||
numbers from my own imagination:
|
||
</p>
|
||
<table>
|
||
<tbody>
|
||
<tr>
|
||
<td><strong>Occupants</strong></td>
|
||
<td><strong>Size (m²)</strong></td>
|
||
<td><strong>Temperature Outside (°C)</strong></td>
|
||
<td><strong>Electricity Usage (kWh)</strong></td>
|
||
</tr>
|
||
<tr>
|
||
<td>4</td>
|
||
<td>150</td>
|
||
<td>24</td>
|
||
<td>25.3</td>
|
||
</tr>
|
||
<tr>
|
||
<td>2</td>
|
||
<td>100</td>
|
||
<td>25.5</td>
|
||
<td>16.2</td>
|
||
</tr>
|
||
<tr>
|
||
<td>1</td>
|
||
<td>70</td>
|
||
<td>26.5</td>
|
||
<td>12.1</td>
|
||
</tr>
|
||
<tr>
|
||
<td>4</td>
|
||
<td>120</td>
|
||
<td>23</td>
|
||
<td>22.1</td>
|
||
</tr>
|
||
<tr>
|
||
<td>2</td>
|
||
<td>90</td>
|
||
<td>21.5</td>
|
||
<td>15.2</td>
|
||
</tr>
|
||
<tr>
|
||
<td>5</td>
|
||
<td>180</td>
|
||
<td>20</td>
|
||
<td>24.4</td>
|
||
</tr>
|
||
<tr>
|
||
<td>1</td>
|
||
<td>60</td>
|
||
<td>18.5</td>
|
||
<td>11.7</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>
|
||
The neural network for this problem should have three input nodes
|
||
corresponding to the first three columns (occupants, size, temperature).
|
||
Meanwhile, it should have one output node representing the fourth column,
|
||
the network’s guess about the electricity usage. And I’ll arbitrarily say
|
||
the network’s hidden layer should have four nodes rather than five. Figure
|
||
10.19 shows this network architecture.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_20.png"
|
||
alt="Figure 10.19: A possible network architecture for three inputs and one regression output"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.19: A possible network architecture for three inputs and one
|
||
regression output
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
Unlike the iris classification network, which is choosing from three labels
|
||
and therefore has three outputs, this network is trying to predict just one
|
||
number, so it has only one output. I’ll note, however, that a single output
|
||
isn’t a requirement of regression. A machine learning model can also perform
|
||
a regression that predicts multiple continuous values, in which case the
|
||
model would have multiple outputs.
|
||
</p>
|
||
<h3 id="ml5js-syntax">ml5.js Syntax</h3>
|
||
<p>
|
||
The ml5.js library is a collection of machine learning models that can be
|
||
accessed using the syntax <code>ml5.</code><code><em>functionName</em></code
|
||
><code>()</code>. For example, to use a pretrained model that detects hand
|
||
positions, you can use <code>ml5.handpose()</code>. For classifying images,
|
||
you can use <code>ml5.imageClassifier()</code>. While I encourage you to
|
||
explore all that ml5.js has to offer (I’ll reference some of these
|
||
pretrained models in upcoming exercise ideas), for this chapter I’ll focus
|
||
on only one function in ml5.js, <code>ml5.neuralNetwork()</code>, which
|
||
creates an empty neural network for you to train.
|
||
</p>
|
||
<p>
|
||
To use this function, you must first create a JavaScript object that will
|
||
configure the model being created. Here’s where some of the big-picture
|
||
factors I just discussed—is this a classification or a regression task? How
|
||
many inputs and outputs?—come into play. I’ll begin by specifying the task I
|
||
want the model to perform (<code>"regression"</code> or
|
||
<code>"classification"</code>):
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
let options = { task: "classification" };
|
||
let classifier = ml5.neuralNetwork(options);</pre
|
||
>
|
||
<p>
|
||
This, however, gives ml5.js little to go on in terms of designing the
|
||
network architecture. Adding the inputs and outputs will complete the rest
|
||
of the puzzle. The iris flower classification has four inputs and three
|
||
possible output labels. This can be configured as part of the
|
||
<code>options</code> object with a single integer for the number of inputs
|
||
and an array of strings listing the output labels:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
let options = {
|
||
inputs: 4,
|
||
outputs: ["iris-setosa", "iris-virginica", "iris-versicolor"],
|
||
task: "classification",
|
||
};
|
||
let digitClassifier = ml5.neuralNetwork(options);</pre
|
||
>
|
||
<p>
|
||
The electricity regression scenario had three input values (occupants, size,
|
||
temperature) and one output value (usage in kWh). With regression, there are
|
||
no string output labels, so only an integer indicating the number of outputs
|
||
is required:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
let options = {
|
||
inputs: 3,
|
||
outputs: 1,
|
||
task: "regression",
|
||
};
|
||
let energyPredictor = ml5.neuralNetwork(options);</pre
|
||
>
|
||
<p>
|
||
You can set many other properties of the model through the
|
||
<code>options</code> object. For example, you could specify the number of
|
||
hidden layers between the inputs and outputs (there are typically several),
|
||
the number of neurons in each layer, which activation functions to use, and
|
||
more. In most cases, however, you can leave out these extra settings and let
|
||
ml5.js make its best guess on how to design the model based on the task and
|
||
data at hand.
|
||
</p>
|
||
<h2 id="building-a-gesture-classifier">Building a Gesture Classifier</h2>
|
||
<p>
|
||
I’ll now walk through the steps of the machine learning life cycle with an
|
||
example problem well suited for p5.js, building all the code for each step
|
||
along the way using ml5.js. I’ll begin at step 0 by articulating the
|
||
problem. Imagine for a moment that you’re working on an interactive
|
||
application that responds to gestures. Maybe the gestures are ultimately
|
||
meant to be recorded via body tracking, but you want to start with something
|
||
much simpler—a single stroke of the mouse (see Figure 10.20).
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_21.png"
|
||
alt="Figure 10.20: A single mouse gesture as a vector between a start and end point"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.20:<em> </em>A single mouse gesture as a vector between a start
|
||
and end point
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
Each gesture could be recorded as a vector extending from the start to the
|
||
end point of a mouse movement. The x- and y-components of the vector will be
|
||
the model’s inputs. The model’s task could be to predict one of four
|
||
possible labels for the gesture: <em>up</em>, <em>down</em>, <em>left</em>,
|
||
or <em>right</em>. With a discrete set of possible outputs, this sounds like
|
||
a classification problem. The four labels will be the model’s outputs.
|
||
</p>
|
||
<p>
|
||
Much like some of the GA demonstrations in
|
||
<a href="/genetic-algorithms#">Chapter 9</a>—and like the simple perceptron
|
||
example earlier in this chapter—the problem I’m selecting here has a known
|
||
solution and could be solved more easily and efficiently without a neural
|
||
network. The direction of a vector can be classified with the
|
||
<code>heading()</code> function and a series of <code>if</code> statements!
|
||
However, by using this seemingly trivial scenario, I hope to explain the
|
||
process of training a machine learning model in an understandable and
|
||
friendly way. Additionally, this example will make it easy to check that the
|
||
code is working as expected. When I’m done, I’ll provide some ideas about
|
||
how to expand the classifier to a scenario that couldn’t use simple
|
||
<code>if</code> statements.
|
||
</p>
|
||
<h3 id="collecting-and-preparing-the-data">
|
||
Collecting and Preparing the Data
|
||
</h3>
|
||
<p>
|
||
With the problem established, I can turn to steps 1 and 2: collecting and
|
||
preparing the data. In the real world, these steps can be tedious,
|
||
especially when the raw data you collect is messy and needs a lot of initial
|
||
processing. You can think of this like having to organize, wash, and chop
|
||
all your ingredients before you can start cooking a meal from scratch.
|
||
</p>
|
||
<p>
|
||
For simplicity, I’d instead like to take the approach of ordering a machine
|
||
learning “meal kit,” with the ingredients (data) already portioned and
|
||
prepared. This way, I’ll get straight to the cooking itself, the process of
|
||
training the model. After all, this is really just an appetizer for what
|
||
will be the ultimate meal in <a href="/neuroevolution#">Chapter 11</a>, when
|
||
I apply neural networks to steering agents.
|
||
</p>
|
||
<p>
|
||
With that in mind, I’ll handcode some example data and manually keep it
|
||
normalized within a range of –1 and +1. I’ll organize the data into an array
|
||
of objects, pairing the x- and y-components of a vector with a string label.
|
||
I’m picking values that I feel clearly point in a specific direction and
|
||
assigning the appropriate label—two examples per label:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
let data = [
|
||
{ x: 0.99, y: 0.02, label: "right" },
|
||
{ x: 0.76, y: -0.1, label: "right" },
|
||
{ x: -1.0, y: 0.12, label: "left" },
|
||
{ x: -0.9, y: -0.1, label: "left" },
|
||
{ x: 0.02, y: 0.98, label: "down" },
|
||
{ x: -0.2, y: 0.75, label: "down" },
|
||
{ x: 0.01, y: -0.9, label: "up" },
|
||
{ x: -0.1, y: -0.8, label: "up" },
|
||
];</pre
|
||
>
|
||
<p>Figure 10.21 shows the same data expressed as arrows.</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_22.png"
|
||
alt="Figure 10.21: The input data visualized as vectors (arrows)"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.21: The input data visualized as vectors (arrows)
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
In a more realistic scenario, I’d probably have a much larger dataset that
|
||
would be loaded in from a separate file, instead of written directly into
|
||
the code. For example, JavaScript Object Notation (JSON) and comma-separated
|
||
values (CSV) are two popular formats for storing and loading data. JSON
|
||
stores data in key-value pairs and follows the same exact format as
|
||
JavaScript object literals. CSV is a file format that stores tabular data
|
||
(like a spreadsheet). You could use numerous other data formats, depending
|
||
on your needs and the programming environment you’re working with.
|
||
</p>
|
||
<p>
|
||
In the real world, the values in that larger dataset would actually come
|
||
from somewhere. Maybe I would collect the data by asking users to perform
|
||
specific gestures and recording their inputs, or by writing an algorithm to
|
||
automatically generate larger amounts of synthetic data that represent the
|
||
idealized versions of the gestures I want the model to recognize. In either
|
||
case, the key would be to collect a diverse set of examples that adequately
|
||
represent the variations in how the gestures might be performed. For now,
|
||
however, let’s see how it goes with just a few servings of data.
|
||
</p>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-104">Exercise 10.4</h3>
|
||
<p>
|
||
Create a p5.js sketch that collects gesture data from users and saves it
|
||
to a JSON file. You can use <code>mousePressed()</code> and
|
||
<code>mouseReleased()</code> to mark the start and end of each gesture,
|
||
and <code>saveJSON()</code> to download the data into a file.
|
||
</p>
|
||
</div>
|
||
<h3 id="choosing-a-model">Choosing a Model</h3>
|
||
<p>
|
||
I’ve now come to step 3 of the machine learning life cycle, selecting a
|
||
model. This is where I’m going to start letting ml5.js do the heavy lifting
|
||
for me. To create the model with ml5.js, all I need to do is specify the
|
||
task, the inputs, and the outputs:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
let options = {
|
||
task: "classification",
|
||
inputs: 2,
|
||
outputs: ["up", "down", "left", "right"],
|
||
debug: true
|
||
};
|
||
let classifier = ml5.neuralNetwork(options);</pre
|
||
>
|
||
<p>
|
||
That’s it! I’m done! Thanks to ml5.js, I can bypass a host of complexities
|
||
such as the number of layers and neurons per layer to have, the kinds of
|
||
activation functions to use, and how to set up the algorithms for training
|
||
the network. The library will make these decisions for me.
|
||
</p>
|
||
<p>
|
||
Of course, the default ml5.js model architecture may not be perfect for all
|
||
cases. I encourage you to read the ml5.js documentation for additional
|
||
details on how to customize the model. I’ll also point out that ml5.js is
|
||
able to infer the inputs and outputs from the data, so those properties
|
||
aren’t entirely necessary to include here in the
|
||
<code>options</code> object. However, for the sake of clarity (and since
|
||
I’ll need to specify them for later examples), I’m including them here.
|
||
</p>
|
||
<p>
|
||
The <code>debug</code> property, when set to <code>true</code>, turns on a
|
||
visual interface for the training process. It’s a helpful tool for spotting
|
||
potential issues during training and for getting a better understanding of
|
||
what’s happening behind the scenes. You’ll see what this interface looks
|
||
like later in the chapter.
|
||
</p>
|
||
<h3 id="training-the-model">Training the Model</h3>
|
||
<p>
|
||
Now that I have the data in a <code>data</code> variable and a neural
|
||
network initialized in the <code>classifier</code> variable, I’m ready to
|
||
train the model. That process starts with adding the data to the model. And
|
||
for that, it turns out I’m not quite done with preparing the data.
|
||
</p>
|
||
<p>
|
||
Right now, my data is neatly organized in an array of objects, each
|
||
containing the x- and y-components of a vector and a corresponding string
|
||
label. This is a typical format for training data, but it isn’t directly
|
||
consumable by ml5.js. (Sure, I could have initially organized the data into
|
||
a format that ml5.js recognizes, but I’m including this extra step because
|
||
it will likely be necessary when you’re using a dataset that has been
|
||
collected or sourced elsewhere.) To add the data to the model, I need to
|
||
separate the inputs from the outputs so that the model understands which are
|
||
which.
|
||
</p>
|
||
<p>
|
||
The ml5.js library offers a fair amount of flexibility in the kinds of
|
||
formats it will accept, but I’ll choose to use arrays—one for the
|
||
<code>inputs</code> and one for the <code>outputs</code>. I can use a loop
|
||
to reorganize each data item and add it to the model:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
for (let item of data) {
|
||
// An array of two numbers for the inputs
|
||
let inputs = [item.x, item.y];
|
||
// A single string label for the output
|
||
let outputs = [item.label];
|
||
//{!1} Add the training data to the classifier.
|
||
classifier.addData(inputs, outputs);
|
||
}</pre
|
||
>
|
||
<p>
|
||
What I’ve done here is set the <strong>shape</strong> of the data. In
|
||
machine learning, this term describes the data’s dimensions and structure.
|
||
It indicates how the data is organized in terms of rows, columns, and
|
||
potentially even deeper, into additional dimensions. Understanding the shape
|
||
of your data is crucial because it determines the way the model should be
|
||
structured.
|
||
</p>
|
||
<p>
|
||
Here, the input data’s shape is a 1D array containing two numbers
|
||
(representing <em>x</em> and <em>y</em>). The output data, similarly, is a
|
||
1D array containing just a single string label. Every piece of data going in
|
||
and out of the network will follow this pattern. While this is a small and
|
||
simple example, it nicely mirrors many real-world scenarios in which the
|
||
inputs are numerically represented in an array, and the outputs are string
|
||
labels.
|
||
</p>
|
||
<p>
|
||
After passing the data into the <code>classifier</code>, ml5.js provides a
|
||
helper function to normalize it. As I’ve mentioned, normalizing data
|
||
(adjusting the scale to a standard range) is a critical step in the machine
|
||
learning process:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Normalize the data.
|
||
classifier.normalizeData();</pre
|
||
>
|
||
<p>
|
||
In this case, the handcoded data was limited to a range of –1 to +1 from the
|
||
get-go, so calling <code>normalizeData()</code> here is likely redundant.
|
||
Still, this function call is important to demonstrate. Normalizing your data
|
||
ahead of time as part of the preprocessing step will absolutely work, but
|
||
the auto-normalization feature of ml5.js is a big help!
|
||
</p>
|
||
<p>
|
||
Now for the heart of the machine learning process: actually training the
|
||
model. Here’s the code:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// The <code>train()</code> method initiates the training process.
|
||
classifier.train(finishedTraining);
|
||
|
||
// A callback function for when the training is complete
|
||
function finishedTraining() {
|
||
console.log("Training complete!");
|
||
}</pre
|
||
>
|
||
<p>
|
||
Yes, that’s it! After all, the hard work has already been completed. The
|
||
data was collected, prepared, and fed into the model. All that remains is to
|
||
call the <code>train()</code> method, sit back, and let ml5.js do its thing.
|
||
</p>
|
||
<p>
|
||
In truth, it isn’t <em>quite</em> that simple. If I were to run the code as
|
||
written and then test the model, the results would probably be inadequate.
|
||
Here’s where another key term in machine learning comes into play:
|
||
<strong>epochs</strong>. The <code>train()</code> method tells the neural
|
||
network to start the learning process. But how long should it train for? You
|
||
can think of an epoch as one round of practice, one cycle of using the
|
||
entire training dataset to update the weights of the neural network.
|
||
Generally speaking, the more epochs you go through, the better the network
|
||
will perform, but at a certain point you’ll have diminishing returns. The
|
||
number of epochs can be set by passing in an <code>options</code> object
|
||
into <code>train()</code>:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
//{!1} Set the number of epochs for training.
|
||
let options = { epochs: 25 };
|
||
classifier.train(options, finishedTraining);</pre
|
||
>
|
||
<p>
|
||
The number of epochs is an example of a hyperparameter, a global setting for
|
||
the training process. You can set others through the
|
||
<code>options</code> object (the learning rate, for example), but I’m going
|
||
to stick with the defaults. You can read more about customization options in
|
||
the ml5.js documentation.
|
||
</p>
|
||
<p>
|
||
The second argument to <code>train()</code> is optional, but it’s good to
|
||
include one. It specifies a callback function that runs when the training
|
||
process is complete—in this case, <code>finshedTraining()</code>. (See the
|
||
“Callbacks” box for more on callback functions.) This is useful for knowing
|
||
when you can proceed to the next steps in your code. Another optional
|
||
callback, which I usually name <code>whileTraining()</code>, is triggered
|
||
after each epoch. However, for my purposes, knowing when the training is
|
||
done is plenty!
|
||
</p>
|
||
<div data-type="note">
|
||
<h3 id="callbacks">Callbacks</h3>
|
||
<p>
|
||
A <strong>callback function</strong> in JavaScript is a function you don’t
|
||
actually call yourself. Instead, you provide it as an argument to another
|
||
function, intending for it to be <em>called back</em> automatically at a
|
||
later time (typically associated with an event, like a mouse click).
|
||
You’ve seen this before when working with Matter.js in
|
||
<a href="/physics-libraries#">Chapter 6</a>, where you specified a
|
||
function to call whenever a collision was detected.
|
||
</p>
|
||
<p>
|
||
Callbacks are needed for <strong>asynchronous</strong> operations, when
|
||
you want your code to continue along with animating or doing other things
|
||
while waiting for another task (like training a machine learning model) to
|
||
finish. A classic example of this in p5.js is loading data into a sketch
|
||
with <code>loadJSON()</code>.
|
||
</p>
|
||
<p>
|
||
JavaScript also provides a more recent approach for handling asynchronous
|
||
operations known as <strong>promises</strong>. With promises, you can use
|
||
keywords like <code>async</code> and <code>await</code> to make your
|
||
asynchronous code look more like traditional synchronous code. While
|
||
ml5.js also supports this style, I’ll stick to using callbacks to stay
|
||
aligned with p5.js style.
|
||
</p>
|
||
</div>
|
||
<h3 id="evaluating-the-model">Evaluating the Model</h3>
|
||
<p>
|
||
If <code>debug</code> is set to <code>true</code> in the initial call to
|
||
<code>ml5.neuralNetwork()</code>, a visual interface should appear after
|
||
<code>train()</code> is called, covering most of the p5.js page and canvas
|
||
(see Figure 10.22). This interface, called the <em>Visor</em>, represents
|
||
the evaluation step.
|
||
</p>
|
||
<figure>
|
||
<img
|
||
src="images/10_nn/10_nn_23.png"
|
||
alt="Figure 10.22: The Visor, with a graph of the loss function and model details"
|
||
/>
|
||
<figcaption>
|
||
Figure 10.22: The Visor, with a graph of the loss function and model
|
||
details
|
||
</figcaption>
|
||
</figure>
|
||
<p>
|
||
The Visor comes from TensorFlow.js (which underlies ml5.js) and includes a
|
||
graph that provides real-time feedback on the progress of the training. This
|
||
graph plots the loss of the model on the y-axis against the number of epochs
|
||
along the x-axis. <strong>Loss</strong> is a measure of how far off the
|
||
model’s predictions are from the correct outputs provided by the training
|
||
data. It quantifies the model’s total error. When training begins, it’s
|
||
common for the loss to be high because the model has yet to learn anything.
|
||
Ideally, as the model trains through more epochs, it should get better at
|
||
its predictions, and the loss should decrease. If the graph goes down as the
|
||
epochs increase, this is a good sign!
|
||
</p>
|
||
<p>
|
||
Running the training for the 200 epochs depicted in Figure 10.21 might
|
||
strike you as a bit excessive. In a real-world scenario with more extensive
|
||
data, I would probably use fewer epochs, like the 25 I specified in the
|
||
original code snippet. However, because the dataset here is so tiny, the
|
||
higher number of epochs helps the model get enough practice with the data.
|
||
Remember, this is a toy example, aiming to make the concepts clear rather
|
||
than to produce a sophisticated machine learning model.
|
||
</p>
|
||
<p>
|
||
Below the graph, the Visor shows a Model Summary table with details on the
|
||
lower-level TensorFlow.js model architecture created behind the scenes. The
|
||
summary includes layer names, neuron counts per layer (in the Output Shape
|
||
column), and a parameters count, which is the total number of weights, one
|
||
for each connection between two neurons. In this case, dense_Dense1 is the
|
||
hidden layer with 16 neurons (a number chosen by ml5.js), and dense_Dense2
|
||
is the output layer with 4 neurons, one for each classification category.
|
||
(TensorFlow.js doesn’t think of the inputs as a distinct layer; rather,
|
||
they’re merely the starting point of the data flow.) The <em>batch</em> in
|
||
the Output Shape column doesn’t refer to a specific number but indicates
|
||
that the model can process a variable amount of training data (a batch) for
|
||
any single cycle of model training.
|
||
</p>
|
||
<p>
|
||
Before moving on from the evaluation stage, I have a loose end to tie up.
|
||
When I first outlined the steps of the machine learning life cycle, I
|
||
mentioned that preparing the data typically involves splitting the dataset
|
||
into three parts to help with the evaluation process:
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
<strong>Training:</strong> The primary dataset used to train the model
|
||
</li>
|
||
<li>
|
||
<strong>Validation:</strong> A subset of the data used to check the model
|
||
during training, typically at the end of each epoch
|
||
</li>
|
||
<li>
|
||
<strong>Testing:</strong> Additional untouched data never considered
|
||
during the training process, for determining the model’s final performance
|
||
after the training is completed
|
||
</li>
|
||
</ul>
|
||
<p>
|
||
You may have noticed that I never did this. For simplicity, I’ve instead
|
||
used the entire dataset for training. After all, my dataset has only eight
|
||
records; it’s much too small to divide three sets! With a large dataset,
|
||
this three-way split would be more appropriate.
|
||
</p>
|
||
<p>
|
||
Using such a small dataset risks the model <strong>overfitting</strong> the
|
||
data, however: the model becomes so tuned to the specific peculiarities of
|
||
the training data that it’s much less effective when working with new,
|
||
unseen data. The main reason to use a validation set is to monitor the model
|
||
during the training process. As training progresses, if the model’s accuracy
|
||
improves on the training data but deteriorates on the validation data, it’s
|
||
a strong indicator that overfitting might be occurring. (The testing set is
|
||
reserved strictly for the final evaluation, one more chance after training
|
||
is complete to gauge the model’s performance.)
|
||
</p>
|
||
<p>
|
||
For more realistic scenarios, ml5.js provides a way to split up the data, as
|
||
well as automatic features for employing validation data. If you’re inclined
|
||
to go further,
|
||
<a href="http://ml5js.org/"
|
||
>you can explore the full set of neural network examples on the ml5.js
|
||
website</a
|
||
>.
|
||
</p>
|
||
<h3 id="tuning-the-parameters">Tuning the Parameters</h3>
|
||
<p>
|
||
After the evaluation step, there’s typically an iterative process of
|
||
adjusting hyperparameters and going through training again to achieve the
|
||
best performance from the model. While ml5.js offers capabilities for
|
||
parameter tuning (which you can learn about in the library’s reference), it
|
||
isn’t really geared toward making low-level, fine-grained adjustments to a
|
||
model. Using TensorFlow.js directly might be your best bet if you want to
|
||
explore this step in more detail, since it offers a broader suite of tools
|
||
and allows for lower-level control over the training process.
|
||
</p>
|
||
<p>
|
||
In this case, tuning the parameters isn’t strictly necessary. The graph in
|
||
the Visor shows a loss all the way down at 0.1, which is plenty accurate for
|
||
my purposes. I’m happy to move on.
|
||
</p>
|
||
<h3 id="deploying-the-model">Deploying the Model</h3>
|
||
<p>
|
||
It’s finally time to deploy the model and see the payoff of all that hard
|
||
work. This typically involves integrating the model into a separate
|
||
application to make predictions or decisions based on new, previously unseen
|
||
data. For this, ml5.js offers the convenience of a
|
||
<code>save()</code> function to download the trained model to a file from
|
||
one sketch and a <code>load()</code> function to load it for use in a
|
||
completely different sketch. This saves you from having to retrain the model
|
||
from scratch every single time you need it.
|
||
</p>
|
||
<p>
|
||
While a model would typically be deployed to a different sketch from the one
|
||
where it was trained, I’m going to deploy the model in the same sketch for
|
||
the sake of simplicity. In fact, once the training process is complete, the
|
||
resulting model is, in essence, already deployed in the current sketch. It’s
|
||
saved in the <code>classifier</code> variable and can be used to make
|
||
predictions by passing the model new data through the
|
||
<code>classify()</code> method. The shape of the data sent to
|
||
<code>classify()</code> should match that of the input data used in
|
||
training—in this case, two floating-point numbers, representing the x- and
|
||
y-components of a direction vector:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Manually create a vector.
|
||
let direction = createVector(1, 0);
|
||
// Convert the x- and y-components into an input array.
|
||
let inputs = [direction.x, direction.y];
|
||
// Ask the model to classify the inputs.
|
||
classifier.classify(inputs, gotResults);</pre
|
||
>
|
||
<p>
|
||
The second argument to <code>classify()</code> is another callback function
|
||
for accessing the results:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
function gotResults(results) {
|
||
console.log(results);
|
||
}</pre
|
||
>
|
||
<p>
|
||
The model’s prediction arrives in the argument to the callback, which I’m
|
||
calling <code>results</code> in the code. Inside, you’ll find an array of
|
||
the possible labels, sorted by <strong>confidence</strong>, a probability
|
||
value that the model assigns to each label. These probabilities represent
|
||
how sure the model is of that particular prediction. They range from 0 to 1,
|
||
with values closer to 1 indicating higher confidence and values near 0
|
||
suggesting lower confidence:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="json">
|
||
[
|
||
{
|
||
"label": "right",
|
||
"confidence": 0.9669702649116516
|
||
},
|
||
{
|
||
"label": "up",
|
||
"confidence": 0.01878807507455349
|
||
},
|
||
{
|
||
"label": "down",
|
||
"confidence": 0.013948931358754635
|
||
},
|
||
{
|
||
"label": "left",
|
||
"confidence": 0.00029277068097144365
|
||
}
|
||
]</pre
|
||
>
|
||
<p>
|
||
In this example output, the model is highly confident (approximately 96.7
|
||
percent) that the correct label is <code>"right"</code>, while it has
|
||
minimal confidence (0.03 percent) in the <code>"left"</code> label. The
|
||
confidence values are normalized and add up to 100 percent.
|
||
</p>
|
||
<p>
|
||
All that remains now is to fill out the sketch with code so the model can
|
||
receive live input from the mouse. The first step is to signal the
|
||
completion of the training process so the user knows the model is ready.
|
||
I’ll include a global <code>status</code> variable to track the training
|
||
process and ultimately display the predicted label on the canvas. The
|
||
variable is initialized to <code>"training"</code> but updated to
|
||
<code>"ready"</code> through the <code>finishedTraining()</code> callback:
|
||
</p>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// When the sketch starts, it will show a status of <code>training</code>.
|
||
let status = "training";
|
||
|
||
function draw() {
|
||
background(255);
|
||
textAlign(CENTER, CENTER);
|
||
textSize(64);
|
||
text(status, width / 2, height / 2);
|
||
}
|
||
|
||
// This is the callback for when training is complete, and the message changes to <code>ready</code>.
|
||
function finishedTraining() {
|
||
status = "ready";
|
||
}</pre
|
||
>
|
||
<p>
|
||
Finally, I’ll use p5.js’s mouse functions to build a vector while the mouse
|
||
is being dragged and call <code>classifier.classify()</code> on that vector
|
||
when the mouse is clicked.
|
||
</p>
|
||
<div data-type="example">
|
||
<h3 id="example-102-gesture-classifier">
|
||
Example 10.2: Gesture Classifier
|
||
</h3>
|
||
<figure>
|
||
<div
|
||
data-type="embed"
|
||
data-p5-editor="https://editor.p5js.org/natureofcode/sketches/SbfSv_GhM"
|
||
data-example-path="examples/10_nn/10_2_gesture_classifier"
|
||
>
|
||
<img src="examples/10_nn/10_2_gesture_classifier/screenshot.png" />
|
||
</div>
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<pre class="codesplit" data-code-language="javascript">
|
||
// Store the start of a gesture when the mouse is pressed.
|
||
function mousePressed() {
|
||
start = createVector(mouseX, mouseY);
|
||
}
|
||
|
||
// Update the end of a gesture as the mouse is dragged.
|
||
function mouseDragged() {
|
||
end = createVector(mouseX, mouseY);
|
||
}
|
||
|
||
// The gesture is complete when the mouse is released.
|
||
function mouseReleased() {
|
||
// Calculate and normalize a direction vector.
|
||
let dir = p5.Vector.sub(end, start);
|
||
dir.normalize();
|
||
// Convert to an input array and classify.
|
||
let inputs = [dir.x, dir.y];
|
||
classifier.classify(inputs, gotResults);
|
||
}
|
||
|
||
// Store the resulting label in the <code>status</code> variable for showing in the canvas.
|
||
function gotResults(error, results) {
|
||
status = results[0].label;
|
||
}</pre
|
||
>
|
||
<p>
|
||
Since the <code>results</code> array is sorted by confidence, if I just want
|
||
to use a single label as the prediction, I can access the first element of
|
||
the array with <code>results[0].label</code>, as in the
|
||
<code>gotResults()</code> function in Example 10.2. This label is passed to
|
||
the <code>status</code> variable to be displayed on the canvas.
|
||
</p>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-105">Exercise 10.5</h3>
|
||
<p>
|
||
Divide Example 10.2 into three sketches: one for collecting data, one for
|
||
training, and one for deployment. Use the
|
||
<code>ml5.neuralNetwork</code> functions <code>save()</code> and
|
||
<code>load()</code> for saving and loading the model to and from a file,
|
||
respectively.
|
||
</p>
|
||
</div>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-106">Exercise 10.6</h3>
|
||
<p>
|
||
Expand the gesture-recognition model to classify a sequence of vectors,
|
||
capturing more accurately the path of a longer mouse movement. Remember,
|
||
your input data must have a consistent shape, so you’ll have to decide how
|
||
many vectors to use to represent a gesture and store no more and no less
|
||
for each data point. While this approach can work, other machine learning
|
||
models (such as recurrent neural networks) are specifically designed to
|
||
handle sequential data and might offer more flexibility and potential
|
||
accuracy.
|
||
</p>
|
||
</div>
|
||
<div data-type="exercise">
|
||
<h3 id="exercise-107">Exercise 10.7</h3>
|
||
<p>
|
||
One of the pretrained models in ml5.js is called <em>Handpose</em>. The
|
||
input of the model is an image, and the prediction is a list of 21 key
|
||
points—x- and y-positions, also known as <em>landmarks</em>—that describe
|
||
a hand.
|
||
</p>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_24.png" alt="" />
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
<p>
|
||
Can you use the outputs of the <code>ml5.handpose()</code> model as the
|
||
inputs to an <code>ml5.neuralNetwork()</code> and classify various hand
|
||
gestures (like a thumbs-up or thumbs-down)? For hints, you can watch my
|
||
<a href="https://thecodingtrain.com/pose-classifier"
|
||
>video tutorial that walks you through this process for body poses in
|
||
the machine learning track on the Coding Train website</a
|
||
>.
|
||
</p>
|
||
</div>
|
||
<div data-type="project">
|
||
<h3 id="the-ecosystem-project-11">The Ecosystem Project</h3>
|
||
<p>
|
||
Incorporate machine learning into your ecosystem to enhance the behavior
|
||
of creatures. How could classification or regression be applied?
|
||
</p>
|
||
<ul>
|
||
<li>
|
||
Can you classify the creatures of your ecosystem into multiple
|
||
categories? What if you use an initial population as a training dataset,
|
||
and as new creatures are born, the system classifies them according to
|
||
their features? What are the inputs and outputs for your system?
|
||
</li>
|
||
<li>
|
||
Can you use a regression to predict the life span of a creature based on
|
||
its properties? Think about how size and speed affected the life span of
|
||
the bloops from <a href="/genetic-algorithms#">Chapter 9</a>. Could you
|
||
analyze how well the regression model’s predictions align with the
|
||
actual outcomes?
|
||
</li>
|
||
</ul>
|
||
<figure>
|
||
<img src="images/10_nn/10_nn_25.png" alt="" />
|
||
<figcaption></figcaption>
|
||
</figure>
|
||
</div>
|
||
<p></p>
|
||
</section>
|