Birds, books, and matrices: a brief adventurein artificial intelligence and neural networks
Thomas PietrahoSpring, 2018
I am an algebraist
Neural networks: major successes
Neural nets can recognize images
carball
bridge burrito
Current accuracy ≈ 95%
Neural nets can translate
Polish: mój poduszkowiec jest pełen węgorzy
English: my hovercraft is full of eels
Google’s version is very close to human translation for anumber of languages. Not Chinese.
Neural nets can play games
Top ranked Go player defeated by AlphaGo (4-1). AlphaGodestroyed by AlphaGo Zero (100-0).
Image by Saran Poroong
Neural networks: minor successes
Neural nets can judge a book by its cover
history science romance sports
Problem: Predict book genre based on its cover.
Accuracy 76%.
with Parikshit Sharma, ’17, IndieBio
Neural nets can identify birds
cardinal wood duck anhinga chickadee
Problem: Predict species of bird based on image.
Accuracy 87%. (P., 2017)
american crow fish crow common raven
Neural nets can identify birds
cardinal wood duck anhinga chickadee
Problem: Predict species of bird based on image.
Accuracy 87%. (P., 2017)
american crow fish crow common raven
Neural nets can be useful to an algebraist?
From The Accountant
What are neural nets?
Neural nets are functions
x
y
Image courtesy of JD Cruzan
Neural nets are functions
2
4
Image courtesy of JD Cruzan
Neural nets are functions
3
9
Image courtesy of JD Cruzan
Neural nets are functions
4
16
Image courtesy of JD Cruzan
Neural nets are functions
1.00 4.98 7.21 9.89 1.01 2.30
3.72 2.67 22.01 1.92 3.70
Image courtesy of JD Cruzan
Neural nets are functions
1.00 4.98 7.21 9.89 1.01 2.30
3.72 2.67 22.01 1.92 3.70
In this form, neural nets can carry out
• regression, or•
Image courtesy of JD Cruzan
Neural nets are functions
1.00 4.98 7.21 9.89 1.01 2.30
0 0 0 1 0
In this form, neural nets can carry out
• regression, or• classification
Image courtesy of JD Cruzan
Neural nets are made up of “neurons”
Two parameters: laziness and loudness.This specifies a neuron’s activation function.
Neural nets are networks of neurons
output
input
Neural nets are universal
Theorem (G. Cybenko 1989)Every function can be modeled as a neural network.
Examples of functions: image classification, languagetranslation, etc.
Question: Why no self-drivingcars in the 1990s?
Learning with neural nets
Procedure:
• assemble a neural network (craft)• adjust laziness and loudness for each neuron (math)• measure error based on a sample of data and repeat(fast processors)
Advances in all three parts of this process are responsible forthe machine learning revolution since 2012.
Image courtesy of Kaiming He
We don’t completely understand why neural nets work
Image courtesy of Elsayed et. al.
A problem in algebra
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)
·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)=
( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)
=( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)=
( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)=
( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
matrix size rank2× 2 83× 3 274× 4 64
1000× 1000 109
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)=
( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 274× 4 64
1000× 1000 109
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)=
( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 274× 4 ��ZZ64 49 (Strassen, 1969)
1000× 1000 109
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)=
( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 274× 4 ��ZZ64 49 (Strassen, 1969)
1000× 1000 ��ZZ109 264M (Strassen, 1969)
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)=
( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 ��ZZ27 23 (Lederman, 1976)4× 4 ��ZZ64 49 (Strassen, 1969)
1000× 1000 ��ZZ109 264M (Strassen, 1969)
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)=
( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 ��ZZ27 23 (Lederman, 1976)4× 4 ��ZZ64��ZZ49 48 (Stothers, 2012)
1000× 1000 ��ZZ109 264M (Strassen, 1969)
Matrix multiplication
( 0.98 0.23 0.120.12 0.34 0.670.11 0.54 0.18
)·( 0.56 0.09 0.100.99 0.45 0.410.39 0.02 0.11
)=
( 0.82 0.19 0.210.67 0.18 0.230.67 0.26 0.25
)
This process is a mess, involving lots of ordinary addition andmultiplication. But it is an important mess.
Goal: minimize number of ordinary multiplications: “rank”
matrix size rank2× 2 �A8 7 (Strassen, 1969)3× 3 ��ZZ27 23 (Lederman, 1976)4× 4 ��ZZ64��ZZ49 48 (Stothers, 2012)
1000× 1000 ��ZZ109���XXX264M 238M (Stothers, 2012)
A little insight
A neural network can model matrix multiplication:
(a bc d
)
·(e fg h
)=
(i jk l
)
A little insight
A neural network can model matrix multiplication:
(a bc d
)·(e fg h
)
=(i jk l
)
A little insight
A neural network can model matrix multiplication:
(a bc d
)·(e fg h
)=
(i jk l
)
A little insight
A neural network can model matrix multiplication:
(a bc d
)·(e fg h
)=
(i jk l
)
A little insight
A neural network can model matrix multiplication:
(a bc d
)·(e fg h
)=
(i jk l
)
A little insight
A neural network can model matrix multiplication:
(a bc d
)·(e fg h
)=
(i jk l
)
Question: can our methods learn this network?
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8
X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X
7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7
X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X
6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6
X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
error over learning time
matrix size rank2× 2 8 X 7 X 6 X
2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11
X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X
10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10
X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
error over learning time
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X
3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15
X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X
14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14
X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X
3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23
X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
error over learning time
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X
22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22
X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
error over learning time
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X
4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49
X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X
48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48
X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
error over learning time
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X
47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47
X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
error over learning time
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X
46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46
X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
error over learning time
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X
45 X 44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45
X 44 X
Will it learn?
Thanks: Dj and HPC
error over learning time
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X
44 X
Will it learn?
Thanks: Dj and HPC
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44
X
Will it learn?
Thanks: Dj and HPC
error over learning time
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
Will it learn?
Thanks: Dj and HPC
Upshot: This result reduces the computational costfor 1000 × 1000 matrix multiplication from 238M to172M ordinary multiplications!
matrix size rank2× 2 8 X 7 X 6 X2× 3 11 X 10 X3× 2 15 X 14 X3× 3 23 X 22 X4× 4 49 X 48 X 47 X 46 X 45 X 44 X
I am an algebraist
Luckily (for algebraists), the neural network solution is only anapproximation.
Question: how can one obtain an exact solution?
Hint: algebra
I am an algebraist
Luckily (for algebraists), the neural network solution is only anapproximation.
Question: how can one obtain an exact solution?
Hint: algebra