Index of subjects
Neural networks, Artificial Intelligence, Machine learning
If:
you are an experienced Python programmer and /or AI scientist, you probably dont need this page
else:
This may be your first step to become a nerd (you also need pizza and a lot of coffee).
Why do I do this? Early sixties I had a shortlived career in the rocknroll scene (hence my interest in all things guitar). After that I got a serious job and by chance rolled into IT to become a realtime machine language programmer in (for that time) huge and advanced message switching computers. That was extreme fun.
My interest in guitar as well as IT developments stayed with me all my life.
We are entering a new era of information technology, driven by Artificial Intelligence. So i decided to pick up that old interest again and educate myself on AI and on the Python language which is popular in AI circles.
I hope to effect some interest with some of you; as especially in Holland education in this field is lagging behind (some unis now actually have a numerus clausus  the ultimate stupidity)
AI (or ML , deep learning. take your pick) is about software structures based on/emulating the human brain, learning either from experience and/or from large datasets.
So AI learns by itself, instead of being preprogrammed for a particular purpose by fallible humans.
We have already seen that by feeding biased data, AI can learn the wrong things. Like teaching a kid only stuff from Fox news.
NOTE: I AM USING WINDOWS 10, PYTHON VERSION 3.6 and 3.7 WITH NUMPY; SO THATS WHAT YOU NEED (SEE WWW.PYTHON.ORG). Easy to install.
LINUX (UBUNTU) WILL WORK AS WELL IF NOT BETTER, BUT I HAVE CHOSEN WINDOWS 10 FOR PRACTICAL REASONS  lazyness.
The mythical NAND gate and why mention it here
The NAND gate is an interesting starting point and easy to use as simple exercise to emulate a single artificial neuron.
Also it is the basic building block of THE universal computer.
Available as simple electronic integrated circuits, with 4 and more gates on a chip (you can buy them cheaply, for example the TI 7401 chips).
A NAND (NOTAND) gate (the chip) will output a ONE (5 volts) only if both inputs are ZERO (zero Volts).
Look it up; there are also AND, OR, XOR and NOT gates, doing other things with ones and zeros.
With multiple NAND gates we can perform binary additions, compare binary numbers, and make flipflops as memory cells and thus registers.
The building blocks of a computer are memory, registers, arithmetic units (AU), comparators, counters that can all be constructed with NAND gates., therefore a computer can be constructed using NAND gates.
You would need a lot of them but its definitely doable and it has actually been done.
If you are interested Google the 7401 TI chips, containing 4 nand gates in silicon, 14PIN DIL, still generally available.
An artifical software Perceptron/neuron with two inputs.

This is to get into some of the very basic basics to get you started with the AI concepts as well as with Python to play around with, before we get into the real thing.
A neuron (or perceptron) will take inputs and produce a particular output. This happens all the time in your brain, but here we will create the software model of a neuron/perceptron.
By applying WEIGHTS to the inputs, we can make a particular input more or less important than another; and by adding a socalled BIAS (number) we can make the neuron easier or less easy to trigger.
After some summing and multiplying the inputs and weights we will apply a TRIGGER (also called ACTIVATION FUNCTION) to the result , being the function that lets the neuron fire or remain quiet.
(important summary) The functions of the perceptron are thus:
 Input the x1  xn values (variables or array)
 Do the summing algorithm with weights w1  wn and bias b: w1 * x1 + w2 * x2 + wn * xn + b
 Perform the trigger or activation function on the result of the sum
 Output the end result to the world
At this stage we do a simple binary activation , if the resulting sum is <=0 then the output becomes 0, else if the sum > 0, the output = 1 (variables or array).
In the model in this chapter, the values of weight and bias are not preset, so you still can play around with various values !!!
X1 to Xn are the input values, and output is the value we are trying to achieve, using the weight and bias. Here we use only two inputs, x1 and x2. Start with bias 3, weights 2 and two inputs, x1 and x2 (use 1 or 0). These values will let the neuron behave as NAND.
In Python speak this becomes:
print ("defining a perceptron")
x1 = int (input("number x1: "))
x2 = int(input("number x2: "))
w1 = int(input("number weight 1: "))
w2 = int(input("number weight 2: "))
b = int(input("number for bias: "))
output = (x1*w1)+(x2*w2)+b
if (output > 0):
output =1
else:
output = 0
print (output)
You can paste the above into Idle (the Python shell / editor), save it with a .py extension in the directory where Python.exe lives and run it under python.
(Alarm!) Python can get very angry if you misuse indenting ; and dont forget brackets!
By all means try out various weights and biases; nothing gets damaged, except your ego perhaps.
Use 0 or 1 for the inputs, we are still talking binary.
Here a screenshot of what actually happens.
As expected with x1 and x2 being 1, the result is 0, thats what a NAND gate does.
Try some other x1, x2 . weight and bias input values. Just to pass the time.
Perceptron/neuron weighted (hardcoded) to function as NAND gate, defined as Python function:
This is the same program as the previous chapter, but as we want to repeatedly use the NAND gate in the next chapters, its useful to define it as a Python function.
A function can be called repeatedly with input parameters., instead of retyping the same code all the time.
For our purpose this particular perceptron is parameterized (hard coded) to work as a NAND gate, now with only x1 and x2 as variable input.
Copy the program lines below into IDLE (the handy Python shell), save as e.g. nandfunction.py or some name you fancy, in the folder where python.exe lives , start the windows commandshell, cd to that folder and run it.
Now do it:
def nand (x1, x2):
print ("defining a perceptron NAND as function")
# ("NAND defined by weights 2 and bias 3")
w1 = int(2)
w2 = int(2)
b = int(3)
output = (x1*w1) + (x2*w2) + b
if (output > 0):
output =1
else:
output = 0
return output
print ("nand is now a function")
x1= int(input("number 1 or 0: "))
x2= int(input("number 1 or 0: "))
print ("Öutput NAND is: ", nand(x1,x2))
This is the result in the command shell:
Try the whole NAND truthtable to see if it works.
Now a program using the NAND neural function to emulate a 1 bit binary adder:
If you have seen enough of the NAND gates, you may skip to the Neural Network section  the real thing,
The previous chapter was just about one simple single nand gate.
Next step is to implement a binary adder for two one bit numbers plus a carry from a previous adder.
You need 4 nand perceptrons for addition and 1 nand perceptron for the carry bit.
The output is 2 bits, 1 bit sum plus carry.
The binary truth table for this is then:
0 + 0 = 0 carry 0 result shows as 0 0 (decimal 0)
0 + 1 = 1 carry 0 result shows as 0 1 (decimal 1)
1 + 1 = 0 carry 1 result shows as 1 0 (decimal 2)
If there was a carry from a previous stage the end result for 1+1 would be 11 (3) .
Without further ado I have defined the add1bit as a function, which calls the nand function multiple times.
The program structure is simple; a. define the nand function, b. define an add1bit function, then c. (main) the code lines to input the values you want to add and run the add1bit function, which will run de nand function, then d. output the result. The lines under c. are called ' main' in other languages alike C , Pascal.
The program lines are copied directly from IDLE, you can copy/paste back through IDLE or paste directly in a wordpad file and save with extension .py.
Save in the location where python exe lives and run through the Windows command shell. For Linux the principle is the same.
The variables o1 upto o7, cin, cout and o refer to the nandgates (ie neurons!) in the diagram, so you can see what happens.
# using the nand function 5 times to create a AU
# adding two one bit binary numbers, carry in and carry out
def nand (a, b):
w1 = int(2)
w2 = int(2)
bias = int(3)
out = (a*w1) + (b*w2) + bias if (out > 0):
out =1
else:
out = 0
return out
def add1bit (a, b, cin):
o1 = nand(a,b)
o2 = nand(a,o1)
o3 = nand(b, o1)
o4 = nand(o2, o3)
o5 = nand(cin, o4)
o6 = nand(cin, o5)
o7 = nand(o4,o5)
o = nand(o6,o7)
cout = nand(o1,o5)
return cout, o
#input binary numbers to be added
print ("Enter two 1bit binary numbers")
a0 = int(input("number 1 or 0: "))
b0 = int(input("number 1 or 0: "))
cin = int(input("carry in, number 1 or 0: "))
# now add m up
result = add1bit (a0, b0, cin)
print ("Result now is carry  sum: " , result)
Here the screendump of the process
Adding two twobit binary numbers
I got carried away slightly, so skip to the neural network chapters if its enough.
The diagram below is a 4bit adder. As we are doing this as exercise we only implement a two twobit number adder Python. Doing a full 4bit adder is more of the same.
This is the embryonic beginning of a real computer
An AU (arithmetic unit) is used for several purposes, such as adding numbers or function as the program counter (points to the address of the next instruction to be fetched from memory and executed).
Once this is done, adding 4 or 8 bit numbers is just more of the same (wont do that here, promise).
The truth table for addition of two twobit numbers is:
00 + 00 = 00
01 + 00 = 01
01 + 01 = 10
10 + 01 = 11
10 + 10 = 00 plus carry 1 (100)
11 + 01 = 00 plus carry 1 (100)
11 + 10 = 01 plus carry 1 (101)
11+ 11 = 10 plus carry 1 (110)
Output is 2 bits plus carry
Values change when the input carry of a previous stage is true. We use the nand function defined above again as building block.
Here in python speak:
# a multi bit binary number adder using the nand function as basic block
# each two bit plus carry adder will be used as a function
# numbers represented by a1 a0 and b1 b0 , c is carry
def nand (a, b):
w1 = int(2)
w2 = int(2)
bias = int(3)
out = (a*w1) + (b*w2) + bias
if (out > 0):
out =1
else:
out = 0
return out
def add1bit (a, b, cin):
o1 = nand(a,b)
o2 = nand(a,o1)
o3 = nand(b, o1)
o4 = nand(o2, o3)
o5 = nand(cin, o4)
o6 = nand(cin, o5)
o7 = nand(o4,o5)
o = nand(o6,o7)
cout = nand(o1,o5)
return cout, o
def add2bits (a1, a0, b1, b0, cin):
a=a0
b=b0
cout,
o0 = add1bit(a0,b0,cin)
cin = cout
a = a1
b = b1
cout, o1 = add1bit (a1,b1,cin)
return o1, o0, cout
#input binary numbers to be added
print ("Enter two 2bit binary numbers")
a1 = int(input("a1 number 1 or 0: "))
a0 = int(input("a0 number 1 or 0: "))
b1 = int(input("b1 number 1 or 0: "))
b0 = int(input("b0 number 1 or 0: "))
cin = int(input("carry in, number 1 or 0: "))
cout = 0
# now add m up
result = add2bits (a1,a0, b1, b0, cin)
bit1, bit0, carry= add2bits (a1, a0, b1, b0, cin)
print ("carry: ", carry, "bit1: ", bit1, "bit0: " , bit0 )
Below you see what happens in the command shell.
The result shows binary 111, which is decimal 7, as we added 11 (3) + 11 (3) + 1 = 111 (7)
A flipflop memory latch from nandgates  utter madness
Just for the sake of utter madness you could attempt to create a memory latch ( or flipflop) for one bit with neurons.
Obviously totally bonkers as we use an Intel i7 desktop, Windows 10, Python to emulate a single 1$ chip., to create a one bit memory.
I will skip this for now as we are ready to move to more intelligent neurons.
Perhaps nice as an exercise in Python programming sometime, as the flip flop has an issue: its clocked.
Read the outputs of the diagram as red led is 1 and green led is 0. The NOT gate is easy, use the nand gate and short circuit the inputs (works in Python as well).
Sigmoid neurons, first derivative and matrix multiplication  dont panic.
YOU CAN SKIP THIS SECTION AND COME BACK WHEN YOU THINK, OY, WHATSTHIS?!
Some mathematical shit here which is good to have heard of.
The issue with the activiation function sofar is that it flips between 0 and 1 and could become unstable.
The difference between perceptrons and sigmoid neurons is that sigmoid neurons don't just output 1 or 0. They can have as output any real number between 1 and 1, so values such as 0,486654 and 0,9988321 are legitimate outputs  as we will see later.
The sigmoid function is available in Python (numpy toolset).
In python speak . If you use the ' as' option you can call numpy by using 'np', a lot shorter. Like this:
import numpy as np
def sigmoid(z):
return 1.0/(1.0+np.exp(z)):
I dont go into the deep with eulers number and more like that. Look it up, lots of stuff on the internet.
In order to play around with AI you only have to know how to import numpy and how to call the function.
Rather than a very steep transition between 0 and 1 , the sigmoid will give a sliding (nonlinear) scale of numbers, for example:
Derivative
one more important concept you have to be aware of, is the derivative (in dutch afgeleide). Sorry about that.
You only have to be aware what is use is., understanding is another matter.
The first derivative of a function will give the tangent line tilt at particular point of the original function. If the deirvative is negative the tilt is to the left, else to the right. You can also determine how steep the curve is at that particular point of the graph.
It is used to check whether we should adjust parameters forward or backward when training a neural network. This is an over simplification, so by all means check on the web.
Matrices and multiplying matrices
In the NAND gate examples i coded each neuron as a function.
In Neural Networks a more efficient way is to use matrix and matrix multiplication.
Neurons and synapses (the weights) in a network are represented by matrices  for our purpose here matrices are stored in numpy arrays, and the multimplication etc. is done in one feel sweep for all neurons using numpy dot multiplication of arrays (= matrices). There is a load of stuff on the internet on the subject, but here a small summary fo you can understand the code lines following.
array1([
[0, 0, 1],
[0, 1, 1],
[1, 0, 1],
[1, 1, 1]])
Matrix 4 x 3, ie 4 rows 3 columnsarray2([
[2],
[2],
[2]])
Matrix 3 x 1 ie 3 rows 1 column
When we multiply these two matrices the result will be array3;
Array3([
[2],
[4],
[4],
[6]])
Multiply gives matrix 4 x 1 ie 4 rows 1 column
The rule in general is :
· The number of columns of the 1st matrix must equal the number of rows of the 2nd matrix.
· And the result will have the same number of rows as the 1st matrix, and the same number of columns as the 2nd matrix.
Dont worry too much. Its nice if you can understand it, numpy will solve the whole thing for you  and fast.
A 2 layer neural network to play with
I first have to apologize to all those clever people (Phd, professors and the like) who also have published on this subject. I've reused (uh, loaned?) ideas and Python solutions and after a good deal of mixing and shuffling came up with the easy examples below.
I hope I added some value to place it in the context and sequences of this uhh, tutorial and make it accessible to morons like myself.
What i also noticed is that there is a naming convention issue here; you will see a 2layer neural network also referred to as a single layer. I see a network as multiple connected things, so a minimum useful network wil consist of TWO LAYERS OF NEURONS.
Left is a TWO LAYER network, consisting of neurons forming the input layer, and in this case a single neuron as OUTPUT LAYER taking the synapses from the input layer and do its thing with those.
At the right a three layer network, with again an input layer and an output layer, but in between a socalled hidden layer that does its thing with the input layer synapses.
A THREE layer network can do cleverer things than a two layer network. Rember each neuron consists of the functions described above.
Ok at this stage i went back to my python 2layer neural network to clean it up for publication here.
In the program below we will work with a simple two layer network, see the picture:
The program will do the following conform the picture:
Triple neuron input layer with values
Synapses ( weights) as array between input neurons and output neurons
The suming algorythm for the synapses will do the sigmoid activation and
generate Layer 1, which is the output
The output will be compared with a separate array and the difference (the error) will is used to change the weights.
The cycle will be repeated a large number of times and then print the last layer1 array; which will closely match the target (we hope).
Here is the program pasted raw, not yet corrected with the right indents etc. but it gives you the idea. Have a look. Next time i wil make the indents so the program is runnable, and the screendump of the output.
import numpy as np
# sigmoid function  deriv flag indicates to yield sigmoid or its derivative
def nonlin(x,deriv=False):
if(deriv==True):
return x*(1x)
else:
return 1/(1+np.exp(x))
# input dataset
x = np.array([ [0,0,1], [0,1,1], [1,0,1], [1,1,1] ])
# output dataset that we try to achieve
y = np.array([[0,0,1,1]]).T
# seed random numbers to make calculation
# deterministic
np.random.seed(1)
# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1))  1
# set number of training iterations
for iter in range(10000):
# forward propagation, do the sums with weights ad bias
l0 = x
l1 = nonlin(np.dot(l0,syn0))
# compare result with wanted output
l1_error = y  l1
# multiply how much we missed by the
# slope = derivative of the sigmoid at the values in l1
l1_delta = l1_error * nonlin(l1,True)
# update weights
syn0 += np.dot(l0.T,l1_delta)
print ("Output After Training:")
print (l1)
print ("The actual synapse/weights for this result: ")
print (syn0)
So run the program in the usual way from the command shell. The result will show after some thinking (10000 iterations!) the following.
Remember the output we wanted the program to learn was [ 0, 0, 1, 1]
The result is [0.009], 0.007], [0.993], [0.992] and this result was obtained with the synapse containing the weights:
[9.672], [0.207], [4.629] give or take a few more digits.
So what has happened. The program has learned itself to recognize the pattern of bits in the input x array as 0011. So if the input x was a picture (bit pattern) of a three, the program would now have learned to recognize this as a 0011  binary 3.
So2; if we now take the synapse (weights) and use them in a much more simplified nonlearning program to read the same x array, it should yield the same output.
So3; what have we achieved then is that we created a neural network (very simple) which recognizes this pattern as a 3
Think about that. Thats amazing. At least, the first time i did this i found it amazing.
I will demonstrate with the following tiny program (the non learning version of the previous one) that indeed this particular xarray, combined with these weights will be recognised as representing a three.
Note that the actual result is not precisely 0011, but zeros are generally small or negative numbers, while ones are distininctly much larger.
You will find that when screwing around with the input array x and y; the result sometimes becomes unstable; a lot of combinations however are very clear nd consistent.
Here is the little program that takes the x input array x and applies the weights we found above in order to demonstrate that without further learning the result will be the same 0011.
print ("This program uses the weights of a training cycle;")
print ("then applies these to the input matrix;")
print ("which should then result in the target output dataset")
import numpy as np
# sigmoid function or derivative
def nonlin(x,deriv=False):
if(deriv==True):
return x*(1x)
else:
return 1/(1+np.exp(x))
# input dataset matrix 4x3
x = np.array([ [0,0,1], [0,1,1], [1,0,1], [1,1,1] ])
# output dataset 4x1 to be learned
y = np.array([[0,0,1,1]]).T
# initialize weights with the learned weights
syn0 = np.array( [[ 9.67299303], [0.2078435 ], [4.62963669]])
# forward propagation, generate output layer l1
l0 = x
l1 = nonlin(np.dot(l0,syn0))
print ("input dataset x")
print (x)
print ("target y was")
print (y)
print ("Output Layer After Training:")
print (l1)
print ("final synapse, for this x/y combination")
print (syn0)
Load and run in the same way as previous examples. The output layer l1 is clearly representing 0011
Sufficient stuff here to play around with. For example try to shorten the weights to eg. 4.62 and so on. Will work just fine.
Test by changing the input array; this will result in a nonmatch !!!!
The screendump of the command shell now will look as follows:
A 3layer network, going deeper
We have done the simple stuff, which should give you an idea of what neuron can do and some background on the math used.
We ve seen that the layers are represented by l0 and syn0 for layer 1, and l1 for output.
So basically if we add the following:
l1 and syn1 and l2
we would have a 3layer network with l1 as input layer, l1 as the socalled hidden layer and l2 as output.
A 3 layer network can solve more complex things because of the extra step in between.
First lets look at the picture  i like pictures: note that i have only drawn 4 neurons un the hidden layer and 4 in the output. The program will be with 32 neurons in the hiddenlayer and 5 in the output. The principle is the same.
Programwise the expansion from a two layer to a three layer network is not extremely difficult if we understand the 2 layer version. Its about adding the input arrays, and syn1 handling.
The program as shown below will have the following matrices during its runtime:
(Matrix is shown as row x column)
1. Input l0 (=x) : 6x5 matrix
2. Syn0 : 5x64
3. L1 layer is the result of (l0.syn0) : 6 x 64
4. Syn1 : 64 x 1
5. Output l2 is the result of (l1.syn1) = 6 x 1
Understanding this and getting it to run, its not very difficult to use a larger input matrix, e.g. 12 x 10 and see what happens.
But first the program with a 6 by 5 matrix
# 3 Layer Neural Network:
# with variable hidden layer size
import numpy as np
hiddenSize = 32
# sigmoid and derivative function
def nonlin(x,deriv=False):
if(deriv==True):
return x*(1x)
else:
return 1/(1+np.exp(x))
# input dataset format 6 rows x 5 columns
X = np.array([
\ [0,1,1,0,0],
\ [1,0,0,1,0],
\ [0,0,1,0,0],
\ [0,1,0,0,0],
\ [1,0,0,0,0],
\ [1,1,1,1,1]])
# output dataset 6 rows x 1 column
y = np.array([[0],[0],[0],[0],[1],[0]])
# seed random numbers
np.random.seed(1)
# randomly initialize our weights with mean 0
syn0 = 2*np.random.random((5,hiddenSize))  1
syn1 = 2*np.random.random((hiddenSize,1))  1
# now learn
for j in range(60000):
# Feed forward through layers 0, 1, and 2
l0 = X
l1 = nonlin(np.dot(l0,syn0))
l2 = nonlin(np.dot(l1,syn1))
# how much did we miss the target value?
l2_error = y  l2
# if (j% 10000) == 0:
# print ("Error:" + str(np.mean(np.abs(l2_error)))
# in what direction is the target value?
l2_delta = l2_error*nonlin(l2,deriv=True)
# how much did each l1 value contribute to the l2 error (according to the weights)?
l1_error = l2_delta.dot(syn1.T)
# in what direction is the target l1?
# were we really sure? if so, don't change too much.
l1_delta = l1_error * nonlin(l1,deriv=True)
syn1 += l1.T.dot(l2_delta)
syn0 += l0.T.dot(l1_delta)
# to use the learned weights separately we need to save l1 and l2
# (syn0 and syn1)
print ("input matrix is: ")
print (X)
print ("output after training: ")
print (l2)
print ("output needed was")
print (y)
The input matrix vaguely resembles a digit 2 (as far as possible with a 6x5 matrix), the output y to be achieved i have set as 00010; which would be binary 2; it can also be seen as the second bit from the right (i will come back to that issue later).
Now all the program has to do is to recognize the input matrix as a 2; lets see what the command shell says:
Hm .....
These numbers may seem a bit mysterious, but they are in scientific notation.
This is m*10 to the power of n. If n is a negative number you could actually say m / 10 to the power of n. A negative exponent means actual divided by.
So the larger n is , the smaller the number, QED.
One number is clearly a lot larger than the others by virtue of the number being to the power of e01; the second largest is e03.
Thats a lot smaller
So yes, eureka, indeed , the program has recognized the input quasi picture 2 and linked it to the number 2.
Now play around with different input matrices and output matrices and see where the program will fail. I would set the number of iterations at 60.000 or less (try), thats a lot faster.
You can also play with hiddenSize and see where the program dies.
The following screendump i did with a digit 6 (of sorts) and two different ways of output. The first one where the 6th bit corresponds with the input nr. 6. The second one with binary six (110) corresponduing wit the input number.
In general the method with one bit works best. You will also see that the quality of the input determines a correct output, try it.
In the case of this input matrix both versions yielded a correct output  need more study to research why and if this can be made consistent.
I will make a version with input matrix 10 x 8 and output 10x 1 , and do the full set of digits from 0 to 9, with a single bit out of 10 output., where the 0 (zero ) should give bit 10. Probably its best to raise the hiddenSize to 64. We'll see.
Following here come the results of a larger matrix inputting 10 digits. Is it consistent?
 time passes ....
Pausing for a while  time passes 14/8/2018
I will do some more thinking and continue this page asap.
Wanted to get it on the website
We have done the simple stuff, which should give you an idea of what neuron can do and some background on the math used.
We've done done a very simple 2layer network. BY ALL MEANS PLAY AROUND WITH IT, CHANGE ALL PARAMETERS (ALWAYS PROTECT YOUR WAY BACK WHEN IT GETS MESSY).
Following will be:
a two/three layer neural nework that can recognize some very very simple
I need to do some more work for myself as well, still learning.