Perceptron were developed in the 1950s and 1960s by the scientist Frank Rosenblatt, inspired by earlier work by Warren McCulloch and Walter Pitts. It is the initial basic algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class.

### How do they Work.??

As shown in the below figure, Perceptons takes several inputs x1,x2,x3....xn and produces a single binary output.Then weights were introduced expressing importance of respective inputs to the outputs. The neuron outputs either 0 or 1 ,which is determined by the weighted sum ∑wjxj is less than or greater than threshold value.

In terms of algebric expression.

By varying weights and thresholds we can get different models of decision making .It is like weighing evidence to make decisions. What does it mean??.

For example, consider you are applying to a certain university and there are three factors(inputs) to consider to make decision. They are

Now we can assign binary values ie good climate x1=1 and for bad x1=0. Similarly, Senior extols the University x2=1 or not so good review X2=0 and the same goes for x3.Now coming to weights, you really like this university, however, only thing that stops you is weather as you abhor bad weather. Assigning weights w1=6, w2=2,w3=1 means w1 ie climate has more say in your decision making. Suppose your threshold value is ,say 5 ,so when climate is bad(x1=0) no matter what other inputs are, output will be 0 conversely, if climate is suitable(x1=1), output will be 1.Thus by varying

Thus by varying weights and thresholds, decisions can be changed.

### Mutltilayer Perceptrons.

The first column is known as first layer of perceptrons -here there are 3 nodes which make three simple decisions based on wieghts and inputs. Second layer is Hidden layer which makes complex decision -its inputs are the outputs from first layer. As the layers increases, decision becomes complex

In terms of algebric

here instead of threshold , bias is added. Bias is just a constant value (or a constant vector) applied to the input and weight product. For the offset of the result, bias is used.

### Why bias is used

Consider a simple example: You have a feed forward perceptron with 2 input nodes x1 and x2, and 1 output node y. x1 and x2 are binary features, x1=x2=0. Multiply x1 and x2 by whatever weights you like, w1 and w2, sum the products and pass it through whatever activation function you prefer. Without a bias node, only one output value is possible, which may yield a very poor fit. O matter what weights, node will not fire

Without bias diagram

With bais diagram(shifting of function to fit data better)

*output = activation_function(dot_product(weights, inputs) + bias)*

Bias value allows the activation function to be shifted to the left or right, to better fit the data. Hence changes to the weights alter the steepness of the sigmoid curve, whilst the bias offsets it, shifting the entire curve so it fits better. Note also how the bias only influences the output values, it doesn’t interact with the actual input data.

Think it in terms of linear regression. without bias, graph is formed like this. Due to absence of bias, model will train over point passing through origin only. With the introduction of bias , model will become more flexible

Reference

Next Up. Sigmoid Function.