Skip to main content

Perceptron Notebook Reflections

· 5 min read
Ross Bulat
Full Stack Engineer

This post briefly reflects on three neural network notebooks: a simple perceptron, a perceptron trained for the AND operator, and a multi-layer perceptron for a more complex non-linear problem.

Simple perceptron

Google Colab notebook: Open in Google Colab

The first notebook introduces the basic structure of a perceptron using NumPy arrays. The inputs are represented as an array, the weights are represented as another array, and the weighted sum is calculated using the dot product.

The main process is:

  1. Define input values.
  2. Define weights.
  3. Multiply inputs by weights using a dot product.
  4. Pass the result into a step function.
  5. Return either 1 or 0 depending on whether the weighted sum reaches the threshold.

This shows the core idea behind a perceptron: it converts numeric inputs into a binary decision. In the notebook, changing the weights changes the final output, which demonstrates that the model's behaviour depends directly on the chosen parameters.

The dataset here is deliberately simple, where only one input vector is used. This makes the calculation easy to understand, but it also shows a limitation. A single perceptron calculation is merely one step in training real machine learning use cases.

Perceptron AND operator

Google Colab notebook: Open in Google Colab

The second notebook extends the idea by using the full truth table for the AND operator:

Input 1Input 2Output
000
010
100
111

Here, the inputs are stored as a matrix, the outputs are stored as a target array, and the weights begin at zero. The notebook then defines a training loop. For each row, the perceptron predicts an output, compares it with the expected output, calculates the error, and updates the weights when the prediction is wrong.

This is a clearer machine learning example because the model is not just given fixed weights. It improves through repeated training until the total error becomes zero. After training, the final weights can correctly classify all four AND cases.

The important point is that AND is linearly separable. A single perceptron can solve it because one straight decision boundary can separate the positive case (1, 1) from the other three cases. This dataset is useful for learning, but it is also very small and artificial. It does not include noise, ambiguity, missing values, or overlapping classes.

Multi-layer perceptron

Google Colab notebook: Open in Google Colab

The third notebook introduces a multi-layer perceptron. This is needed because the target pattern is more complex: the outputs match the XOR operator, where (0, 1) and (1, 0) should return 1, while (0, 0) and (1, 1) should return 0.

A single perceptron cannot solve XOR because the classes are not linearly separable. The notebook addresses this by adding a hidden layer between the input layer and the output layer. It also replaces the step function with the sigmoid activation function, which produces values between 0 and 1.

The training process is more involved:

  1. Send the inputs through the first set of weights into the hidden layer.
  2. Apply the sigmoid function to the hidden-layer values.
  3. Send the hidden-layer outputs through a second set of weights into the output layer.
  4. Compare the predicted outputs with the target outputs.
  5. Use the sigmoid derivative to calculate error signals.
  6. Update both sets of weights through backpropagation.
  7. Repeat this process for many epochs until the error becomes very small.

This notebook shows why more complex datasets require more complex models. The hidden layer allows the network to learn an internal representation of the data rather than relying on a single straight boundary. The trade-off is that training becomes more computationally expensive and depends on choices such as random weight initialisation, learning rate, number of epochs, and activation function.

Learning reflection

A simple weighted sum can demonstrate the mechanics of a perceptron, but it does not learn. The AND dataset can be learned by a single perceptron because the classes are linearly separable. The XOR-style dataset requires a multi-layer network because the pattern cannot be separated by one linear boundary.

This helps explain the applicability and challenges of machine learning algorithms. Simple datasets can make algorithms look very effective, but real-world datasets may contain noise, missing values, non-linear relationships, class imbalance, and ambiguous boundaries. As the dataset becomes more complex, the model may need more capacity, more training time, and more careful tuning.

The key takeaway is that neural networks build from simple ideas: inputs, weights, activations, errors, and weight updates. The notebooks show this progression, moving from a fixed perceptron calculation to a trainable perceptron, and finally to a multi-layer network that can learn a non-linear pattern.