CNN Object-Recognition Notebook: Findings

June 21, 2026 · 2 min read

Full Stack Engineer

This post summarises my CNN object-recognition notebook, including the model setup, training results, test prediction and key limitations of using CIFAR-10 for image classification.

Google Colab notebook: Open in Google Colab

The notebook trains a convolutional neural network on CIFAR-10, a balanced dataset of 32×32 colour images across ten classes. I retained the original architecture: two convolution-and-pooling blocks, a 256-unit dense layer and a ten-class softmax output. The model has 225,610 trainable parameters. The data was divided into 40,000 training images, 10,000 validation images and 10,000 test images, with pixel values normalised to float32 values between zero and one.

Early stopping ended training after seven epochs. Training accuracy rose to 73.25%, while validation accuracy reached 63.61%. Validation loss was lowest at epoch five (1.0692) and then increased for two epochs, even as training performance continued to improve. This divergence suggests overfitting. On the unseen test set, the model achieved 63.58% accuracy with a loss of 1.1105.

For the required single-image experiment, I changed the selected test image from index 16 to index 12. The true class was dog, and the model also predicted dog, so the prediction was correct. However, its confidence was only 34.59%. This is a useful reminder that the highest-scoring class can still represent substantial uncertainty.

Performance also varied by class. Automobile recall was 0.80, whereas cat precision was only 0.39. The confusion matrix showed considerable overlap between visually similar animal classes: 266 dogs were classified as cats, while 163 cats were classified as dogs. CIFAR-10 is useful for comparing algorithms because it is small, balanced and quick to train, but its low resolution removes detail and its ten broad categories do not represent the variety of real deployment data. A correct result on one image therefore cannot establish reliability. Data augmentation, regularisation, restoring the best validation weights and testing on a more representative dataset would provide a stronger evaluation.