About this project
This is a first try at building and training a Convolutional Neural Network (CNN) to classify images. As of today, using PyTorch and ChatGPT to do that becomes a one-afternoon project, the hardest part being (by far) to port this demo to a web app.
Try it online
Draw a smiley face (either happy or frowning)! You will see the predictions in real-time thanks to the ONNX Runtime Web API. The database used to train the model consists of images almost as big as the canvas: try drawing big smilies for better results. If you're experiencing odd results, it is a side effect of the model being trained only on my drawings (and thus highly biased towards my them).
Implementation
The model was trained on a dataset of about 550 images of smiley faces (half of which are happy, the other
half frowning). The images were handdrawn using a small python script using pygame. The architecure of the
model is inspired from a CNN known to give great results on the MNIST dataset:
- First layer is a convolutional one with 1 channel (images are black and white).
- Second layer is another convolutional layer with 32 output channels.
- Third layer is a convolutional layer with 64 output channels.
- Fourth layer is a fully connected layer with 128 neurons.
- Fifth layer is the output layer with 2 neurons for classification.
- Between layers, ReLU (Rectified Linear Unit) activation functions are applied for non-linearity.
- Max pooling is used after the first convolutional layer with a kernel size of 2.
- A dropout layer is added with a rate of 0.25 to prevent overfitting.
- The second convolutional layer also has ReLU activation.
- Before passing to the fully connected layers, the feature map is flattened.
- After the first fully connected layer, another ReLU activation is applied.
- Another dropout layer is used with a rate of 0.25.
To enhance the network's robustness, data augmentation techniques, such as random rotation, translation, and stretching of images, were employed. The model underwent training for 10 epochs with a batch size of 64. Cross-entropy loss served as the chosen loss function, and the Adam optimizer was utilized with a learning rate of 0.001. The model delivered impressive results with an accuracy of 99.4% on the test set, eliminating the need for further fine-tuning.