About this project

This is a first try at building and training a Convolutional Neural Network (CNN) to classify images. As of today, using PyTorch and ChatGPT to do that becomes a one-afternoon project, the hardest part being (by far) to port this demo to a web app.

Try it online

Draw a smiley face (either happy or frowning)! You will see the predictions in real-time thanks to the ONNX Runtime Web API. The database used to train the model consists of images almost as big as the canvas: try drawing big smilies for better results. If you're experiencing odd results, it is a side effect of the model being trained only on my drawings (and thus highly biased towards my them).

😊
😔

Implementation

The model was trained on a dataset of about 550 images of smiley faces (half of which are happy, the other half frowning). The images were handdrawn using a small python script using pygame. The architecure of the model is inspired from a CNN known to give great results on the MNIST dataset:

  • First layer is a convolutional one with 1 channel (images are black and white).
  • Second layer is another convolutional layer with 32 output channels.
  • Third layer is a convolutional layer with 64 output channels.
  • Fourth layer is a fully connected layer with 128 neurons.
  • Fifth layer is the output layer with 2 neurons for classification.
  • Between layers, ReLU (Rectified Linear Unit) activation functions are applied for non-linearity.
  • Max pooling is used after the first convolutional layer with a kernel size of 2.
  • A dropout layer is added with a rate of 0.25 to prevent overfitting.
  • The second convolutional layer also has ReLU activation.
  • Before passing to the fully connected layers, the feature map is flattened.
  • After the first fully connected layer, another ReLU activation is applied.
  • Another dropout layer is used with a rate of 0.25.

To enhance the network's robustness, data augmentation techniques, such as random rotation, translation, and stretching of images, were employed. The model underwent training for 10 epochs with a batch size of 64. Cross-entropy loss served as the chosen loss function, and the Adam optimizer was utilized with a learning rate of 0.001. The model delivered impressive results with an accuracy of 99.4% on the test set, eliminating the need for further fine-tuning.

Source

The javascript code was adapted from this demo, replacing onnx.js with the more recent ONNX Runtime Web API and adding support for touch devices. Everything else (python code and database) is available on my github.