Normally, I would share photos of our DLPy team in the recording studio. However, COVID-19 has made the decision for us to record our latest DLPy video from home.
There are times when it is necessary to understand how similar images are to one another based on their features. In our example, we distinguish cats from birds. This is an easy task for us humans, but much more difficult for computer vision. How do we get started with this type of classification task?
We are going to use an embedding model. An embedding model is a way to reduce the dimensionality of input data, such as images. Consider this to be a type of data preparation applied to image analysis. When an embedding model is used, input images are converted into low-dimensional vectors that can be more easily used by other computer vision tasks. The key to good embedding is to train the model so that similar images are converted to similar vectors.
An embedding model is composed of two components: a feature extractor network and the embedding layer. We choose to use ResNet18 as the feature extractor. This is built into the DLPy API. We built the model using the EmbeddingModel API, passing as arguments the feature extractor, the embedding layer, and the type of embedding model.
We chose to use a triplet network. A triplet network has three identical copies of the embedding model. These three parallel networks enable three images to be embedded simultaneously. The input to the first network contains anchor examples (for reference), the second contains positive examples (images similar to anchors), and the third contains negative examples (images dissimilar to anchors). Then, the embedding loss layer collects these three embeddings and compares the results. Updates are made to the weights so that similar images are moved closer together in the embedding and dissimilar images are pushed apart. This means that the image embedding should place the bird embeddings near other bird embeddings and the cat embeddings near other cat embeddings.
The output of the embedding layer can be further passed on to other machine learning techniques such as clustering, k nearest-neighbor analysis, etc.
The video below provides a walkthrough of creating an image embedding model with DLPy by training a triplet network and then using that model for machine learning via k-means clustering to separate the two classes of cats and birds.
In case you missed them, here are the previous blogs with videos on DLPy: