Vectors for classification
Vectors are re-framing how we are approaching traditional deep learning problems.
Last updated
Vectors are re-framing how we are approaching traditional deep learning problems.
Last updated
Required Knowledge: Vectors, Encoding, Classification problems Audience: Data scientists, Vector enthusiasts, Statisticians, Machine learning engineers Reading time: 5 minutes
Defining Multi-Classification
Classification refers to when a model is used to predict 2 or multiple labels (more than 2). As an example, this could be when given an animal image, the individual is required to label the category that the animal belongs to from a list of pre-defined categories.
Traditional Classification
In the classification example above, the image is read and fed through a neural network. The neural network, trained on predicting whether an image is a dog/cat/rabbit/emu is then given a probability that it can belong in each class.
Vectors reframe traditional classification into a vector search problem
Let us reframe our example of image classification (labeling an image based on the given captions to identify the best category) using vector search. Instead of predicting the most likely label using a neural network (which is how it was previously done), the labels were, instead, encoded using a deep learning model. The images were also encoded and then a vector search was performed on the projections of these encodings to identify the most similar images to the labels.
Let us now quickly compare the advantages and disadvantages of each approach.
Advantages and Disadvantages of Vector Similarity Approach
There are several advantages to this approach:
Resolves the cold-start issue (in traditional approaches, classification neural networks would have to be re-trained in order to adapt to new categories)
Reduced cost of data science experiments - using excellent out-of-the-box vectors/similarity search that resolves this issue means you can then reduce the cost of initial data science experiments and bring data to value quickly
Key Disadvantage:
If you require vectors to fit well on pre-defined, it requires more data science expertise to finetune these vectors compared to traditional multi-classification approaches.