How to turn text into Vectors
An introduction to turning data into vectors.
Last updated
An introduction to turning data into vectors.
Last updated
Assumed Knowledge: Vectors Target Audience: Python developers, general developers Reading Time: 3 minutes
To help transform data into vectors, we open-sourced a library called VectorHub (you can explore the hub at hub.vctr.ai). For this, you will need to use Python, and you can run all of the below on Colab.
The library can be installed via pip:
Once you install via pip, you can then use a model in Python. For example:
From this - you will have obtained a vector which can now be indexed and stored away for search. If you are interested in reading what is occurring under the hood or to write your own library for this - take a look below.
What is occurring under the hood?
We vectorise a sentence by firstly tokenizing text into separate subwords (these are called tokens), each token is mapped to a separate vector which is then fed through the model. Note: models are not necessarily trained for the best vectors and representation space and specific models will need to be identified for different use cases. If there is a use case you would like, feel free to message us in our Discord.