πŸ’»How to turn text into Vectors

An introduction to turning data into vectors.

Assumed Knowledge: Vectors Target Audience: Python developers, general developers Reading Time: 3 minutes

To help transform data into vectors, we open-sourced a library called VectorHub (you can explore the hub at hub.vctr.ai). For this, you will need to use Python, and you can run all of the below on Colab.

The library can be installed via pip:

$ pip install vectorhub[encoders-text-tfhub]

Once you install via pip, you can then use a model in Python. For example:

from vectorhub.encoders.text.tfhub import USE2Vec
enc = USE2Vec()
text= "How do you encode data?"
vector = enc.encode(image_url)

From this - you will have obtained a vector which can now be indexed and stored away for search. If you are interested in reading what is occurring under the hood or to write your own library for this - take a look below.

What is occurring under the hood?

We vectorise a sentence by firstly tokenizing text into separate subwords (these are called tokens), each token is mapped to a separate vector which is then fed through the model. Note: models are not necessarily trained for the best vectors and representation space and specific models will need to be identified for different use cases. If there is a use case you would like, feel free to message us in our Discord.

Last updated