πŸ’»How to turn data into Vectors (code)

A guide to turning data into vectors with VectorHub.

Assumed Knowledge: Vectors Target Audience: Python developers, General developers Reading Time: 3 minutes

The process of turning structured/unstructured data (in the form of Excel Spreadsheets, videos, images, word documents, PDFs) into vectors can involve quite complicated pipelines.

To help transform data into vectors, we open-sourced a library called VectorHub (you can explore the hub at hub.vctr.ai). For this, you will need to use Python 3 (tested on Python 3.6/Python 3.7).

The library can be installed via pip:

$ pip install vectorhub

Once you install via pip, you can then use a model in Python. For example:

$ from vectorhub.encoders.text import ViText2Vec

You can easily instantiate the model using the below.

from vectorai import request_api_key
username = input("What is your username")
email = input("What is your email?")
api_key = request_api_key(username, email, description="Trying out VectorHub.")
vi = ViText2Vec(username, api_key)

Transforming your data into vectors is as simple as the following:

text = "My dog loves taking long walks on the beach!"
vector = vi.encode(text)
# Voila, you have your vector!

What is happening under the hood in VectorHub?

In this library, VectorHub abstracts away a few complexities to make the encoding smooth. There is, however, still a lot of room for customisation.

Quickly going over the diagram, as data is parsed through VectorHub, it is converted into a NumPy array. This array is fed through the model, pooled together and is then transformed into a native Python object.

For each of the different data types, we have sections on how each of them are converted into vectors so users can understand what is occurring in each of the processes.

These pages can be found below if you are interested:

Last updated