Guide To Vectors
  • Introduction
  • Guide to this Book
  • API documentation
  • Python SDK Documentation
  • Learn about vectors
    • 🌱Introduction to vectors
      • 🌱Applications of vectors
        • 🌱Vectors for classification
      • 🌱Limitations of vectors
    • 🌱What is vector search?
      • 🌱How to vector search
      • 🌱How to build image to text search using code
      • 🧍Try vector search with playground!
      • 💻Vector search with code
    • 🌱Terminology Guide
  • Unlock Vector AI
    • 🔍Inserting Into Vector AI
      • 🧍Inserting with playground
      • 💻Inserting with API
        • 💻Inserting with API - encoding while inserting (recommended)
        • 💻Inserting with API - encoding before inserting
        • 💻Inserting with API - encoding after inserting
      • 🧍How to check insertion succeeded
    • 🔍Searching with Vector AI
      • 🌱How to search with the playground
      • 🌱Combining with traditional search
        • 🧍How to combine exact text search with vector search
        • 💻How to add exact text search to vector search
      • 🌱Personalisation with vector search
        • 💻Personalised search/recommendations with vector search
      • 🌱Chunk search
        • 💻How To Chunk Search
        • 💻How To Do MultiVector Chunk Search
        • 💻How to do multi step chunk search
      • 🧍How to diversify search results
    • 🔍Clustering
      • 🌱Clustering Vectors From Deep Learning models
    • 🔍Aggregation
      • 💻Writing Your First Aggregation
      • 💻Publishing Your First Aggregation
    • 🔍Experimentation
      • 🌱Vector Evaluation
        • 🌱Evaluate Vector Bias
    • 🔍Jobs
      • 💻Tagging Jobs
      • 💻Chunking Jobs
      • 💻Encoding Jobs
      • 🧍List all jobs (active and inactive)
    • 🔍Encoding
    • 🔍Maintenance & Monitoring
      • 🧍How to view your collections
      • 💻How to share your collections
      • 💻How to back up your collections
      • 💻How to change name of a collection field
      • 💻How to change the schema of a collection
      • 💻How to remove a field in a collection
      • 💻How to request a read API key
  • Tutorials
    • 💻How to turn data into Vectors (code)
      • 💻How to turn text into Vectors
      • 💻How to turn images Into Vectors
      • 💻How to turn audio into Vectors
    • 💻Image Search For Developers
    • 💻How To Combine Different Vectors For Search
    • 💻How To Combine Different Vectors With Exact Matching Text
    • 💻Semantic NLP search with FAISS and VectorHub
  • ABOUT
    • Credits
    • Philosophy
    • Glossary
Powered by GitBook
On this page

Was this helpful?

  1. Tutorials
  2. How to turn data into Vectors (code)

How to turn text into Vectors

An introduction to turning data into vectors.

PreviousHow to turn data into Vectors (code)NextHow to turn images Into Vectors

Last updated 4 years ago

Was this helpful?

Assumed Knowledge: Vectors Target Audience: Python developers, general developers Reading Time: 3 minutes

To help transform data into vectors, we open-sourced a library called VectorHub (you can explore the hub at hub.vctr.ai). For this, you will need to use Python, and you can run all of the below on Colab.

The library can be installed via pip:

$ pip install vectorhub[encoders-text-tfhub]

Once you install via pip, you can then use a model in Python. For example:

from vectorhub.encoders.text.tfhub import USE2Vec
enc = USE2Vec()
text= "How do you encode data?"
vector = enc.encode(image_url)

From this - you will have obtained a vector which can now be indexed and stored away for search. If you are interested in reading what is occurring under the hood or to write your own library for this - take a look below.

What is occurring under the hood?

We vectorise a sentence by firstly tokenizing text into separate subwords (these are called tokens), each token is mapped to a separate vector which is then fed through the model. Note: models are not necessarily trained for the best vectors and representation space and specific models will need to be identified for different use cases. If there is a use case you would like, feel free to message us in our .

💻
💻
Discord
Pipeline for text2vec