Guide To Vectors
  • Introduction
  • Guide to this Book
  • API documentation
  • Python SDK Documentation
  • Learn about vectors
    • 🌱Introduction to vectors
      • 🌱Applications of vectors
        • 🌱Vectors for classification
      • 🌱Limitations of vectors
    • 🌱What is vector search?
      • 🌱How to vector search
      • 🌱How to build image to text search using code
      • 🧍Try vector search with playground!
      • 💻Vector search with code
    • 🌱Terminology Guide
  • Unlock Vector AI
    • 🔍Inserting Into Vector AI
      • 🧍Inserting with playground
      • 💻Inserting with API
        • 💻Inserting with API - encoding while inserting (recommended)
        • 💻Inserting with API - encoding before inserting
        • 💻Inserting with API - encoding after inserting
      • 🧍How to check insertion succeeded
    • 🔍Searching with Vector AI
      • 🌱How to search with the playground
      • 🌱Combining with traditional search
        • 🧍How to combine exact text search with vector search
        • 💻How to add exact text search to vector search
      • 🌱Personalisation with vector search
        • 💻Personalised search/recommendations with vector search
      • 🌱Chunk search
        • 💻How To Chunk Search
        • 💻How To Do MultiVector Chunk Search
        • 💻How to do multi step chunk search
      • 🧍How to diversify search results
    • 🔍Clustering
      • 🌱Clustering Vectors From Deep Learning models
    • 🔍Aggregation
      • 💻Writing Your First Aggregation
      • 💻Publishing Your First Aggregation
    • 🔍Experimentation
      • 🌱Vector Evaluation
        • 🌱Evaluate Vector Bias
    • 🔍Jobs
      • 💻Tagging Jobs
      • 💻Chunking Jobs
      • 💻Encoding Jobs
      • 🧍List all jobs (active and inactive)
    • 🔍Encoding
    • 🔍Maintenance & Monitoring
      • 🧍How to view your collections
      • 💻How to share your collections
      • 💻How to back up your collections
      • 💻How to change name of a collection field
      • 💻How to change the schema of a collection
      • 💻How to remove a field in a collection
      • 💻How to request a read API key
  • Tutorials
    • 💻How to turn data into Vectors (code)
      • 💻How to turn text into Vectors
      • 💻How to turn images Into Vectors
      • 💻How to turn audio into Vectors
    • 💻Image Search For Developers
    • 💻How To Combine Different Vectors For Search
    • 💻How To Combine Different Vectors With Exact Matching Text
    • 💻Semantic NLP search with FAISS and VectorHub
  • ABOUT
    • Credits
    • Philosophy
    • Glossary
Powered by GitBook
On this page

Was this helpful?

  1. Tutorials

Semantic NLP search with FAISS and VectorHub

PreviousHow To Combine Different Vectors With Exact Matching TextNextCredits

Last updated 4 years ago

Was this helpful?

Assumed Knowledge: Vectors Target Audience: Data scientists, Python developers Reading Time: 3 minutes

The following guide uses VectorHub and FAISS (by Facebook) to show an example of how to use vectors for search.

Step 0) Getting the right Python and requirements

Here, we use Python3.6/Python3.7. We have tested the code on Colab to ensure that this works even if you do not have your own Python installed. If you are interested in running the code in Colab, click .

Step 1) Encoding Data With Vectors

First, we install VectorHub to encode models easily. We install the encoders-text-tfhub extra requirement because we are interested in using VectorHub's Bert model. You can find more about the Bert model . Bert was a model released by Google that provides bi-directional encoding with attention layers that led to a significant improvement in NLP performance.

%%capture
!pip install vectorhub[encoders-text-tfhub]

Then, we want to instantiate our model and start encoding. VectorHub abstracts away the dependency requirements into simple installation steps like above and also uses the best model and default pooler strategies based on our own tests. You can read more about Bert2Vec on the VectorHub model card .

from vectorhub.encoders.text.tfhub import Bert2Vec
bert_enc = Bert2Vec()
# Words
words = [
    'How can I design my own post-graduate education?', 
    'How could water be produced on Mars?', 
    'How can I fall in love?', 
    'How can India improve in corruption?'
]
vectors = []
# This can be optimised using list comprehension but 
#we make it easier to read just for demo purposes
for word in words:
    vector = bert_enc.encode(word)
    vectors.append(vector)

Step 2) Building An Index

We then add our vectors and their associated words to the FAISS index. The FAISS index can be instantiated in a number of different ways. In this case, we instantiate it with the L2 index and then add the models. As they require numpy arrays for compatibility reasons, we convert them to compatible numpy arrays before inserting them into the index.

import numpy as np
import faiss
vector_length = len(vector)
index = faiss.IndexFlatL2(vector_length) # build the index using L2 as the distance
index.add(np.array(vectors).astype('float32')) # add vectors to the index

Step 3) Searching Our Index

Once you build the index, you encode the query vector. From the query vectors, we locate the closest vectors to the query vector.

num_of_results = 3 # Number of results 
# Search using
search_term = 'Building a better government'
query_vector = bert_enc.encode(search_term)
D, I = index.search(np.array([query_vector]).astype('float32'), num_of_results)
# Return the results in order
for i in range(k):
    print(words[I[0][i]])

Voila! You have built a very basic semantic search with FAISS. From here, you may add more to the index, build improved search or use your own datasets. FAISS search, however, is limited in its ability to provide support for more advanced search options (searching with filters, multi-vector search, personalised search). For these additional requirements (as well as online storage), we recommend reading which is our cloud-based vector search solution.

💻
here
here
here
Vector Search With Vector AI