💻Semantic NLP search with FAISS and VectorHub
Assumed Knowledge: Vectors Target Audience: Data scientists, Python developers Reading Time: 3 minutes
The following guide uses VectorHub and FAISS (by Facebook) to show an example of how to use vectors for search.
Step 0) Getting the right Python and requirements
Here, we use Python3.6/Python3.7. We have tested the code on Colab to ensure that this works even if you do not have your own Python installed. If you are interested in running the code in Colab, click here.
Step 1) Encoding Data With Vectors
First, we install VectorHub to encode models easily. We install the encoders-text-tfhub
extra requirement because we are interested in using VectorHub's Bert model. You can find more about the Bert model here. Bert was a model released by Google that provides bi-directional encoding with attention layers that led to a significant improvement in NLP performance.
Then, we want to instantiate our model and start encoding. VectorHub abstracts away the dependency requirements into simple installation steps like above and also uses the best model and default pooler strategies based on our own tests. You can read more about Bert2Vec on the VectorHub model card here.
Step 2) Building An Index
We then add our vectors and their associated words to the FAISS index. The FAISS index can be instantiated in a number of different ways. In this case, we instantiate it with the L2 index and then add the models. As they require numpy arrays for compatibility reasons, we convert them to compatible numpy arrays before inserting them into the index.
Step 3) Searching Our Index
Once you build the index, you encode the query vector. From the query vectors, we locate the closest vectors to the query vector.
Voila! You have built a very basic semantic search with FAISS. From here, you may add more to the index, build improved search or use your own datasets. FAISS search, however, is limited in its ability to provide support for more advanced search options (searching with filters, multi-vector search, personalised search). For these additional requirements (as well as online storage), we recommend reading Vector Search With Vector AI which is our cloud-based vector search solution.
Last updated