Scaling Up Search with Vector Databases
4

In the previous lesson, we built a semantic search pipeline to search through Paul Graham's collection of online essays. We used OpenAI's Embedding API to embed each paragraph of each essay and then performed an exhaustive search to find the top 5 most similar paragraphs to a given query. However, we concluded by noting that (a) we are not persisting our embedding anywhere and (b) our search would not scale well to millions of documents. In this lesson, we are going to address these limitations.