Open Engineer

Introduction

In the previous lesson, A Primer on Text Embeddings, we learned all about the theory behind word and text embeddings, similarity scoring, principles of semantic search, and two of the earliest text embedding models, word2vec and doc2vec.

In recent years, however, word2vec and doc2vec have both been superseded by far more capable, transformer-based models. In this lesson, we'll learn about one of the most important transformer-based embedding models, called BERT, and how to implement it using the Hugging Face transformers library. We'll then put this model to use by building a semantic search pipeline on a custom Wikipedia dataset.