Text Retrieval with Embeddings

Course Description

This course offers an in-depth exploration of text retrieval using text embeddings, a combination often referred to as semantic search. It covers the theory behind text embeddings, the process of generating embeddings with both traditional and state-of-the-art models, and how to leverage these embeddings to build a text retrieval system. Additionally, it explores how to scale these systems using high-performance vector databases.

Lessons

Introduction to Text Retrieval with Embeddings

theory

A brief introduction to the course and the problem space of text retrieval. We'll define the problem, its history of solutions, and why it's so important to the field of NLP.

Introduction

What is text retrieval?

Why is text retrieval important?

A Brief History of Text Retrieval Methods

A Primer on Embeddings and Semantic Search

theory

A deep-dive into the intuition and theory behind text embeddings. We'll take a look at how embeddings work, the original text embedding models, and how to build semantic search pipelines.

Introduction

A Short Introduction to Text Embeddings

Understanding Text Embeddings

Similarity Scoring

Meet the Embedding Models

Semantic Search

Generating Contextual Embeddings with BERT

tooling

project

A guide to generating powerful contextual text embeddings for semantic search using the breakthrough BERT model.

Introduction

Project Definition: Wikipedia Search

Introduction to BERT

Assembling BERT

Wikipedia Semantic Search

Improving Search with OpenAI's Embedding API

tooling

project

A primer on how to deploy OpenAI's Embedding API to generate state-of-the-art embeddings for text retrieval and improve the performance of semantic search pipelines.

Introduction

Project Definition: Paul Graham Essay Search

Meet OpenAI's Embedding Models

The OpenAI Embedding API

Building the Paul Graham Essay Corpus

Essay Search with OpenAI

Scaling Up Search with Vector Databases

tooling

project

An in-depth introduction to using vector databases like ChromaDB to scale up semantic search pipelines.

Introduction

Project Definition: Upgrading Paul Graham Essay Search

Introduction to Vector Databases

Meet ChromaDB

Upgrading Essay Search