Edocti
Advanced Technical Training for the Software Engineer of Tomorrow
Edocti Training

Practical Vector Embeddings & Database Integration

Intermediate
7 h
4.9 (42 reviews)

Scheduled sessions

No sessions are available at the moment.
Practical Vector Embeddings & Database Integration

Modern AI Data bootcamp: Move beyond basic relational queries and unlock the power of semantic search using vector embeddings.

Learn how to map text, images, and complex data into high-dimensional vector spaces using modern embedding models (OpenAI, HuggingFace).

Master pgvector: Transform PostgreSQL into a highly efficient vector database. Understand indexing strategies (IVFFlat, HNSW) to balance query speed and recall accuracy.

Gain practical experience via ~70% hands-on labs, building a production-ready semantic search engine from scratch.

How this helps: Essential for building RAG (Retrieval-Augmented Generation) systems, recommendation engines, and advanced search features without relying on expensive managed vector DBs.

Who it’s for: Software Engineers, Data Engineers, and Database Administrators looking to integrate AI capabilities into their existing PostgreSQL infrastructure.

Skills You Will Learn

Vector Mathematics Embedding Generation (Python) PostgreSQL & pgvector Cosine Similarity & L2 Distance HNSW & IVFFlat Indexing Hybrid Search (Semantic + Text) RAG Data Ingestion pipelines

Curriculum

Demystifying Vector Embeddings

  • What are embeddings? The transition from keyword search (BM25) to semantic search
  • High-dimensional vector spaces and distance metrics (Cosine Similarity, L2 distance, Inner Product)
  • Generating embeddings in Python: Using OpenAI APIs vs. local open-source models (SentenceTransformers/HuggingFace)
  • Mini-lab: Generating and comparing embeddings for text similarity in memory

Introducing pgvector and PostgreSQL Integration

  • Why use PostgreSQL for vectors? ACID compliance + vector search
  • Installing and configuring the pgvector extension via Docker
  • Defining vector columns, inserting high-dimensional data, and basic exact nearest neighbor (k-NN) queries
  • Lab: Building a basic semantic search engine over a product catalog

Approximate Nearest Neighbor (ANN) Indexing

  • The scaling problem: Why exact k-NN is too slow for production
  • IVFFlat (Inverted File Flat) index: Concepts, building, and parameter tuning (lists, probes)
  • HNSW (Hierarchical Navigable Small World) index: The state-of-the-art for speed and recall
  • Lab: Benchmarking IVFFlat vs HNSW on a large dataset (speed vs. accuracy trade-offs)

Building a Complete RAG Retriever Pipeline

  • Chunking strategies for long documents (Token splitters, semantic chunking)
  • Hybrid Search: Combining Full-Text Search (tsvector) with Semantic Search (pgvector) for superior results
  • Handling metadata filtering (e.g., semantic search within a specific date range or category)
  • Lab: End-to-end integration – From PDF ingestion to a functioning hybrid search API

Optional modules

Optional — Image and Multimodal Embeddings

  • Introduction to CLIP (Contrastive Language-Image Pretraining)
  • Generating image embeddings and querying them via pgvector
  • Building a reverse image search engine

Course Day Structure

  • Part 1: Concepts & Generation: 09:00–10:30
  • Break: 10:30–10:45
  • Part 2: DB Integration: 10:45–12:15
  • Lunch break: 12:15–13:15
  • Part 3: Indexing & Tuning: 13:15–15:15
  • Break: 15:15–15:30
  • Part 4: Real-world Lab: 15:30–17:30

Want to find out more? We are here to help!

Or email us directly at training@edocti.com.