Edocti Training

Practical Vector Embeddings & Database Integration

Name: Practical Vector Embeddings & Database Integration
Rating: 4.9 (42 reviews)

Intermediate

7 h

4.9 (42 reviews)

Scheduled sessions

No sessions are available at the moment.

Let me know when the next session opens

Practical Vector Embeddings & Database Integration

Modern AI Data bootcamp: Move beyond basic relational queries and unlock the power of semantic search using vector embeddings.

Learn how to map text, images, and complex data into high-dimensional vector spaces using modern embedding models (OpenAI, HuggingFace).

Master pgvector: Transform PostgreSQL into a highly efficient vector database. Understand indexing strategies (IVFFlat, HNSW) to balance query speed and recall accuracy.

Gain practical experience via ~70% hands-on labs, building a production-ready semantic search engine from scratch.

How this helps: Essential for building RAG (Retrieval-Augmented Generation) systems, recommendation engines, and advanced search features without relying on expensive managed vector DBs.

Who it’s for: Software Engineers, Data Engineers, and Database Administrators looking to integrate AI capabilities into their existing PostgreSQL infrastructure.

Skills You Will Learn

Vector Mathematics Embedding Generation (Python) PostgreSQL & pgvector Cosine Similarity & L2 Distance HNSW & IVFFlat Indexing Hybrid Search (Semantic + Text) RAG Data Ingestion pipelines

Curriculum

Demystifying Vector Embeddings

What are embeddings? The transition from keyword search (BM25) to semantic search
High-dimensional vector spaces and distance metrics (Cosine Similarity, L2 distance, Inner Product)
Generating embeddings in Python: Using OpenAI APIs vs. local open-source models (SentenceTransformers/HuggingFace)
Mini-lab: Generating and comparing embeddings for text similarity in memory

Introducing pgvector and PostgreSQL Integration

Why use PostgreSQL for vectors? ACID compliance + vector search
Installing and configuring the pgvector extension via Docker
Defining vector columns, inserting high-dimensional data, and basic exact nearest neighbor (k-NN) queries
Lab: Building a basic semantic search engine over a product catalog

Approximate Nearest Neighbor (ANN) Indexing

The scaling problem: Why exact k-NN is too slow for production
IVFFlat (Inverted File Flat) index: Concepts, building, and parameter tuning (lists, probes)
HNSW (Hierarchical Navigable Small World) index: The state-of-the-art for speed and recall
Lab: Benchmarking IVFFlat vs HNSW on a large dataset (speed vs. accuracy trade-offs)

Building a Complete RAG Retriever Pipeline

Chunking strategies for long documents (Token splitters, semantic chunking)
Hybrid Search: Combining Full-Text Search (tsvector) with Semantic Search (pgvector) for superior results
Handling metadata filtering (e.g., semantic search within a specific date range or category)
Lab: End-to-end integration – From PDF ingestion to a functioning hybrid search API

Optional modules

Optional — Image and Multimodal Embeddings

Introduction to CLIP (Contrastive Language-Image Pretraining)
Generating image embeddings and querying them via pgvector
Building a reverse image search engine

Course Day Structure

Part 1: Concepts & Generation: 09:00–10:30
Break: 10:30–10:45
Part 2: DB Integration: 10:45–12:15
Lunch break: 12:15–13:15
Part 3: Indexing & Tuning: 13:15–15:15
Break: 15:15–15:30
Part 4: Real-world Lab: 15:30–17:30

Why Edocti?

The trainers: the most obvious reason. We love what we do and share the knowledge we build in day-to-day practice.
Relevant content: customized to the engineering team’s real, day-to-day needs.
Hands-on first: all our courses are practical. We don’t believe in "slide courses". Our programs are roughly 70% hands-on and 30% focused theory.
Edocti has been working on Automotive projects since 2016.
Our trainers have 11+ years of Automotive experience.
Autonomous Driving and ADAS projects for Volvo and General Motors.
V2X projects for GM and VW.
Strong collaboration with Tier-1 companies for technical training and architecture.

Who Should Attend

Software Engineers, Backend Developers, and Data Engineers who want to understand vector mathematics, generate embeddings, and build fast, scalable semantic search systems using PostgreSQL and pgvector.

Required Infrastructure

Laptop (Windows/macOS/Linux) with Docker installed. We provide ready-to-run Docker Compose setups including PostgreSQL with the pgvector extension and Jupyter Notebooks for Python labs.