RAG Prototype - Part 1

RAG Prototype - Part 1
A robot holding a schematic for a robot dog with the title "Augmented Retriever"

A peer recently inspired me to see how quickly I could build a RAG (Retrieval- Augmented Generation) system to implement a standard LLM chat interface with responses limited to content from a collection of private technical documents.

As luck would have it, an older TalkPython podcast had introduced me to the existence of Gradio, an easy to use library focused on making ML-centric web apps. Combining that with the PostgreSQL extension PGVector would allow me to quickly build a prototype using my favorite database platform.

A secondary goal was to explore the limitations of a pure RAG based approach. Various ML forums include complaints that RAG really stumbles when there's no nicely self-contained answer to the user's question. It also struggles if the user's request doesn't include adequate unique keywords, the returned context is abbreviated, or the LLM is working on a sufficiently complex problem that it will need to perform multiple inquiries.

Initial Technology List

The plan was to first build this in my local lab, then formalize a GitHub or GitLab CI/CD system to regenerate the backend DB whenever new source material was published. Ideally, this is subsequently shared via a free tier on Hugging Face or Google Colab that would allow me to utilize it on the road for non-sensitive sources of data.

  • UI
    • Gradio
  • Data storage and search
  • Models
    • Original build:
      • text-embedding-qwen3-embedding-8b for embedding (4.7 GB)
      • qwen3.6-35b-a3b for chat (22.1 GB)
    • Post-optimization pass to improve accuracy and reduce GPU requirements:
      • text-embedding-granite-embedding-107m-multilingual for embedding (121 MB)
      • granite-4.1-30b for chat (17.5GB)
    • CI/CD
      • GitHub - I'm rusty on GitHub, and this was a good opportunity to refresh my knowledge on it. After being spoiled using the paid version of GitLab at two prior employers, I was pleasantly surprised to see how much progress had been made in GitHub workflows.
        • The merge request process also feels way more intuitive now, my recollection was that it used to require alternating between the web UI and the git cli to make sure you properly understood the expected outcome of the merge.

Database Setup

As a general rule, I assume that whatever I build will end up being used in ways that I can't conceive of, and will have a longer lifespan than originally planned. Therefore, dependencies need to be from a reasonably trustworthy source to minimize my maintenance effort and keep vulnerabilities to a bare minimum.

As I'm a fan of containers, I located a pre-built docker image with PostgresSQL + the pgvector extension pre-installed, direct from the author of the extension.

https://github.com/pgvector
https://hub.docker.com/r/pgvector/pgvector

If we review the layers of the container, we can verify how it was constructed. jq is used here to filter the JSON formatted output of docker history for the literal commands that were executed during the docker image build.

# Grab the image we want to use
docker pull pgvector/pgvector:pg18-trixie

# Also grab the official postgres image so we can compare it
docker pull postgres:18-trixie

# Export each layer's "CreatedBy" field into a text file
docker history pgvector/pgvector:pg18-trixie --no-trunc --format json |jq .CreatedBy > vector.txt

docker history postgres:18-trixie --no-trunc --format json |jq .CreatedBy > postgres.txt

# Compare the vector layers against the postgres layers
diff vector.txt postgres.txt

Commands to compare docker images

We can see vector's layers are built identically to the vanilla PostgreSQL layers, with the exception of 3 added layers (lines 1-3). The other change is an unimportant creation timestamp for the first layer in each image.

1,3d0

< "RUN |1 PG_MAJOR=18 /bin/sh -c apt-get update &&   
  apt-mark hold locales &&   
  apt-get install -y --no-install-recommends build-essential postgresql-server-dev-$PG_MAJOR &&
  cd /tmp/pgvector &&
  make clean &&
  make OPTFLAGS=\"\" &&
  make install &&
  mkdir /usr/share/doc/pgvector &&
  cp LICENSE README.md /usr/share/doc/pgvector &&
  rm -r /tmp/pgvector &&
  apt-get remove -y build-essential postgresql-server-dev-$PG_MAJOR &&
  apt-get autoremove -y &&
  apt-mark unhold locales &&
  rm -rf /var/lib/apt/lists/* # buildkit"

< "ADD https://github.com/pgvector/pgvector.git#v0.8.2 /tmp/pgvector # buildkit"

< "ARG PG_MAJOR=18"

27c24
< "# debian.sh --arch 'amd64' out/ 'trixie' '@1771804800'"
---
> "# debian.sh --arch 'amd64' out/ 'trixie' '@1777939200'"

Diff output (formatted for readability)

For reference, history lists the most recent items first, so the environment variable PG_MAJOR was set, then the source code for pgvector was copied to the /tmp partition, and then the actual installation is performed. The installation layer utilizes a best practice of using many chained commands to minimize the total number of layers in the image and avoids bloating the current layer with unneeded files.

Docker Compose

After reading through the pgvector and base postgres image docs, we can construct a minimal docker compose file that keeps the db password out of our source code, and gives us an external volume to persist the DB's database during image upgrades.

The shared memory size of 128mb matches PostgreSQL's default memory needs, suitable for a small development lab. However, it will negatively impact index build time and vacuum (maintenance) operations for much larger input datasets. I'm also using a named volume here, as I develop on both Windows and Linux and I dislike fussing with OS specific file paths until I'm much closer to production ready code.

services:

  vectordb:
    image: pgvector/pgvector:pg18-trixie
    environment:
      # Must define PG_PASSWORD=xyz in a .env      
      # or set as an env variable for the first launch      
      POSTGRES_PASSWORD: ${PG_PASSWORD}
    ports:
        - "5432:5432"    restart: unless-stopped
    shm_size: 128mb
    volumes:
      - pgdata:/var/lib/postgresql

volumes:
  pgdata:

Basic docker-compose.yml file for the pgvector image

After defining PG_PASSWORD in a .env file in the same directory (and also making sure .env is in our .gitignore file), a container instance is spun up, and the database is initialized with an empty database and our specified password. It can be accessed via a database IDE tool like pgAdmin or DBeaver, using the default username of postgres, the specified password, and default database name of postgres via port 5432.

docker compose up -d
SELECT version();

SQL query for the postgres version

PostgreSQL 18.3 (Debian 18.3-1.pgdg13+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit

Version reported by SQL query

SELECT table_name
FROM information_schema.tables 
WHERE table_schema = 'public'
limit 10;

-- This will return zero results

SQL query to verify the database has no tables

Now that I'm confident we have a viable database, it's time to define the vector table and put some data into it. Continued in RAG Prototype - Part 2