An agentic RAG experiment across 20 years of professional history

WTF is a personal agent I built for my freelance work. It ingests twenty years of pitches, project plans, research and retrospectives into a vector store, then lets me ask questions about my own back catalogue. Two modes: explore the corpus to find what I already did and have inevitably forgotten, or generate a new pitch or plan structured against past work so I am not starting from a blank page at 11pm the night before.

Every answer comes with citations back to the source passage, which matters because I do not entirely trust myself, never mind an LLM, to remember what actually happened on a project from 2009.

How it works

A question is matched against the corpus by both meaning and keyword simultaneously. The semantic search finds passages that are conceptually related; the keyword search finds ones that contain the actual words. Both results are fused into a single ranked list using Reciprocal Rank Fusion, which is a fancy way of saying the passages that show up near the top of both lists get promoted. This matters because pure vector search tends to miss exact terminology, and pure keyword search misses meaning. Combining them gets more of the right stuff to the top.

That ranked list goes to the agent, which reads it and makes a decision: enough context to answer, or not. If not, it rewrites the query and searches again, up to three times before it gives up and admits it does not know. When it does have enough, it writes the answer directly from the retrieved passages and attaches an inline citation to each claim so you can see exactly where it came from. This is less a trust mechanism than a sanity check, given that the source material is my own work and I should probably know what is in it.

01stage

Document ingestion and vector indexing

Every PDF, article and case study is ingested, broken into passages, and encoded as a high-dimensional vector. Building a searchable memory of everything produced.

02stage

Hybrid retrieval with RRF re-ranking

Questions are matched against the corpus by meaning and by keyword simultaneously, then fused into a single ranked list of the most relevant passages.

03stage

Agentic reasoning loop

The agent reads what it found and decides whether to search again with a sharper query, looping until it has enough to give a confident answer.

04stage

Grounded synthesis with citations

The response is written directly from the retrieved passages, with every claim linked back to the source document it came from.

What the agent knows

The corpus is structured, not just a pile of PDFs scraped off a hard drive. At ingestion an LLM extracts metadata for each document (client, year, type, sector, outcome) and stores it alongside the embeddings. That is what makes the thing useful rather than a fancy search box. The agent can reason about what worked, what shape of problem this resembles, and which old project is worth pulling forward.

Tech stack

The interesting part is the AI stack, not the web stack. Everything below sits behind a thin Next.js app with the usual frontend bits, which I will spare you.

Retrieval

Voyage AIvoyage-3

1024-dim vectors for semantic search

pgvectorcosine

Vector store on Supabase Postgres, IVFFlat index

Postgres FTStsvector

Keyword retrieval fused with semantic via RRF

Agent

LangGraphJS

Explicit retrieve → assess → generate loop

Zod + AI SDK

Structured outputs for assessment and metadata

Custom SSE

LangGraph state to stream

Inference

Groqopenai/gpt-oss-20b

Default. Fast enough for an agentic loop

Anthropicclaude-sonnet-4-5

Swappable via AI_PROVIDER env var

Not just palettes: a tool to build flexible and robust semantic colour systems

Interactive SVG animation playground with Motion: make your icons bouncy