End to End RAG with Postgres
And how Timescale is fixing RAG pipelines using its pgai ecosystem
In the world of Retrieval-Augmented Generation (RAG) systems that cater to specific internal data, there is a need to store embedding in a Vector Database. As it is the 2nd step after retrieval, it echoes the ETL pipeline in Big Data pipelines.
And similar to ETL workflows, there are issues with out-of-sync data due to batch updates that happen at a cadence (nightly, weekly etc.). Internal data sources such as wikis or documentation are also updated frequently; hence Vector databases need to be updated frequently.
External vector database such as Pinecone, Milvus would need an integration with frameworks lile LlamaIndex or Langchain. Also updating it would mean testing, validating amongst other tasks.
So the question is
What if the vector data store could handle it natively?
Enter Postgres.
One of my personal favourite videos on Postgres is from “Art of the Terminal” named “Wait... PostgreSQL can do WHAT?”
Postgres as an ecosystem has everything you could ask for. A REST API with PGRest, Vector store with Pgvector etc. This is just tip of the iceberg. Hence, this allows Postgres to be a smart data store that can handle updates itself.
Timescale has released an extension pgai vectorizer
. This extension handles:
Data storage (as it is Postgres, so it is non-trivial)
Insert and Update and embedding in Postgres itself.
SELECT ai.create_vectorizer(
<my_table_name>::regclass,
destination => <embedding_table_name>,
-- Embedding model to use like `text-embedding-3-small` with dimension number
embedding => ai.embedding_openai(<model_name>, <dimensions>),
-- Chunking strategy
chunking => ai.chunking_recursive_character_text_splitter(<column_name>)
);
- Use SQL statement (just one!) to implement RAGs.
-- 1. Create a reusable RAG function
CREATE OR REPLACE FUNCTION my_rag_response(query_text TEXT)
RETURNS TEXT AS $$
DECLARE
context_chunks TEXT;
response TEXT;
BEGIN
-- Perform similarity search to find relevant wiki
SELECT string_agg(title || ': ' || chunk, ' ') INTO context_chunks
FROM (
SELECT title, chunk
FROM wiki_embedding
ORDER BY embedding <=> ai.openai_embed('text-embedding-3-small', query_text)
LIMIT 3
) AS relevant_posts;
-- Generate a summary using gpt-4o-mini
SELECT ai.openai_chat_complete(
'gpt-4o-mini',
jsonb_build_array(
jsonb_build_object('role', 'system', 'content', 'You are a helpful assistant. Use only the context provided to answer the question. Also mention the titles of the blog posts you use to answer the question.'),
jsonb_build_object('role', 'user', 'content', format('Context: %s\n\nUser Question: %s\n\nAssistant:', context_chunks, query_text))
)
)->'choices'->0->'message'->>'content' INTO response;
RETURN response;
END;
$$ LANGUAGE plpgsql;
-- 2. Execute the RAG function
SELECT my_rag_response('How do I onboard this new service?');
- Use pgvectorscale to do ANN which is required for RAG applications.
For real demo, Timescale has shared the demo here:
Resources:
Closing thoughts
So try it out in your Postgres instance. And closing thoughts is
“Postgres is awesome”