End to End RAG with Postgres

End to End RAG with Postgres

And how Timescale is fixing RAG pipelines using its pgai ecosystem

In the world of Retrieval-Augmented Generation (RAG) systems that cater to specific internal data, there is a need to store embedding in a Vector Database. As it is the 2nd step after retrieval, it echoes the ETL pipeline in Big Data pipelines.

And similar to ETL workflows, there are issues with out-of-sync data due to batch updates that happen at a cadence (nightly, weekly etc.). Internal data sources such as wikis or documentation are also updated frequently; hence Vector databases need to be updated frequently.

External vector database such as Pinecone, Milvus would need an integration with frameworks lile LlamaIndex or Langchain. Also updating it would mean testing, validating amongst other tasks.

So the question is

What if the vector data store could handle it natively?

Enter Postgres.

One of my personal favourite videos on Postgres is from “Art of the Terminal” named “Wait... PostgreSQL can do WHAT?”

Postgres as an ecosystem has everything you could ask for. A REST API with PGRest, Vector store with Pgvector etc. This is just tip of the iceberg. Hence, this allows Postgres to be a smart data store that can handle updates itself.

Timescale has released an extension pgai vectorizer. This extension handles:

  1. Data storage (as it is Postgres, so it is non-trivial)

  2. Insert and Update and embedding in Postgres itself.

SELECT ai.create_vectorizer( 
    <my_table_name>::regclass, 
    destination => <embedding_table_name>,
    -- Embedding model to use like `text-embedding-3-small` with dimension number
    embedding => ai.embedding_openai(<model_name>, <dimensions>),
    -- Chunking strategy
    chunking => ai.chunking_recursive_character_text_splitter(<column_name>)
);
  1. Use SQL statement (just one!) to implement RAGs.
-- 1. Create a reusable RAG function
CREATE OR REPLACE FUNCTION my_rag_response(query_text TEXT)
RETURNS TEXT AS $$
DECLARE
   context_chunks TEXT;
   response TEXT;
BEGIN
   -- Perform similarity search to find relevant wiki
   SELECT string_agg(title || ': ' || chunk, ' ') INTO context_chunks
   FROM (
       SELECT title, chunk
       FROM wiki_embedding
       ORDER BY embedding <=> ai.openai_embed('text-embedding-3-small', query_text)
       LIMIT 3
   ) AS relevant_posts;

   -- Generate a summary using gpt-4o-mini
   SELECT ai.openai_chat_complete(
       'gpt-4o-mini',
       jsonb_build_array(
           jsonb_build_object('role', 'system', 'content', 'You are a helpful assistant. Use only the context provided to answer the question. Also mention the titles of the blog posts you use to answer the question.'),
           jsonb_build_object('role', 'user', 'content', format('Context: %s\n\nUser Question: %s\n\nAssistant:', context_chunks, query_text))
       )
   )->'choices'->0->'message'->>'content' INTO response;

   RETURN response;
END;
$$ LANGUAGE plpgsql;

-- 2. Execute the RAG function
SELECT my_rag_response('How do I onboard this new service?');
  1. Use pgvectorscale to do ANN which is required for RAG applications.

For real demo, Timescale has shared the demo here:

Resources:

  1. Timescale Blog

  2. PgAI Github repository

Closing thoughts

So try it out in your Postgres instance. And closing thoughts is

“Postgres is awesome”

Did you find this article valuable?

Support Ayman Tech Musings by becoming a sponsor. Any amount is appreciated!