RAG service

Now, we'll put together everything we've learnt so far

I've a file doc1.txt with following content

cat says meow

And, I've another file doc2.txt with following content

dog will bark

Let's create a simple function which would convert query into embeddings and find the matching documents

Typescript Code:

async function queryEmbedding(query: string) {
  const embedding = await createEmbeddingOpenAI(query);
  const { data, error } = await supabase.rpc("query_documents", {
    query_embedding: embedding,
  });

  return data;
}

Python Code:

def query_embedding(query: str):
    embedding = create_embedding_openai(query)
    response = supabase.rpc("query_documents", {"query_embedding": embedding}).execute()
    return response

We'll put everything together for querying

Read the files
Create embeddings for contents in those files
Get the query
Create embedding for the query
Find the matching documents

Typescript Code:

async function main() {
  await createEmbeddingFromFiles(["data/doc1.txt", "data/doc2.txt"]);
  const query = "dog";
  const results = await queryEmbedding(query);
  console.log(results);
}

main();

Python Code:

def main():
    create_embedding_from_files(["data/doc1.txt", "data/doc2.txt"])
    response = query_embedding("dog")
    print(response)

main()

When I execute above code, I'm getting below result in console

[
  { doc_name: 'data/doc1.txt', similarity: 0.360020222665688 },
  { doc_name: 'data/doc2.txt', similarity: 0.545840669796527 }
]

As you can see, doc2.txt has contents about dog and that is why similarity score is more.