Querying
Querying is a 2 step process.
- Convert query to embedding
- Match the query embedding with embeddings in the database and do the similarity search.
In previous section, we've learnt how to convert file content to embedding. There is no change in the process. Instead of converting file content to embedding, we're going to convert query text to embedding.
Semantic Search
In earlier section, we've mentioned that when we convert text to embeddings, the words/sentences which are related to each other are put closer to each other.
So, we'll try to calculate the distance between our query embeddings and document embeddings and then we can get to know the similarity score.
Below is the code for the same
CREATE OR REPLACE FUNCTION query_documents(query_embedding vector(1536))
RETURNS TABLE (doc_name TEXT, similarity FLOAT) AS $$
BEGIN
RETURN QUERY
SELECT d.doc_name, 1 - (d.embedding <=> query_embedding) AS similarity
FROM docs d;
END;
$$ LANGUAGE plpgsql;
Now, we'll convert query text to embedding and call the postgres function to calculate similarity score
Typescript Code:
async function queryEmbedding(query: string) {
const embedding = await createEmbeddingOpenAI(query);
const { data, error } = await supabase.rpc("query_documents", {
query_embedding: embedding,
});
return data;
}
Python Code:
def query_embedding(query: str):
embedding = create_embedding_openai(query)
response = supabase.rpc("query_documents", {"query_embedding": embedding}).execute()
return response