Querying

Querying is a 2 step process.

Convert query to embedding
Match the query embedding with embeddings in the database and do the similarity search.

In previous section, we've learnt how to convert file content to embedding. There is no change in the process. Instead of converting file content to embedding, we're going to convert query text to embedding.

Semantic Search

In earlier section, we've mentioned that when we convert text to embeddings, the words/sentences which are related to each other are put closer to each other.

So, we'll try to calculate the distance between our query embeddings and document embeddings and then we can get to know the similarity score.

Below is the code for the same

CREATE OR REPLACE FUNCTION query_documents(query_embedding vector(1536))
RETURNS TABLE (doc_name TEXT, similarity FLOAT) AS $$
BEGIN
    RETURN QUERY
    SELECT d.doc_name, 1 - (d.embedding <=> query_embedding) AS similarity
    FROM docs d;
END;
$$ LANGUAGE plpgsql;

Now, we'll convert query text to embedding and call the postgres function to calculate similarity score

Typescript Code:

async function queryEmbedding(query: string) {
  const embedding = await createEmbeddingOpenAI(query);
  const { data, error } = await supabase.rpc("query_documents", {
    query_embedding: embedding,
  });

  return data;
}

Python Code:

def query_embedding(query: str):
    embedding = create_embedding_openai(query)
    response = supabase.rpc("query_documents", {"query_embedding": embedding}).execute()
    return response

Semantic Search​

Semantic Search