KDB.AI
KDB.AI is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.
This example demonstrates how to use KDB.AI to run semantic search on unstructured text documents.
To access your end point and API keys, sign up to KDB.AI here.
To set up your development environment, follow the instructions on the KDB.AI pre-requisites page.
The following examples demonstrate some of the ways you can interact with KDB.AI through LangChain.
You'll need to install langchain-community
with pip install -qU langchain-community
to use this integration
Import required packages
import os
import time
from getpass import getpass
import kdbai_client as kdbai
import pandas as pd
import requests
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import KDBAI
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
KDBAI_ENDPOINT = input("KDB.AI endpoint: ")
KDBAI_API_KEY = getpass("KDB.AI API key: ")
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key: ")
KDB.AI endpoint: https://ui.qa.cld.kx.com/instance/pcnvlmi860
KDB.AI API key: ········
OpenAI API Key: ········
TEMP = 0.0
K = 3
Create a KBD.AI Session
print("Create a KDB.AI session...")
session = kdbai.Session(endpoint=KDBAI_ENDPOINT, api_key=KDBAI_API_KEY)
Create a KDB.AI session...
Create a table
print('Create table "documents"...')
schema = {
"columns": [
{"name": "id", "pytype": "str"},
{"name": "text", "pytype": "bytes"},
{
"name": "embeddings",
"pytype": "float32",
"vectorIndex": {"dims": 1536, "metric": "L2", "type": "hnsw"},
},
{"name": "tag", "pytype": "str"},
{"name": "title", "pytype": "bytes"},
]
}
table = session.create_table("documents", schema)
Create table "documents"...
%%time
URL = "https://www.conseil-constitutionnel.fr/node/3850/pdf"
PDF = "Déclaration_des_droits_de_l_homme_et_du_citoyen.pdf"
open(PDF, "wb").write(requests.get(URL).content)
CPU times: user 44.1 ms, sys: 6.04 ms, total: 50.2 ms
Wall time: 213 ms
562978
Read a PDF
%%time
print("Read a PDF...")
loader = PyPDFLoader(PDF)
pages = loader.load_and_split()
len(pages)
Read a PDF...
CPU times: user 156 ms, sys: 12.5 ms, total: 169 ms
Wall time: 183 ms
3
Create a Vector Database from PDF Text
%%time
print("Create a Vector Database from PDF text...")
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
texts = [p.page_content for p in pages]
metadata = pd.DataFrame(index=list(range(len(texts))))
metadata["tag"] = "law"
metadata["title"] = "Déclaration des Droits de l'Homme et du Citoyen de 1789".encode(
"utf-8"
)
vectordb = KDBAI(table, embeddings)
vectordb.add_texts(texts=texts, metadatas=metadata)
Create a Vector Database from PDF text...
CPU times: user 211 ms, sys: 18.4 ms, total: 229 ms
Wall time: 2.23 s
['3ef27d23-47cf-419b-8fe9-5dfae9e8e895',
'd3a9a69d-28f5-434b-b95b-135db46695c8',
'd2069bda-c0b8-4791-b84d-0c6f84f4be34']
Create LangChain Pipeline
%%time
print("Create LangChain Pipeline...")
qabot = RetrievalQA.from_chain_type(
chain_type="stuff",
llm=ChatOpenAI(model="gpt-3.5-turbo-16k", temperature=TEMP),
retriever=vectordb.as_retriever(search_kwargs=dict(k=K)),
return_source_documents=True,
)
Create LangChain Pipeline...
CPU times: user 40.8 ms, sys: 4.69 ms, total: 45.5 ms
Wall time: 44.7 ms