Couchbase
Couchbase is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications. Couchbase embraces AI with coding assistance for developers and vector search for their applications.
Vector Search is a part of the Full Text Search Service (Search Service) in Couchbase.
This tutorial explains how to use Vector Search in Couchbase. You can work with either Couchbase Capella and your self-managed Couchbase Server.
Setup
To access the CouchbaseVectorStore
you first need to install the langchain-couchbase
partner package:
pip install -qU langchain-couchbase
Credentials
Head over to the Couchbase website and create a new connection, making sure to save your database username and password:
import getpass
COUCHBASE_CONNECTION_STRING = getpass.getpass(
"Enter the connection string for the Couchbase cluster: "
)
DB_USERNAME = getpass.getpass("Enter the username for the Couchbase cluster: ")
DB_PASSWORD = getpass.getpass("Enter the password for the Couchbase cluster: ")
If you want to get best in-class automated tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()
Initialization
Before instantiating we need to create a connection.
Create Couchbase Connection Object
We create a connection to the Couchbase cluster initially and then pass the cluster object to the Vector Store.
Here, we are connecting using the username and password from above. You can also connect using any other supported way to your cluster.
For more information on connecting to the Couchbase cluster, please check the documentation.
from datetime import timedelta
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
auth = PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)
options = ClusterOptions(auth)
cluster = Cluster(COUCHBASE_CONNECTION_STRING, options)
# Wait until the cluster is ready for use.
cluster.wait_until_ready(timedelta(seconds=5))
We will now set the bucket, scope, and collection names in the Couchbase cluster that we want to use for Vector Search.
For this example, we are using the default scope & collections.
BUCKET_NAME = "langchain_bucket"
SCOPE_NAME = "_default"
COLLECTION_NAME = "default"
SEARCH_INDEX_NAME = "langchain-test-index"
For details on how to create a Search index with support for Vector fields, please refer to the documentation.
Simple Instantiation
Below, we create the vector store object with the cluster information and the search index name.
- OpenAI
- HuggingFace
- Fake Embedding
pip install -qU langchain-openai
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass()
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
pip install -qU langchain-huggingface
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model="sentence-transformers/all-mpnet-base-v2")
pip install -qU langchain-core
from langchain_core.embeddings import FakeEmbeddings
embeddings = FakeEmbeddings(size=4096)
from langchain_couchbase.vectorstores import CouchbaseVectorStore
vector_store = CouchbaseVectorStore(
cluster=cluster,
bucket_name=BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
embedding=embeddings,
index_name=SEARCH_INDEX_NAME,
)
Specify the Text & Embeddings Field
You can optionally specify the text & embeddings field for the document using the text_key
and embedding_key
fields.
vector_store_specific = CouchbaseVectorStore(
cluster=cluster,
bucket_name=BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
embedding=embeddings,
index_name=SEARCH_INDEX_NAME,
text_key="text",
embedding_key="embedding",
)