IBM Db2 Vector Store and Vector Search

LangChain's Db2 integration (langchain-db2) provides vector store and vector search capabilities for working with IBM relational database Db2 version v12.1.2 and above, distributed under the MIT license. Users can use the provided implementations as-is or customize them for specific needs. Key features include:

Vector storage with metadata
Vector similarity search and max marginal relevance search, with metadata filtering options
Support for dot production, cosine, and euclidean distance metrics
Performance optimization by index creation and Approximate nearest neighbors search. (Will be added shortly)

Setup

Prerequisites for using Langchain with Db2 Vector Store and Search

Install package langchain-db2 which is the integration package for the db2 LangChain Vector Store and Search.

The installation of the package should also install its dependencies like langchain-core and ibm_db.

# pip install -U langchain-db2

Connect to Db2 Vector Store

The following sample code will show how to connect to Db2 Database. Besides the dependencies above, you will need a Db2 database instance (with version v12.1.2+, which has the vector datatype support) running.

import ibm_db
import ibm_db_dbi

database = ""
username = ""
password = ""

try:
    connection = ibm_db_dbi.connect(database, username, password)
    print("Connection successful!")
except Exception as e:
    print("Connection failed!")

Import the required dependencies

from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores.utils import DistanceStrategy
from langchain_core.documents import Document
from langchain_db2 import db2vs
from langchain_db2.db2vs import DB2VS

API Reference:Document

Initialization

Create Documents

# Define a list of documents
documents_json_list = [
    {
        "id": "doc_1_2_P4",
        "text": "Db2 handles LOB data differently than other kinds of data. As a result, you sometimes need to take additional actions when you define LOB columns and insert the LOB data.",
        "link": "https://www.ibm.com/docs/en/db2-for-zos/12?topic=programs-storing-lob-data-in-tables",
    },
    {
        "id": "doc_11.1.0_P1",
        "text": "Db2® column-organized tables add columnar capabilities to Db2 databases, which include data that is stored with column organization and vector processing of column data. Using this table format with star schema data marts provides significant improvements to storage, query performance, and ease of use through simplified design and tuning.",
        "link": "https://www.ibm.com/docs/en/db2/11.1.0?topic=organization-column-organized-tables",
    },
    {
        "id": "id_22.3.4.3.1_P2",
        "text": "Data structures are elements that are required to use Db2®. You can access and use these elements to organize your data. Examples of data structures include tables, table spaces, indexes, index spaces, keys, views, and databases.",
        "link": "https://www.ibm.com/docs/en/zos-basic-skills?topic=concepts-db2-data-structures",
    },
    {
        "id": "id_3.4.3.1_P3",
        "text": "Db2® maintains a set of tables that contain information about the data that Db2 controls. These tables are collectively known as the catalog. The catalog tables contain information about Db2 objects such as tables, views, and indexes. When you create, alter, or drop an object, Db2 inserts, updates, or deletes rows of the catalog that describe the object.",
        "link": "https://www.ibm.com/docs/en/zos-basic-skills?topic=objects-db2-catalog",
    },
]

# Create Langchain Documents

documents_langchain = []

for doc in documents_json_list:
    metadata = {"id": doc["id"], "link": doc["link"]}
    doc_langchain = Document(page_content=doc["text"], metadata=metadata)
    documents_langchain.append(doc_langchain)

Create Vector Stores with different distance metrics

First we will create three vector stores each with different distance strategies.

(You can manually connect to the Db2 Database and will see three tables : Documents_DOT, Documents_COSINE and Documents_EUCLIDEAN. )

# Create Db2 Vector Stores using different distance strategies

# When using our API calls, start by initializing your vector store with a subset of your documents
# through from_documents(), then incrementally add more documents using add_texts().
# This approach prevents system overload and ensures efficient document processing.

model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

vector_store_dot = DB2VS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_DOT",
    distance_strategy=DistanceStrategy.DOT_PRODUCT,
)
vector_store_max = DB2VS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_COSINE",
    distance_strategy=DistanceStrategy.COSINE,
)
vector_store_euclidean = DB2VS.from_documents(
    documents_langchain,
    model,
    client=connection,
    table_name="Documents_EUCLIDEAN",
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

Manage vector store

Demonstrating add and delete operations for texts, along with basic similarity search

def manage_texts(vector_stores):
    """
    Adds texts to each vector store, demonstrates error handling for duplicate additions,
    and performs deletion of texts. Showcases similarity searches and index creation for each vector store.

    Args:
    - vector_stores (list): A list of DB2VS instances.
    """
    texts = ["Rohan", "Shailendra"]
    metadata = [
        {"id": "100", "link": "Document Example Test 1"},
        {"id": "101", "link": "Document Example Test 2"},
    ]

    for i, vs in enumerate(vector_stores, start=1):
        # Adding texts
        try:
            vs.add_texts(texts, metadata)
            print(f"\n\n\nAdd texts complete for vector store {i}\n\n\n")
        except Exception as ex:
            print(f"\n\n\nExpected error on duplicate add for vector store {i}\n\n\n")

        # Deleting texts using the value of 'id'
        vs.delete([metadata[0]["id"], metadata[1]["id"]])
        print(f"\n\n\nDelete texts complete for vector store {i}\n\n\n")

        # Similarity search
        results = vs.similarity_search("How are LOBS stored in Db2 Database", 2)
        print(f"\n\n\nSimilarity search results for vector store {i}: {results}\n\n\n")


vector_store_list = [
    vector_store_dot,
    vector_store_max,
    vector_store_euclidean,
]
manage_texts(vector_store_list)

Query vector store

Demonstrate advanced searches on vector stores, with and without attribute filtering

With filtering, we only select the document id 101 and nothing else

# Conduct advanced searches
def conduct_advanced_searches(vector_stores):
    query = "How are LOBS stored in Db2 Database"
    # Constructing a filter for direct comparison against document metadata
    # This filter aims to include documents whose metadata 'id' is exactly '101'
    filter_criteria = {"id": ["101"]}  # Direct comparison filter

    for i, vs in enumerate(vector_stores, start=1):
        print(f"\n--- Vector Store {i} Advanced Searches ---")
        # Similarity search without a filter
        print("\nSimilarity search results without filter:")
        print(vs.similarity_search(query, 2))

        # Similarity search with a filter
        print("\nSimilarity search results with filter:")
        print(vs.similarity_search(query, 2, filter=filter_criteria))

        # Similarity search with relevance score
        print("\nSimilarity search with relevance score:")
        print(vs.similarity_search_with_score(query, 2))

        # Similarity search with relevance score with filter
        print("\nSimilarity search with relevance score with filter:")
        print(vs.similarity_search_with_score(query, 2, filter=filter_criteria))

        # Max marginal relevance search
        print("\nMax marginal relevance search results:")
        print(vs.max_marginal_relevance_search(query, 2, fetch_k=20, lambda_mult=0.5))

        # Max marginal relevance search with filter
        print("\nMax marginal relevance search results with filter:")
        print(
            vs.max_marginal_relevance_search(
                query, 2, fetch_k=20, lambda_mult=0.5, filter=filter_criteria
            )
        )


conduct_advanced_searches(vector_store_list)

Usage for retrieval-augmented generation

API reference

Vector store conceptual guide
Vector store how-to guides

Setup​

Prerequisites for using Langchain with Db2 Vector Store and Search​

Connect to Db2 Vector Store​

Import the required dependencies​

Initialization​

Create Documents​

Create Vector Stores with different distance metrics​

Manage vector store​

Demonstrating add and delete operations for texts, along with basic similarity search​

Query vector store​

Demonstrate advanced searches on vector stores, with and without attribute filtering​

Usage for retrieval-augmented generation​

API reference​

Related​