Runpod
RunPod provides GPU cloud infrastructure, including Serverless endpoints optimized for deploying and scaling AI models.
This guide covers how to use the langchain-runpod
integration package to connect LangChain applications to models hosted on RunPod Serverless.
The integration offers interfaces for both standard Language Models (LLMs) and Chat Models.
Intstallation
Install the dedicated partner package:
%pip install -qU langchain-runpod
Setup
1. Deploy an Endpoint on RunPod
- Navigate to your RunPod Serverless Console.
- Create a "New Endpoint", selecting an appropriate GPU and template (e.g., vLLM, TGI, text-generation-webui) compatible with your model and the expected input/output format (see component guides or the package README).
- Configure settings and deploy.
- Crucially, copy the Endpoint ID after deployment.
2. Set API Credentials
The integration needs your RunPod API Key and the Endpoint ID. Set them as environment variables for secure access:
import getpass
import os
os.environ["RUNPOD_API_KEY"] = getpass.getpass("Enter your RunPod API Key: ")
os.environ["RUNPOD_ENDPOINT_ID"] = input("Enter your RunPod Endpoint ID: ")
(Optional) If using different endpoints for LLM and Chat models, you might need to set RUNPOD_CHAT_ENDPOINT_ID
or pass the ID directly during initialization.
Components
This package provides two main components:
1. LLM
For interacting with standard text completion models.
See the RunPod LLM Integration Guide for detailed usage
from langchain_runpod import RunPod
# Example initialization (uses environment variables)
llm = RunPod(model_kwargs={"max_new_tokens": 100}) # Add generation params here
# Example Invocation
try:
response = llm.invoke("Write a short poem about the cloud.")
print(response)
except Exception as e:
print(
f"Error invoking LLM: {e}. Ensure endpoint ID and API key are correct and endpoint is active."
)
2. Chat Model
For interacting with conversational models.
See the RunPod Chat Model Integration Guide for detailed usage and feature support.
from langchain_core.messages import HumanMessage
from langchain_runpod import ChatRunPod
# Example initialization (uses environment variables)
chat = ChatRunPod(model_kwargs={"temperature": 0.8}) # Add generation params here
# Example Invocation
try:
response = chat.invoke(
[HumanMessage(content="Explain RunPod Serverless in one sentence.")]
)
print(response.content)
except Exception as e:
print(
f"Error invoking Chat Model: {e}. Ensure endpoint ID and API key are correct and endpoint is active."
)