Skip to main content
Open In ColabOpen on GitHub

vLLM Chat

vLLM can be deployed as a server that mimics the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. This server can be queried in the same format as OpenAI API.

Overviewโ€‹

This will help you getting started with vLLM chat models, which leverage the langchain-openai package. For detailed documentation of all ChatOpenAI features and configurations head to the API reference.

Integration detailsโ€‹

ClassPackageLocalSerializableJS supportPackage downloadsPackage latest
ChatOpenAIlangchain_openaiโœ…betaโŒPyPI - DownloadsPyPI - Version

Model featuresโ€‹

Specific model features-- such as tool calling, support for multi-modal inputs, support for token-level streaming, etc.-- will depend on the hosted model.

Setupโ€‹

See the vLLM docs here.

To access vLLM models through LangChain, you'll need to install the langchain-openai integration package.

Credentialsโ€‹

Authentication will depend on specifics of the inference server.

To enable automated tracing of your model calls, set your LangSmith API key:

# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")

Installationโ€‹

The LangChain vLLM integration can be accessed via the langchain-openai package:

%pip install -qU langchain-openai

Instantiationโ€‹

Now we can instantiate our model object and generate chat completions:

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain_openai import ChatOpenAI
inference_server_url = "http://localhost:8000/v1"

llm = ChatOpenAI(
model="mosaicml/mpt-7b",
openai_api_key="EMPTY",
openai_api_base=inference_server_url,
max_tokens=5,
temperature=0,
)

Invocationโ€‹

messages = [
SystemMessage(
content="You are a helpful assistant that translates English to Italian."
),
HumanMessage(
content="Translate the following sentence from English to Italian: I love programming."
),
]
llm.invoke(messages)
AIMessage(content=' Io amo programmare', additional_kwargs={}, example=False)

Chainingโ€‹

We can chain our model with a prompt template like so:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate(
[
(
"system",
"You are a helpful assistant that translates {input_language} to {output_language}.",
),
("human", "{input}"),
]
)

chain = prompt | llm
chain.invoke(
{
"input_language": "English",
"output_language": "German",
"input": "I love programming.",
}
)
API Reference:ChatPromptTemplate

API referenceโ€‹

For detailed documentation of all features and configurations exposed via langchain-openai, head to the API reference: https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html

Refer to the vLLM documentation as well.


Was this page helpful?