Ollama
Ollama allows you to run open-source large language models, such as gpt-oss, locally.
Ollamabundles model weights, configuration, and data into a single package, defined by a Modelfile. It optimizes setup and configuration details, including GPU usage. For a complete list of supported models and model variants, see the Ollama model library.
See this guide for more details
on how to use ollama with LangChain.
Installation and Setup
Ollama installation
Follow these instructions to set up and run a local Ollama instance.
Ollama will start as a background service automatically, if this is disabled, run:
# export OLLAMA_HOST=127.0.0.1 # environment variable to set ollama host
# export OLLAMA_PORT=11434 # environment variable to set the ollama port
ollama serve
After starting ollama, run ollama pull <name-of-model> to download a model from the Ollama model library:
ollama pull gpt-oss:20b
- This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.
- To view all pulled (downloaded) models, use ollama list
We're now ready to install the langchain-ollama partner package and run a model.
Ollama LangChain partner package install
Install the integration package with:
pip install langchain-ollama
LLM
from langchain_ollama.llms import OllamaLLM
See the notebook example here.
Chat Models
Chat Ollama
from langchain_ollama.chat_models import ChatOllama
See the notebook example here.
Ollama tool calling
Ollama tool calling uses the
OpenAI compatible web server specification, and can be used with
the default BaseChatModel.bind_tools() methods
as described here.
Make sure to select an ollama model that supports tool calling.
Embedding models
from langchain_community.embeddings import OllamaEmbeddings
See the notebook example here.