Async programming with LangChain
LLM based applications often involve a lot of I/O-bound operations, such as making API calls to language models, databases, or other services. Asynchronous programming (or async programming) is a paradigm that allows a program to perform multiple tasks concurrently without blocking the execution of other tasks, improving efficiency and responsiveness, particularly in I/O-bound operations.
You are expected to be familiar with asynchronous programming in Python before reading this guide. If you are not, please find appropriate resources online to learn how to program asynchronously in Python. This guide specifically focuses on what you need to know to work with LangChain in an asynchronous context, assuming that you are already familiar with asynchronous programming.
LangChain asynchronous APIsβ
Many LangChain APIs are designed to be asynchronous, allowing you to build efficient and responsive applications.
Typically, any method that may perform I/O operations (e.g., making API calls, reading files) will have an asynchronous counterpart.
In LangChain, async implementations are located in the same classes as their synchronous counterparts, with the asynchronous methods having an "a" prefix. For example, the synchronous invoke method has an asynchronous counterpart called ainvoke.
Many components of LangChain implement the Runnable Interface, which includes support for asynchronous execution. This means that you can run Runnables asynchronously using the await keyword in Python.
await some_runnable.ainvoke(some_input)
Other components like Embedding Models and VectorStore that do not implement the Runnable Interface usually still follow the same rule and include the asynchronous version of method in the same class with an "a" prefix.
For example,
await some_vectorstore.aadd_documents(documents)
Runnables created using the LangChain Expression Language (LCEL) can also be run asynchronously as they implement the full Runnable Interface.
For more information, please review the API reference for the specific component you are using.
Delegation to sync methodsβ
Most popular LangChain integrations implement asynchronous support of their APIs. For example, the ainvoke method of many ChatModel implementations uses the httpx.AsyncClient to make asynchronous HTTP requests to the model provider's API.
When an asynchronous implementation is not available, LangChain tries to provide a default implementation, even if it incurs a slight overhead.
By default, LangChain will delegate the execution of unimplemented asynchronous methods to the synchronous counterparts. LangChain almost always assumes that the synchronous method should be treated as a blocking operation and should be run in a separate thread.
This is done using asyncio.loop.run_in_executor functionality provided by the asyncio library. LangChain uses the default executor provided by the asyncio library, which lazily initializes a thread pool executor with a default number of threads that is reused in the given event loop. While this strategy incurs a slight overhead due to context switching between threads, it guarantees that every asynchronous method has a default implementation that works out of the box.
Performanceβ
Async code in LangChain should generally perform relatively well with minimal overhead out of the box, and is unlikely to be a bottleneck in most applications.
The two main sources of overhead are:
- Cost of context switching between threads when delegating to synchronous methods. This can be addressed by providing a native asynchronous implementation.
- In LCEL any "cheap functions" that appear as part of the chain will be either scheduled as tasks on the event loop (if they are async) or run in a separate thread (if they are sync), rather than just be run inline.
The latency overhead you should expect from these is between tens of microseconds to a few milliseconds.
A more common source of performance issues arises from users accidentally blocking the event loop by calling synchronous code in an async context (e.g., calling invoke rather than ainvoke).
Compatibilityβ
LangChain is only compatible with the asyncio library, which is distributed as part of the Python standard library. It will not work with other async libraries like trio or curio.
In Python 3.9 and 3.10, asyncio's tasks did not
accept a context parameter. Due to this limitation, LangChain cannot automatically propagate the RunnableConfig down the call chain
in certain scenarios.
If you are experiencing issues with streaming, callbacks or tracing in async code and are using Python 3.9 or 3.10, this is a likely cause.
Please read Propagation RunnableConfig for more details to learn how to propagate the RunnableConfig down the call chain manually (or upgrade to Python 3.11 where this is no longer an issue).
How to use in ipython and jupyter notebooksβ
As of IPython 7.0, IPython supports asynchronous REPLs. This means that you can use the await keyword in the IPython REPL and Jupyter Notebooks without any additional setup. For more information, see the IPython blog post.