Docling
Docling parses PDF, DOCX, PPTX, HTML, and other formats into a rich unified representation including document layout, tables etc., making them ready for generative AI workflows like RAG.
This integration provides Docling's capabilities via the
DoclingLoader
document loader.
Installation and Setup
Simply install langchain-docling
from your package manager, e.g. pip:
pip install langchain-docling
Document Loader
The DoclingLoader
class in langchain-docling
seamlessly integrates Docling into
LangChain, enabling you to:
- use various document types in your LLM applications with ease and speed, and
- leverage Docling's rich representation for advanced, document-native grounding.
Basic usage looks as follows:
from langchain_docling import DoclingLoader
FILE_PATH = ["https://arxiv.org/pdf/2408.09869"] # Docling Technical Report
loader = DoclingLoader(file_path=FILE_PATH)
docs = loader.load()
For end-to-end usage check out this example.