Skip to main content

UnstructuredXMLLoader

This notebook provides a quick overview for getting started with UnstructuredXMLLoader document loader. The UnstructuredXMLLoader is used to load XML files. The loader works with .xml files. The page content will be the text extracted from the XML tags.

Overviewโ€‹

Integration detailsโ€‹

ClassPackageLocalSerializableJS support
UnstructuredXMLLoaderlangchain_communityโœ…โŒโœ…

Loader featuresโ€‹

SourceDocument Lazy LoadingNative Async Support
UnstructuredXMLLoaderโœ…โŒ

Setupโ€‹

To access UnstructuredXMLLoader document loader you'll need to install the langchain-community integration package.

Credentialsโ€‹

No credentials are needed to use the UnstructuredXMLLoader

If you want to get automated best in-class tracing of your model calls you can also set your LangSmith API key by uncommenting below:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installationโ€‹

Install langchain_community.

%pip install -qU langchain_community

Initializationโ€‹

Now we can instantiate our model object and load documents:

from langchain_community.document_loaders import UnstructuredXMLLoader

loader = UnstructuredXMLLoader(
"./example_data/factbook.xml",
)
API Reference:UnstructuredXMLLoader

Loadโ€‹

docs = loader.load()
docs[0]
Document(metadata={'source': './example_data/factbook.xml'}, page_content='United States\n\nWashington, DC\n\nJoe Biden\n\nBaseball\n\nCanada\n\nOttawa\n\nJustin Trudeau\n\nHockey\n\nFrance\n\nParis\n\nEmmanuel Macron\n\nSoccer\n\nTrinidad & Tobado\n\nPort of Spain\n\nKeith Rowley\n\nTrack & Field')
print(docs[0].metadata)
{'source': './example_data/factbook.xml'}

Lazy Loadโ€‹

page = []
for doc in loader.lazy_load():
page.append(doc)
if len(page) >= 10:
# do some paged operation, e.g.
# index.upsert(page)

page = []

API referenceโ€‹

For detailed documentation of all ModuleNameLoader features and configurations head to the API reference: https://python.langchain.com/v0.2/api_reference/community/document_loaders/langchain_community.document_loaders.xml.UnstructuredXMLLoader.html


Was this page helpful?


You can also leave detailed feedback on GitHub.