Two things are certain in business: documents, and communications. Even as businesses transition to powerful agentic workflows—with AI agents, robots, and people working together—documents, communications, and other types of content will continue to underlie every enterprise process.
Intelligent document processing (IDP) tools are firmly established within many businesses. Traditionally, IDP has helped businesses extract useful data from structured documents to deepen business and customer insights, and to drive automation in document-based processes.
However, IDP has evolved to become even more useful and powerful. In this article I’ll explain what’s driving this evolution, and how IDP is using the latest generative AI (GenAI) models to turn even the most complex unstructured documents into clean, structured data.
A significant amount of work in the enterprise depends on complicated unstructured content—think of complex documents such as legal contracts, brokerage statements, and physician statements, or unstructured communications such as product feedback, and customer queries.
In fact, McKinsey estimates that 90% of all business data is unstructured, with a large portion coming in the form of complex documents.
Traditionally, IDP solutions have struggled to understand and process complex, unstructured documents. This is because IDP has long depended on rules-based methods or specialized AI models pre-trained to understand and process documents in a consistent, structured schema. Think of an invoice—many clearly structured ‘fields’ where important information like customer names, dates, and invoice numbers can be found.
Compare this to a complex contract, or a brokerage statement where key information is buried within masses of text, often without a standardized format. These complex documents are typically very long and their structures varied. They can have complex elements like nested tables, charts, images, and very long page counts, often in the hundreds.
Pre-trained IDP models have another downside—data annotation. This is where people in the business (called ‘subject matter experts’) manually label data to help the IDP model recognize the correct fields to extract. While data annotation is very useful for fine-tuning model performance, it can be time-consuming and resource intensive. Models relying solely on user annotations are hard to scale for long and complex documents, since the annotation process is long and arduous. An inference first approach using generative models is critical here to ensure time to value and scalability. Being able to turn this unstructured content into structured data, and then passing that data to an AI agent, will be incredibly valuable for enterprises and allow them to automate vastly more processes. Fortunately, a new generation of IDP solutions is helping businesses do just that!
Reading a document may be relatively simple for people, but it’s a very hard task for machines to do. Processing a document isn’t just about being able to extract data from document fields and complex elements like graphs and tables. You also need to understand how those individual fields relate to each other and to the document as a whole.
Ultimately, the goal in processing documents is to transform unstructured information into a well-structured schema to support analysis and automation of processes. This requires real reasoning and understanding. That’s why traditional, specialized IDP models trained to extract data from a specific schema or output won’t always work to the levels expected.
However, in the last couple of years, an answer has been found. IDP solutions have begun to combine the use of specialized AI with state-of-the-art GenAI models. These are usually large language models (LLMs) similar to those powering UiPath Autopilot and Anthropic’s Claude. These LLMs are capable not just of absorbing large amounts of multi-model data, but also of reasoning and interpreting it to understand its meaning. These capabilities make LLMs ideal for more interpretive tasks where the input isn’t always fixed. That’s why they’re such a powerful tool for understanding unstructured documents and extracting their key data for insight and automation.
By combining structured, specialized AI with flexible GenAI, IDP solutions enable businesses to process the bulk of their enterprise data, whether it’s structured, semi-structured, or unstructured. When combined with AI agents, robots, and people, entire processes involving the most complex and unstructured documents and communications can be automated end-to-end.
However, the latest IDP solutions still need to give businesses precise control over GenAI outputs. When automating document-heavy workflows, the goal is often to populate a specific format or schema with data extracted from said documents. Yet, LLM outputs can be unpredictable—counterproductive in settings where consistency and reliability are key. Getting an LLM to generate outputs in a consistent format (such as a table) requires time-consuming prompt engineering. This is where SMEs experiment with giving the model various instructions until the desired output is achieved. However, as business needs change and use cases scale, teams will soon find themselves with a massive instruction set that’s difficult to maintain.
Furthermore, LLMs can make mistakes. They've been known to hallucinate, make up outputs, and struggle to cite sources reliably. They also don’t provide reliable confidence levels for their extractions, leading to the need for more human review. In IDP, this could take the form of extracting the wrong information or even outputting made-up data. Businesses need controls and human-in-the-loop to ensure GenAI outputs are correct so they can be used consistently and reliably by the business with a minimal amount of human review.
UiPath has provided AI-enabled document processing and communications mining for years. But our approach has developed to now incorporate advanced GenAI.
We call our new IDP capability UiPath IXP (Intelligent Xtraction and Processing), where the ‘X’ not only represents the idea of ‘extraction’ but also a growing number of diverse content types, including complex and unstructured documents and communications. UiPath IXP is built for fast time to value and a seamless user experience to take the pain out of complex document processing of all types.
UiPath combines the use of specialized AI models for structured documents with state-of-the-art GenAI for complex unstructured content. Our inference-first experience means you can provide the model instructions (just like a prompt) and it will extract the information you need and put it into the format you specify.
At the same time, UiPath IXP provides precise controls to ensure the accuracy of our IDP outputs. Users have the ability to write prompt instructions at the level of individual fields and, crucially, are given tools to evaluate model accuracy. They can engage in closed-loop learning to improve model performance and ensure outputs meet the exact requirements. This is enhanced by custom validation experiences that are tuned to specific use cases. Our models give confidence scores for every prediction so you can manually review when the model isn’t entirely sure. Finally, our models also provide attribution, giving sources and references for all their predictions.
Learn how UiPath IXP enables enterprises to automate more and extend end-to-end automation (both agentic and traditional) into areas previously untouched. Watch our on-demand session from UiPath Agentic AI Summit.
SVP, Product Management, UiPath
Sign up today and we'll email you the newest articles every week.
Thank you for subscribing! Each week, we'll send the best automation blog posts straight to your inbox.