Automate schema generation for intelligent document processing | Amazon Web Services
Before you can extract information from documents using intelligent document processing (IDP) techniques, you need a schema for each document class that defines what to extract. But how do you create schemas when you have thousands of documents and don’t know what classes exist? Doing this at scale can take substantial manual effort, making downstream IDP initiatives difficult to justify.
In this post, we’ll show you how our multi-document discovery feature solves this problem. It serves as an automated pre-processing step, analyzing unknown documents, clustering them by type, and generating schemas ready for the IDP Accelerator. You’ll learn how the new capability uses visual embeddings for automatic clustering and agents for schema generation. We’ll also walk you through running the solution on your own document collections.
IDP Accelerator
The IDP Acceleratoris a scalable, serverless, open-source solution for automated document processing and information extraction. To customize the solution to...
Copyright of this story solely belongs to amazon.com. To see the full text click HERE