How to extract specific information from documents with Python?
Tutorial

How to extract specific information from documents with Python?

In this tutorial, you will learn how to use Document Queries API in 5 minutes using Python. Eden AI provides an easy and developer-friendly API that allows you to extract specific information from your documents.

What is Document Queries API?

A document queries API, also known as a custom document parser API, refers to a powerful tool designed to extract specific information from unstructured text-based documents, including PDFs and web pages, enabling further analysis and manipulation.

With Custom Document Parsing functionality, users can effortlessly input queries to retrieve precise data they require. The system leverages advanced optical character recognition (OCR) technology to scan the document, while employing sophisticated natural language processing (NLP) models to understand the queries and automatically extract the relevant information.

Custom Document Parsing API result on Eden AI

The API proves particularly valuable in swiftly and accurately extracting data from a vast volume of documents, such as invoices or legal documents. By employing this API, businesses can automate the data extraction process, leading to time savings and increased operational efficiency.

Get Started with Document Queries API

The first step is to install Python's requests package, that will allow you to call Eden AI API.

Next, you'll need to install Python's JSON package to be able to read and print the result of the API request.

How to extract specific information from documents with Python

You are now ready to process your file into Eden AI Document Queries API. You can process files in .pdf, .jpg, .png and documents in many languages.

1. Get a Document Queries API Key on Eden AI

To perform Document Queries, you'll need to create an account on Eden AI for free. Then, you will be able to get your API key directly from the homepage with free credits offered by Eden AI.

Eden AI platform - Get your API key

2. Let’s extract specific information from documents

Now that you have imported packages on Python and got your API key, you will be able to extract specific information from your document. With Eden AI, you can choose from a wide range of different engines you want for Document Queries. You can access the list of Document Queries' providers available on Eden AI on our documentation.

Here is the Python script you need to write on your notebook:

For example, we called two different Document Queries engines. Eden AI API will then return in its JSON response results of those providers.

Eden AI Document Queries API is an asynchronous API. It means that you will get in response an ID:

Then you will need to perform a GET request to check the status of the API request (success, processing, failed):

You will first get this response:

Once the request is done (status : finished), you will be able to get the result with this print:

Here is an example of a result for Document Queries task:

Benefits of using Document Queries API with Eden AI

Using Document Queries with Eden AI API is quick and easy.

Multiple AIs in one API - Eden AI
Multiple AIs in one API - Eden AI

Save time and cost

We offer a unified API for all providers: simple and standard to use, with a quick switch between providers and an access to the specific features of each provider.

Easy to integrate

The JSON output format is the same for all suppliers thanks to Eden AI's standardisation work. The response elements are also standardised thanks to Eden AI's powerful matching algorithms.

Customization

With Eden AI you have the possibility to integrate a third party platform: we can quickly develop connectors. To go further and customize your Document Queries request with specific parameters, check out our documentation.

Related Posts

Try Eden AI for free.

You can directly start building now. If you have any questions, feel free to schedule a call with us!

Get startedContact sales