Quickly and easily extract tables from documents and transform them in CSV with just a few simple steps!
CSV is a widely accepted format for tabular data, making it ideal for data manipulation, analysis, and integration with existing systems.
It is a simple, human-readable, and widely supported format for tabular data. It’s the go-to choice when it comes to data manipulation, analysis, and integration, especially for businesses that rely on spreadsheets, databases, and data warehouses for decision-making.
That is precisely why we are offering a Python-based solution for converting JSON responses from Eden AI OCR table API into CSV format. By following these simple steps, you’ll acquire practical skills to streamline data processing and integration, ensuring you get the most out of your digitized content.
NOTE: For this tutorial we will concentrate on simple tables easily readable in .csv format. For tables with lots of row & column spans, it is an entire different challenge to represent them in a simple format.
First thing first, we should parse our document into JSON thanks to the Eden AI API.
The API is asynchronous, meaning that we can conduct multiple requests at the same time without waiting for the previous request to execute. This is useful when you need to parse a document spanning multiple pages, which would take a long time to process.
However, for the purpose of this example, we will just send a very simple table that can be found here.
Here is a code snippet to show you how to launch the job:
The API returns a public_id that we can now use to get the result of the job. Since we don’t know when it will finish, we will poll the job and check its status every 5 seconds.
Now that we got the table, we need to format it into multiple lists of strings, each list representing a row.
Here is how to do it:
Finally we just need to create a csv file and write the data into it:
Here is the resulting CSV file:
Here it is! We have successfully parsed a table document and transformed it into a CSV file. It’s actually very easy to do it with Python, and it shouldn’t be a problem to implement it in other languages.