Resume parsing (OCR): Which solution to choose?
Tutorial

Resume parsing (OCR): Which solution to choose?

Introduction: 

On average enterprises receive thousands of resumes per year and each of them used to be screened and shortlisted manually where the recruiters would have to look through each resume separately and extract relevant data to see if they are fitted for the position, this process requires an immense amount of time and could cost over the quality of candidates. Resume Parsing or OCR resume provides solutions to most of the problems faced with manual resume screening. In this article, we will focus on this technology.

What is resume parsing: 

This technology is built on two steps, the first step is text extraction with OCR (for Optical Character Recognition) it extracts a text from various file types: pdf, Docx, JPEG, PNG, etc. Their goal is only to get the texts in the document without dealing with the structure of the document.

The second step would be data categorization, where it classifies the extracted text into keys and tags like personal information and skills; it's based on deep learning algorithms and NER (named entity recognition). 

The final result of the parsing is a structured form that can be readable by the computer such as JSON or XML; this makes it easy to be stored into a database and automatically analyzed. 

Use case: 

OCR Resume or Resume Parsing is mostly used in the human resources and recruitment field, as of today the recruitment industry weighs more than $200B globally with a growth increasing year on year. You have millions of people uploading resumes and applying for jobs every day on thousands of employment platforms. 

Now with the entry of computers and Artificial Intelligence it has affected the recruitment process more by introducing software such as the applicant tracking system (ATS) which now the resume parser are a part of, the recruiter can set criteria for the job and candidates not matching those can be filtered out quickly and automatically, he begins by uploading automatically or manually all application for the position in the ATS. Once they are uploaded, the resume parser tool scans each document and extracts the relevant information like work experience, skills, contact information, education, certifications and professional certifications, the recruiters can even customize some of the fields that may not be included in a traditional resume but needed for the position.

Resume Parsing API engines: 

During our study on Resume Parser APIs, we decided to choose 3 providers APIs that provide high performance according to many blog articles and rankings. 

  • HrFlow
  • Affinda
  • Sovren

This is the list of provider APIs we are going to test. It is interesting to note that some other solutions and open source solutions exist.

Use case tested: 

As said previously, resume parsing APIs are mainly used with ATS software for recruitment. In this article, we are going to test different resume parser APIs with more than 40 resumes.

For each of the 40 resumes provided by a company, we tested the 3 APIs. Of course, for a real project you will need to test on a representative part of your database to have the right view about different performance.

In our benchmark, we wanted to compare the performance of the APIs on various fields: full name, number, address, education, work experience (description + date) and skills. 

Note that some providers extract more than 100 fields from the resume but since we only want traditional information about the candidate we only focused on these.

The API response is a json response that will be used to develop a custom ATS by extracting the specific keys. 

Challenges of parsing APIs (Eden AI):

While using different APIs for parsing, we met some challenges. Some providers can perform well on basic information like name, address and skills but don’t retrieve experiences while others perform on experience and education but not on basic information. 

Another challenge concerns returned keys; some APIs return both the last name and first name while others directly have one key full name and same goes for work experience where the description title and date are included in the same keys in some APIs .

To make this easier and bypass these challenges we used Eden AI solution to use APIs from different vendors. Eden AI allows us to get multiple resume parser APIs results with only one simple request and return a standardized response for all them.

You don’t need a preprocess to compare them, so if you want to combine a result from multiple providers, it can be done easily with few lines of code.

URL = "https://api.edenai.run/v1/pretrained/ocr/ocr_cv"
Resume_path ="test.pdf"
header = {
    'Authorization': 'Bearer ' + key
    }
multipart_form_data = {
        'providers':str(['affinda']),
    }
files = {
    'files':open(Resume_path,'rb'),
}
response = requests.post(URL,data=multipart_form_data,files=files,headers=header)

Or just use the web interface where you import the resume and choose the providers that you want to test.

You can also manage and evaluate your own cost for each provider available for resume parser, it gives you an idea for your project as it includes the pricing per request.

Results:

Please note that the results represent the percentage of resumes whose result is accurate; a prediction close to the real field without being accurate will be considered as a bad prediction i.e.: prediction of last name but not the first name.

Warning: These results are not an accurate representation of the performance, it will always depend on your dataset.

Conclusion: 

Depending on the data used, the best way to obtain the highest performance is always different. For some use cases, an API from the provider A will be the best, for another use case provider B API is better. For a more complex use case, maybe a combination is needed. Sometimes, a custom model can be better.

With Eden AI, you can get fast access to various results from various providers. So you can have a better idea about which is the solution that best fits for you.

The decision making is as follows:

First you run your datas on Eden AI to benchmark solutions available on the market. 

Then you can either find a result that pushes you to choose one API that fits your need or different providers that give you good results for different fields so you can build your own custom model by combining multiple providers.

This process guarantees you to make the right choice to succeed in your project. Eden AI is the universal APIl that allows you to have flexibility in the use of all these Resume Parsing engines to always get the best performance/cost ratio.

If you want to know more about Eden AI or simply to talk with us, feel free to book an appointment or write to us!

Related Posts

Try Eden AI for free.

You can directly start building now. If you have any questions, feel free to schedule a call with us!

Get startedContact sales