Docparser is a web-based software that allows you to extract data from PDF documents and convert them into easy-to-handle structured data. https://docparser.com/blog/extract-data-from-pdf/
In this screencast we will cover the basics and show you how to get started with Docparser. In other screencasts, we cover specific use-cases, show you how to download your parsed data and leverage our cloud integrations to fully automate your workflow.
Creating your free trial account is as quick as clicking the “Start Free Trial” button & populating the signup information.
Now we create our 1st Document Parser.
Choose the category that's the closest match to the document type you are converting. In this screencast we will create a Document Parser for a simple PDF form. Our other screencasts cover how to extract data from popular document types, such as invoices, purchase orders and standardized contracts.
You will then be prompted to upload a few sample documents which are used as blueprints for setting up our parsing rules. Simply navigate to the files, or drag and drop.
Docparser also provides the option to email files to your parser, import documents using our API, or automatically fetch them from cloud storage platforms such as Box, Google Drive & DropBox. More on this use case in future videos.
Once your sample documents are uploaded, it’s time to create parsing rules for our document layout.
A parsing rule is basically a set of instructions which tell our algorithms how to extract and format your data. Typically, you will create one parsing rule for every data field you want to extract.
Based on the document category you chose in the first step, Docparser suggests a couple of popular parsing rule presets. In our case we are seeing parsing rule presets related to PDF form processing.
Docparser offers parsing rule presets for all kinds of use-cases. Those presets allow you to extract any kind of data from fixed or variable positions in your document.
We will now create our first parsing rule to extract data from PDF form fields.
Just select the preset, draw a rectangle around the position where the form answer is located and confirm.
You are now presented with the text data located at the position you defined in the previous step.
You can choose to add additional text filters to further process the extracted text data. We will skip this step and continue creating more parsing rules.
Our next parsing rule uses a pre-built filter which automatically identifies, extracts and formats dates. Just draw a rectangle around the approximate location of your date and confirm. As you can see, Docparser was able to isolate the date and already presents it in a standardized format.
Again, should you need to further filter or format the data, you could do so by chaining up multiple text and table filters.
That’s it, we just created two parsing rules which extract text from a fixed position and a date from an approximate location inside the document.
We are now leaving the parsing rule editor to view our parsed data. Our parsing rules are working as expected and we see two extracted data fields for each document.
With just a few easy steps we created our first Document Parser capable of batch converting PDF documents into structured data.
Stay tuned for more videos where we will build parsers for specific use-cases, show you how to download your parsed data and leverage our cloud integrations to fully automate your workflow.