In this video we will show you how to extract tables from PDFs or scanned documents. You will learn how to convert native PDFs to Excel and how to apply OCR in case you are dealing with scanned documents. As sample documents we are using invoices and purchase orders following a fixed format. The technique of converting a PDF into Excel format can also be applied to other documents types. Check out our related blog articles: PDF to Excel: https://docparser.com/blog/convert-pdf-to-excel/ PDF to CSV: https://docparser.com/blog/convert-pdf-csv-extract-text-from-pdf/ PDF to Google Sheets: https://docparser.com/blog/pdf-to-google-sheets/ Let’s jump right in and create a new Document Parser. In this screencast we will choose ‘Purchase Orders’ as the document format. After confirming our document type we will upload a couple of sample documents. Once we are done uploading our sample documents we can proceed to our new Document Parser. As for invoice processing, Docparser comes with powerful data extraction algorithms tailored to parsing purchase orders. As we can see, Docparser already created three parsing rules which extracted the purchase order number, the purchase order date and the totals. We will now create a new parsing rule to extract the line-items from our purchase orders. Selecting the ‘Line-Items’ preset will bring us to the table extraction toolkit. All you need to do is to move, add or delete column spacers as needed to define the columns of your table. Then select the approximate outer boundary of the data table you seek to extract. When you proceed, you will see your parsed table rows in raw unfiltered format on the top. At this point we basically already converted our PDF into a spreadsheet data. You can now chain up multiple filters to further process your PDF table data until you end up with clean table rows. Docparser offers a variety of table manipulation filters. For example, you can add a filter which groups rows together in case the line-item descriptions goes over two lines. Adding this additional filter gives us a clean table representation of the line items in our purchase order. Let’s hit save and leave the parsing rule editor. We can now flip through the sample purchase orders which we uploaded previously. As you can see, all purchase order data is extracted correctly, including all line-items. That’s it for today! We just showed you how easy it is to extract data from PDF to Excel, CSV or Google Sheets. Stay tuned for more screencasts covering advanced parsing setup and automating your workflow thanks to our Docparser integration partners.
Views: 21604 Docparser
Docparser is a web-based software that allows you to extract data from PDF documents and convert them into easy-to-handle structured data. https://docparser.com/blog/extract-data-from-pdf/ In this screencast we will cover the basics and show you how to get started with Docparser. In other screencasts, we cover specific use-cases, show you how to download your parsed data and leverage our cloud integrations to fully automate your workflow. Creating your free trial account is as quick as clicking the “Start Free Trial” button & populating the signup information. Now we create our 1st Document Parser. Choose the category that's the closest match to the document type you are converting. In this screencast we will create a Document Parser for a simple PDF form. Our other screencasts cover how to extract data from popular document types, such as invoices, purchase orders and standardized contracts. You will then be prompted to upload a few sample documents which are used as blueprints for setting up our parsing rules. Simply navigate to the files, or drag and drop. Docparser also provides the option to email files to your parser, import documents using our API, or automatically fetch them from cloud storage platforms such as Box, Google Drive & DropBox. More on this use case in future videos. Once your sample documents are uploaded, it’s time to create parsing rules for our document layout. A parsing rule is basically a set of instructions which tell our algorithms how to extract and format your data. Typically, you will create one parsing rule for every data field you want to extract. Based on the document category you chose in the first step, Docparser suggests a couple of popular parsing rule presets. In our case we are seeing parsing rule presets related to PDF form processing. Docparser offers parsing rule presets for all kinds of use-cases. Those presets allow you to extract any kind of data from fixed or variable positions in your document. We will now create our first parsing rule to extract data from PDF form fields. Just select the preset, draw a rectangle around the position where the form answer is located and confirm. You are now presented with the text data located at the position you defined in the previous step. You can choose to add additional text filters to further process the extracted text data. We will skip this step and continue creating more parsing rules. Our next parsing rule uses a pre-built filter which automatically identifies, extracts and formats dates. Just draw a rectangle around the approximate location of your date and confirm. As you can see, Docparser was able to isolate the date and already presents it in a standardized format. Again, should you need to further filter or format the data, you could do so by chaining up multiple text and table filters. That’s it, we just created two parsing rules which extract text from a fixed position and a date from an approximate location inside the document. We are now leaving the parsing rule editor to view our parsed data. Our parsing rules are working as expected and we see two extracted data fields for each document. With just a few easy steps we created our first Document Parser capable of batch converting PDF documents into structured data. Stay tuned for more videos where we will build parsers for specific use-cases, show you how to download your parsed data and leverage our cloud integrations to fully automate your workflow.
Views: 13641 Docparser
Welcome to another screencast video of Docparser. In this video we’ll show you how easy it is to convert PDF invoices into Excel Spreadsheets. https://docparser.com/blog/invoice-scanning/ With just a couple of clicks we’ll create a Document Parser which is capable of extracting invoice numbers, invoice totals, and dates then return the data as an Excel spreadsheet. The Document Parser we are about to build will work for PDF invoices with various layouts and thanks to the OCR capabilities of Docparser also works for scanned invoices in multiple languages. Let’s start by creating our Document Parser. In the first step, we select “Invoice” as the document type. Docparser comes with powerful invoice data extraction algorithms which are automatically put in place when creating an Invoice Document Parser. All which is left to do is to upload a couple of sample invoices. Later on, invoices can be uploaded manually with our app, sent in as email attachments, imported through our API or fetched from your cloud storage provider. Once we are done uploading our sample invoices we can proceed to our new Document Parser. Let’s have a look at our uploaded sample documents and check what kind of data Docparser extracted. As you can see, three parsing rules were created automatically which extract the invoice date, invoice number and the invoice totals. In upcoming videos we’ll show you how to tweak existing parsing rules and add more as needed. As a last step we will create a download link that gives us the data from all uploaded invoices as an Excel file. We are presented with different file format options and we will choose the preferred Excel format. By default, download links will include the data of the most recent uploaded documents. Let’s change the settings to something like “Documents uploaded this month” or “Documents upload since last download”. In the advanced settings, we can define which fields our Excel file should include and what type of data we are dealing with. This step is optional but saves you from manually formatting your data in Excel. Let’s hit save and download our Excel file. Everything is looking as expected and we are done with the setup. We just created an invoice parser which converts invoices into an easy to handle Excel file. Stay tuned for more videos where we explain how to extract individual line-items from invoices and how to fully automate your workflow with our cloud integration options.
Views: 9647 Docparser
In this screencast we will show you how to convert form based PDF contracts into easy-to-handle structured data. We will create a Document Parser that extracts names, dates and checkbox values from a standardized lease agreement. Let’s start by creating our new Document Parser. Docparser offers a variety of templates for popular document types. By choosing a matching document type we make sure that appropriate default settings are in place. When prompted for the type of document, we choose “Standardized Contracts”. In the next step we will upload a couple of sample contracts which are used as blueprints for setting up our parsing rules. Simply navigate to the files, or drag and drop. Once the sample documents are uploaded it’s time to create the parsing rules. We will start by creating a parsing rule to extract the signatory’s name. We just draw a square around the location where the data can be found and confirm the selection. As you can see, a text manipulation filter was already added which splits the full name presented in our PDF contract into the first and family name. Our second parsing rule will extract the signatory’s date of birth. We select the “Date” preset and draw a square around the approximate location of the date inside our contract document. Once the location is confirmed, Docparser will return the date in a standardized format. We can keep the standardized format or choose to further process the extracted data by chaining up filters. Our third parsing rule will extract the value of a checkbox which is located at the bottom of our contracts. As with the other parsing rules, we first draw a square around the approximate position of the checkbox and hit confirm. As we can see, Docparser was able to identify the checkbox and determine whether or not it is checked. That’s it! We just created three parsing rules which convert our form based PDF contracts into easy-to-handle structured data. We are now leaving the parsing rule editor and we will flip through the sample contracts which we uploaded previously. All data fields have been extracted correctly. Now that our parsing rules are in place, we are ready to upload more contracts which will then be converted in real-time. Docparser also offers various options to fully automate your workflow. You can for example send new documents to Docparser via email or link a specific folder in Dropbox, Google Drive or Box to your Document Parser. Furthermore, Docparser can send the extracted data to other cloud-applications in real-time thanks to our integration partners. Stay tuned for more screencasts on how to automate your document based workflows with Docparser.
Views: 3201 Docparser
This PDF parsing video will cover the basics, from account set-up to simple PDF to Text conversion rules. You can navigate the app to find download options from PDF to CSV, Excel, JSON and XML. Additionally, you can use our api, native integrations, or connect with Zapier to send parsed PDF data to hundreds of places. Docparser is a fantastic solution for anyone needing to extract PDF data from Purchase Orders, Invoices, Reports, Shipping & Delivery PDF's, Applications, Price Lists and much more.
Views: 3437 Docparser