HomeНаука и техникаRelated VideosMore From: Docparser

Extract Tables From PDF to Excel, CSV or Google Sheet with Docparser

21 ratings | 17170 views
In this video we will show you how to extract tables from PDFs or scanned documents. You will learn how to convert native PDFs to Excel and how to apply OCR in case you are dealing with scanned documents. As sample documents we are using invoices and purchase orders following a fixed format. The technique of converting a PDF into Excel format can also be applied to other documents types. Check out our related blog articles: PDF to Excel: https://docparser.com/blog/convert-pdf-to-excel/ PDF to CSV: https://docparser.com/blog/convert-pdf-csv-extract-text-from-pdf/ PDF to Google Sheets: https://docparser.com/blog/pdf-to-google-sheets/ Let’s jump right in and create a new Document Parser. In this screencast we will choose ‘Purchase Orders’ as the document format. After confirming our document type we will upload a couple of sample documents. Once we are done uploading our sample documents we can proceed to our new Document Parser. As for invoice processing, Docparser comes with powerful data extraction algorithms tailored to parsing purchase orders. As we can see, Docparser already created three parsing rules which extracted the purchase order number, the purchase order date and the totals. We will now create a new parsing rule to extract the line-items from our purchase orders. Selecting the ‘Line-Items’ preset will bring us to the table extraction toolkit. All you need to do is to move, add or delete column spacers as needed to define the columns of your table. Then select the approximate outer boundary of the data table you seek to extract. When you proceed, you will see your parsed table rows in raw unfiltered format on the top. At this point we basically already converted our PDF into a spreadsheet data. You can now chain up multiple filters to further process your PDF table data until you end up with clean table rows. Docparser offers a variety of table manipulation filters. For example, you can add a filter which groups rows together in case the line-item descriptions goes over two lines. Adding this additional filter gives us a clean table representation of the line items in our purchase order. Let’s hit save and leave the parsing rule editor. We can now flip through the sample purchase orders which we uploaded previously. As you can see, all purchase order data is extracted correctly, including all line-items. That’s it for today! We just showed you how easy it is to extract data from PDF to Excel, CSV or Google Sheets. Stay tuned for more screencasts covering advanced parsing setup and automating your workflow thanks to our Docparser integration partners.
Html code for embedding videos on your blog
Text Comments (1)
LESLEY CHETTY (10 months ago)
how can i get to use this software

Would you like to comment?

Join YouTube for a free account, or sign in if you are already a member.