How to extract tables from emails
In this article we'll see how to set up Parseur to extract tables from emails and parse result into structured data.
Extracting tables from emails is very useful, for example, when you want to process shopping orders coming in your mailbox. There is typically varying number of items in each email and you don't want to create a template for each different number of ordered items. For each ordered item, you want to extract several data point such as item name, quantity and price.
This article will assume you already have a Parseur account. If not, you can sign up for free here. Also, we will assume you already know how to create templates to parse simple emails with Parseur. If not, we recommend you to first read this Getting Started with Parseur guide.
Prefer a video?
Extracting a table from your emails, automatically
For the purpose of this demonstration, let's assume we want to extract details about book orders on Amazon. Whenever we order on Amazon, Amazon sends a confirmation email containing order detail. An Amazon confirmation email looks like this:
We want to create a template that can extract order details, including for each item the book title and the book price. We want to create a single template that will extract all data, irrespective of the number of books ordered.
Here is what you get after processing this email in Parseur:
|Date||Sunday, October 7|
|Customer Address||Jane Doe FAIRVIEW HEIGHTS VAN HOUTEN HILL LONDON, ZZ9 ABC United Kingdom|
|Items.Title||Fear and Loathing in Las Vegas|
|Items.Title||The Hitchhiker's Guide to the Galaxy|
And here is the same result, viewed in JSON format:
Staying true with our philosophy, capturing a table in Parseur is as simple as point and click.
Step 1: Create a table field
- In the template editor, select the whole table you want to capture.
- Click New Table Field to create a new field for that selection.
The table option panel opens on the right-hand side and the table gets highlighted in green.
Note: for best results, make sure the document you use for the template contains at least 2 rows (i.e. in our example, 2 books).
Step 2: Set table name and options
Use the Table option panel on the right-hand side to customize your field:
- Name: Enter table field name
- Output format: leave as is ("Table")
- Input format: In most cases, leave as is ("Vertical HTML Table"). Parseur will assume your table rows are laid out vertically, top to bottom.
- If your table rows are laid out horizontally from left to right, select "Horizontal Table"
- If your table is Text only, choose "Vertical Text Table" (or "Horizontal Text Table" respectively).
- Table Header: If the first row of your table contains the column names and Parseur was able to correctly identify columns in the result preview, tick the box "Table headers included in selection". Otherwise, go to Step 3 below to teach Parseur how to identify your columns.
Step 3: Create table columns
At the bottom of your document, Parseur shows you a live preview of the table parsing. We are going to create column fields to tell Parseur which information we want to extract. And then, we're going to points those columns across multiple rows until the preview result is what we want.
To do this:
- Select a piece of text you want to extract on the first row of the table
- Click New Table Column
- Enter a name for that new column and hit Enter (or click the tick button)
Repeat steps 1. to 3. for each column field you want to extract.
Now we need to teach Parseur how to split rows:
- Select similar fields on the following rows and assign it to the existing field.
- Do this until the result preview suits what you need.
When the table preview at the bottom of the document shows the desired result you are done!
Advanced usage: Table parsing setting using Regular Expressions. Parseur can reliably extract table data from documents just by pointing and clicking the fields as described here. In case you need to do some advanced table parsing, you can set your own table tow separators and table cell separators as Regexps in the Advanced options section.
Step 4: Save your changes
Click the Save Field button when you're done editing the table field.
Then, continue editing the template adding new fields and finally click Create Template when completed.
Step 5: Export your table data
You have several strategies available when it comes to exporting your table data. This section lists the different strategies depending on the type of export.
Download and Google Sheet
There are 2 options to export your table data:
- The default export keeps the "1 document per sheet row" convention: table field rows will be added as columns.
- The table export uses the "1 table field item per sheet row" convention: each table field item will be on its own sheet row and other fields extracted from the document will be repeated across rows.
There are 3 main ways to send table data to Zapier
1. New Table Processed trigger
Send the table directly through Zapier, each row in this table will turn into a row (or item) in the target application (for example, a row in Google Spreadsheet). Non-table fields in the document are repeated for each row. This is the trigger you will most likely need.
2. New Document Processed
Send each document as a whole with all tables flattened. Flattened means each document looks like the table mode result on the right of your document. It also means that the number of columns varies with the number of rows in the largest table. For this reason, This trigger is usually not recommended when using tables.
New Table Processed should provide the best compatibility with other apps. If you have more advanced needs we have one more solution, read on!
3. New Document Processed (With List Items)
Send the document with nested tables. This is similar to the JSON mode result on the right of your document. The number of columns does not vary with the number of rows. But the receiving app must be capable of handling item lists. This is the best choices for apps that can handle list items in Zapier, Freshbooks being one of them for example.
There are 3 options to export your table fields via webhooks:
- Document processed: this is the default option. JSON payload is a deep key/value object with table fields values sent as an array (equivalent to the JSON screen capture at the beginning of this article)
- Document processed (flattened): JSON payload is a flat key/value object with keys named as in the default table export (e.g. "Items.Title")
- Table item processed: JSON payload is an array of table item objects where other fields are repeated across table items (similar to the table export for Google Sheets).