How to extract tables from emails

In this article we'll see how to set up Parseur to extract tables from emails and parse result into structured data.

Extract tables from emails cover image

Extracting tables from emails is very useful, for example, when you want to parse shopping orders coming in your mailbox. There is typically varying number of items in each email and you don't want to create a template for each different number of ordered items. For each ordered item, you want to extract several data points such as item name, quantity and price.

This article will assume you already have a Parseur account. If not, you can sign up for free here. Also, we will assume you already know how to create templates to parse simple emails with Parseur. If not, we recommend you to first read this Getting Started with Parseur guide.

Prefer a video?

Extracting a table from your emails, automatically

For the purpose of this article, let's assume we want to extract details about book orders on Amazon. Whenever we order on Amazon, Amazon sends a confirmation email containing order detail. An Amazon confirmation email looks like this:

Example of an Amazon confirmation email containing the list of books ordered

Example of an Amazon confirmation email containing the list of books ordered

We want to create a template that can extract order details, including for each item the book title and the book price. We want to create a single template that will extract all data, irrespective of the number of books ordered.

Here is what you get after processing this email in Parseur:

Field Value
Reference 123-4567890-1234567
Date Sunday, October 7
Customer Address
Jane Doe
FAIRVIEW HEIGHTS
VAN HOUTEN HILL
LONDON, ZZ9 ABC
United Kingdom
Items[0].Title Fear and Loathing in Las Vegas
Items[0].Price £12.99
Items[1].Title The Tempest
Items[1].Price £9.99
Items[2].Title The Hitchhiker's Guide to the Galaxy
Items[2].Price £14.99
Total £37.97

And here is the same result, viewed in JSON format:

Data extracted by Parseur, in JSON

Data extracted by Parseur, in JSON

Staying true with our philosophy, capturing a table in Parseur is as simple as point and click.

Step 1: Create a table field

  • In the template editor, select the whole table you want to capture.
  • Click New Table Field to create a new field for that selection.

The table option panel opens on the right-hand side and the table gets highlighted in green.

Note: for best results, make sure the document you use for the template contains at least 2 rows (i.e. in our example, 2 books).

Select the whole table section and create a new table field

Select the whole table section and create a new table field

Step 2: Set table name and options

Use the Table option panel on the right-hand side to customize your field:

  • Name: Enter a name for the table field
  • Output format: leave as is ("Table")
  • Input format: In most cases, leave as is ("Vertical HTML Table"). Parseur will assume your table rows are laid out vertically, top to bottom.
    • If your table rows are laid out horizontally from left to right, select "Horizontal Table"
    • If your table is Text only, choose "Vertical Text Table" (or "Horizontal Text Table" respectively).
  • Table Header: If the first row of your table contains the column names and Parseur was able to correctly identify columns in the result preview, tick the box "Table headers included in selection". Otherwise, go to Step 3 below to teach Parseur how to identify your columns.

Step 3: Create table columns

At the bottom of your document, Parseur shows you a live preview of the table parsing.

You have several ways to customize Parseur table parsing to your needs.

Use default parsing

If the table preview shows the data as you want it, great!

To name the columns:

  • Click on the column name
  • Select the column name from the list or type a new one
Click on the column header to edit its name

Click on the column header to edit its name

Customize table parsing and columns

If Parseur didn't get the table parsed the way you wanted, you can improve the parsing result by telling Parseur which data you want to extract.

In this example, we are going to create column fields to capture the information we want to extract. And then, we're going to point those columns across multiple rows until the preview result is what we want.

To do this:

  1. Select a piece of text you want to extract on the first row of the table
  2. Click New Table Column
  3. Enter a name for that new column and hit Enter (or click the tick button)

Repeat steps 1. to 3. for each column field you want to extract.

Now we need to teach Parseur how to split rows:

  • Select similar fields on the following rows and assign it to the existing field.
  • Do this until the result preview suits what you need.

When the table preview at the bottom of the document shows the desired result you are done!

Create column fields and teach Parseur where they are in the table until you get the desired result

Create column fields and teach Parseur where they are in the table until you get the desired result

Advanced: Further customize table parsing with separators

If you need to do some advanced table parsing, you can set your own table row and cell separators.

Parseur uses regular expressions to split the table into rows and the rows into cells.

You will find the regular expressions generated by Parseur in the Advanced Options panel, under Row Separators and Cell Separators. Tick the Manually set separators box to customize the separators.

Open the Advanced Options panel to manually set separators

Open the Advanced Options panel to manually set Separators

The separators are regular expressions that Parseur will use to slice the table into cells.

Examples of using separators:

  • To slice the table into rows based on the HTML </tr> attribute, enter </tr> in row separator
  • To slice the table into rows based on a new line, enter \n in row separator
  • To slice the rows into cells based on either </th> or </td>, enter </th>|</td> in cell separator (| means or in regexps)
  • To slice the rows into cells for every 2 or more spaces (common for text-based tables), enter \s{2,} in cell separator (\s means any space character and {2,} means previous pattern needs to be matched 2 or more times)

You can get very far with regular expressions (including using lookaheads and lookbehinds). Check out the following link for the full reference: https://docs.python.org/3.6/library/re.html>

Step 4 (optional): Filter table rows

Parseur can filter out unwanted rows. For example you may want to keep rows that only match a specific value in the certain column. You can do that by setting constraints on column fields.

To filter table rows:

  1. Click on the edit button of the column field you want to set a constraint on
  2. Open the Advanced options panel
  3. In the contraint option, enter what the column needs to match in order to be kept. The constraint input accepts regular expressions.
  4. To setup multi-column constrains, save this field and edit other columns constraints.

Examples:

  • To filter on rows with John Smith in the name colunm, enter John Smith in the constraint input of the name column.
  • To filter on rows where column ID end with ABC, enter ABC$ in the constraint input of the ID column

Note: By default, if the filter returns no row, Parseur will set the document to "New Template Needed". If you want to allow tables with empty rows, check the Allow empty tables box when editing the table field.

Step 5: Save your changes

Click the Save Field button when you're done editing the table field.

Then, continue editing the template adding new fields and finally click Create Template when completed.

Step 6: Export your table data

You have several strategies available when it comes to exporting your table data. This section lists the different strategies depending on the type of export.

Download and Google Sheet

There are 2 options to export your table data:

  • The default export keeps the "1 document per sheet row" convention: table field rows will be added as columns.
  • The table export uses the "1 table field item per sheet row" convention: each table field item will be on its own sheet row and other fields extracted from the document will be repeated across rows.
Choose between default export and table export depending on how you want your data to be formatted

Choose between default export and table export depending on how you want your data to be formatted

Zapier

There are 3 main ways to send table data to Zapier

1. New Table Processed trigger

Send the table directly through Zapier, each row in this table will turn into a row (or item) in the target application (for example, a row in Google Spreadsheet). Non-table fields in the document are repeated for each row. This is the trigger you will most likely need.

Parseur Table Processed Trigger on Zapier

Parseur Table Processed Trigger on Zapier

2. New Document Processed

Send each document as a whole with all tables flattened. Flattened means each document looks like the table mode result on the right of your document. It also means that the number of columns varies with the number of rows in the largest table. For this reason, This trigger is usually not recommended when using tables.

Parseur Document Processed Trigger on Zapier

Parseur Document Processed Trigger on Zapier

New Table Processed should provide the best compatibility with other apps. If you have more advanced needs we have one more solution, read on!

To access the last option, click on the show less common options link

To access the last option, click on the show less common options link

3. New Document Processed (With List Items)

Send the document with nested tables. This is similar to the JSON mode result on the right of your document. The number of columns does not vary with the number of rows. But the receiving app must be capable of handling item lists. This is the best choices for apps that can handle list items in Zapier, Freshbooks being one of them for example.

Parseur Document Processed (With List Items) Trigger on Zapier

Parseur Document Processed (With List Items) Trigger on Zapier

Webhooks

There are 3 options to export your table fields via webhooks:

  • Document processed: this is the default option. JSON payload is a deep key/value object with table fields values sent as an array (equivalent to the JSON screen capture at the beginning of this article)
  • Document processed (flattened): JSON payload is a flat key/value object with keys named as in the default table export (e.g. "Items[1].Title")
  • Table item processed: JSON payload is an array of table item objects where other fields are repeated across table items (similar to the table export for Google Sheets).

Last updated: