At Parseur, we started with extracting data from emails. But Parseur’s ultimate mission is to extract data from the Internet as a whole. In this article we are going to describe how to use Parseur to parse a webpage from a link in an email.
Step 1: Create a Parseur mailbox
If you haven’t already done so, create your Parseur account and you’ll get started on our free forever plan. Create your first mailbox and forward an email containing a link you want to extract data from.
Note: Parseur works best with machine generated emails.
Step 2: Create a template to capture the email link
Once your email is in Parseur, create your template like any other template, which is as easy as Point & Click.
Here, the only information we are interested in is the link, so let’s select it and create a new field.
Note for geeks: if the link URL you want to capture is inside an HTML link, switch to the Source View to create your field and only select the url piece inside the href attribute of the HTML link.
Now click on the edit button right of the field name, and change the format to “Linked Document”
Click Update then Create to save the template.
Two things are going to happen:
- You email is going to be parsed and the link will be extracted
- After a few seconds you’ll see the new downloaded webpage appear in the document queue.
The status of the document is “New template needed” because we now need to tell Parseur what information we need from this webpage. We do that by creating a new template.
Step 3: Create a template for the fetched webpage
Now create a template for the webpage by clicking on the plus button.
Click Create and…
Step 4: Watch Parseur parse a webpage and profit!
Creating the template will parse your document and extract the relevant data from it.
Now, every time you send a similar email with a link, the webpage will be fetched and if it matches one of your existing templates, data from the webpage will be parsed and extracted automatically.
Some closing remarks
Parseur is not limited to extracting links from emails. Any field in a template with the format “Linked Document” will be used to download documents and extract data. That means that you can fetch webpages from email attachments as well as from other webpages!
Parseur charges you on the number of successfully processed documents. Which means that fetching a webpage from an email link and parsing that webpage will count as 2 credits.
Infinite loop warning
Since the address of the document appears in the subject, try not to create a template that extract and fetch the subject as a link, it will create a new document from the same link again and again…