How to extract email metadata using Parseur
Parseur flagship feature is to extract custom data from document content. However, sometimes you want to extract email metadata as well and include them in your parsed data. Let's see how to include metadata in your parsed data and which metadatas are available in Parseur.
What is email metadata?
Email metadata is all the information that surrounds the content of the email or document itself.
- Subject, Sender (From), recipient (To), carbon copy (CC) and blind carbon copy (BCC) information
- Email date and time of reception
- Information regarding the mail servers and network routing
In addition to parsing custom data from email and document content, you can also extract metadata using Parseur.
How to extract email metadata in Parseur?
In Parseur, metadata fields are called "Extra fields". This is as opposed to the "Custom fields" that you make when creating templates.
To add Extra fields in Parseur:
- Open the Parseur App
- Make sure you have created a mailbox or create a new one (if you're new to Parseur, head over to our getting started article)
- Open your Parseur mailbox
- Click on the Fields section on the left-hand side menu
- This section will list all available metadata extra fields below your custom fields
- Click on the extra fields you need. You can also mouse over them to get more information about the extra field.
Parseur can parse different types of document metadata.
Let's go through them.
Date and Time metadata
Metadata fields about dates and times:
- Received: date and time when Parseur received the document
- ReceivedDate: date when Parseur received the document
- ReceivedTime: time of the day when Parseur received the document
These fields are formatted according to your Date and Time formatting preferences. Head over to your User Preferences to change them.
Email address metadata
Metadata fields about email addresses:
- Sender: the email address that sent the email to Parseur. This is usually be the same address as the OriginalRecipient address, unless your mailbox receives emails from different aliases (or is a catch-all)
- Recipient: the email address that receives the email. It is your Parseur mailbox address (in the form <your-mailbox-name>@in.parseur.com)
- To: the "To" field of the email. The "To" field can contain several email addresses.
- CC: the "CC" field of the email. The "CC" field can contain several email addresses.
- BCC: the "BCC" field of the email. The "BCC" field can contain several email addresses.
- ReplyTo: the email address to reply to (if set)
- RecipientSuffix: the recipient suffix (or alias suffix) that you used. Say you have created a mailbox named email@example.com. You can send emails to firstname.lastname@example.org or email@example.com and all emails will land in the same mailbox. When you use such aliases, the RecipientSuffix field contains what is after the + (for example test123 and id456 in the examples given before). This is particularly useful if you forward emails from different sources and want to know which source sent what email.
- OriginalRecipient: the email address that receives the email before forwarding it to Parseur. Note: this will only work after you set up automatic forwarding of your emails (it will be equal to the Recipient otherwise)
Document Content metadata
Metadata fields about document content:
- Subject: the title of the document. Depending on the type of document, this is either:
- the subject of the email
- the filename of the attachment
- the URL of the linked web page
- HtmlDocument: the full content of the document including HTML formatting
- TextDocument: the full content of the document in Text (excluding any HTML formatting)
- LastReply: the content of the last reply in the email chain (in plain text). Note: this field is limited to English text replies and is currently tested on the following email platforms: Yahoo, iCloud, Gmail, Outlook.com, iOS Mail, Apple Mail, Microsoft Outlook (Windows & Mac), and Mozilla Thunderbird. Parseur makes a “best attempt” to parse all inbound replies. We also cannot parse HTML email parts of replies to populate this field — it will only be applicable when there is a plain text email part in the reply.
These fields are useful if you set up a trigger for when a document cannot be parsed. This way, not only can you get a real time notification when a document parsing fails, but you can also check the title and content of the document without having to log onto Parseur.
Metadata fields specific to Parseur:
- DocumentID: a unique ID that identifies the document in Parseur
- ParentID: ID of the parent of the document (if any). For example, if you send an email with attachments, the attachment ParentID will be the email DocumentID.
- DocumentURL: a link to the document in Parseur App. Useful if you have an integration where you want to be able to quickly open the app and check the document. This link will redirect to Parseur App and hence requires you to be authenticated with Parseur to access it.
- PublicDocumentURL: a public link to the Document. You need to be very careful when sharing this link as anyone with the link can access your document without any authentication.
- Template: the name of the Parseur template that was used to parse the document.