An email parser is a software tool that converts a raw email into a readable format. There are actually two main categories of email parsers. First, low-level email MIME parsers decode raw emails into a readable text format. Second, high-level email content parsers convert the content of emails into structured data. Structured data is a data format with structural meaning, i.e. understandable by a machine. Structured data can usually be visualized in Excel or used as input for another software (as part of an automated business workflow, for instance).
Infographic: What is an email parser?
We'll get into more details about email parsers. But first...
Let's define parse, what parsing is and what parsers do
Wait, maybe what pulled you off in the first place was the word "parser".
So what is a parser ?
Etymologically, the verb to parse comes from the Latin pars which meant the plural of part. So a parser has something to do with identifying parts of something.
In fact, a parser is a tool that can analyze and identify meaningful parts in a text. Using fancy words, data parsing means the process of analyzing a string of symbols, either in natural language or in computer languages, that conform to the rules of a formal grammar (thank you Wikipedia for making us look smart here).
A parser is a computer program that defines a set of instructions in its source code to analyze input sentences and transform them into data structures. This is usually done using parse trees for lexical and syntactic analyses.
Let's take an example if this is still too obscure. While you are reading this exact sentence, a sequence of letters on a screen, your brain makes sense of the meaning of it. Your brain acts as a parser:
- It first identifies a sequence of letters to make words. That's called lexical analysis.
- Then, it uses grammar and context to understand the meaning of the words put together to make a sentence. That's syntactic analysis.
You're doing parsing right now!
Parsers in computer science
In computer science, a parser is what makes it possible for a machine to understand what a programmer means when he types code in his programming language of choice. The parser reads the code and, through several layers of parsing, will ultimately convert it into a set of 0s and 1s, which will trigger things appearing on screen or data being sent through the internet.
The world of parsing in computer science has a deep and rich theoretical background, along with jargon like Lexical Analysis, Chomsky's grammar, Backus–Naur form etc. For more information, have a look at this introduction on PDF on Grammar and Parsing Techniques. It's a lot of fun!
Now that this has been hopefully cleared up, let's get back to our email parsers.
What is a MIME parser?
Audience: MIME parsers are intended for people with a technical / programming background.
MIME (for Multipurpose Internet Mail Extensions) is the internet standard format in which emails are encoded. MIME format supports the handling of different character sets, non-text attachments (such as pictures, audio) and multi-part message body which allows to combine it all altogether. Like most internet standard, MIME has been defined through a set of RFCs (Requests for comment) by the IETF: mainly RFC 2045, RFC 2046, RFC 2047, RFC 4288, RFC 4289 and RFC 2049 .
Emails MIME parsers are used to decode emails encoded in MIME. Such parser can extract the header (that includes the sender email, recipient email, subject, date, etc.), extract the body of the email and any attachment.
There is a large variety of open-source libraries that provide email MIME parsing in most programming language. For example:
- Python: Email library
- Ruby: Mail gem
- C/C++: Mimetic or VMime
- Java: Apache Commons Email
- PHP: MailParse
There are also a number of online SaaS platforms that offer MIME parsing as a service, such as:
What is an email parser?
Audience: email parsers are intended for people with a business process automation background. Email parsers are great for automating email data entry processes.
A major issue with emails is that they are just a flow of unstructured text by nature. Machines usually don't like unstructured data, which makes it difficult for somebody to include incoming emails into an automation workflow.
An email parser (aka email scraper, or email data extractor, or content email parser) is for people that need to extract some piece of text from their emails and put them into an Excel spreadsheet or feed it to another software for processing/tracking. In other words, an email parser extracts unstructured text from an email and transforms it into structured data.
These email parsers are especially useful to process large amounts of machine generated emails.
When to use an email parser?
There are a wide number of domains that use email parsers to help automate their businesses.
Some use case examples for a content email parser include:
- Parse e-commerce confirmation emails (from marketplaces like Amazon, Ebay, Etsy, Craiglist etc.). Then, feed them to a simple Spreadsheet or a complex logistics management software like SAP in order to manage and track the order processing
- Parse real estate notification emails coming from different real estate ad listing websites. Then, consolidate them all into a spreadsheet or in your CRM software of choice (e.g. Salesforce, Pipedrive, Zoho)
- Parse travel confirmation emails (e.g. flight confirmations, hotel confirmations, rental confirmations). And feed them to a corporate travel management software or just to create a travel map
- Parse network and system monitoring reports (e.g. Pingdom, NewRelic, Dynatrace). Consolidate all alerts in the same data warehouse in order keep track of and detect any problem automatically and centrally
- Parse social notification emails (e.g. from Twitter, Facebook, LinkedIn, Pinterest). Then, keep track of them and eventually make sure following users are thanked / onboarded / nurtured
- And many more, the sky is the limit! Machine generated emails are everywhere and contain a wealth of data that businesses rely on.