Howdy, I'm Sylvain, building software here at Parseur. We just released a major feature: a new system to parse PDF files visually.

New: Extract data from PDF visually
Parsing PDF documents using OCR is the most requested feature on our feature upvote page.
Improved reliably for complex documents
We used to convert PDF documents into text, trying to preserve the original layout of the pages. It worked great for simple documents (and that why we are keeping the text engine along with the new one).
However, this made it particularly difficult for our legacy, text-based engine to reliably extract data from complex PDF documents.
That is why we are introducing a new parsing engine, called OCR (for Optical Character Recognition). The OCR template editor allows you to create templates by drawing boxes around the text you want to extract. You can also define labels which are acting as landmarks or anchors in your document, helping the engine to position the fields in the page.
You'll find more detailed informations on our support page here: Create your first OCR template.
Optional fields, at last!
This new engine allows you to define optional fields, and is more resilient to small changes in the document layout. It's also faster to build templates, and easier to adjust them, without having to create them from scratch. This is because you can attach several samples to a given template. This allows you to define fields that may show up on some documents but not all.
Complete retro-compatibility
All the current features, such as tables, metadata, post-processing and static fields, keep working with the new engine. The output data format is the same, webhooks are unchanged.
This new engine works along the current one, and you can even mix and match the templates from both engines in the same mailbox, to get the best of both worlds.
If you have both text-based and OCR templates in your mailbox, the template with the most fields will take priority over the others.
Per-page pricing
One credit is now accounted for each successfully parsed page. If a document is not composed of several pages (like a long email or a spreadsheet), then just one credit is accounted when that document gets successfully processed, regardless of the document's length, as usual.
Beta starts now
We are now starting the beta phase: we let you try the new engine by simply asking us in the chat or by email at support@parseur.com. The goal here is to collect your feedback and improve the system, fix bugs and implement features that you need. If you decide to join the beta program, expect bugs and imperfections, but please report them to us; we'll fix them as quickly as we can.
What's next?
After the beta phase is over and the new OCR engine is available for all, we plan to make it work with all HTML documents such as emails and web pages.