Multi-Engine Document Parsing
The right parsing engine for every document
Vision AI for visual layouts, Text AI for plain text, templates for fixed forms. All three engines run in the same mailbox.
What's included
Vision AI extraction
Vision models read pages as images rather than as text. The AI sees the document the way a human reader would, with full layout and visual context.
- Best for rich PDFs, scans, and forms with complex structure
- Captures handwriting, checkboxes, stamps, and layout cues
- Set up with plain-English instructions, no template required
Text AI extraction
Documents are first converted to plain text, using OCR when the source has no native text layer. The AI then parses the extracted text alone, ignoring layout and images.
- Best for emails, plain PDFs, and other text-first documents
- Useful when visual layout adds no information
- Set up with plain-English instructions, no template required
Template-based extraction
Add as many templates per mailbox as you need. Parseur auto-picks the best match per document, producing the same output every time, no AI involved.
- Best for standardized forms and machine-generated emails
- The most reliable extraction method when layouts never change
- Set up with a visual template editor, one per document layout
Table and line item extraction
Each row in a table becomes its own data record, not a single merged field. Works across all three parsing engines. For native spreadsheets, table parsing is automatic.
- Handles variable row counts from one document to the next
- Supports tables that span multiple pages
- AI engines support parsing complex, multi-line rows into separate fields
OCR for scanned documents and images
Optical Character Recognition reads text from scans, phone photos, and image-only PDFs. Feeds the Text AI and template engines when there's no native text layer.
- Works on scans, phone photos, and image-only PDFs
- Multilingual OCR across 200+ languages, including handwriting
- Template engine uses zonal and dynamic OCR for fixed or shifting layouts
Document pre-processing
Accurate extraction starts with cleaning up and repairing incoming documents. Parseur's pre-processing has been refined over 100M+ documents and a decade of real-world edge cases.
- Deskews tilted scans and re-runs OCR on garbled text
- Repairs corrupted PDFs, broken email encoding, and malformed HTML
- Detects country-specific date and number formats automatically
How Document Parsing works
What just happened
Document Intake
Documents were uploaded or arrived automatically from email, API, or connected storage.
Pre-process
Each document goes through a cleanup pass first. Parseur fixes page orientation, straightens tilted scans, and repairs garbled or misordered content as needed.
OCR
For scans, phone photos, and image-only PDFs, Parseur runs OCR to extract the text. Documents that already carry a native text layer skip this step.
Pick the engine
Parseur picks the right engine for each document automatically. Template-based parsing takes priority when a matching template exists, otherwise Vision AI handles image-rich pages and Text AI handles plain text content.
Extract
The selected parsing engine pulls structured fields out of the document, mapped to the schema you defined in your mailbox. From here, every field flows into normalization for formatting and validation.
What happens next
Data Normalization and Validation
Extracted fields are validated, formatted, and shaped for downstream workflows.