Data Normalization and Validation
Same shape, clean data for every document
From mailbox schemas to post-processing, every extracted value lands clean, validated, and ready for downstream systems.
What's included
Mailbox-level schemas
A consistent schema is what makes downstream integrations and automations actually reliable. Define your fields once and every document the mailbox processes maps to the same shape.
- Standard fields for single values, table fields for repeating data
- Plain-English instructions tell the AI what to capture for each field
- Adjust fields anytime through the UI, or programmatically via the API
Field-level formatting
Built-in formats normalize dates, numbers, addresses, and more. The right format is inferred from document context, with mailbox-level defaults as fallback.
- Dates parse any order, separator, or month name across languages
- Numbers parse any decimal/thousands separator across regional formats
- Address fields geolocate and split addresses into structured parts
Data validation
Every extracted result is verified against the mailbox schema. Failures surface in the UI, trigger an email notification, and fire a webhook, so both ops teams and tools hear about them.
- Schema check confirms the AI result matches the field shape
- Required-field check catches missing values at the source
- Choice-field check flags values outside the allowed list
Post-processing rules
When standard formatting and validation aren't enough, drop in a small Python script. Rules run after extraction to reshape values or run custom validation against your business logic.
- Combine, split, or compute new fields from extracted values
- Apply business logic, lookups, or conditional transforms
- Available on Pro plan and above
How Data Normalization works
What just happened
Multi-Engine Document Parsing
Vision AI, Text AI, templates, or OCR pulled structured fields from each document.
Map to schema
Extracted values are mapped to the fixed set of fields defined for the mailbox. Every document, no matter the source layout, ends up with the same column shape on output.
Format
Each field runs through its configured format. Dates and numbers normalize across regional variations using document context, names split into first/middle/last, addresses parse into structured parts.
Validate
Each result runs through the validation checks before moving on. Documents that pass continue to post-processing, the rest are flagged so nothing leaves Parseur unnoticed.
Post-process
Optional Python rules run last, applying business logic that field-level formatting can't express. Combine fields, look up reference data, or shape output to match an exact downstream contract.
What happens next
Real-time Exports and Integrations
Normalized data is delivered to your CRM, accounting system, or database in real time.