Zonal OCR: easily convert documents into structured data

Zonal OCR goes one step further from traditional Optical Character Recognition. It lets you extract text at specific locations ('zones') on document pages. Zonal OCR is effectively the simplest way to transform raw document content produced by OCR into structured data.

How does Zonal OCR work?

Using Zonal OCR in Parseur is very easy and intuitive. Creating fields to extract text is as simple as 1, 2, 3, repeat!


Draw a zone on the page

Locate a bit of text you want to extract and draw a box over it with your mouse.

Name your field

Click the "Create Field" button and give your new field a meaningful name.

Set field options

Optionally, customize your field options, like its format (date, time, location, contact name) or its requiredness.


That's it! Repeat the operation for every field you want to extract. Parseur will then extract the data at the zones you drew for every document you upload with a similar layout.

Differences between OCR and Zonal OCR

Zonal OCR is a step further from traditional OCR and makes it possible to reuse individual data points trapped in your documents into your business workflows.

Traditional OCR

Convert document to plain text

document converted to text with ocr

OCR identifies all characters from a document and converts them into plain text.

Traditional OCR is best for indexing documents' content and making them searchable. But it won't allow you to easily reuse individual data points into other applications as data remains unstructured (it's just plain text).

Zonal OCR

Convert document into structured data

document converted to structured data with zonal ocr

Zonal OCR extracts text at specific zones that you define on the page and convert them into well-formed data, such as JSON.

Zonal OCR is best to transform documents (unstructured by nature) into structured data. As drawing zones on documents is a visual process, Zonal OCR is easy to work with and troubleshoot.

Should you use Zonal OCR? Pros and Cons.

Zonal OCR is the simplest way to extract structured data from documents. But you should keep in mind its limitations for real-life use.

Advantages of Zonal OCR

Full control

Zonal OCR allows you to extract the exact data you are interested in, name the data points in a way that will make sense for your workflow and normalize its content (dates, number, address..).

Ease of setup

Creating fields with Zonal OCR couldn't be easier: just draw a box over each of the data points you need. No need to tinker with parsing rules or regular expressions.

Easy to troubleshoot and adjust

Zonal fields are easy to reason about. When something goes wrong, just visually overlay the field position on your current document to check if its position is correct or adjust it otherwise.

Limitations of Zonal OCR

Cannot handle fields that "move"

By design, Zonal OCR extracts text at a fixed position on a document page. If a field position moves from one document to the next, you may end up capturing partial or unrelated data.

Cannot handle fields of varying size

For the same reason as above, fields captured with Zonal OCR have a fixed width and height. Capturing variable-sized data like addresses or tables with Zonal OCR is challenging.

Cannot usually handle badly scanned documents

Pages on scanned documents of bad quality can vary in scale and orientation. That can make Zonal OCR unreliable for those types of documents as the position of each data point to extract varies slightly from one scan to another.

Intelligent data extraction with Dynamic OCR

Parseur's powerful OCR capabilities overcome the limitations of Zonal OCR using Dynamic OCR but also multi-templates and automatic layout detection.

Dynamic OCR

With Dynamic OCR, easily extract text from fields that move horizontally, vertically or change size from one document to the next.

Powerful template engine

Extract data from various layouts by creating multiple templates and using automatic layout detection.

Best-in-class OCR software

Parseur OCR accuracy is the best on the market. It supports most languages, including handwritten and is blazingly fast.

