Zonal OCR: easily convert documents into structured data
Zonal OCR goes one step further than traditional Optical Character Recognition. It lets you extract text at specific locations ('zones') on document pages. Zonal OCR is effectively the simplest way to transform raw document content produced by OCR into structured data.
How does Zonal OCR work?
Using Zonal OCR in Parseur is very easy and intuitive. Creating fields to extract text is as simple as 1, 2, 3, repeat.
-
1
Draw a zone on the page
- Locate a bit of text you want to extract and draw a box over it with your mouse.
-
2
Name your field
- Click the "Create Field" button and give your new field a meaningful name.
-
3
Set field options
- Optionally, customize your field options, like its format (date, time, location, contact name) or its requiredness.
-
Repeat
- Repeat the operation for every field you want to extract. Parseur will then extract the data at the zones you drew for every document you upload with a similar layout.
Differences between OCR and Zonal OCR
Zonal OCR is a step forward from traditional OCR. Rather than extracting the full text from a document, Zonal OCR extracts structured data that can be used in your business workflows.
Traditional OCR
Convert document to plain text
OCR identifies all characters from a document and converts them into plain text.
Traditional OCR is best for indexing documents' content and making them searchable. But it won't allow you to easily reuse the data into other applications as the data remains unstructured (it's just plain text).
Zonal OCR
Convert document into structured data
Zonal OCR extracts text at specific zones that you define on the page and converts them into well-formed data, such as JSON.
Zonal OCR is best to transform documents (unstructured by nature) into structured data. As drawing zones on documents is a visual process, Zonal OCR is easy to work with.
Should you use Zonal OCR? Pros and Cons.
Zonal OCR is the easiest way to extract structured data from documents. But you should keep in mind its limitations for real-life use.
Advantages of Zonal OCR
-
✅
Full control
- Zonal OCR allows you to extract the exact data you are interested in, name the fields in a way that will make sense for your workflow, and normalize its content (dates, numbers, address..).
-
✅
Ease of setup
- Creating fields with Zonal OCR couldn't be easier: just draw a box over each of the fields you need. No need to tinker with brittle parsing rules or regular expressions.
-
✅
Easy to debug and adjust
- Fields extracted by Zonal OCR are easy to reason about. When something goes wrong, just visually overlay the field's position on your current document to check if its position is correct or adjust it otherwise.
Limitations of Zonal OCR
-
❌
Cannot handle fields that "move"
- By design, Zonal OCR extracts text at a fixed position on a document page. If a field position moves from one document to the next, you may end up capturing partial or unrelated data.
-
❌
Cannot handle fields of varying size
- For the same reason as above, fields captured with Zonal OCR have a fixed width and height. Capturing variable-sized data like addresses or tables with Zonal OCR is challenging.
-
❌
Cannot usually handle badly scanned documents
- Pages on poorly scanned documents can vary in scale and orientation. That can make Zonal OCR unreliable for those types of documents as the position of each field to extract varies slightly from one scan to another.
Intelligent Data Extraction with Dynamic OCR
Parseur's powerful OCR capabilities overcome the limitations of Zonal OCR using Dynamic OCR but also multi-templates and automatic layout detection.
Dynamic OCR
With Dynamic OCR, easily extract text from fields that move horizontally, vertically or change size from one document to the next.
Powerful template engine
Extract data from various layouts by creating multiple templates and using automatic layout detection.
Best-in-class OCR software
Parseur OCR accuracy is the best on the market. It supports most languages, including handwritten, and is blazingly fast.