OCR Pages

<< Click to Display Table of Contents >>

Navigation:  Actions Library > Pages Actions >

OCR Pages


 

ocr.pages.large.iconOCR Pages

 


 

The OCR Pages action performs optical character recognition on documents. Note that two optical character recognition engines are available in PDF-Tools: the Default OCR engine and the Enhanced OCR engine, which is an optional extra in PDF-Tools that was added in version 8. The Enhanced OCR engine is faster, more accurate and more dynamic than the default OCR engine, and it also contains some extra features. Further information about the Enhanced OCR engine is available here. If you have purchased the Enchanced OCR engine, then you can use the OCR preferences detailed here to switch between the Default and Enhanced OCR engines.

 

This action contains the following customizable parameters:

 

ocr.pages.action.options.

Figure 1. OCR Pages Action Options

 

Select an option in the dropdown menu to determine the action taken when input documents contain text:

Select OCR Document to perform OCR on input documents.

Select Do not OCR but continue processing to omit the OCR process from the operation and continue with the remaining actions.

Select Skip processing the document to exclude the document from processing.

Click More Options to view/edit all options. The OCR Pages dialog box will open, as detailed below.

Select the Show setup dialog while running box to launch the OCR Pages dialog box and customize settings each time this action is used. Clear this box to disable the OCR Pages dialog box from opening each time the action is used, which is useful when the same settings are used consistently.

 

Note that the options in the OCR Pages dialog box depend on the OCR engine being used:

 

Default OCR Engine

 

ocr.pages.dialog.box

Figure 2. OCR Pages Dialog Box

 

Use the Page Range settings to determine the page range for OCR:

Select All to specify all pages.

The Current Page option is not currently available. It will be available in a future build.

Select Pages to specify a custom page range. Further information on defining page ranges is available here.

If a custom range is specified, then select an option in the Subset dropdown menu to determine a subset of pages.

Select the Skip pages that already contain text content items box to omit pages that contain text content from the OCR process.

 

Use the Recognition settings to determine the language and accuracy of the OCR process. Please note that increasing the accuracy also increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document contains imperfections. This is because the software will search to a greater depth and may attempt to recognize imperfections as text. Click More Languages to view available language packs.

 

Click OK to save settings.

 

Enhanced OCR Engine

 

enhanced.ocr.dialog.box

Figure 3. OCR Pages (Enhanced) Dialog Box

 

The options in this dialog box are the same as those detailed in (figure 2) but with additional Output Options, which are available in the dropdown menu:

 

Select Searchable Image to retain the image-based content on which OCR is performed and insert a duplicate, invisible text layer on the text recognized during the operation. This will make the source text selectable and searchable in the same manner as ordinary text.

Select Editable Text and Images to replace image-based text in source documents with the text recognized in the process of optical character recognition. This will convert image-based text into editable text.

Select Fine Page Content to replace the content of source documents with new content that contains only the text and images recognized during optical character recognition.

 

Click OK to OCR documents.