PDF-XChange Editor Plus V7 User Manual > Tabs Guide > Convert Tab

OCR Pages

Click OCR Pages to perform optical character recognition on documents:

ocr.pages.location.convert

Figure 1. Convert Tab, OCR Pages

The OCR process in PDF-XChange Editor analyzes image-based documents, recognizes text and then places a duplicate, invisible text layer on top of it, which makes the source text selectable and searchable in the same manner as ordinary text. When this option is selected the OCR Pages dialog box will open:

ocr.pages.dialog.v7

Figure 2. OCR Pages Dialog Box

Use the Page Range settings to determine the page range for OCR:

•Select All to specify all pages.

•Select Current Page to specify the current page.

•Select Pages to specify a custom page range. Further information on defining page ranges is available here. Use the Subset dropdown menu to specify a subset of page ranges. Select All Pages, Odd Pages Only or Even Pages Only as desired.

Use the Recognition settings to determine the language and accuracy of the OCR process. Please note that increasing the accuracy also increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document contains imperfections. This is because the software will search to a greater depth and may attempt to recognize imperfections as text. Click More Languages to view available language packs.

Use the Output options to determine the format and quality of output from the OCR process:

•Select an option in the Output Type dropdown menu to determine the output format:

•Select Create New Searchable PDF to create a duplicate of the current document in which text is searchable/selectable. The process of OCR makes this possible, as it inserts an invisible text layer that contains the OCR results over the document. The invisible text layer is structured to match the layout of images in the original document that were identified as text during the process. This has the effect of converting image-based content into searchable/selectable text, as the layer is invisible. (Note that it is only possible to search and select text identified during the OCR process - it is not possible to edit text).

•Select Preserve Original Content and Add Text Layer to add the invisible text layer detailed above to the source document, as opposed to creating a new document.

•Use the Quality dropdown menu to determine the resolution of new documents when then the Create New Searchable PDF option is used.

•Select the Auto Deskew box to deskew documents automatically when the Create New Searchable PDF option is used. (Deskewing is a useful feature that straightens images that have been photographed or scanned crookedly).

•Select the Do not OCR pages that already contain text content items box to omit pages that contain text-based content items from the process.

Click OK to OCR documents.