Troubleshooting scanning/OCR for troublesome PDF files

Our software, powered by the world-class Omnipage engine, is great at handling a huge variety of PDF files. In the unlikely event of your coming across a PDF that does not scan/OCR well – we have identified three in ten years – then here is a technique that might help.

  1. Open the PDF with ClaroRead Plus/Pro/Scan2Text as normal.
  2. Save as a TIFF image file, not a PDF or Word document.
  3. Now open this TIFF image file version and output as PDF or Word.

Rendering the image into an image file, then performing OCR on the image, circumvents some of the potential PDF complexities that might be causing problems.

As a last resort: if the PDF still does not open/scan OK, but opens in Adobe Reader or Acrobat Reader DC, then you can open it in Adobe and use ClaroRead’s Scan from Screen option to read back the text.