Tesseract



Tesseract

Author: Jerry Chae

It is one of most used Open Source OCR solutions

https://en.wikipedia.org/wiki/Tesseract_(software)

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.

Need help?

Technical contact to tech@argos-labs.com


May you search all operations,



Prerequisite


You must have the local Tesseract module installed.

Installation file name is:    tesseract-ocr-w32-setup-v4.1.0.20190314.exe


If the local Tesseract module has not been installed, the plugin will fail to run and will give a message to ask installation as well as the URL to download the module.


If no local module has been installed, this error message will appear.
The message contains the module file name and the URL to download it from.

 


Here is the download URL.

https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w32-setup-v4.1.0.20190314.exe



Contents

Operations and how to set parameters

  1. Get version
  2. List of Languages
  3. OCR




Operations and how to set parameters

1. Get Version

This operation simply returns the version number of locally installed module.





2. List of Languages







3. OCR

Return value contains the OCR results in String.




Page segmentation options can be chosen from 13 options.


  •   0    Orientation and script detection (OSD) only.
  •   1    Automatic page segmentation with OSD.
  •   2    Automatic page segmentation, but no OSD, or OCR.
  •   3    Fully automatic page segmentation, but no OSD. (Default)
  •   4    Assume a single column of text of variable sizes.
  •   5    Assume a single uniform block of vertically aligned text.
  •   6    Assume a single uniform block of text.
  •   7    Treat the image as a single text line.
  •   8    Treat the image as a single word.
  •   9    Treat the image as a single word in a circle.
  •  10    Treat the image as a single character.
  •  11    Sparse text. Find as much text as possible in no particular order.
  •  12    Sparse text with OSD.
  •  13    Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.

OCR Engine selection options

  •   0    Original Tesseract only.
  •   1    Neural nets LSTM only.
  •   2    Tesseract + LSTM.
  •   3    Default, based on what is available.






All Plugins