'How to install language in tesseract OCR
I have installed tesseract OCR and it has only 'eng' and 'osd' in the language list. I need german language. I tired following command
brew install tesseract-ocr-deu
but i am getting error.
Error: No available formula with the name "tesseract-ocr-deu"
==> Searching for a previously deleted formula (in the last month)...
Warning: homebrew/core is shallow clone. To get complete history run:
git -C "$(brew --repo homebrew/core)" fetch --unshallow
Error: No previously deleted formula found.
==> Searching for similarly named formulae...
Error: No similarly named formulae found.
==> Searching taps...
==> Searching taps on GitHub...
Error: No formulae found in taps.
Solution 1:[1]
On mac OS type
brew install tesseract-lang
Installs all languages, you can check them by,
tesseract --list-langs
Solution 2:[2]
On MacOS Mojave (10.14.3) works:
brew install tesseract-lang
Solution 3:[3]
For completeness, I am adding an answer on how to install and use a non-English language with Tesseract OCR on Linux. Since this is the first result I got on Google and I think it may help someone.
To install German language on Ubuntu/Debian:
$ sudo apt-get install tesseract-ocr-deu
Language codes of all supported languages can be found here:
https://github.com/tesseract-ocr/tessdoc/blob/master/Data-Files-in-different-versions.md
To specify the language in OCR engine use option: -l lang
, e.g. for German:
$ tesseract -l deu 'imagename' 'stdout'
Solution 4:[4]
You download them from tesseract repository.
At the moment tessdata for 4.0 is available here and tessdata for 3.04 here.
Solution 5:[5]
I had to install Italian language but tesseract-lang installation cost 164 files, 654.0MB and gives the less precise version fast vs best so I decided to go manual
Add path to shell (if you brew on Mac find your path with brew info tesseract
)
export TESSDATA_PREFIX=/usr/local/Cellar/tesseract/5.1.0/share/tessdata/
Update profile (if you are on zsh
)
source ~/.zshrc
Save the language in my case the best
version of ita
wget -O $TESSDATA_PREFIX/ita.traineddata https://github.com/tesseract-ocr/tessdata/raw/main/ita.traineddata
Now you should see the added language
tesseract --list-langs
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Sahana M |
Solution 2 | weivall |
Solution 3 | Marko Lalovic |
Solution 4 | Dmitrii Z. |
Solution 5 | Ax_ |