'Tesseract returns nothing for Arabic words/letters
I have installed Pytesseract and it's working perfectly on French/English text and also in numbers. But when I try to read any Arabic text/letter it doesn't return anything.
Here is the code I have used:
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
print(pytesseract.image_to_string(Image.open('maroc.jpg'), lang='ara'))
Here is the letter I'm trying to read د
:
If someone was able to read it using another method please help, thanks!
Solution 1:[1]
Code :
from pytesseract import image_to_string
from PIL import Image
import pytesseract
print(pytesseract.image_to_pdf_or_hocr('test.png', lang='ara', extension='hocr'))
Take new Arabic tessdata from here:
Solution 2:[2]
if you want to recognise arabic words download the arabic trained model from the link below then save it in the location according to your Tesseract folder
C:\Program Files\Tesseract-OCR\tessdata or C:\Program Files (x86)\Tesseract-OCR\tessdata
Solution 3:[3]
for raspberry pi 4 just download module from Eliyaz KL answer and put in this path /usr/share/tesseract-ocr/4.00/tessdata/ i don't know which operating system use i answerd in my case
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Eliyaz KL |
Solution 2 | Feisal Aswad |
Solution 3 |