'Pytesseract - OCR on image with colored text
im trying to use Pytesseract to get some text in an image. However, the text is an orange color and the background has both black and white. I have tried several options but ultimately I'm unable to read the text using Pytesseract. Below is a sample of the image:
Here is the code I have arrived at:
import pytesseract
from PIL import Image,ImageOps
import numpy as np
img = Image.open("OCR.png").convert("L")
img = ImageOps.invert(img)
# img.show()
threshold = 240
table = []
pixelArray = img.load()
for y in range(img.size[1]): # binaryzate it
List = []
for x in range(img.size[0]):
if pixelArray[x,y] < threshold:
List.append(0)
else:
List.append(255)
table.append(List)
img = Image.fromarray(np.array(table, dtype="uint8")) # load the image from array.
# img.show()
print(pytesseract.image_to_string(img))
The code above results in an all black image. Text becomes black too
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|