'Identify author name, book name from a string obtained with OCR
After using tess4j(OCR library) on a photo of a book preface, i obtain the text from the image. Now, i want to identify author name, book name from that string. I don't want to search the strings in a database, because i need efficiency.
For example the text is:
First example: some text here, Jack is the author some text here of "Jungle Book" some text here
Second example: "Jungle Book" was written by Jack another text
here.
And the output should be:
- author: Jack
- book name: Jungle Book
I don't know how to do it, can you give me some hints?
Solution 1:[1]
I'm not very familiar with tess4j but I'm pretty sure it gives you information about the location of the text you found. The title of the book will be centrally located. The location of the author's name will be under the title if you ever have the word "by" or "-", or else it will be above the title if the text has a " ' "(apostrophe).
PS: I know this is a very old question, but this might help someone :).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | JayantH Bellam |