'Java TESSERACT create byte[] instead of pdf file - tessInstance.createDocuments()

Is it possible to generate with Tess4j the byte[] of a PDF with OCR instead of a physical file?

I need to make PDF files searchable via OCR, it works but I would like to avoid this step.

Tesseract tessInst = new Tesseract();
tessInst.setDatapath("C:\\Tess4J");
List<RenderedFormat> list = new ArrayList<RenderedFormat>();
list.add(RenderedFormat.PDF);
tessInst.createDocuments(inputFile.getPath(), "C:\\a\\b\\b\\Tess4J\\filename", list); // i dont want to create this, i just need a byte[]!

Thx!



Solution 1:[1]

No, Tesseract does not support it. TessPDFRendererCreate expects a string for file path as input.

https://tesseract-ocr.github.io/tessapi/5.x/a00008.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 nguyenq