'How can I fill empty space with null in a pdf? using pdfbox
I am using Java PDFBOX to read a pdf
It is a very long pdf with more than 40 pages, and I need to extract more than 100 elements on each page, doing it manually using coordinates would take me forever.
Is there a way to get the pdf page text in rows with each empty space filled with some null value?
When I parse this table for example:
using the code:
PDFTextStripper stripper = new PDFTextStripper();
stripper.setSortByPosition(true);
stripper.setStartPage(30);
stripper.setEndPage(30);
LOG.info("page 30 \n{}", stripper.getText(document));
I get this:
016 1 300
030 17 994 41 629 15 712
042 676 676
The problem is that I can't tell if there are just one or two values which are which !!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|