'How can I fill empty space with null in a pdf? using pdfbox

I am using Java PDFBOX to read a pdf

It is a very long pdf with more than 40 pages, and I need to extract more than 100 elements on each page, doing it manually using coordinates would take me forever.

Is there a way to get the pdf page text in rows with each empty space filled with some null value?

When I parse this table for example: enter image description here

using the code:

            PDFTextStripper stripper = new PDFTextStripper();
            stripper.setSortByPosition(true);

            stripper.setStartPage(30);
            stripper.setEndPage(30);
            LOG.info("page 30 \n{}", stripper.getText(document));

I get this:

016         1 300 
030        17 994        41 629        15 712 
042           676           676 

The problem is that I can't tell if there are just one or two values which are which !!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source