'Tabula read pdf - CalledProcessError

I am using tabula to read tables from a pdf. The documents I'm extracting data from are really large, so I'm using a for-loop to run through the different pages:

for i in range(45, endofdoc):
      df = read_pdf('D:\\XXXXX.pdf', pages = i, pandas_options={'header': None}, java_options =    "-Xmx512m"):

This has worked for many of the files. For the file I'm currently working on, it worked until page 195. On this page it gives an error I will paste below. Interestingly enough, the next page works again. I checked the PDF, the format of this page is not different than any of the other ones. What could be going wrong? And more importantly, how can I fix it? Thanks in advance!

File "C:\Users\Kirsten\anaconda3\lib\subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,

CalledProcessError: Command '['java', '-Xmx512m', '-Dfile.encoding=UTF8', '-jar', 'C:\\Users\\Kirsten\\anaconda3\\lib\\site-packages\\tabula\\tabula-1.0.5-jar-with-dependencies.jar', '--pages', '195', '--guess', '--format', 'JSON', 'D:\\XXXXX.pdf']' returned non-zero exit status 1.

My versions:

Python version:
    3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)]
Java version:
    java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) Client VM (build 25.181-b13, mixed mode, sharing)
tabula-py version: 2.3.0
platform: Windows-10-10.0.19044-SP0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Tabula read pdf - CalledProcessError

Sources

Related Questions