'PyPDF2 PdfFileWriter has no attribute stream
I am trying to split a pdf into its pages and save each page as a new pdf. I have tried this method from a previous question with no success and the pypdf2 split example from here with no success. EDIT: I can see in my files that it does successfully write the first page, the second page pdf is then created but is empty.
Here is the code I am trying to run:
from PyPDF2 import PdfFileWriter, PdfFileReader
inputpdf = PdfFileReader(open("my_pdf.pdf", "rb"))
for i in range(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open("document-page%s.pdf" % i, "wb") as outputStream:
output.write(outputStream)
Here is the full error message:
Traceback (most recent call last):
File "pdf_functions.py", line 9, in <module>
output.write(outputStream)
File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 557, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 575, in _sweepIndirectReferences
if data.pdf.stream.closed:
AttributeError: 'PdfFileWriter' object has no attribute 'stream'
I also tried this and confirmed that I can indeed extract a single page.
from PyPDF2 import PdfFileWriter, PdfFileReader
inputpdf = PdfFileReader(open("/home/ubuntu/inputs/cityshape/form5.pdf", "rb"))
#for i in range(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(2))
with open("document-page2.pdf", "wb") as outputStream:
output.write(outputStream)
Solution 1:[1]
The same thing happened to me.
I was able to solve it by moving the following line inside the loop:
inputpdf = PdfFileReader(open("/home/ubuntu/inputs/cityshape/form5.pdf", "rb"))
I believe that some versions of PyPDF2 have some sort of bug, that when you invoke thePdfFileWriter.write
method, it messes with the PdfFileReader instance. By recreating the PdfFileReader instance after each write, it bypasses this bug.
The following code should work (untested):
from PyPDF2 import PdfFileWriter, PdfFileReader
pdf_in_file = open("my_pdf.pdf",'rb')
inputpdf = PdfFileReader(pdf_in_file)
pages_no = inputpdf.numPages
for i in range(pages_no):
inputpdf = PdfFileReader(pdf_in_file)
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open("document-page%s.pdf" % i, "wb") as outputStream:
output.write(outputStream)
pdf_in_file.close()
Solution 2:[2]
I solved the error "AttributeError: 'PdfFileWriter' object has no attribute 'stream'" by repeating opening the PDF.
My old code:
pdf = PdfFileReader('arq.pfd')
pagi = 14
pagf = 20
dic = PdfFileMerger()
for i in range(pagi -1, pagf):
pag = PdfFileWriter()
pag.addPage(pdf.getPage(i))
with open('pag.pdf', 'wb') as split:
pag.write(split)
pag = PdfFileReader('pag.pdf')
dic.append(pag)
with open(f'PDF ({pagi} - {pagf}).pdf', 'wb') as split:
dic.write(split)
!rm pag.pdf
My new code:
pdf = PdfFileReader('arq.pdf')
pagi = 14
pagf = 20
dic = PdfFileMerger()
for i in range(pagi - 1, pagf):
pag = PdfFileWriter()
pag.addPage(pdf.getPage(i))
with open('pag.pdf', 'wb') as split:
pdf = PdfFileReader('arq.pdf') # Adding pdf again
pag.write(split)
pag = PdfFileReader('pag.pdf')
dic.append(pag)
with open(f'PDF ({pagi} - {pagf}).pdf', 'wb') as split:
dic.write(split)
!rm pag.pdf
Hugs!
Solution 3:[3]
I have this problem today. But I found so many code just like me without errors, so I think maybe just version error. I have used pypdf2 version==1.27.3, just change it version to 1.25.0, this error will fix.
pip install pypdf2==1.25.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | slee423 |
Solution 2 | Wallef Santos |
Solution 3 | s gong |