'Why does PyPDF2.PdfFileWriter forget changes I made to a document?
I am trying to modify text in a PDF file. The text can be in an object of type Tj
or BDC
. I find the correct objects and if I read them directly after changing them they show the updated values.
But if I pass the complete page to PdfFileWriter the change is lost. I might be updating a copy and not the real object. I checked the id()
and it was different. Does someone have an idea how to fix this?
from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.generic import TextStringObject, NameObject, ContentStream
from PyPDF2.utils import b_
reader = PdfFileReader("some.pdf")
writer = PdfFileWriter()
for page_idx in range(0, 1):
# Get the current page and it's contents
page = reader.getPage(page_idx)
content_object = page["/Contents"].getObject()
content = ContentStream(content_object, reader)
for operands, operator in content.operations:
if operator == b_("BDC"):
operands[1][NameObject("/Contents")] = TextStringObject("xyz")
if operator == b_("Tj"):
operands[0] = TextStringObject("xyz")
writer.addPage(page)
# Write the stream
with open("output.pdf", "wb") as fp:
writer.write(fp)
Solution 1:[1]
The solution is to assign the ContentStream
that is being iterated and changed to the page afterwards before passing it to the PdfFileWriter
:
page[NameObject('/Contents')] = content
writer.addPage(page)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Martin Thoma |