'How do I know my file is attached in my PDF using PyPDF2?

I am trying to attach an .exe file into a PDF using PyPDF2.

I ran the code below, but my PDF file is still the same size.
I don't know if my file was attached or not.

from PyPDF2 import PdfFileWriter, PdfFileReader

writer = PdfFileWriter()
reader = PdfFileReader("doc1.pdf")

# check it's whether work or not
print("doc1 has %d pages" % reader.getNumPages())

writer.addAttachment("doc1.pdf", "client.exe")

What am I doing wrong?



Solution 1:[1]

First of all, you have to use the PdfFileWriter class properly.

You can use appendPagesFromReader to copy pages from the source PDF ("doc1.pdf") to the output PDF (ex. "out.pdf"). Then, for addAttachment, the 1st parameter is the filename of the file to attach and the 2nd parameter is the attachment data (it's not clear from the docs, but it has to be a bytes-like sequence). To get the attachment data, you can open the .exe file in binary mode, then read() it. Finally, you need to use write to actually save the PdfFileWriter object to an actual PDF file.

Here is a more working example:

from PyPDF2 import PdfFileReader, PdfFileWriter

reader = PdfFileReader("doc1.pdf")
writer = PdfFileWriter()

writer.appendPagesFromReader(reader)

with open("client.exe", "rb") as exe:
    writer.addAttachment("client.exe", exe.read())

with open("out.pdf", "wb") as f:
    writer.write(f)

Next, to check if attaching was successful, you can use os.stat.st_size to compare the file size (in bytes) before and after attaching the .exe file.

Here is the same example with checking for file sizes:
(I'm using Python 3.6+ for f-strings)

import os
from PyPDF2 import PdfFileReader, PdfFileWriter


reader = PdfFileReader("doc1.pdf")
writer = PdfFileWriter()
writer.appendPagesFromReader(reader)

with open("client.exe", "rb") as exe:
    writer.addAttachment("client.exe", exe.read())

with open("out.pdf", "wb") as f:
    writer.write(f)

# Check result
print(f"size of SOURCE: {os.stat('doc1.pdf').st_size}")
print(f"size of EXE: {os.stat('client.exe').st_size}")
print(f"size of OUTPUT: {os.stat('out.pdf').st_size}")

The above code prints out

size of SOURCE: 42942
size of EXE: 989744
size of OUTPUT: 1031773

...which sort of shows that the .exe file was added to the PDF.

Of course, you can manually check it by opening the PDF in Adobe Reader:

enter image description here

As a side note, I am not sure what you want to do with attaching exe files to PDF, but it seems you can attach them but Adobe treats them as security risks and may not be possible to be opened. You can use the same code above to attach another PDF file (or other documents) instead of an executable file, and it should still work.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martin Thoma