'Read PDF metadata using PyPDF2
I've tried to extract metadata with PyPDF2 and pdfminer.six
and got:
reader = PdfFileReader("example.pdf")
info = pdf.getDocumentInfo()
gets response:
{'/Title': IndirectObject(38, 0), '/Author': IndirectObject(40, 0), '/Subject': IndirectObject(41, 0), '/Producer': IndirectObject(39, 0), '/Creator': IndirectObject(42, 0), '/CreationDate': IndirectObject(43, 0), '/ModDate': IndirectObject(43, 0)}
Using pdfrw
With pdfrw
it worked like this:
from pdfrw import PdfReader
>>> PdfReader(<filename>).Info
Solution 1:[1]
This is now part of the PyPDF2 docs:
from PyPDF2 import PdfFileReader
reader = PdfFileReader("example.pdf")
info = reader.getDocumentInfo()
print(reader.numPages)
# All of the following could be None!
print(info.author)
print(info.creator)
print(info.producer)
print(info.subject)
print(info.title)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Martin Thoma |