'Merging PDFs on memory and saving in PDF/a format with ghostscript
I try to merge two pdfs with PyPDF2
in memory and save the resulting pdf in the PDF/a
format. Convert a Pdf to PDF/a with Ghostscript
works, but only with two paths from hard disk not from memory. It doesn't work with the merged PDF on memory
The following code produces the error:
import os, subprocess
from PyPDF2 import PdfFileMerger
from io import BytesIO
def convertPDF2PDFA(sourceFile, targetFile):
ghostScriptExec = ['gs', '-dPDFA=2', '-dBATCH', '-dNOPAUSE', '-sProcessColorModel=DeviceCMYK',
'-sDEVICE=pdfwrite', '-dPDFACompatibilityPolicy=3']
cwd = os.getcwd()
os.chdir(os.path.dirname(targetFile))
try:
subprocess.check_output(ghostScriptExec +
['-sOutputFile=' + os.path.basename(targetFile), sourceFile])
except subprocess.CalledProcessError as e:
raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
os.chdir(cwd)
path1 = 'path1'
path2 = 'path2'
save_path = 'result_path'
paths = [path1, path2]
merger = PdfFileMerger()
tmp = BytesIO()
for path in paths:
merger.append(path, import_bookmarks=False)
merger.write(tmp)
convertPDF2PDFA(tmp.getvalue(), save_path)
The error I get is ValueError: embedded null byte
.
Edit: I changed the ghost script parameter to:
ghostScriptExec = ['gs', '-dPDFA=2', '-dBATCH', '-dNOPAUSE', '-dNOSAFER', '-sProcessColorModel=DeviceRGB',
'-sDEVICE=pdfwrite', '-dPDFACompatibilityPolicy=2']
I also added PDFA_def.ps
, in which I included the AdobeRGB1998.icc
.
def convertPDF2PDFA(sourceFile, targetFile):
cwd = os.getcwd()
os.chdir(os.path.dirname(targetFile))
pdfa_def_path = '/Users/mazze/Desktop/PDFA_def.ps'
try:
subprocess.check_output(ghostScriptExec +
['-sOutputFile=' + os.path.basename(targetFile) , pdfa_def_path, sourceFile])
except subprocess.CalledProcessError as e:
raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
os.chdir(cwd)
Converting PDF to PDF/a works or at least Adobe always confirms pdf/a format. However, the whole thing only works if I first save the merged file and later use the path in the convertPDF2PDFA
and save the whole thing under a new path. Do I always have to take this extra step?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|