'Merging PDFs on memory and saving in PDF/a format with ghostscript

I try to merge two pdfs with PyPDF2 in memory and save the resulting pdf in the PDF/a format. Convert a Pdf to PDF/a with Ghostscript works, but only with two paths from hard disk not from memory. It doesn't work with the merged PDF on memory

The following code produces the error:

import os, subprocess
from PyPDF2 import PdfFileMerger
from io import BytesIO

def convertPDF2PDFA(sourceFile, targetFile):
    ghostScriptExec = ['gs', '-dPDFA=2', '-dBATCH', '-dNOPAUSE', '-sProcessColorModel=DeviceCMYK',
                       '-sDEVICE=pdfwrite', '-dPDFACompatibilityPolicy=3']
    cwd = os.getcwd()
    os.chdir(os.path.dirname(targetFile))
    try:
        subprocess.check_output(ghostScriptExec +
                                ['-sOutputFile=' + os.path.basename(targetFile), sourceFile])
    except subprocess.CalledProcessError as e:
        raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
    os.chdir(cwd)

path1 = 'path1'
path2 = 'path2'
save_path = 'result_path'

paths = [path1, path2]
merger = PdfFileMerger()
tmp = BytesIO()

for path in paths:
    merger.append(path, import_bookmarks=False)

merger.write(tmp)

convertPDF2PDFA(tmp.getvalue(), save_path)

The error I get is ValueError: embedded null byte.

Edit: I changed the ghost script parameter to:

ghostScriptExec = ['gs', '-dPDFA=2', '-dBATCH', '-dNOPAUSE', '-dNOSAFER', '-sProcessColorModel=DeviceRGB',
                   '-sDEVICE=pdfwrite', '-dPDFACompatibilityPolicy=2']

I also added PDFA_def.ps, in which I included the AdobeRGB1998.icc.

def convertPDF2PDFA(sourceFile, targetFile):
    cwd = os.getcwd()
    os.chdir(os.path.dirname(targetFile))
    pdfa_def_path = '/Users/mazze/Desktop/PDFA_def.ps'
    try:
        subprocess.check_output(ghostScriptExec +
                            ['-sOutputFile=' + os.path.basename(targetFile) , pdfa_def_path, sourceFile])
    except subprocess.CalledProcessError as e:
        raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
    os.chdir(cwd)

Converting PDF to PDF/a works or at least Adobe always confirms pdf/a format. However, the whole thing only works if I first save the merged file and later use the path in the convertPDF2PDFA and save the whole thing under a new path. Do I always have to take this extra step?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source