'How to install Poppler to be used on AWS Lambda

I have to run pdf2image on my Python Lambda Function in AWS, but it requires poppler and poppler-utils to be installed on the machine.

I have tried to search in many different places how to do that but could not find anything or anyone that have done that using lambda functions.

Would any of you know how to generate poppler binaries, put it on my Lambda package and tell Lambda to use that?

Thank you all.



Solution 1:[1]

AWS lambda runs under an execution environment which includes software and libraries if anything you need is not there you need to install it to create an execution environment.Check the below link for more info , https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html

for poppler follow this steps to create your own binary https://github.com/skylander86/lambda-text-extractor/blob/master/BuildingBinaries.md

Solution 2:[2]

My approach was to use the AWS Linux 2 image as a base to ensure maximum compatibility with the Lambda environment, compile openjpeg and poppler in the container build and build a zip containing the binaries and libraries needed which can then by used as a layer.

This enables you to write your code in it's own lambda which pulls in the poppler dependencies as a layer, simplifying build and deployment.

The contents of the layer will be unpacked into /opt/. This means the contents will automatically be available because by default in the lambda environment

  • $PATH is /usr/local/bin:/usr/bin/:/bin:/opt/bin
  • $LD_LIBRARY_PATH is /lib64:/usr/lib64:$LAMBDA_RUNTIME_DIR:$LAMBDA_RUNTIME_DIR/lib:$LAMBDA_TASK_ROOT:$LAMBDA_TASK_ROOT/lib:/opt/lib

Dockerfile :

# https://www.petewilcock.com/using-poppler-pdftotext-and-other-custom-binaries-on-aws-lambda/

ARG POPPLER_VERSION="21.10.0"
ARG POPPLER_DATA_VERSION="0.4.11"
ARG OPENJPEG_VERSION="2.4.0"


FROM amazonlinux:2

ARG POPPLER_VERSION
ARG POPPLER_DATA_VERSION
ARG OPENJPEG_VERSION

WORKDIR /root

RUN yum update -y
RUN yum install -y \
   cmake \
   cmake3 \
   fontconfig-devel \
   gcc \
   gcc-c++ \
   gzip \
   libjpeg-devel \
   libpng-devel \
   libtiff-devel \
   make \
   tar \
   xz \
   zip

RUN curl -o poppler.tar.xz https://poppler.freedesktop.org/poppler-${POPPLER_VERSION}.tar.xz
RUN tar xf poppler.tar.xz
RUN curl -o poppler-data.tar.gz https://poppler.freedesktop.org/poppler-data-${POPPLER_DATA_VERSION}.tar.gz
RUN tar xf poppler-data.tar.gz
RUN curl -o openjpeg.tar.gz https://codeload.github.com/uclouvain/openjpeg/tar.gz/refs/tags/v${OPENJPEG_VERSION}
RUN tar xf openjpeg.tar.gz

WORKDIR poppler-data-${POPPLER_DATA_VERSION}
RUN make install

WORKDIR /root
RUN mkdir openjpeg-${OPENJPEG_VERSION}/build
WORKDIR openjpeg-${OPENJPEG_VERSION}/build
RUN cmake .. -DCMAKE_BUILD_TYPE=Release
RUN make
RUN make install

WORKDIR /root
RUN mkdir poppler-${POPPLER_VERSION}/build
WORKDIR poppler-${POPPLER_VERSION}/build
RUN cmake3 .. -DCMAKE_BUILD_TYPE=release -DBUILD_GTK_TESTS=OFF -DBUILD_QT5_TESTS=OFF -DBUILD_QT6_TESTS=OFF \
    -DBUILD_CPP_TESTS=OFF -DBUILD_MANUAL_TESTS=OFF -DENABLE_BOOST=OFF -DENABLE_CPP=OFF -DENABLE_GLIB=OFF \
    -DENABLE_GOBJECT_INTROSPECTION=OFF -DENABLE_GTK_DOC=OFF -DENABLE_QT5=OFF -DENABLE_QT6=OFF \
    -DENABLE_LIBOPENJPEG=openjpeg2 -DENABLE_CMS=none  -DBUILD_SHARED_LIBS=OFF
RUN make
RUN make install


WORKDIR /root
RUN mkdir -p package/{lib,bin,share}
RUN cp -d /usr/lib64/libexpat* package/lib
RUN cp -d /usr/lib64/libfontconfig* package/lib
RUN cp -d /usr/lib64/libfreetype* package/lib
RUN cp -d /usr/lib64/libjbig* package/lib
RUN cp -d /usr/lib64/libjpeg* package/lib
RUN cp -d /usr/lib64/libpng* package/lib
RUN cp -d /usr/lib64/libtiff* package/lib
RUN cp -d /usr/lib64/libuuid* package/lib
RUN cp -d /usr/lib64/libz* package/lib
RUN cp -rd /usr/local/lib/* package/lib
RUN cp -rd /usr/local/lib64/* package/lib
RUN cp -d /usr/local/bin/* package/bin
RUN cp -rd /usr/local/share/poppler package/share

WORKDIR package
RUN zip -r9 ../package.zip *

And to run...

docker build -t poppler .
docker run --name poppler -d -t poppler cat
docker cp poppler:/root/package.zip .

Then upload package.zip as a layer using the console or aws cli.

Solution 3:[3]

Straightforward Build Instructions for Poppler on Lambda using Docker

In order to put Poppler on Lambda, we will build a zipped folder containing poppler and add it as a layer. Follow these steps on an EC2 instance running Amazon Linux 2 (t2micro is plenty).

  1. Setup the machine

Install docker on the EC2 machine. Instructions here

mkdir -p poppler_binaries
  1. Create a Dockerfile

Use this link or copy/paste from below.

FROM ubuntu:18.04

# Installing dependencies
RUN apt update
RUN apt-get update
RUN apt-get install -y locate \
                       libopenjp2-7 \
                       poppler-utils

RUN rm -rf /poppler_binaries;  mkdir /poppler_binaries;
RUN updatedb
RUN cp $(locate libpoppler.so) /poppler_binaries/.
RUN cp $(which pdftoppm) /poppler_binaries/.
RUN cp $(which pdfinfo) /poppler_binaries/.
RUN cp $(which pdftocairo) /poppler_binaries/.
RUN cp $(locate libjpeg.so.8 ) /poppler_binaries/.
RUN cp $(locate libopenjp2.so.7 ) /poppler_binaries/.
RUN cp $(locate libpng16.so.16 ) /poppler_binaries/.
RUN cp $(locate libz.so.1 ) /poppler_binaries/.
  1. Build Docker Image and create a zip file

Running the commands below will produce a zip file in your home directory.

docker build -t poppler-build .
# Run the container
docker run -d --name poppler-build-cont poppler-build sleep 20 
#docker exec poppler-build-cont 
sudo docker cp poppler-build-cont:/poppler_binaries .
# Cleaning up
docker kill poppler-build-cont
docker rm poppler-build-cont
docker image rm poppler-build
cd poppler_binaries
zip -r9 ..poppler.zip .
cd ..
  1. Make and add your Lambda Layer

Download your zip file or upload it to S3. Head to the Lambda Console page to create a Layer and then add it to your function. Information about layers here.

  1. Add Environment Variable to Lambda

In order to avoid adding unnecessary folder structure to the zip as described here. We will add an environment variable to point to our dependency

PYTHONPATH: /opt/

And Viola! You now have a working Lambda function with Poppler!

Note: Credit to these two articles which helped me piece this together

Warning: do not try to add pdf2image to the same layer. I am not sure why but when they are in the same layer, pdf2image cannot find poppler.

Solution 4:[4]

Hi @Alex Albracht thanks for compiled easy instructions! They helped a lot. But I really struggled with getting the lambda function find the poppler path. So, I'll try to add that up with an effort to make it clear.

The binary files should go in a zip folder having structure as: poppler.zip -> bin/poppler where poppler folder contains the binary files. This zip folder can be then uploaded as a layer in AWS lambda.

For pdf2image to work, it needs poppler path. This should be included in the lambda function in the format - "/opt/bin/poppler".

For example, poppler_path = "/opt/bin/poppler" pages = convert_from_path(PDF_file, 500, poppler_path=poppler_path)

Solution 5:[5]

I used the pre-built AWS Lambda layer https://github.com/jeylabs/aws-lambda-poppler-layer/releases and it worked!

You can use this solution if you just want to run the function, but If you want to specify the version and have more control, I'll recommend using the container image solution.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Subrata Fouzdar
Solution 2 scytale
Solution 3
Solution 4 Saylee M.
Solution 5 Charanjit Singh