'Unable to connect oracle database using cx_oracle from AWS Glue
I am trying to connect oracle database from AWS glue using cx_oracle but i am getting this error message
DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory". See https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html for help
I tried to downoad the so files as per the document and stored it in S3 which has been linked to Glue with --extra-files parameter but still getting same error message
I have tried this stackoverflow question and also tried to set rpath with s3 url but no success. Any idea would really help
Solution 1:[1]
Credits
This answer is a compilation of this and this and a lot of discussion around the former in the comments. Credit for the rpath
patching solution goes to @harjeet-singh, the original author of the abovementioned answer and for the libaio
to @good-will, but there are still some steps around these solutions that keep confusing people, so that is why I am going to consolidate everything in a single step-by-step answer here.
Background
In order to connect to an Oracle database using cx-Oracle
from a Python shell AWS Glue job, we need to bundle the oracle client libraries with it. Furthermore, the libraries have to be patched with a correct rpath
in order to load correctly, because in Glue runtime we have filesystem write access only to /tmp
, which is where our archive will be, but cx-Oracle
can't know that and expects a different directory by default. And the LD_LIBRARY_PATH
hack cannot be implemented because we have no control over how the Glue job is launched.
Step-By-Step Guide
- Download Instant Client Basic ZIP package for x86-64 Linux from here. This guide uses version 21.5.0.0.0
wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-basic-linuxx64.zip
- Unzip the archive
unzip instantclient-basic-linuxx64.zip
- Remove symlinks from the archive and move the file they point to (in this case
libclntsh.so.21.1
to the one that is going to be looked up when you usecx-Oracle
:libclntsh.so
). This is done because whatever loads these libraries dynamically apparently does not resolve symlinks. Maybe it will in the future, but I had to do this to make it work.
cd instantclient_21_5/
find . -type l -name "libclntsh.so*" -delete
mv libclntsh.so.21.1 libclntsh.so
Do the same with the other files with symlinks if after completing the whole guide and running your job you still have problems with libraries like
DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libsomething.so: cannot open shared object file: No such file or directory"
- Patch the
rpath
to point to a static directory that we are going to be using from inside the Glue job For example, if your archive is namedinstant-client-basic-linux.x64-21.5.0.0.0
and it contains a folder namedinstantclient_21_5
with all the libraries. When the job runs, this archive is going to be available in a random directory under/tmp
(more on that below. We need to find our archive in one of those directories and extract it to a static directory under/tmp
, for example/tmp/libs
. So then, yourrpath
would be/tmp/libs/instant-client-basic-linux.x64-21.5.0.0.0/instantclient_21_5
, because that in the absolute path to client libraries.
sudo apt-get update
sudo apt-get install patchelf -y
patchelf --set-rpath /tmp/libs/instant-client-basic-linux.x64-21.5.0.0.0/instantclient_21_5 libclntsh.so
- Put
libaio.so.1
in the archive
cd ..
wget https://src.fedoraproject.org/lookaside/pkgs/libaio/libaio-0.3.110.tar.gz/2a35602e43778383e2f4907a4ca39ab8/libaio-0.3.110.tar.gz
tar xzvf libaio-0.3.110.tar.gz
cd libaio-0.3.110
make prefix=`pwd`/usr install
find ./usr/lib/ -type l -name "libclntsh.so*" -delete
mv ./usr/lib/libaio.so.1.0.1 ../instantclient_21_5/libaio.so.1
Note: you might want to check with a newer version of libaio
if there is one.
6. Zip the archive
cd ..
zip -T -r instantclient-basic-linuxx64_patched.zip instantclient_21_5/
- Download
cx-Oracle
wheel
wget https://files.pythonhosted.org/packages/a9/b7/c2d0223fb4f1013b090cf82f3ce56f36f33b79a48f9c33b36717c2977b04/cx_Oracle-8.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
- Upload the archive and
cx-Oracle
to S3
aws s3 cp instantclient-basic-linuxx64_patched.zip s3://<mybucket>/glue_libs
aws s3 cp cx_Oracle-8.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl s3://<mybucket>/glue_libs
- Configure your Glue job
I assume your glue job is already created and we are going to just configure it.
- Put the S3 URL to the archive in "Referenced files path" configuration parameter
- Put the S3 URL to the
cx-Oracle
wheel in "Python library path" configuration parameter
- Add a bit of code to the glue job to set up the libraries.
This code must be executed before any usage of
cx-Oracle
. It can be executed after import, but before the usage.
Iterate through random directories in /tmp
, find an archive you created and extract it to a static directory that we set in rpath
previously. Then initialize the cx-Oracle
client and you are good to go.
Here is an example of an implementation:
import zipfile
from pathlib import Path
import cx_Oracle
filename = 'instantclient-basic-linuxx64_patched.zip'
oracle_archive = next(Path('./tmp').glob(f'**/{filename}'))
with zipfile.ZipFile(oracle_archive, 'r') as f:
Path('./tmp/libs').mkdir()
f.extractall('./tmp/libs')
cx_Oracle.init_oracle_client(lib_dir=f'/tmp/libs/{filename}/instantclient_21_5')
TLDR
Run this on your Linux machine (substitute your bucket name in the end):
wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-basic-linuxx64.zip
unzip instantclient-basic-linuxx64.zip
cd instantclient_21_5/
find . -type l -name "libclntsh.so*" -delete
mv libclntsh.so.21.1 libclntsh.so
sudo apt-get update
sudo apt-get install patchelf -y
patchelf --set-rpath /tmp/libs/instant-client-basic-linux.x64-21.5.0.0.0/instantclient_21_5 instantclient_21_5/libclntsh.so
cd ..
wget https://src.fedoraproject.org/lookaside/pkgs/libaio/libaio-0.3.110.tar.gz/2a35602e43778383e2f4907a4ca39ab8/libaio-0.3.110.tar.gz
tar xzvf libaio-0.3.110.tar.gz
cd libaio-0.3.110
make prefix=`pwd`/usr install
find ./usr/lib/ -type l -name "libclntsh.so*" -delete
mv ./usr/lib/libaio.so.1.0.1 ../instantclient_21_5/libaio.so.1
cd ..
zip -T -r instantclient-basic-linuxx64_patched.zip instantclient_21_5/
wget https://files.pythonhosted.org/packages/a9/b7/c2d0223fb4f1013b090cf82f3ce56f36f33b79a48f9c33b36717c2977b04/cx_Oracle-8.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
aws s3 cp instantclient-basic-linuxx64_patched.zip s3://<mybucket>/glue_libs/
aws s3 cp cx_Oracle-8.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl s3://<mybucket>/glue_libs/
Follow steps 9 and 10 above to configure your job.
Hope anyone who reads this will get this right on the first try, because I sure didn't.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | rhymefororange |