'How to set preferred encoding in WSGI to UTF-8

Feeling a bit crazy here. I've got Apache set up with mod_wsgi, but I can't get the encoding to work properly. I have:

  • tested that mod_wsgi is running in daemon mode
  • read Graham Dumpleton's blog post about setting up the lang and locale settings for the WSGIDaemonProcess directive.
  • created a minimal test that seems to demonstrate the issue
# I recompiled the mod_wsgi file to get the Python version correct
sys.version = '3.8.6 (default, Sep 24 2020, 21:54:23) \n[GCC 8.3.0]'
sys.prefix = '/usr/local'
sys.path = ['/usr/local/lib/python38.zip', '/usr/local/lib/python3.8', '/usr/local/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/site-packages', '/usr/local/src/scorched']

# This seems to be a timing thing? Not sure, but possibly problematic
locale.getlocale() = (None, None)
# This was fixed by setting lang or locale (not sure which)
locale.getdefaultlocale() = ('en_US', 'UTF-8')
sys.getdefaultencoding() = 'utf-8'

# These seem like a problem...
sys.getfilesystemencoding() = 'ascii'
locale.getpreferredencoding(False): 'ANSI_X3.4-1968'

# It's daemon mode
mod_wsgi.process_group = 'cl'

My WSGI configs look like this:

    WSGIScriptAlias / /opt/courtlistener/docker/apache/wsgi-configs/python_version_test.py
    WSGIDaemonProcess cl \
      threads=10 \
      processes=64 \
      python-path=/usr/local/lib/python3.8/site-packages/ \
      lang='en_US.UTF-8' \
      locale='en_US.UTF-8'
    WSGIProcessGroup cl
    WSGIApplicationGroup %{GLOBAL}
    WSGIPassAuthorization On

When I log into the server and start python in the terminal, this line works fine, but it fails when it runs via mod_wsgi:

from reporters_db import REPORTERS

All that line does is import a json file that has some utf-8 content in it. Here's the code behind that import:

db_root = os.path.dirname(os.path.realpath(__file__))
with open(os.path.join(db_root, "data", "reporters.json")) as f:
    REPORTERS = json.load(f, object_hook=datetime_parser)

Since the json call above doesn't have the encoding specified, it uses ASCII and fails:

 Traceback (most recent call last):
   File "/opt/courtlistener/docker/apache/wsgi-configs/python_version_test.py", line 6, in <module>
     from reporters_db import REPORTERS
   File "/usr/local/lib/python3.8/site-packages/reporters_db/__init__.py", line 22, in <module>
     REPORTERS = json.load(f, object_hook=datetime_parser)
   File "/usr/local/lib/python3.8/json/__init__.py", line 293, in load
     return loads(fp.read(),
   File "/usr/local/lib/python3.8/encodings/ascii.py", line 26, in decode
     return codecs.ascii_decode(input, self.errors)[0]
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 441720: ordinal not in range(128)

How can I tell it (and the rest of my codebase) to use utf-8 like sane adults?


Edit 1

Perhaps it is important to mention that I'm running apache with the following command:

exec apache2ctl -D FOREGROUND "$@"

I thought that would source the /etc/apache2/envvars file, so I appended the following to that file:

export LANG="en_US.UTF-8"

And I tried tweaking my startup command to:

LANG="en_US.UTF-8" exec apache2ctl -D FOREGROUND "$@"

I was hopeful, but no. Still no progress.



Solution 1:[1]

Well, I finally figured this out by searching for every time Graham Dumpleton mentioned the word "lang" on the Internet. That eventually turned up this thread, which mentioned that it was possible to not have a locale installed. I was able to check that by running locale -a inside my Ubuntu Docker image, which revealed:

locale -a
C
C.UTF-8
POSIX

So that's the issue! mod_wsgi doesn't know what I'm asking for when I ask for en_US.utf-8, and it doesn't throw an error either. Swapping my settings to instead be set to C.UTF-8 fixed this immediately.

I'm running a slim docker image, so that must be why I lack locales. I also don't have a file at /etc/default/locale that a lot of other answers in this general area refer to.

I've filed this as a bug.

Solution 2:[2]

I had a similar UnicodeDecodeError issue when parsing a yaml file containing Unicode characters on Debian 11, Apache2, mod_wsgi.

It was enough to set WSGIDaemonProcess locale to C.UTF-8, then the error went. This single line changed in my /etc/apache2/sites-available/000-default.conf

     WSGIDaemonProcess my_app locale='C.UTF-8'

In the question, mlissner mentioned a bunch of settings tried, but those were not needed for me.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mlissner
Solution 2 printomi