'properly logging unicode & utf-8 exceptions in python 2

I'm trying to log various exceptions from libraries in python 2.7. I find that sometimes the exceptions contain a unicode string and sometimes a utf8 bytestring. I thought that logging.exception(e) was the right approach to log them, but the following doesn't seem to work:

# encoding: utf-8
import logging
try:
    raise Exception('jörn')
except Exception as e:
    logging.exception(e)

try:
    raise Exception(u'jörn')
except Exception as e:
    logging.exception(e)

saving this into a file and running it results in this:

$ python test.py
ERROR:root:jörn
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    raise Exception('jörn')
Exception: jörn
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py", line 859, in emit
    msg = self.format(record)
  File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py", line 732, in format
    return fmt.format(record)
  File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/logging/__init__.py", line 474, in format
    s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128)
Logged from file test.py, line 12

So as you see the utf8 exception worked fine, but the unicode exception broke logging, swallowing the real exception and hiding it behind a UnicodeEncodeError.

Is there some standard logging facility for exceptions that won't break my code? What am i missing?



Solution 1:[1]

Actually, i think i finally found the mistake and a proper way myself: I seem to have used logging.exception('msg') wrong the whole time. You're not meant to pass in the exception, but a message:

# encoding: utf-8
import logging
try:
    raise Exception('jörn')
except Exception as e:
    logging.exception('exception occurred')

try:
    raise Exception(u'jörn')
except Exception as e:
    logging.exception('exception occurred')

running the above correctly logs the exception:

$ python test.py
ERROR:root:exception occurred
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    raise Exception('jörn')
Exception: jörn
ERROR:root:exception occurred
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    raise Exception(u'jörn')
Exception: j\xf6rn

The reason why logging.exception(e) seems to fail is that it passes the exception e up into logging.Formatter.format() where the it arrives as record.message variable which still is an Exception object.

Then in Line 474 the following happens:

s = self._fmt % record.__dict__

which is equivalent to the following:

s = '%(levelname)s:%(name)s:%(message)s' % {
   'levelname': 'ERROR',
   'name': 'ROOT',
   'message': Exception(u'jörn')
}

It turns out this is why if message is one of ['jörn', u'jörn', Exception('jörn')] it works and not if it is Exception(u'jörn'):

>>> 'foo %s' % 'jörn'
'foo j\xc3\xb6rn'
>>> 'foo %s' % u'jörn'
u'foo j\xf6rn'
>>> 'foo %s' % Exception('jörn')
'foo j\xc3\xb6rn'
>>> 'foo %s' % Exception(u'jörn')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128)

As you can see there is an automatic upcasting that happens for unicode strings which is why the following works:

>>> logging.error('jörn')
ERROR:root:jörn
>>> logging.error(u'jörn')
ERROR:root:jörn

This transformation into unicode fails when attempting it with an Exception object that doesn't properly handle the encoding of its message (which sadly seems to be the case in a lot of libraries).

The logging.exception(msg) call seems to properly use the repr() for formatting the exception for logging and prefixes it with your msg. So if you don't make the mistake and pass the exception to logging.exception it will correctly log it.

So long story short:

Don't use logging.exception(e) but logging.exception('exception occurred'). It will automagically and correctly append the formatted exception to your log. If you really want to use the exception's message without assuming some encoding, the safest you can do is logging.exception(repr(e)).

Solution 2:[2]

It's not the logging that fails to deal with unicode, it's the Exception.__str__ method which does not support unicode strings as exception arguments. When you invoke logging.exception(e) it will do something like logging.exception(str(e)) which in turn does something like str(self.args) on the exception instance. That's where the error comes from, your self.args is a unicode string which cannot be encoded in ascii. You have two options, either do logging.exception(unicode(e)) or implement your own exception class that provides a __str__ method that can deal with unicode objects in self.args.

The reason why your first test passes is that the editor encodes the string into UTF-8 and Python sees a string instance with encoded unicode characters.

Solution 3:[3]

There's a lot of ways of doing, but most simple is to "patch" logger in order for it to pre-convert any non unicode string so logging won't fail.

Patching sounds bad, but fortunately there is a mechanism called a contextfilter that does the job without modifying the code.

TL;DR: I have built the following code as a package. Use it with pip install ofunctions.logger_utils Usage:

from ofunctions.logger_utils import logger_get_logger

logger = logger_get_logger(log_file='somepath')
logger.info('Ûnicode café")

TL;DR end

Consider the following code which sole purpose is to call function safe_string_convert() on a given logged message:

<!-- lang: python -->
class FixPython2Logging(logging.Filter):
    def __init__(self):
        self._worst_level = logging.INFO
        if sys.version_info[0] < 3:
            # pylint: disable=E1003 (bad-super-call)
            super(logging.Filter, self).__init__()
        else:
            super().__init__()

    def filter(self, record):
        # type: (str) -> bool
        # Fix python2 unicodedecodeerrors when non unicode strings are sent to logger
        if sys.version_info[0] < 3:
            record.msg = safe_string_convert(record.msg)
        return True

Of course we'd have to design a fool proof (or unicode decode error proof) function that makes sure we'll have a string that can be logged:

<!-- lang: python -->
def safe_string_convert(string):
    """
    Allows to encode strings for hacky UTF-8 logging in python 2.7
    """

    try:
        return string.decode("utf8")
    except Exception:  # noqa
        try:
            return string.decode("unicode-escape")
        except Exception:  # noqa
            try:
                return string.decode("latin1")
            except Exception:  # noqa
                if sys.version_info[0] < 3:
                    # pylint: disable=E0602 (undefined-variable)
                    if isinstance(string, unicode):  # noqa
                        return string
                try:
                    return (
                        b"Cannot convert logged string. Passing it as binary blob: "
                        + bytes(string)
                    )
                except Exception:  # noqa
                    return string

Now we can make sure FixPython2Logging filter will get executed by any logger call. Let's just add it as filter to our standard logger class:

<!-- lang: python -->
    log_filter = FixPython2Logging()
    logger = logging.getLogger()

    # Remove earlier handlers if exist
    while _logger.handlers:
        _logger.handlers.pop()

    # Add context filter
    logger.addFilter(log_filter)

    # Try it
    logger.info("Ûnicode café")

Here we go ;) Of course the contextfilter works for your initial logger.exception() question ;)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 Bernhard
Solution 3