'Python print Unicode string via 'Git Bash' gets 'UnicodeEncodeError'

in test.py i have

print('Привет мир')

with cmd worked as normal

> python test.py
?????? ???

with Git Bash got error

$ python test.py
Traceback (most recent call last):
  File "test.py", line 2, in <module>
    print('\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440')
  File "C:\Users\raksa\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

enter image description here

Does anyone know the reason behind of getting error when execute python code via Git Bash?



Solution 1:[1]

Python 3.6 directly uses the Windows API to write Unicode to the console, so is much better about printing non-ASCII characters. But Git Bash isn't the standard Windows console so it falls back to previous behavior, encoding Unicode string in the terminal encoding (in your case, cp1252). cp1252 doesn't support Cyrillic, so it fails. This is "normal". You'll see the same behavior in Python 3.5 and older.

In the Windows console Python 3.6 should print the actual Cyrillic characters, so what is surprising is your "?????? ???". That is not "normal", but perhaps you don't have a font selected that supports Cyrillic. I have a couple of Python versions installed:

C:\>py -3.6 --version
Python 3.6.2

C:\>py -3.6 test.py
?????? ???

C:\>py -3.3 --version
Python 3.3.5

C:\>py -3.3 test.py
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    print('\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440 \u4f60\u597d')
  File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

Solution 2:[2]

Had this problem with python 3.9

import sys, locale
print("encoding", sys.stdout.encoding)
print("local preferred", locale.getpreferredencoding())
print("fs encoding", sys.getfilesystemencoding())

If this returns "cp1252" and not "utf-8" then print() doesn't work with unicode.

This was fixed by changing the windows system locale.

Region settings > Additional settings > Administrative > Change system locale > Beta: Use Unicode UTF-8 for worldwide language support

Solution 3:[3]

Since Python 3.7 you can do

import sys
sys.stdout.reconfigure(encoding='utf-8')

This mostly fixes the git bash problem for me with Chinese characters. They still don't print correctly to standard out on the console, but it doesn't crash, and when redirected to a file the correct unicode characters are present.

Credit to sth in this answer.

Solution 4:[4]

Set the the environment variable PYTHONUTF8=1, or Use -Xutf8 command line option.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 David Stephan
Solution 3 Adam Burke
Solution 4 Jeremy Caney