'Python print Unicode string via 'Git Bash' gets 'UnicodeEncodeError'
in test.py i have
print('Привет мир')
with cmd worked as normal
> python test.py
?????? ???
with Git Bash got error
$ python test.py
Traceback (most recent call last):
File "test.py", line 2, in <module>
print('\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440')
File "C:\Users\raksa\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>
Does anyone know the reason behind of getting error when execute python code via Git Bash?
Solution 1:[1]
Python 3.6 directly uses the Windows API to write Unicode to the console, so is much better about printing non-ASCII characters. But Git Bash isn't the standard Windows console so it falls back to previous behavior, encoding Unicode string in the terminal encoding (in your case, cp1252). cp1252 doesn't support Cyrillic, so it fails. This is "normal". You'll see the same behavior in Python 3.5 and older.
In the Windows console Python 3.6 should print the actual Cyrillic characters, so what is surprising is your "?????? ???". That is not "normal", but perhaps you don't have a font selected that supports Cyrillic. I have a couple of Python versions installed:
C:\>py -3.6 --version
Python 3.6.2
C:\>py -3.6 test.py
?????? ???
C:\>py -3.3 --version
Python 3.3.5
C:\>py -3.3 test.py
Traceback (most recent call last):
File "test.py", line 1, in <module>
print('\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440 \u4f60\u597d')
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>
Solution 2:[2]
Had this problem with python 3.9
import sys, locale
print("encoding", sys.stdout.encoding)
print("local preferred", locale.getpreferredencoding())
print("fs encoding", sys.getfilesystemencoding())
If this returns "cp1252" and not "utf-8" then print() doesn't work with unicode.
This was fixed by changing the windows system locale.
Region settings > Additional settings > Administrative > Change system locale > Beta: Use Unicode UTF-8 for worldwide language support
Solution 3:[3]
Since Python 3.7 you can do
import sys
sys.stdout.reconfigure(encoding='utf-8')
This mostly fixes the git bash problem for me with Chinese characters. They still don't print correctly to standard out on the console, but it doesn't crash, and when redirected to a file the correct unicode characters are present.
Credit to sth in this answer.
Solution 4:[4]
Set the the environment variable PYTHONUTF8=1
, or
Use -Xutf8
command line option.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | David Stephan |
Solution 3 | Adam Burke |
Solution 4 | Jeremy Caney |