'Write data in unknown encoding
Is it possible write data to a file in an unknown encoding? I cannot decode email headers, for example message-id, because if I use handler ignore or a replace https://docs.python.org/3/library/codecs.html#error-handlers non-RFC header will be RFC-compliant and antispam don't increase spam score.
I get string from postfix in milter protocol. I cannot save this data unchanged for antispam, raise UnicodeError. Examples:
cat savefile
#!/usr/bin/python3
import sys
fh = open('test', 'w+')
fh.write(sys.argv[1])
echo žlutý | xargs ./savefile && cat test
žlutý
echo žlutý | iconv -f UTF-8 -t ISO8859-2 - | xargs ./savefile
Traceback (most recent call last):
File "/root/./savefile", line 5, in <module>
fh.write(sys.argv[1])
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcbe' in position 0: surrogates not allowed
Input may be a lot of unknown encoding. Milter application in python2 works well.
Solution 1:[1]
You want to handle raw bytes
then, not strings. open
the output file in binary mode. Note this:
sys.argv
..
Note: On Unix, command line arguments are passed by bytes from OS. Python decodes them with filesystem encoding and “surrogateescape” error handler. When you need original bytes, you can get it by
[os.fsencode(arg) for arg in sys.argv]
.
So:
import sys
import os
with open('test', 'wb+') as fh:
fh.write(os.fsencode(sys.argv[1]))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | deceze |