'Parsing outlook .msg files with python to get HTML body
I looked around on StackOverflow and couldn't find a satisfactory answer for getting an HTML body from the msg
file. Does anyone know how to parse .msg files from outlook with Python?
I've tried using extract_msg
and msg_parser
with no luck. Help would be greatly appreciated!
I have also used chardet.detect
to detect the encoding but sometimes it misbehave with some bullets point likes .
to ·
Example Code :
msg = extract_msg.openMsg('test.msg')
msg_obj = MsOxMessage('test.msg')
html = ''
try:
body_encoding = chardet.detect(msg.htmlBody)['encoding']
html = msg.htmlBody.decode(body_encoding) if msg.htmlBody else ''
except:
html = None
Solution 1:[1]
Keep in mind that MSG files created by Outlook do not contain the PR_HTML
MAPI property (unlike the messages in an Outlook store that natively supports HTML); the HTML is encoded inside the PR_RTF_COMPRESSED
property, which contains compressed RTF stream - take a look an an MSG file with OutlookSpy (I am its author) - click "More functions | OpenIMsgOnIStg".
You can use Outlook Object Model to call Namespace.OpenSharedItem and then read MailItem.HTMLBody property, but Outlook Object Model cannot be used in a service (such as IIS) and a temporary message will be created in the default store, which means Outlook needs to log to a profile first.
If using Redemption is an option (I am its author - it is an Extended MAPI wrapper and it can be used from a service in any language), you can use RDOSession.GetMessageFromMsgFile
and then read RDOMail.HTMLBody
property.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Dmitry Streblechenko |