'Parsing outlook .msg files with python to get HTML body

I looked around on StackOverflow and couldn't find a satisfactory answer for getting an HTML body from the msg file. Does anyone know how to parse .msg files from outlook with Python?

I've tried using extract_msg and msg_parser with no luck. Help would be greatly appreciated!

I have also used chardet.detect to detect the encoding but sometimes it misbehave with some bullets point likes . to ·

Example Code :

msg = extract_msg.openMsg('test.msg')
msg_obj = MsOxMessage('test.msg')

html = ''
try:
    body_encoding = chardet.detect(msg.htmlBody)['encoding']
    html = msg.htmlBody.decode(body_encoding) if msg.htmlBody else ''
except:
    html = None


Solution 1:[1]

Keep in mind that MSG files created by Outlook do not contain the PR_HTML MAPI property (unlike the messages in an Outlook store that natively supports HTML); the HTML is encoded inside the PR_RTF_COMPRESSED property, which contains compressed RTF stream - take a look an an MSG file with OutlookSpy (I am its author) - click "More functions | OpenIMsgOnIStg".

You can use Outlook Object Model to call Namespace.OpenSharedItem and then read MailItem.HTMLBody property, but Outlook Object Model cannot be used in a service (such as IIS) and a temporary message will be created in the default store, which means Outlook needs to log to a profile first.

If using Redemption is an option (I am its author - it is an Extended MAPI wrapper and it can be used from a service in any language), you can use RDOSession.GetMessageFromMsgFile and then read RDOMail.HTMLBody property.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dmitry Streblechenko