'Read outlook emails in databricks
I would like to read mails from microsoft outlook using python and run the script using a databricks cluster.
I'm using win32com on my local machine and able to read emails. However, when i try to install the same package on databricks , it seems to throw an error saying
DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: org.apache.spark.SparkException: Process List(/databricks/python/bin/pip, install, pywin32, --disable-pip-version-check) exited with code 1. ERROR: Could not find a version that satisfies the requirement pywin32 ERROR: No matching distribution found for pywin32
sample code is as follows
import win32com.client
import pandas as pd
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI").Folders
emails_list = [
'[email protected]'
]
subjects = []
categories = []
body_content = []
names = []
for id, name in enumerate(emails_list):
folder = outlook(name)
#print('Accessing email - ' , folder)
inbox = folder.Folders("Inbox")
message = inbox.Items
message = message.GetFirst()
body_content.append(message.Body)
subjects.append(message.Subject)
categories.append(message.Categories)
names.append(name)
df = pd.DataFrame(list(zip(names,subjects,categories,body_content)),
columns=['names','subjects','categories','body_content'])
df.head(3)
Solution 1:[1]
Databricks clusters are using Linux (specifically, Ubuntu Linux), so you can't use COM library that is designed for Windows. Potentially you can access your emails in the Office 365 using IMAP protocol, or something like that (see docs). Python has built-in imaplib
library that could be used for that purpose, for example, like in the following article.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Alex Ott |