'Fastest way to move outlook emails to postgresql db using Npgsql

I have around 200 000 emails in outlooks public folders.

Exporting to pst is a little bit fast but I don't know if psts are reliable. Also decoding it perfectly(with attachments) became a big headache. python lib doesn’t save attachments. Java lib not importing html body for all emails.

So, I thought of saving to postgresql db.

I am using Npgsql but it is slow(with the way I am using it).

My current slow way:

sql = "insert into tablename (a,b,c) values (:aa, :bb, :cc)";  
cmd = (sql, con) #I am skipping full commands as I am reading from mind, dont think I made mistake in my code.  

NpgsqlParameter pa = new NpgsqlParameter(":aa", sometype)
#same for other 3

pa.value = somevalue

cmd.Parameters.Add(pa)
#add others also

If I comment the db save part, it is really fast(like 1000 mails in 10 seconds), but with db part it is around 10 mails for 2 seconds.

I think reason is, I have 10 columns (htmlbody, body, headerstring etc) and I am creating new NpgsqlParameter for these 10 columns, setting value for it for every mail which is making it slow I feel.

I am unable to simply do like

"...values('"+htmlbody+"', '"+header_string+"'..

because of errors

syntax error at or near..so I started using npgsqlparameter.  

Also, I am saving below fields only in case parsing the header_string fails(its failing for some emails sadly particularly the ones having 'microsoft ..version 2' at the beginning of the headerstring). Are they enough to represent full emails just in case headerstring is useless?

    sender_entry_id 
    foldername text,
    subject text,

    headerstring text,
    num_of_attachments int,

    body text,
    body_html text,

    bcc text,
    tooo text,
    frm text,
    cc text,
    conversation_id text,
    conversation_index text,
    conversation_topic text,
    message_class text,
    senton timestamp

Also, I am getting this error

System.Runtime.InteropServices.COMException (0x80040304): The operation failed.

at Microsoft.Office.Interop.Outlook.PropertyAccessorClass.GetProperty(String SchemaName)

while saving some attachments as byte[] variable. Here is the code:

 const string PR_ATTACH_DATA_BIN =
"http://schemas.microsoft.com/mapi/proptag/0x37010102";

 byte[] attachmentData =
 (byte[])attachment.PropertyAccessor.GetProperty(
     PR_ATTACH_DATA_BIN);  #attachment is Outlook.Attachment


Solution 1:[1]

The GetProperty method of the PropertyAccessor class can't read large PT_BINARY values (> 64kB). That's why you get an exception in the code.

You can use a low-level code (Extended MAPI) to open the attachment data as IStream:

IAttach::OpenProperty(PR_ATTACH_DATA_BIN, IID_IStream).

Solution 2:[2]

You can save the attachment to a temporary file first (Attachment.SaveAsFile), then read its data.

As Eugene mentioned, you can access the attachment data as IStream using Extended MAPI, but it is not accessible in C#, only in C++ or Delphi.

If using Redemption (I am its author) is an option, you can use the AsText, AsArray, and AsStream properties exposed both by the RDOAttachment and Attachment object (returned by the Safe*Item objects).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Eugene Astafiev
Solution 2