'data time Format recognition in exported excel with xlsxwriter
I didn't find a solution for this:
From a dataframe I generate an excel and some columns need to be in format hh:mm:ss (with no limit to 24h, for example a value can be '28:39:13'.
When generating the excel everything looks okay. But when operating with the values of the cells isn't working properly until I apply the "text to columns" option.
## Example code
import pandas as pd
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
example_list = ["12:35:25", "", "", "", "27:36:11"]
example_df = pd.DataFrame(example_list)
writer = pd.ExcelWriter('enhanced.xlsx', engine='xlsxwriter')
example_df.to_excel(writer, index=False, header=False, sheet_name='report')
workbook = writer.book
worksheet = writer.sheets['report']
total_hformat = workbook.add_format({'num_format': '[h]:mm:ss', 'bold': False})
worksheet.set_column('A:C', 12, total_hformat)
writer.save()
writer.close()
Before applying "text to columns":
A1+A2+A3+A4+A5+A6 returns correct result.
sum(A1:A6) returns wrong result
Applying "text to columns":
After applying "text to columns":
Why does this happen? How can I solve it from xlsxwriter and avoid the need of converting all the columns after exporting an excel??
Thanks.
EDIT: I SOLVED IT!! Excel has own numeric time format which is composed of a integer (days) + fraction (percentage time of the day). Example 1,25 = 30 hours. Then we just need the time in seconds (unix format) then divide the time between the amount of seconds of a day (60*60*24). That is the value which sould be in the dataframe. When exporting with xlsxwriter, just apply the format '[h]:mm:ss' like i shown in the example above.
Solution 1:[1]
I SOLVED IT: Excel has own numeric time format which is composed of a integer (days) + fraction (percentage time of the day). Example 1,25 = 30 hours. Then we just need the time in seconds (unix format) then divide the time between the amount of seconds of a day (60*60*24). That is the value which sould be in the dataframe. When exporting with xlsxwriter, just apply the format '[h]:mm:ss' like i shown in the example above.
Solution 2:[2]
The best way to solve this would be to convert the time strings to datetime objects and then set '[h]:mm:ss' as the datetime_format in pandas.ExcelWriter. However, I don't think it is possible to parse 24h+ times with datetime.strptime().
In which case it is probably easiest to convert the time string to an Excel serial date. Something like this:
import pandas as pd
import re
example_list = ['12:35:25', '', '', '', '27:36:11']
# Convert the time string to Excel serial dates.
for i, date in enumerate(example_list):
m = re.match('^(\d+):(\d+):([\d\.]+)$', date)
if m:
seconds = (int(m.group(1)) * 60 * 60 +
int(m.group(2)) * 60 +
float(m.group(3)))
serial_date = seconds / (24 * 60 * 60)
example_list[i] = serial_date
example_df = pd.DataFrame(example_list)
writer = pd.ExcelWriter('enhanced.xlsx', engine='xlsxwriter')
example_df.to_excel(writer, index=False, header=False, sheet_name='report')
workbook = writer.book
worksheet = writer.sheets['report']
total_hformat = workbook.add_format({'num_format': '[h]:mm:ss', 'bold': False})
worksheet.set_column('A:C', 12, total_hformat)
writer.close()
Output:
Apart from the for() loop the rest of the code is the same as yours. I've allowed for decimal seconds which may not be necessary.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | jmcnamara |