'H2O python - How to let h2oframe to dataframe with correctly character and datetime
I have a csv file, and want to use H2O to do DeepLearning. But it has some Chinese and datetime that when I finish my Deeplearning need to save output to csv, it can't return to original data.
I use small data to show my problem here.
In[1]: df = pd.DataFrame({'datetime':['2016-12-17 00:00:00'],'time':['00:00:30'],'month':['月'], 'weekend':['周六']})
print(df.dtypes)
df
out[1]: datetime object
time object
month object
weekend object
dtype: object
datetime time month weekend
0 2016-12-17 00:00:00 00:00:30 月 周六
In[2]: h2o_frame = h2o.H2OFrame(df);h2o_frame ;h2o_frame.types ;h2o_frame
C:\Users\thi\Anaconda3\lib\site-packages\h2o\utils\shared_utils.py:170: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. data = _handle_python_lists(python_obj.as_matrix().tolist(), -1)[1]
out[2]: Parse progress: |█████████████████████████████████████████████████████████| 100%
datetime time month weekend
2016-12-17 00:00:00 1970-01-01 00:00:30 <0xA4EB> <0xA9>P<0xA4BB>
the time I want it just only 00:00:30, any way to fix it?
month and weekends I don't find any way to let it show Chinese, but I still finish my deeplearning
But when I want to let h2oframe back to DataFrame and save to csv file, it save <0xA4EB>
for me but not 月
, and datetime change to int
In[3]: dff = h2o_frame.as_data_frame();dff
out[3]: datetime time month weekend
0 1481932800000 30000 <0xA4EB> <0xA9>P<0xA4BB>
- How to correctly return character from h2oframe to DataFrame
- How to correctly return datetime from h2oframe to DataFrame
Solution 1:[1]
One simplest way to solve this is, when you convet pandas frame to H2OFrame use argument column_types ,as below:
In [69]: col_types
Out[69]: ['categorical', 'categorical', 'categorical', 'categorical']
In [70]: h2o_frame = h2o.H2OFrame(df,column_types=col_types);h2o_frame ;h2o_frame.types ;h2o_frame
Parse progress: |?????????????????????????????????????????????????????????????????????????????| 100%
Out[70]:
datetime month time weekend
------------------- ------- -------- ---------
2016-12-17 00:00:00 ? 00:00:30 ??
[1 row x 4 columns]
In [71]: dff = h2o_frame.as_data_frame();dff
Out[71]:
datetime month time weekend
0 2016-12-17 00:00:00 ? 00:00:30 ??
Solution 2:[2]
allfiles = h2o.import_file(path='data/', pattern=".csv")
df = allfiles.as_data_frame()
df['datetime'] = pd.to_datetime(df["datetime"], unit='ms')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Henry Ecker |
Solution 2 | user1098761 |