'How to extract efficiently time-series data from a netCDF file?
I want to extract time-series of data from a unique netCDF file. I have to extract three-time series of daily temperatures across more than 500 cities from 2004 to 2016 (more precisely, I extract 3-time series across 3 points coordinates for each city).
The following program works, but it is very slow. (More than 8hours to obtain one location time series). I have already tried to divide coordinates into several CSV files and run the program separately for each of these files, but it is not very efficient. Maybe I should chunck the netCDF file (5 Go) into smaller files to reduce the "reading" process. But I don't know how to do that.
from netCDF4 import Dataset
from datetime import datetime
from netCDF4 import Dataset
import pandas as pd
import os
import numpy as np
os.chdir('D:PATH/tmp/')
date_range = pd.date_range(start = "2004-01-01", end = "2016-12-31", freq ='D')
df = pd.DataFrame(0.0, columns = ['Temp1','Temp2','Temp3'], index = date_range)
cities = pd.read_csv(r'D:\PATH\cities_coordinates.csv', sep =',')
cities['NUTS_ID']= cities['NUTS_ID'].map(str)
for index, row in cities.iterrows():
location = row['NUTS_ID']
location_latitude1 = row['lat1']
location_longitude1 = row['lon1']
location_latitude2 = row['lat2']
location_longitude2 = row['lon2']
location_latitude3 = row['lat3']
location_longitude3 = row['lon3']
for day in date_range:
data = Dataset("D:/PATH/temperature.nc",'r')
# Storing the lat and lon data into variables of the netCDF file into variables
lat = data.variables['latitude'][:]
lon = data.variables['longitude'][:]
# Squared difference between the specified lat, lon and the lat, lon of the netCDF
sq_diff_lat1 = (lat - location_latitude1)**2
sq_diff_lon1 = (lon - location_longitude1)**2
sq_diff_lat2 = (lat - location_latitude2)**2
sq_diff_lon2 = (lon - location_longitude2)**2
sq_diff_lat3 = (lat - location_latitude3)**2
sq_diff_lon3 = (lon - location_longitude3)**2
# Identify the index of the min value for lat and lon
min_index_lat1 = sq_diff_lat1.argmin()
min_index_lon1 = sq_diff_lon1.argmin()
min_index_lat2 = sq_diff_lat2.argmin()
min_index_lon2 = sq_diff_lon2.argmin()
min_index_lat3 = sq_diff_lat3.argmin()
min_index_lon3 = sq_diff_lon3.argmin()
# Accessing the temperature data
tx = data.variables['tx']
start = '2004-01-01'
end = '2016-12-31'
d_range = pd.date_range(start = start, end = end, freq='D')
for t_index in np.arange(0, len(d_range)):
print('Recording the value for: '+str(d_range[t_index]))
df.loc[d_range[t_index]]['Temp1']=tx[t_index, min_index_lat1, min_index_lon1]
df.loc[d_range[t_index]]['Temp2']=tx[t_index, min_index_lat2, min_index_lon2]
df.loc[d_range[t_index]]['Temp3']=tx[t_index, min_index_lat3, min_index_lon3]
df.to_csv(location +'.csv')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|