'Using Variance of time series as input feature for time series clustering

I have a time series dataset, it is a data frame with 2000 rows and 1000 columns. Each rows is for one specific id and has a unique pattern. I want to clustering this data into multiple classes. Let to put a sample of my dataset and a work that I've done so far.

Here is my data frame:

import pandas as pd 

df = pd.DataFrame()
df['val1'] = [1, 3, 95,34,5]
df['val2'] = [1, 2, 95,84,15]
df['val3'] = [1, 3, 85,74,25]
df['val4'] = [1, 2, 75,64,5]
df['val5'] = [1, 1, 65,24,35]
df['val6'] = [1, 6, 55,14,45]
df['val7'] = [1, 3, 45,34,5]
df['val8'] = [1, 9, 35,44,55]
df['val9'] = [1, 3, 25,24,75]
df['val10'] = [1, 9, 5,14,25]

I am using the tslearn.clustering for clustering as follow and it works:

from tslearn.clustering import TimeSeriesKMeans
import numpy as np
n_clusters= 5

ts = df.to_numpy()

ts_input =  ts[:,:, np.newaxis]

sz, len_series,d = ts_input.shape
sdtw_km = TimeSeriesKMeans(n_clusters, metric="euclidean", max_iter=50, 
max_iter_barycenter=200)  

km_pred = sdtw_km.fit_predict(ts_input)

However, I want to use the variance of each row, and maximum of each row as input feature to help the clustering. But I don't have any idea how to consider these features for the clustering. Could you please help me how to consider these two features in my method to help the clustering task? Thanks



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source