'How to remove 2d array from 3d array if it contains NA values
I am working on a seq2seq machine learning problem with Conv1D and LSTM, to do this I must produce a tensor input of the shape samples, timesteps, features
. Aside from the problems that I was having with the LSTM layer (different topic). I find myself struggling to delete a 2d slice of my 3d input tensor if it contains NA value(s). I want to delete the entire sample if any feature, in anytimestep is NA.
Up until now to keep it simple i was working with univariate data and my solution was to simply transform my array into a pandas dataframe and use their df.dropna(axis=0)
function to drop the entire sample. However that function only works with 2d dataframes. I've tried looping over my samples to produce 2d arrays that i can then convert into pandas dataframes, but got stuck trying to add the 2d arrays together again. And i figured, there has GOT to be a cleaner way to go about this. So i found this example:
x = np.array([[[1,2,3], [4,5,np.nan]], [[7,8,9], [10,11,12]]])
print("Original array:")
print(x)
print("Remove all non-numeric elements of the said array")
print(x[~np.isnan(x).any(axis=2)])
which works for 2d arrays, but i figured it would work with any number of dimensions, I was wrong... I don't understand what I'm doing wrong here. For completeness sake, here is my function that successfully deletes input and its corresponding output from X_train and y_train if either X_train OR y_train contains NA value(s) (but this only works for univariate data as the 3rd dimension in the X_train tensor is of shape 1 and can therefore be dropped):
def drop_days_with_na(df, df1):
df_shape = df.shape
df = df.reshape(df.shape[0], df.shape[1])
df = np.concatenate((df, df1), axis=1)
df = pd.DataFrame(df)
na_index = df.isna()
df = df.dropna(axis=0)
df = np.array(df)
df = df.reshape(df.shape[0], df.shape[1], 1)
df1 = df[:,df_shape[1]:,:]
df1 = df1.reshape(df1.shape[0], df1.shape[1])
df = df[:,:df_shape[1],:]
return df, df1, na_index
Solution 1:[1]
This solved my problem:
def remove_nan(X,Y):
x = []
y = []
for sample in range(X.shape[0]):
if np.isnan(X[sample,:,:]).any() | np.isnan(Y[sample,:]).any():
None
else:
x.append(X[sample,:,:])
y.append(Y[sample,:])
x = np.array(x)
y = np.array(y)
return x, y
x_train, y_train = remove_nan(x_train, y_train)
x_test, y_test = remove_nan(x_test, y_test)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |