'How to remove 2d array from 3d array if it contains NA values

I am working on a seq2seq machine learning problem with Conv1D and LSTM, to do this I must produce a tensor input of the shape samples, timesteps, features. Aside from the problems that I was having with the LSTM layer (different topic). I find myself struggling to delete a 2d slice of my 3d input tensor if it contains NA value(s). I want to delete the entire sample if any feature, in anytimestep is NA.

Up until now to keep it simple i was working with univariate data and my solution was to simply transform my array into a pandas dataframe and use their df.dropna(axis=0) function to drop the entire sample. However that function only works with 2d dataframes. I've tried looping over my samples to produce 2d arrays that i can then convert into pandas dataframes, but got stuck trying to add the 2d arrays together again. And i figured, there has GOT to be a cleaner way to go about this. So i found this example:

x = np.array([[[1,2,3], [4,5,np.nan]], [[7,8,9], [10,11,12]]])
print("Original array:")
print(x)
print("Remove all non-numeric elements of the said array")
print(x[~np.isnan(x).any(axis=2)])

which works for 2d arrays, but i figured it would work with any number of dimensions, I was wrong... I don't understand what I'm doing wrong here. For completeness sake, here is my function that successfully deletes input and its corresponding output from X_train and y_train if either X_train OR y_train contains NA value(s) (but this only works for univariate data as the 3rd dimension in the X_train tensor is of shape 1 and can therefore be dropped):

def drop_days_with_na(df, df1):
    df_shape = df.shape
    df = df.reshape(df.shape[0], df.shape[1])
    df = np.concatenate((df, df1), axis=1)
    df = pd.DataFrame(df)
    na_index = df.isna()
    df = df.dropna(axis=0)
    df = np.array(df)
    df = df.reshape(df.shape[0], df.shape[1], 1)
    df1 = df[:,df_shape[1]:,:]
    df1 = df1.reshape(df1.shape[0], df1.shape[1])
    df = df[:,:df_shape[1],:]
    return df, df1, na_index


Solution 1:[1]

This solved my problem:

def remove_nan(X,Y):
    x = []
    y = []
    for sample in range(X.shape[0]):
        if np.isnan(X[sample,:,:]).any() | np.isnan(Y[sample,:]).any():
            None
        else:
            x.append(X[sample,:,:])
            y.append(Y[sample,:])
    x = np.array(x)
    y = np.array(y)
    return x, y

x_train, y_train = remove_nan(x_train, y_train)
x_test, y_test = remove_nan(x_test, y_test)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1