'Why doesn't mean square error work in case of angular data?

Suppose, the following is a dataset for solving a regression problem:

H   -9.118   5.488   5.166   4.852   5.164   4.943   8.103  -9.152  7.470  6.452  6.069  6.197  6.434  8.264  9.047         2.222
H    5.488   5.166   4.852   5.164   4.943   8.103  -9.152  -8.536  6.452  6.069  6.197  6.434  8.264  9.047 11.954         2.416 
C    5.166   4.852   5.164   4.943   8.103  -9.152  -8.536   5.433  6.069  6.197  6.434  8.264  9.047 11.954  6.703         3.028
C    4.852   5.164   4.943   8.103  -9.152  -8.536   5.433   4.924  6.197  6.434  8.264  9.047 11.954  6.703  6.407        -1.235
C    5.164   4.943   8.103  -9.152  -8.536   5.433   4.924   5.007  6.434  8.264  9.047 11.954  6.703  6.407  6.088        -0.953 
H    4.943   8.103  -9.152  -8.536   5.433   4.924   5.007   5.057  8.264  9.047 11.954  6.703  6.407  6.088  6.410         2.233
H    8.103  -9.152  -8.536   5.433   4.924   5.007   5.057   5.026  9.047 11.954  6.703  6.407  6.088  6.410  6.206         2.313
H   -9.152  -8.536   5.433   4.924   5.007   5.057   5.026   5.154 11.954  6.703  6.407  6.088  6.410  6.206  6.000         2.314
H   -8.536   5.433   4.924   5.007   5.057   5.026   5.154   5.173  6.703  6.407  6.088  6.410  6.206  6.000  6.102         2.244 
H    5.433   4.924   5.007   5.057   5.026   5.154   5.173   5.279  6.407  6.088  6.410  6.206  6.000  6.102  6.195         2.109 

the left-most column is the class data. The rest of the features are all angular data.

My initial setup for the model was as follows:

def create_model(n_hidden_1, n_hidden_2, num_features):
    # create the model
    model = Sequential()
    model.add(tf.keras.layers.InputLayer(input_shape=(num_features,)))
    model.add(tf.keras.layers.Dense(n_hidden_1, activation='relu'))
    model.add(tf.keras.layers.Dense(n_hidden_2, activation='relu'))
    model.add(tf.keras.layers.Dense(1))

    # instantiate the optimizer
    opt = keras.optimizers.Adam(learning_rate=LEARNING_RATE)

    # compile the model
    model.compile(
         loss="mean_squared_error",
         optimizer=opt,
         metrics=["mean_squared_error"]
    )

    # return model
    return model

This model didn't produce the correct outcome.

Someone told me that MSE doesn't work in the case of angular data. So, I need to use a custom output layer and a custom error function.

Why doesn't mean square error work in the case of angular data?

How can I solve this issue?



Solution 1:[1]

Data that represent angles like 180 degrees, causes problems with most loss functions because they are not meant for radiants. MSE calculates a huge error between 0 and 359 although 0=360. It simply doesn’t understand the concepts of radiants and angles.

There are a number of ways to fix this depending on what you want to predict. The easiest would be to transform your data via sinus function and then use the transformed data for training. You would need to apply the inverse function to your predictions.

The other option is to customise the MSE loss function to transform x via the sinus function.

Solution 2:[2]

I am assuming that by "angular" you mean some form of representation of an angle. If this is the case, then MSE does not work well because it does not have a concept of 0 == 360 (or equivalent regularities in radians), thus e.g. predicting 359.999999 for a correct label of 0 will create a huge error, while it should produce a tiny error.

Solution 3:[3]

import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
%matplotlib inline

def create_model(n_hidden_1, n_hidden_2, num_features):
    # create the model
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.InputLayer(input_shape=(num_features,)))
    model.add(tf.keras.layers.Dense(n_hidden_1, activation='relu'))
    model.add(tf.keras.layers.Dense(n_hidden_2, activation='sigmoid')) # relu ignores data information, so we choose sigmoid.
    model.add(tf.keras.layers.Dense(1))

    # instantiate the optimizer
    opt = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE)

    # compile the model
    model.compile(
         loss="mean_squared_error",
         optimizer="adam",
#          metrics=["mean_squared_error"]
    )

    # return model
    return model

ss = [['H',-9.118,5.488,5.166,4.852,5.164,4.943,8.103,-9.152,7.470,6.452,6.069,6.197,6.434,8.264,9.047, 2.222],
['H',5.488,5.166,4.852,5.164,4.943,8.103,-9.152,-8.536,6.452,6.069,6.197,6.434,8.264,9.047,11.954, 2.416],
['C',5.166,4.852,5.164,4.943,8.103,-9.152,-8.536,5.433,6.069,6.197,6.434,8.264,9.047,11.954,6.703, 3.028],
['C',4.852,5.164,4.943,8.103,-9.152,-8.536,5.433,4.924,6.197,6.434,8.264,9.047,11.954,6.703,6.407,-1.235],
['C',5.164,4.943,8.103,-9.152,-8.536,5.433,4.924,5.007,6.434,8.264,9.047,11.954,6.703,6.407,6.088,-0.953],
['H',4.943,8.103,-9.152,-8.536,5.433,4.924,5.007,5.057,8.264,9.047,11.954,6.703,6.407,6.088,6.410, 2.233],
['H',8.103,-9.152,-8.536,5.433,4.924,5.007,5.057,5.026,9.047,11.954,6.703,6.407,6.088,6.410,6.206, 2.313],
['H',-9.152,-8.536,5.433,4.924,5.007,5.057,5.026,5.154,11.954,6.703,6.407,6.088,6.410,6.206,6.000, 2.314],
['H',-8.536,5.433,4.924,5.007,5.057,5.026,5.154,5.173,6.703,6.407,6.088,6.410,6.206,6.000,6.102, 2.244],
['H',5.433,4.924,5.007,5.057,5.026,5.154,5.173,5.279,6.407,6.088,6.410,6.206,6.000,6.102,6.195, 2.109]]

data = pd.DataFrame(ss)
y =data.iloc[:,-1:]
x = data.iloc[:,1:-1]

# scaler, Accelerating model convergence
scaler = MinMaxScaler()
x_model = scaler.fit(x)
x = scaler.transform(x)

y_model = scaler.fit(y)
y = scaler.transform(y)

# model 
LEARNING_RATE = 0.001
model = create_model(n_hidden_1=4, n_hidden_2=2, num_features=15)

model.fit(x,y,epochs=1000)

# predict 
y_pre = model.predict(x)
print('predict: ',y_model.inverse_transform(y_pre))
print('y: ',y_model.inverse_transform(y))
predict:  [[ 2.2238724 ]
 [ 2.415551  ]
 [ 3.0212667 ]
 [-1.1861311 ]
 [-0.98702306]
 [ 2.2277246 ]
 [ 2.3132346 ]
 [ 2.3148017 ]
 [ 2.235104  ]
 [ 2.1206288 ]]
y:  [[ 2.222]
 [ 2.416]
 [ 3.028]
 [-1.235]
 [-0.953]
 [ 2.233]
 [ 2.313]
 [ 2.314]
 [ 2.244]
 [ 2.109]]

I wrote a piece of code and annotated it. In terms of the results, the difference is very small. Friendly tip: pay attention to the over fitting of the model.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 former_Epsilon
Solution 2 lejlot
Solution 3 lazy