'Using HMMLEARN for forecasting a time series, returns always the same value

I am trying to vaguely reproduce the results of this paper: https://users.cs.duke.edu/~bdhingra/papers/stock_hmm.pdf using the HMMLEARN package in python. In short, the paper is about using a Gaussian mixture Hidden Markov Model (GMHMM) to predict the close value of a stock, given its open value for a given day. In more detail, the author uses historical data for the Open, High, Low and Close values of a stock to define the vector of fractional changes:

enter image description here

Then, they train the GMHMM given the historical data and finally using a maximum a posteriori (MAP) method to predict the next Close value, after observing the Open value for a given day and previous n days (specifically they take n=10 days). Mathematically, this is equivalent to the maximization problem:

enter image description here

Regarding the last step, in practice, they discretize the space of possible values of the three variables fracChange, fracHigh and fracLow, the estimate the loglikelihood using the forward-backward algorithm for all possible discrete values and select the one that maximizes it as the prediction for the values of the observation vector for day d+1.

Now to my issues. In python I first load and reshape appropriately the historical data (downloaded historical data from Yahoo Finance as in the paper for Apple stock between 10 Feb 2003 to 10 Sept 2004 for the train set and between 13 Sept 2004 to 21 Jan 2005 for the test set, both as a csv file named "AAPL_train_set.csv" and "AAPL_test_set.csv") and then successfully train a GMHMM for the historical data. Then, I construct a list of the discretized next day predictions and given the observed values for 10 previous days, I want to select the one from the list that maximizes the loglikelihood. However, whatever the choice of the previous 10 days of data, I always get the same prediction for the next day, which makes no sense. To calculate the probability I use the function scores(). In detail, here is my code:

import numpy as np
import pandas as pd
from hmmlearn import hmm

# Load train data and process. Set correct filepath
filepath_train= "...(link to).../AAPL_train_set.csv"
df_train=pd.read_csv(filepath_train)
obs_train_unprocessed=df_train[["Open", "High", "Low", "Close"]]
trainData=pd.DataFrame({'fracChange':(obs_train_unprocessed["Open"]-obs_train_unprocessed["Close"])/obs_train_unprocessed["Open"], 'fracHigh':(obs_train_unprocessed["High"]-obs_train_unprocessed["Open"])/obs_train_unprocessed["Open"], 'fracLow':(obs_train_unprocessed["Open"]-obs_train_unprocessed["Low"])/obs_train_unprocessed["Open"]})
trainData=pd.DataFrame(trainData).to_numpy()

# Load test data and process
filepath_test="...(link to).../AAPL_test_set.csv"
df_test=pd.read_csv(filepath_train)
obs_test_unprocessed=df_test[["Open", "High", "Low", "Close"]]
testData=pd.DataFrame({'fracChange':(obs_test_unprocessed["Open"]-obs_test_unprocessed["Close"])/obs_test_unprocessed["Open"], 'fracHigh':(obs_test_unprocessed["High"]-obs_test_unprocessed["Open"])/obs_test_unprocessed["Open"], 'fracLow':(obs_test_unprocessed["Open"]-obs_test_unprocessed["Low"])/obs_test_unprocessed["Open"]})
testData=pd.DataFrame(testData).to_numpy()

# Train the model
model = hmm.GMMHMM(n_components=3, n_mix=3, covariance_type="full", n_iter=1000)
modelTrained=model.fit(trainData)

# List of potential prediction values
potential_prediction = [np.linspace(-0.1,0.1,51), np.linspace(0, 0.1, 11), np.linspace(0, 0.1, 11)]
list_of_potential_predictions = [[None,None,None]]
for x in potential_prediction[0]:
    for y in potential_prediction[1]:
        for z in potential_prediction[2]:
            list_of_potential_predictions=np.append(list_of_potential_predictions, [[x,y,z]], axis=0)
list_of_potential_predictions = np.delete(list_of_potential_predictions, (0), axis=0)

# For the test set and a window of 10 days, I choose the most probable value from the list of potential predictions by employing the .score() method. 
predictions=[]
for j in range(5):
    scores=[]
    for i in list_of_potential_predictions:
        scores= np.append(scores, modelTrained.score(np.vstack([testData[j:(j+10), 0:3], [i]])))
    maxScoreIndex=np.where(scores == np.amax(scores))
    predictions=np.append(predictions,list_of_potential_predictions[maxScoreIndex])

However, all the predictions I get are the same no matter what is the past data. At this point I am confused and not sure whether there is some mistake in my code or I am misusing the ".score()" method from the HMMLEARN package. Could someone help me fix this? Thank you in advance.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source