'Upsampling using SMOTE in python

I am trying to use SMOTE in python to handle highly imbalanced data set. After splitting the data set into train and test I generate synthetic samples using SMOTE. Then I use xgboost algorithm on the SMOTE generated data. My model output is to predict the probability for the original dataset. But after implementing SMOTE the number of samples have been increased and how do I get back the original data set to predict the probabilities? Code as below:

X_train, X_test, y_train, y_test = train_test_split(X_final, Y_final, test_size=0.1, random_state = 27)  
sm = SMOTE(random_state=27, ratio=1.0)  
X_final_sm, Y_final_sm = sm.fit_sample(X_train, y_train)  
smote_xgb = XGBClassifier().fit(X_final_sm, Y_final_sm) 
smote_pred = smote_xgb.predict(X_final_sm)  
smote_pred_prob = smote_xgb.predict_proba(X_final_sm)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Upsampling using SMOTE in python

Sources

Related Questions