'Poor accuarcy score for Semi-Supervised Support Vector machine
I am using a Semi-Supervised approach for Support Vector Machine in Python for the image classification from PASCAL VOC 2007 data. I have tried with the default parameters from the libraries and also tuned them but it get extremely bad accuracy of about only ~ 2%.
Below is my code:
import pandas as pd
import numpy as np
from sklearn import decomposition
from sklearn.model_selection import train_test_split
from numpy import concatenate
import numpy as np
from sklearn import datasets
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import decomposition
import warnings
warnings.filterwarnings("ignore")
color_layout_features = pd.read_pickle("color_layout_descriptor.pkl")
bow_surf = pd.read_pickle("bow_surf.pkl")
color_hist_features = pd.read_pickle("hist.pkl")
labels = pd.read_pickle("labels.pkl")
# Feat. Scaling
def scale(X, x_min, x_max):
nom = (X-X.min(axis=0))*(x_max-x_min)
denom = X.max(axis=0) - X.min(axis=0)
denom[denom==0] = 1
return x_min + nom/denom
# normalization
def normalize(x):
return (x - np.min(x))/(np.max(x) - np.min(x))
color_layout_features_scaled = scale(color_layout_features, 0, 1)
color_hist_features_scaled = scale(color_hist_features, 0, 1)
bow_surf_scaled = scale(bow_surf, 0, 1)
features = np.hstack([color_layout_features_scaled, color_hist_features_scaled, bow_surf_scaled])
# define dataset
X, Y = features, labels
X = normalize(X)
pca = decomposition.PCA(n_components=100)
pca.fit(X)
X = pca.transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.30, random_state=1, stratify=Y)
# split train into labeled and unlabeled
X_train_lab, X_test_unlab, y_train_lab, y_test_unlab = train_test_split(X_train, y_train, test_size=0.30, random_state=1, stratify=y_train)
# create the training dataset input
X_train_mixed = concatenate((X_train_lab, X_test_unlab))
# create "no label" for unlabeled data
nolabel = [-1 for _ in range(len(y_test_unlab))]
# recombine training dataset labels
y_train_mixed = concatenate((y_train_lab, nolabel))
from semisupervised import S3VM
model = S3VM(kernel="Linear", C = 1e-2, gamma = 0.5, lamU = 1.0, probability=True)
#model.fit(X_train_mixed, _train_mixed)
model.fit(np.vstack((X_train_lab, X_test_unlab)), np.append(y_train_lab, nolabel))
#model.fit(np.vstack((label_X_train, unlabel_X_train)), np.append(label_y_train, unlabel_y))
# predict
predict = model.predict(X_test)
acc = metrics.accuracy_score(y_test, predict)
# metric
print("accuracy", acc*100)
accuracy 2.6692291266282298
I am using a Transductive version of SVM (TSVM) from the semisupervised
library. But not sure what am I doing wrong so that even after tweaking the parameters I still get the same result. Any inputs would be helpful.
I refer https://github.com/rosefun/SemiSupervised/blob/master/semisupervised/TSVM.py to make the implementation. Any inputs would be helpful.
Solution 1:[1]
according to link Documentation "The unlabeled samples should be labeled as -1" . are you consider this ?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ali karimi |