machine learning - Is there a need to normalise input vector for prediction in SVM? -
for input data of different scale understand values used train classifier has normalized correct classification(svm).
so input vector prediction needs normalized?
the scenario have training data normalized , serialized , saved in database, when prediction has done serialized data deserialized normalized numpy array, , numpy array fit on classifier , input vector prediction applied prediction. input vector needs normalized? if how it, since @ time of prediction don't have actual input training data normalize?
also normalizing along axis=0 , i.e. along column.
my code normalizing :
preprocessing.normalize(data, norm='l2',axis=0)
is there way serialize preprocessing.normalize
in svms recommended scaler several reasons.
- it better have same scale in many optimization methods.
- many kernel functions use internally euclidean distance compare 2 different samples (in gaussian kernel euclidean distance in exponential term), if every feature has different scale, euclidean distance take account features highest scale.
when put features in same scale must remove mean , divide standard deviation.
xi - mi xi -> ------------ sigmai
you must storage mean , standard deviation of every feature in training set use same operations in future data.
in python have functions you:
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.standardscaler.html
to obtain means , standar deviations:
scaler = preprocessing.standardscaler().fit(x)
to normalize training set (x matrix every row data , every column feature):
x = scaler.transform(x)
after training, must normalize of future data before classification:
newdata = scaler.transform(newdata)
Comments
Post a Comment