python - sklearn: Naive Bayes classifier gives low accuracy -

January 15, 2012

i have dataset includes 200000 labelled training examples. each training example have 10 features, including both continuous , discrete. i'm trying use sklearn package of python in order train model , make predictions have troubles (and questions too).

first let me write code have written far:

from sklearn.naive_bayes import gaussiannb # data contains 200 000 examples # targets contain corresponding labels each training example gnb = gaussiannb() gnb.fit(data, targets) predicted = gnb.predict(data)

the problem low accuracy (too many misclassified labels) - around 20%. not quite sure whether there problem data (e.g. more data needed or else) or code.

is proper way implement naive bayes classifier given dataset both discrete , continuous features?

furthermore, in machine learning know dataset should split training , validation/testing sets. automatically performed sklearn or should fit model using training dataset , call predict using validation set?

any thoughts or suggestions appreciated.

the problem low accuracy (too many misclassified labels) - around 20%. not quite sure whether there problem data (e.g. more data needed or else) or code.

this not big error naive bayes, extremely simple classifier , should not expect strong, more data won't help. gaussian estimators good, naive assumptions problem. use stronger model. can start random forest since easy use non-experts in field.

is proper way implement naive bayes classifier given dataset both discrete , continuous features?

no, not, should use different distributions in discrete features, scikit-learn not support that, have manually. said before - change model.

furthermore, in machine learning know dataset should split training , validation/testing sets. automatically performed sklearn or should fit model using training dataset , call predict using validation set?

nothing done automatically in manner, need on own (scikit learn has lots of tools - see cross validation pacakges).

Search This Blog

CSS

python - sklearn: Naive Bayes classifier gives low accuracy -

Comments

Post a Comment

Popular posts from this blog

php - trouble displaying mysqli database results in correct order -

depending on nth recurrence of job in control M -

sql server - Cannot query correctly (MSSQL - PHP - JSON) -