python - sklearn: Naive Bayes classifier gives low accuracy -


i have dataset includes 200000 labelled training examples. each training example have 10 features, including both continuous , discrete. i'm trying use sklearn package of python in order train model , make predictions have troubles (and questions too).

first let me write code have written far:

from sklearn.naive_bayes import gaussiannb # data contains 200 000 examples # targets contain corresponding labels each training example gnb = gaussiannb() gnb.fit(data, targets) predicted = gnb.predict(data) 

the problem low accuracy (too many misclassified labels) - around 20%. not quite sure whether there problem data (e.g. more data needed or else) or code.

is proper way implement naive bayes classifier given dataset both discrete , continuous features?

furthermore, in machine learning know dataset should split training , validation/testing sets. automatically performed sklearn or should fit model using training dataset , call predict using validation set?

any thoughts or suggestions appreciated.

the problem low accuracy (too many misclassified labels) - around 20%. not quite sure whether there problem data (e.g. more data needed or else) or code.

this not big error naive bayes, extremely simple classifier , should not expect strong, more data won't help. gaussian estimators good, naive assumptions problem. use stronger model. can start random forest since easy use non-experts in field.

is proper way implement naive bayes classifier given dataset both discrete , continuous features?

no, not, should use different distributions in discrete features, scikit-learn not support that, have manually. said before - change model.

furthermore, in machine learning know dataset should split training , validation/testing sets. automatically performed sklearn or should fit model using training dataset , call predict using validation set?

nothing done automatically in manner, need on own (scikit learn has lots of tools - see cross validation pacakges).


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -