scikit learn - Python Data Prediction -
i have data intel processors presenting in different types of graphics.
and want make regression show's exponential function can "predict" transistor count after max year have in data (max: 2014). eg: 2019, 2021, 2030..
import pandas pd import numpy np import matplotlib.pyplot plt data = pd.read_csv('data.csv', delimiter=',') x = np.array(data['year'], dtype=int) y = np.array(data['count'], dtype=int) print(x) print(y) plt.plot(x, y, 'bo') plt.xlabel('year') plt.ylabel('transistor count') plt.yscale('log') plt.grid(true) plt.show() this code shows:
transistor count (image)
what have tried use scikit-learning create regression, end wrong settings exponential function. , have tried online regression tools, ain't accurate.
i have looked @ sklearn cheat sheet find currect classifier use. not well-know sklearn yet. have tried myself 2 days lot of searching on both google , stackoverflow, haven't found out works data.
[1971 1972 1974 1976 1978 1982 1985 1989 1993 1995 1997 1999 2000 2002 2006 2008 2010 2011 2012 2013 2014 2014] [ 2300 3500 4400 6500 29000 134000 275000 1180235 3100000 5500000 7500000 9500000 42000000 220000000 291000000 731000000 1170000000 2270000000 3100000000 1860000000 4310000000 5560000000] this data inside 2 arrays.
import pandas pd import numpy np import matplotlib.pyplot plt sklearn.svm import svc data = pd.read_csv('data.csv', delimiter=',') x = np.array(data['year'], dtype=int) y = np.array(data['count'], dtype=int) x = np.reshape(x, (x.size, 1)) clf = svc() clf.fit(x, y) in range(1971, 2030, 1): print(i,':', clf.predict([[i]])) and code, program predict data after 2014 same value 2014. ( (2015-2030).value === 2014.value )
i not sure if there settings can add classifier, or if don't have enough knowledge ml this.
in general, can perform extrapolation should wary of results. wouldn't trust them blindly. example, if had 3 years of data, put lot of credence in extrapolation of 1000 years? it's possible.
but why use classifier if you're performing regression? try using linear regression instead since there's near linear fit. model log(y) = + bx.
import pandas pd import numpy np import matplotlib.pyplot plt sklearn.linear_model import linearregression # linear regression x = np.array(data['year'], dtype=int) y = np.array(data['count'], dtype=int) x = np.reshape(x, (x.size, 1)) clf = linearregression(fit_intercept=false) clf.fit(x, np.log(y)) # fit log of y pred= [] in range(1971, 2030, 1): pred.append(clf.predict([[i]])) plt.plot(range(1971, 2030, 1), pred, 'bo') plt.xlabel('year') plt.ylabel('transistor count') plt.grid(true) plt.show() 
Comments
Post a Comment