python - Creating a matrix CSV file with numpy -
relatively new python here.
so have csv file contents this:
dsa dds fsdf dasdsa 1 1 32.2 9 4 1 2 53.2 8 2 1 3 44.2 0 1 1 4 12.3 3 2 1 5 15.6 4 3 2 1 12.3 3 2 2 2 91.3 4 11 2 3 32.3 5 33 2 4 44.2 3 2 2 5 55.2 4 1 3 1 60.2 4 2 3 2 80.2 1 15 3 3 10.2 4 1 3 4 99.2 8 3 3 5 13.1 10 2 4 1 32.3 19 2 4 2 10.3 12 3 4 3 52.3 22 4 . . . . . . . . . .
i want output this:
1 2 3 4 . . . 1 32.2 53.2 44.2 12.3 . . 2 12.3 91.3 32.3 44.2 . . 3 60.2 80.2 10.2 99.2 . . 4 32.3 10.3 52.3 . . . . . . . . . . . . . . . . .
as can see, i'm using first 3 columns of csv file , skipped first row (rubbish data).
i'd use numpy this, thought code trick:
from scipy.sparse import coo_matrix import numpy np l, c, v = np.load('test.csv', skiprows=1, delimiter=',').t[:3,:] m = coo_matrix((v, (l-1, c-1)), shape=(l.max(), c.max())) print(m.toarray())
this works, first 2 columns in csv file excluded output. result turns out be:
[32.2 53.2 44.2 12.3 12.3 91.3 32.3 44.2 60.2 80.2 10.2 99.2 32.3 10.3 52.3 .]
any thoughts on how can generate matrix need (the output)? csv file huge (it's got around 10k rows , columns), need use first 3 columns.
thanks heaps!
import pandas pd data = pd.read_csv('data.txt', delim_whitespace=true) data2 = data['dds'].reshape(len(data['dds'])/5, 5) df = pd.dataframe(data2, columns=range(1, 6), index=range(1, data2.shape[0]+1)) print(df)
update:
without 'rubbish data':
import pandas pd names_ = range(1, 6) data = pd.read_csv('data.txt', delim_whitespace=true, names=names_) data2 = data[3].reshape(len(data[3])/5, 5) df = pd.dataframe(data2, columns=names_, index=range(1, data2.shape[0]+1)) print(df)
Comments
Post a Comment