python - pandas - binning data and getting 2 columns -
i have simple dataframe. there 2 columns, day_created (int, change datetime) , suspended (int, change boolean). can change data if makes easier work with.
day created suspended 0 12 0 1 6 1 2 24 0 3 8 0 4 100 1 5 30 0 6 1 1 7 6 0
the day_created column integer of day account created (from start date), starting @ 1 , increasing. suspended column 1 suspension , 0 no suspension.
what bin these accounts groups of 30 days or months, each bin total number of accounts month , number of accounts suspended created in month. plan on creating bar graph 2 bars each month.
how should go this? don't use pandas often. assume need tricks resample , count.
use
df.index = start_date + pd.to_timedelta(df['day created'], unit='d')
to give dataframe index of timestamps representing when accounts created.
then can use
result = df.groupby(pd.timegrouper(freq='m')).agg(['count', 'sum'])
to group rows of dataframe (by months) according timestamps in index. .agg(['count', 'sum'])
computes number of accounts (the count) , number of suspended accounts each group.
then result.plot(kind='bar', ax=ax)
plots bar graph:
import pandas pd import matplotlib.pyplot plt df = pd.dataframe( {'day created': [12, 6, 24, 8, 100, 30, 1, 6], 'suspended': [0, 1, 0, 0, 1, 0, 1, 0]}) start_date = pd.timestamp('2016-01-01') df.index = start_date + pd.to_timedelta(df['day created'], unit='d') result = df.groupby(pd.timegrouper(freq='m'))['suspended'].agg(['count', 'sum']) result = result.rename(columns={'sum':'suspended'}) fig, ax = plt.subplots() result.plot(kind='bar', ax=ax) locs, labels = plt.xticks() plt.xticks(locs, result.index.strftime('%y-%m-%d')) fig.autofmt_xdate() plt.show()
Comments
Post a Comment