Calculate Different Types of Spend - Pandas/Numpy - Python -
i have 2 dataframes :
df1 +------------+-------------+------+ | product id | cost method | rate | +------------+-------------+------+ | 10 | cpm | 10 | | 20 | cpc | 0.3 | | 30 | cpcv | 0.4 | | 40 | flf | 100 | | 50 | vad | 0 | | 60 | cpm | 0.1 | +------------+-------------+------+ df2 +--------+------------+-------------+--------+-----------------+ | date | product id | impressions | clicks | completed views | +--------+------------+-------------+--------+-----------------+ | 01-jan | 10 | 300 | 4 | 0 | | 02-jan | 20 | 30 | 3 | 0 | | 03-jan | 30 | 200 | 4 | 20 | | 02-jan | 40 | 300 | 4 | 0 | | 02-jan | 40 | 500 | 4 | 0 | | 03-jan | 40 | 200 | 3 | 0 | | 04-jan | 90 | 3000 | 3 | 0 | | 05-jan | 50 | 3000 | 5 | 0 | +--------+------------+-------------+--------+-----------------+
the ideal output this:
+--------+------------+-------------+--------+-----------------+--------+ | date | product id | impressions | clicks | completed views | spend | +--------+------------+-------------+--------+-----------------+--------+ | 01-jan | 10 | 300 | 4 | 0 | $3 | | 02-jan | 20 | 30 | 3 | 0 | $1 | | 03-jan | 30 | 200 | 4 | 20 | $8 | | 02-jan | 40 | 300 | 4 | 0 | $50 | | 02-jan | 40 | 500 | 4 | 0 | $50 | | 03-jan | 40 | 200 | 3 | 0 | $- | | 04-jan | 90 | 3000 | 3 | 0 | $- | | 05-jan | 50 | 3000 | 5 | 0 | $- | +--------+------------+-------------+--------+-----------------+--------+
where :
- product matched id in case id can't matched, product spend calculated @ 0
- where flf calculated sum of total impressions product per day, , if sums on minimum limit, e.g. 600 impressions, rate applied. if there 2 or more entries same day, rate divided equally count of times appears in same day
- where, if product vad, spend 0
- where cpc calculated rate times number of clicks
- where cpm calculated rate*(impression / 1000)
i'm going answer though shouldn't really. you're new on stack overflow (so), let educational post. rest assured tone of post isn't trying condescending or harsh.
first, ask proper question (read this please) need 2 things:
- explain have tried (provide code sample!) , explain problem is. question in current format doesn't comply. there's 5 or 6 different things in it, , feels you're asking homework.
- provide workable example.
for workable example, kind of did this, format choose annoying since 1 cannot directly use pd.read_clipboard()
load data. people here volunteering time, , if have spend 5 or 10 minutes recreating data won't it.
here's how have done it:
here first dataframe, use df1 = pd.read_clipboard(index_col=0)
load it:
productid costmethod rate 10 cpm 10.0 20 cpc 0.3 30 cpcv 0.4 40 flf 100.0 50 vad 0.0 60 cpm 0.1
here second dataframe, use df2 = pd.read_clipboard(index_col=0)
load it:
productid date impressions clicks completedviews 10 01-jan 300 4 0 20 02-jan 30 3 0 30 03-jan 200 4 20 40 02-jan 300 4 0 40 02-jan 500 4 0 40 03-jan 200 3 0 90 04-jan 3000 3 0 50 05-jan 3000 5 0
now, far doing homework, here's proposed solution. trust try understand code , not reuse it.
step 1: merge both dataframes
i'm merging left on df2, that's important. read more in pandas documentation on merging
df3 = df2.merge(df1, left_index=true, right_index=true, how='left') df3
step 2: calculate spend
we're going write custom function , dataframe.apply
def calc_spend(row): """ accepts row of dataframe (df3.apply(calc_spend, axis=1)), , computes spend according these rules: * if costmethod nan, 0 * flf calculated sum of total impressions product per day, , if sums on minimum limit, e.g. 600 impressions, rate applied. if there 2 or more entries same day, rate divided equally count of times appears in same day * where, if product vad, spend 0 * cpc calculated rate times number of clicks * cpm calculated rate*(impression / 1000) """ if row.costmethod == 'flf': # calc sum of total impressions product # i'm using boolean indexing select rows both productid , date # same current row filterdateproductid = (df3.date == row.date) & (df3.index == row.name) total_impressions = df3.ix[filterdateproductid, 'impressions'].sum() if total_impressions < 600: spend = total_impressions else: count = df3.ix[filterdateproductid].shape[0] rate = row.rate / count # if use python 2.7 make sure "from future import division" spend = rate * total_impressions / 1000.0 elif row.costmethod == 'vad': spend = 0 elif row.costmethod == 'cpc': spend = row.rate * row.clicks elif row.costmethod == 'cpm': spend = row.rate * row.impressions / 1000.0 else: # includes case costmethod na spend = 0 return spend
now can apply function itself:
df3['spend'] = df3.apply(calc_spend, axis=1) df3
you'll perhaps notice "spend" calculated isn't same yours, because initial specs on how calculate weren't great. easy change calc_spend
function match requirements.
Comments
Post a Comment