Calculate Different Types of Spend - Pandas/Numpy - Python -


i have 2 dataframes :

df1 +------------+-------------+------+ | product id | cost method | rate | +------------+-------------+------+ |         10 | cpm         | 10   | |         20 | cpc         | 0.3  | |         30 | cpcv        | 0.4  | |         40 | flf         | 100  | |         50 | vad         | 0    | |         60 | cpm         | 0.1  | +------------+-------------+------+  df2 +--------+------------+-------------+--------+-----------------+ |  date  | product id | impressions | clicks | completed views | +--------+------------+-------------+--------+-----------------+ | 01-jan |         10 |         300 |      4 |               0 | | 02-jan |         20 |          30 |      3 |               0 | | 03-jan |         30 |         200 |      4 |              20 | | 02-jan |         40 |         300 |      4 |               0 | | 02-jan |         40 |         500 |      4 |               0 | | 03-jan |         40 |         200 |      3 |               0 | | 04-jan |         90 |        3000 |      3 |               0 | | 05-jan |         50 |        3000 |      5 |               0 | +--------+------------+-------------+--------+-----------------+ 

the ideal output this:

 +--------+------------+-------------+--------+-----------------+--------+ |  date  | product id | impressions | clicks | completed views | spend  | +--------+------------+-------------+--------+-----------------+--------+ | 01-jan |         10 |         300 |      4 |               0 |  $3    | | 02-jan |         20 |          30 |      3 |               0 |  $1    | | 03-jan |         30 |         200 |      4 |              20 |  $8    | | 02-jan |         40 |         300 |      4 |               0 |  $50   | | 02-jan |         40 |         500 |      4 |               0 |  $50   | | 03-jan |         40 |         200 |      3 |               0 |  $-    | | 04-jan |         90 |        3000 |      3 |               0 |  $-    | | 05-jan |         50 |        3000 |      5 |               0 |  $-    | +--------+------------+-------------+--------+-----------------+--------+ 

where :

  1. product matched id in case id can't matched, product spend calculated @ 0
  2. where flf calculated sum of total impressions product per day, , if sums on minimum limit, e.g. 600 impressions, rate applied. if there 2 or more entries same day, rate divided equally count of times appears in same day
  3. where, if product vad, spend 0
  4. where cpc calculated rate times number of clicks
  5. where cpm calculated rate*(impression / 1000)

i'm going answer though shouldn't really. you're new on stack overflow (so), let educational post. rest assured tone of post isn't trying condescending or harsh.


first, ask proper question (read this please) need 2 things:

  • explain have tried (provide code sample!) , explain problem is. question in current format doesn't comply. there's 5 or 6 different things in it, , feels you're asking homework.
  • provide workable example.

for workable example, kind of did this, format choose annoying since 1 cannot directly use pd.read_clipboard() load data. people here volunteering time, , if have spend 5 or 10 minutes recreating data won't it.

here's how have done it:

here first dataframe, use df1 = pd.read_clipboard(index_col=0) load it:

productid      costmethod   rate  10               cpm   10.0 20               cpc    0.3 30              cpcv    0.4 40               flf  100.0 50               vad    0.0 60               cpm    0.1 

here second dataframe, use df2 = pd.read_clipboard(index_col=0) load it:

productid  date  impressions  clicks  completedviews 10         01-jan          300       4               0 20         02-jan           30       3               0 30         03-jan          200       4              20 40         02-jan          300       4               0 40         02-jan          500       4               0 40         03-jan          200       3               0 90         04-jan         3000       3               0 50         05-jan         3000       5               0 

now, far doing homework, here's proposed solution. trust try understand code , not reuse it.

step 1: merge both dataframes

i'm merging left on df2, that's important. read more in pandas documentation on merging

df3 = df2.merge(df1, left_index=true, right_index=true, how='left') df3 

merged df3

step 2: calculate spend

we're going write custom function , dataframe.apply

def calc_spend(row):     """     accepts row of dataframe (df3.apply(calc_spend, axis=1)),     , computes spend according these rules:     * if costmethod nan, 0     * flf calculated sum of total impressions product per day,          , if sums on minimum limit,          e.g. 600 impressions, rate applied.          if there 2 or more entries same day,          rate divided equally count of times appears in same day     * where, if product vad, spend 0     * cpc calculated rate times number of clicks     * cpm calculated rate*(impression / 1000)     """      if row.costmethod == 'flf':         # calc sum of total impressions product         # i'm using boolean indexing select rows both productid , date         # same current row         filterdateproductid = (df3.date == row.date) & (df3.index == row.name)         total_impressions = df3.ix[filterdateproductid, 'impressions'].sum()         if total_impressions < 600:             spend = total_impressions         else:             count = df3.ix[filterdateproductid].shape[0]             rate = row.rate / count # if use python 2.7 make sure "from future import division"             spend = rate * total_impressions / 1000.0      elif row.costmethod == 'vad':         spend = 0      elif row.costmethod == 'cpc':         spend = row.rate * row.clicks      elif row.costmethod == 'cpm':         spend = row.rate * row.impressions / 1000.0      else: # includes case costmethod na         spend = 0      return spend 

now can apply function itself:

df3['spend'] = df3.apply(calc_spend, axis=1) df3 

final result

you'll perhaps notice "spend" calculated isn't same yours, because initial specs on how calculate weren't great. easy change calc_spend function match requirements.


Comments

Popular posts from this blog

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -