r - How to remove rows matching criteria and rows adjacent to them -


i have following sample data:

data <- data.table(id = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),                   date = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6),                  score = c(4,3,2,2,4,1,5,5,5,2,1,4,2,1,5,5,5,3,5,5,5,2,4,5))     id date score  1:  1    1     4  2:  1    2     3  3:  1    3     2  4:  1    4     2  5:  1    5     4  6:  1    6     1  7:  2    1     5  8:  2    2     5  9:  2    3     5 10:  2    4     2 11:  2    5     1 12:  2    6     4 13:  3    1     2 14:  3    2     1 15:  3    3     5 16:  3    4     5 17:  3    5     5 18:  3    6     3 19:  4    1     5 20:  4    2     5 21:  4    3     5 22:  4    4     2 23:  4    5     4 24:  4    6     5     id date score 

i want eliminate rows, , change others, based in part on position in table. have 2 criteria, each id:

  1. if row has date == 1 , score == 5, want remove row , subsequent rows have score==5 follow after row, until score not 5. (so, example, i == 4, want keep data dates 4,5,6).

  2. for other dates score == 5, want replace score average of preceding 2 scores (or previous score, if have 1 prior score).

so, table want end is:

   id date score  1:  1    1   4.0  2:  1    2   3.0  3:  1    3   2.0  4:  1    4   2.0  5:  1    5   4.0  6:  1    6   1.0  7:  2    4   2.0  8:  2    5   1.0  9:  2    6   4.0 10:  3    1   2.0 11:  3    2   1.0 12:  3    3   1.5 13:  3    4   1.5 14:  3    5   1.5 15:  3    6   3.0 16:  4    4   2.0 17:  4    5   4.0 18:  4    6   3.0   

what best way go this? imagine it's combination of shift , .i haven't been able put together.

# find rows satisfying 1st condition torm = data[, if(score[1] == 5 & date[1] == 1) .i             , = .(id, rleid(score), cumsum(date == 1))]$v1  library(catools) # running mean  data[-torm    # remove rows    # add running mean    ][, mn := runmean(score, 2, endrule = 'keep', align = 'right'), = id    # compute new score - little care needed here in case have 5's in group    ][, new.score := ifelse(score == 5, mn[which(score != 5)[1]], score)      , = .(id, cumsum(score != 5))][] #    id date score  mn new.score # 1:  1    1     4 4.0       4.0 # 2:  1    2     3 3.5       3.0 # 3:  1    3     2 2.5       2.0 # 4:  1    4     2 2.0       2.0 # 5:  1    5     4 3.0       4.0 # 6:  1    6     1 2.5       1.0 # 7:  2    4     2 2.0       2.0 # 8:  2    5     1 1.5       1.0 # 9:  2    6     4 2.5       4.0 #10:  3    1     2 2.0       2.0 #11:  3    2     1 1.5       1.0 #12:  3    3     5 3.0       1.5 #13:  3    4     5 5.0       1.5 #14:  3    5     5 5.0       1.5 #15:  3    6     3 4.0       3.0 #16:  4    4     2 2.0       2.0 #17:  4    5     4 3.0       4.0 #18:  4    6     5 4.5       3.0 

Comments

Popular posts from this blog

sql server - Cannot query correctly (MSSQL - PHP - JSON) -

php - trouble displaying mysqli database results in correct order -

C++ Linked List -