r - Extract first and last values among a number of columns in data frame -


i have observed subjects a-d during 2-4 years , each year reported numerical value. want extract first , last value of each subject, ignoring nas. how create new variables first_value , last_value? in example, have included desired result:

df <- data.frame(subject = c("a","b","c","d"),                   year1 = c(1, 2, na, na),                  year2 = c(3, 4, na, 5),                  year3 = c(6, 7, 8, na),                  year4 = c(9, 10, na, 11),                  first_value <- c(1, 2, 8, 5),                  last_value <- c(9, 10, 8, 11)) 

and solution if variables year1-year4 categorical?

using data.table package:

library(data.table) setdt(df)[, `:=` (first_value = na.omit(unlist(.sd))[1],                    last_value = tail(na.omit(unlist(.sd)),1)),            = subject][] 

which gives:

   subject year1 year2 year3 year4 first_value last_value 1:           1     3     6     9           1          9 2:       b     2     4     7    10           2         10 3:       c    na    na     8    na           8          8 4:       d    na     5    na    11           5         11 

following suggestion of @alexis_laz, can use max.col follows repective values:

f <- max.col(!is.na(df[c("year1", "year2", "year3", "year4")]), 'first') l <- max.col(!is.na(df[c("year1", "year2", "year3", "year4")]), 'last')  df$first_value <- sapply(seq_along(f), function(i) df[,-1][i,f[i]]) df$last_value <- sapply(seq_along(l), function(i) df[,-1][i,l[i]]) 

which same result. suggested @alexis_laz in comments, can further improved to:

m <- as.matrix(df[c("year1", "year2", "year3", "year4")])  f <- max.col(!is.na(m), 'first') l <- max.col(!is.na(m), 'last')  df$first_value <- df[-1][cbind(1:nrow(df), f)] df$last_value <- df[-1][cbind(1:nrow(df), l)] 

and using dplyr , tidyr packages:

library(dplyr) library(tidyr)  df %>%    gather(year, val, 2:5) %>%    filter(!is.na(val)) %>%    group_by(subject) %>%    summarise(first_value = first(val),             last_value = last(val)) %>%    left_join(df, ., = 'subject') 

warning: variation of without using filter , using na.omit(val) (or val[!is.na(val)]) in summarise:

df %>%    gather(year, val, 2:5) %>%    group_by(subject) %>%    summarise(first_value = first(na.omit(val)),             last_value = last(na.omit(val))) %>%    left_join(df, ., = 'subject') 

won't work result of bugs reported here , here.


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -