r - Extract 2 parts of a string -
assume have following string (filename):
a <- "x/zheb100/tkn_var29380_timely_p1.txt" which consists of several parts (here given p1)
or one
b <- "x/zheb100/zhn_var29380_timely.txt" which consists of 1 part (so no need label p)
how can extract identifier, 3 letters before varxxxxx (so in case 1 tkn, in case 2 zhn) plus part identifier, if available?
so result should be:
case1 : tkn_p1 case2 : zhn i know how extract first identifier, cannot handle second 1 @ same time.
my approach far:
sub(".*(.{3})_var29380_timely(.{3}).*","\\1\\2", a) sub(".*(.{3})_var29380_timely(.{3}).*","\\1\\2", b) but adds .tx incorrectly in second case.
you not using anchors , matching last 3 characters right after timely without checking these characters (. matches character).
i suggest
sub("^.*/([a-z]{3})_var\\d+_timely(_[^_.]+)?\\.[^.]*$", "\\1\\2", a) details:
^- start of string.*/- part of string , including last/([a-z]{3})- 3 ascii uppercase letters captured group 1_var\\d+_timely-_var+ 1 or more digits +_timely(_[^_.]+)?- optional group 2 capturing_+ 1 or more chars other_,.\\.- dot[^.]*- 0 or more chars other.$- end of string.
replacement pattern contains 2 backreferences both capturing groups insert contents replaced string.
a <- "x/zheb100/tkn_var29380_timely_p1.txt" a2 <- sub("^.*/([a-z]{3})_var\\d+_timely(_[^_.]+)?\\.[^.]*$", "\\1\\2", a) a2 [1] "tkn_p1" b <- "x/zheb100/zhn_var29380_timely.txt" b2 <- sub("^.*/([a-z]{3})_var\\d+_timely(_[^_.]+)?\\.[^.]*$", "\\1\\2", b) b2 [1] "zhn"
Comments
Post a Comment