r - Extract 2 parts of a string -
assume have following string (filename):
a <- "x/zheb100/tkn_var29380_timely_p1.txt"
which consists of several parts (here given p1)
or one
b <- "x/zheb100/zhn_var29380_timely.txt"
which consists of 1 part (so no need label p)
how can extract identifier, 3 letters before varxxxxx
(so in case 1 tkn
, in case 2 zhn
) plus part identifier, if available?
so result should be:
case1 : tkn_p1 case2 : zhn
i know how extract first identifier, cannot handle second 1 @ same time.
my approach far:
sub(".*(.{3})_var29380_timely(.{3}).*","\\1\\2", a) sub(".*(.{3})_var29380_timely(.{3}).*","\\1\\2", b)
but adds .tx
incorrectly in second case.
you not using anchors , matching last 3 characters right after timely
without checking these characters (.
matches character).
i suggest
sub("^.*/([a-z]{3})_var\\d+_timely(_[^_.]+)?\\.[^.]*$", "\\1\\2", a)
details:
^
- start of string.*/
- part of string , including last/
([a-z]{3})
- 3 ascii uppercase letters captured group 1_var\\d+_timely
-_var
+ 1 or more digits +_timely
(_[^_.]+)?
- optional group 2 capturing_
+ 1 or more chars other_
,.
\\.
- dot[^.]*
- 0 or more chars other.
$
- end of string.
replacement pattern contains 2 backreferences both capturing groups insert contents replaced string.
a <- "x/zheb100/tkn_var29380_timely_p1.txt" a2 <- sub("^.*/([a-z]{3})_var\\d+_timely(_[^_.]+)?\\.[^.]*$", "\\1\\2", a) a2 [1] "tkn_p1" b <- "x/zheb100/zhn_var29380_timely.txt" b2 <- sub("^.*/([a-z]{3})_var\\d+_timely(_[^_.]+)?\\.[^.]*$", "\\1\\2", b) b2 [1] "zhn"
Comments
Post a Comment