regex - Sanitizing strings with filenames and extension in Java -
having 4 type of file names:
- filename double extension
- filename no extension
- filename dot @ end, , no extension
- filename proper name.
like this:
string doubleexsension = "doubleexsension.pdf.pdf"; string noextension = "noextension"; string namewithdot = "namewithdot."; string propername = "propername.pdf"; string extension = "pdf";
my aim sanitze types , output filename.filetype
properly. made little stupid script in order make post:
arraylist<string> app = new arraylist<string>(); app.add(doubleexsension); app.add(propername); app.add(noextension); app.add(namewithdot); system.out.println("------------"); for(string : app) { // ends . if (i.endswith(".")) { string m = + extension; system.out.println(m); break; } // double extension string p = i.replaceall("(\\.\\w+)\\1+$", "$1"); system.out.println(p); }
this outputs:
------------ doubleexsension.pdf propername.pdf noextension namewithdot.pdf
i dont know how can handle noextension
one. how can it? when there's no extension, should take extension
value , apped string @ end.
my desired output be:
------------ doubleexsension.pdf propername.pdf noextension.pdf namewithdot.pdf
thanks in advance.
you may add alternatives regex match kinds of scenarios:
(?:(\.\w+)\1*|\.|([^.]))$
and replace $2.pdf
. see regex demo.
edit: in case extensions can duplicated known, may use whitelisting approach via alternation group:
(?:(\.(?:pdf|gif|jpe?g))\1*|\.|([^.]))$
see regex demo.
details:
(?:
- start of grouping,$
end of string anchor applied alternatives below (they must @ end of string)(\.\w+)\1*
- duplicated (or not) extensions (.
+ 1+ word chars repeated 0 or more times) (with whitelisting approach, indicated extensions taken account -(?:pdf|gif|jpe?g)
matchpdf
,gif
, jpeg, jpg
, etc. if more alternatives added)|
- or\.
- dot|
- or([^.])
- char not dot captured group 2
)
- end of outer grouping$
- end of string.
see java demo:
list<string> strs = arrays.aslist("doubleexsension.pdf.pdf","noextension","namewithdot.","propername.pdf"); (string str : strs) system.out.println(str.replaceall("(?:(\\.\\w+)\\1*|\\.|([^.]))$", "$2.pdf"));
Comments
Post a Comment