regex - Sanitizing strings with filenames and extension in Java -


having 4 type of file names:

  1. filename double extension
  2. filename no extension
  3. filename dot @ end, , no extension
  4. filename proper name.

like this:

string doubleexsension = "doubleexsension.pdf.pdf"; string noextension = "noextension"; string namewithdot = "namewithdot."; string propername = "propername.pdf";  string extension = "pdf"; 

my aim sanitze types , output filename.filetype properly. made little stupid script in order make post:

arraylist<string> app = new arraylist<string>(); app.add(doubleexsension); app.add(propername); app.add(noextension); app.add(namewithdot);  system.out.println("------------");  for(string : app) {      // ends .     if (i.endswith(".")) {         string m = + extension;         system.out.println(m);         break;     }      // double extension     string p = i.replaceall("(\\.\\w+)\\1+$", "$1");     system.out.println(p); } 

this outputs:

------------ doubleexsension.pdf propername.pdf noextension namewithdot.pdf 

i dont know how can handle noextension one. how can it? when there's no extension, should take extension value , apped string @ end.

my desired output be:

------------ doubleexsension.pdf propername.pdf noextension.pdf namewithdot.pdf 

thanks in advance.

you may add alternatives regex match kinds of scenarios:

(?:(\.\w+)\1*|\.|([^.]))$ 

and replace $2.pdf. see regex demo.

edit: in case extensions can duplicated known, may use whitelisting approach via alternation group:

(?:(\.(?:pdf|gif|jpe?g))\1*|\.|([^.]))$ 

see regex demo.

details:

  • (?: - start of grouping, $ end of string anchor applied alternatives below (they must @ end of string)
    • (\.\w+)\1* - duplicated (or not) extensions (. + 1+ word chars repeated 0 or more times) (with whitelisting approach, indicated extensions taken account - (?:pdf|gif|jpe?g) match pdf, gif, jpeg, jpg, etc. if more alternatives added)
    • | - or
    • \. - dot
    • | - or
    • ([^.]) - char not dot captured group 2
  • ) - end of outer grouping
  • $ - end of string.

see java demo:

list<string> strs = arrays.aslist("doubleexsension.pdf.pdf","noextension","namewithdot.","propername.pdf"); (string str : strs)     system.out.println(str.replaceall("(?:(\\.\\w+)\\1*|\\.|([^.]))$", "$2.pdf")); 

Comments

Popular posts from this blog

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -