csv - Split columns that are seperated with tabs and spaces -
i have weird fileformat here, uses tabs , spaces in amount seperate fields (even trailing , leading ones). speciality is, fields can added spaces in them, escaped in csv manner.
one example:
0 "some string" 234 23947 123 ""some escaped"string"" i try parse such columns awk , need have every item in array, e.g.
foo[0] -> 0 foo[1] -> "some string" foo[2] -> 234 foo[3] -> 23947 foo[4] -> 123 foo[5] -> ""some escaped"string"" is possible? read http://web.archive.org/web/20120531065332/http://backreference.org/2010/04/17/csv-parsing-with-awk/ says parsing csv hard (for beginning should enough parse normal strings spaces, escaped variant rare)
before mess around long time: there way in awk or better use other language?
with gnu awk fpat:
$ cat tst.awk begin { fpat="\\s+|\"[^\"]+\"|,[^,]+," } { gsub(/@/,"@a") gsub(/,/,"@b") gsub(/""/,",") (i=1; i<=nf; i++) { gsub(/,/,"\"\"",$i) gsub(/@b/,",",$i) gsub(/@a/,"@",$i) print i, $i } } $ awk -f tst.awk file 1 0 2 "some string" 3 234 4 23947 5 123 6 ""some escaped"string"" to understand that's doing, see https://stackoverflow.com/a/40512703/1745001
Comments
Post a Comment