Encoding error in sed -
i've been trying remove duplicate character sequences in social media text using following code:
sed 's/\([a-za-z]\)\1\1\1*/\1\1\1/g'
the code works fine on regular ascii lines breaks on non-ascii text error sed: re error: illegal byte sequence
. example:
you 💩
for it's worth, i'm running mac osx. need reset encoding variable?
Comments
Post a Comment