Encoding error in sed -


i've been trying remove duplicate character sequences in social media text using following code:

sed 's/\([a-za-z]\)\1\1\1*/\1\1\1/g' 

the code works fine on regular ascii lines breaks on non-ascii text error sed: re error: illegal byte sequence. example:

you 💩 

for it's worth, i'm running mac osx. need reset encoding variable?


Comments

Popular posts from this blog

php - trouble displaying mysqli database results in correct order -

depending on nth recurrence of job in control M -

sql server - Cannot query correctly (MSSQL - PHP - JSON) -