unicode - What is alternative for MultibyteToWideChar and WideCharToMultiByte functions in .NET? -

February 15, 2012

i trying migrate code vc++ .net. vc++ code uses multibytetowidechar , widechartomultibyte functions provided winapi. tried using system.text.encoding class in .net not working encodings. there other way conversion? wrong in below code snippet?

here c# code:

public static string multibytetowidechar(string input, int codepage)     {         encoding e1 = encoding.getencoding(codepage);         encoding e2 = encoding.unicode;          //byte[] source = e1.getbytes(input);          byte[] source = mbcstobyte(input);          byte[] target = encoding.convert(e1, e2, source);          return e2.getstring(target);     } public static string widechartomultibyte(string input, int codepage)     {         encoding e1 = encoding.unicode;         encoding e2 = encoding.getencoding(codepage);          byte[] source = e1.getbytes(input);          byte[] target = encoding.convert(e1, e2, source);          return encoding.getencoding(codepage).getstring(target);      } private static byte[] mbcstobyte(string s)     {         byte[] b = new byte[s.length];         int = 0;         foreach (char c in s)             b[i++] = (byte)c;         return b;     }

multibytetowidechar working codepage 1255 , not 866

widechartomultibyte not working codepage 1251.

multibytetowidechar() converts encoded bytes (not characters!) unicode characters.

widechartomultibyte() converts unicode characters encoded bytes (not characters!).

in .net, string type sequence of unicode characters (in utf-16 byte encoding). using string hold encoded bytes plain wrong.

in multibytetowidechar() function, assuming input string contains unicode characters 16-bit representations of codepage-encoded 8-bit bytes. translating unicode characters as-is byte[] array, converting assumingly codepage-encoded array utf-16 byte[] array, , converting utf-16 string. work fine if , if initial assumption true begin with. not case, unless input corrupted begin with.

in widechartomultibyte() function, converting input string utf-16 byte[] array, converting array codepage-encoded byte[] array. far (though use encoding.getbytes() go utf-16 string directly codepage-encoded byte[] without using encoding.convert() @ all). using same codepage convert codepage-encoded byte[] array utf-16 string, un-doing had done. output string same value input string (provided specified codepage supports of unicode characters in input string, otherwise have data loss during first codepage conversion).

that being said, correct code should more instead:

public static string multibytetowidechar(byte[] input, int codepage)     {         return encoding.getencoding(codepage).getstring(input);     } public static byte[] widechartomultibyte(string input, int codepage)     {         return encoding.getencoding(codepage).getbytes(input);     }

don't use string hold encoded bytes, use actual byte[] array instead.

Search This Blog

CSS

unicode - What is alternative for MultibyteToWideChar and WideCharToMultiByte functions in .NET? -

Comments

Post a Comment

Popular posts from this blog

php - trouble displaying mysqli database results in correct order -

depending on nth recurrence of job in control M -

sql server - Cannot query correctly (MSSQL - PHP - JSON) -