Pages

September 7, 2006

Unicode-y-ness UTF-16 Nonsense

UTF-16 is an abomination. It is a lot of wasted code and space, doesn't really solve any real world issues and is just generally bad (not to mention responsible for probably billions of dollars in wasted code fitting and retrofitting). Everyone should just stick to UTF-8 . Strings are not randomly accessible data structures (unless you want to REALLY waste memory with UTF-32) - get over it. Adjust your coding practices appropriately, and for god's sakes don't convert everything to "mostly" two-byte character strings.

(and yes, it is a bit more space for far east languages, but is still just better - for example, do you know what endian UTF-16 you're using?)

p.s. this only applies if you're writing code, not script and its high-level equivalents.

2 comments:

Anonymous said...

Sree, you da BOM!

http://www.unicode.org/notes/tn12/

Sree Kotay said...

Yeah - legacy apps should really never go UTF-16 (IMHO, natch :))