I cannot believe what I’m seeing. UTF-8 implementations failed miserably.

There’s a character in Turkish alphabet, it is capital “i”, just like “I” but with a dot on top of it. Why am I describing it instead of just showing it ? Well, WordPress also seems to be unable to handle it. From now on, [ci] refers to this special character in this post.

Chrome bug

toLowerCase() of Chrome 5.0.375.99 is buggy. If a string contains [ci], the resulting string has artifacts. For instance assume you have the original string “[ci]ello”, which is 5 characters long. When you feed it to toLowerCase() the resulting string will be 6 characters long with a “garbage” data after the converted [ci]. The results is something like i\xxxello. There’s a garbage after “i” which you can see via charCodeAt().

Firefox is plagued too

Then I tried in-case-sensitive RegExp matching. Trying to match “[ci]ello” with “/iello/i” failed both on Chrome 5.0.375.99 and Firefox 3.6.8.

Internet Explorer 8 kicked ass or what ?

Even though I, very objectively, hate IE series too I should note that IE8′s toLowerCase and RegExp matching works perfectly. In your face, all Microsoft haters :)

What about toUpperCase() ?

On all browsers toUpperCase converts “i” to “I” and not to “[ci]” and they should, since the context (language) is not known and English is naturally assumed to be the one in use, converting “i” to “I” is perfectly acceptable. Though it doesn’t make it right. In Turkish toUpperCase(“i”) should return “[ci]“. Which makes Javascript look inadequate at handling internationalization.