Tuesday, July 26, 2011

Inserting Unicode characters

The other day I had to enter Unicode characters in an existing text file. I had to introduce word joiner characters in between existing characters to ensure they wouldn't get split as a result of line wrapping. This is not the sort of thing I have to do on a daily basis so I had to do a bit of digging online to find out how to achieve this (credit goes to this site I came across after some Google searching).

Inserting Unicode characters in Vim is quite straightforward: in insert mode, press ctrl+v, u, and then follow this with the necessary Unicode characters. For example, to enter a non-breaking space, you'd enter insert mode, press ctrl+v, type u, followed by 00A0.

Below is an example of doing just this. I've entered the name of one of my favorite novels in runic.

Runic letters entered using Unicode characters

Of course, this is sort of a contrived example because I can't imagine someone writing out a long string of characters this way; for that it would behoove one to use an IME instead. I'd imagine using this sort of Unicode input method more sparingly, such as the occasional non-breaking space, word joiner, or oh-so-cool math symbol (yes I'm talking about you, proper subset symbol, ⊊).

More generally, entering Unicode characters in Ubuntu requires the following sequence of keys: ctrl+shift+u, Unicode characters, and then Enter. Certain text editors may have specific requirements for Unicode entry: Vim was already mentioned above, but I believe that Emacs has yet a different way of Unicode entry. Also, different operating systems also have their own mechanism for Unicode entry: entering characters in Windows and Mac will likely be different to the sequence mentioned above.