Friday, April 16, 2010

Joel on Software and UNICODE and HTML charset

I wanted to post an note on the joelonsoftware.com page on UNICODE basics.

That page has a link to i18nguy.com which has useful pages linked.

My own habit is to use a variety of pages at fileformat.info such as the P-page and the S-page when I want a Pilcrow sign ( ¶ ) or the planet Saturn symbol or some such.

Some of my favourites are balanced quotation marks ( “ ” ) and a variety of foreign quotation marks ( »« ).

The reason for the quotation marks is so that when doing poetry markup in Curl I will not have to "escape" the keyboard dbl-quote character ( " ) which is used by the progamming language but which I don't want to use in text markup.

I seem to use the paired single-quotes ( ‘ ’ ) less often.

Here are German-style quotes: „Sein“ (will be better in a suitable font.)

Other favourites of mine are  § Ç ß ä ö ü à é and ê ë è ï î ö ô along with Ä Ö Ü.  The last characters should have been capital A O U with umlaut diacritics: if you see something else, you may have to change a browser VIEW setting for Encoding for your web browser.

If you are a Windows user who wants to copy and save these to a notepad cheatsheet, remember to save the file as UTF-8 on the SAVE AS menu.  When you need an  é  in a hurry, holding down the ALT-key and typing 1 3 0 on the keypad and then releasing the ALT-key will place U-00E9 into most text.

IF you are a Windows user with an editor that which will not use displauy UNICODE with the default Arial Unicode MS font, well, there is always Notepad++ ... and it will allow you to save the file with or without the initial BOM bytes that indicate the encoding (which default notepad will not suppress.)

The minimal HTML header follows:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
For more details, here is a char-encoding link.

Note:  the above links have been set to open in a tab or window other than this blog page.

No comments:

Post a Comment