Jump to: Search | Global Menu | Main Menu | Log on | Page Content | Side Bar | Footer
ITnews Wednesday, 20 August, 2008. 5:57 pm. London time.  
       

 

ITnews Partners  Globe icon

 e-Skills logo. Opens web site in new window. SFIA Foundation logo. Opens web site in new window. FNX logo. Opens web site in new window. Skills for Business logo. Opens web site in new window. 

International Support for Character Sets

   

It is very important that the character encoding of any XML or (X)HTML document is clearly labelled. This can be done in the following ways:

Use the charset parameter in the Content-Type header of HTTP. Example:

Content-Type: text/html; charset=UTF-8 

For XML, use the encoding pseudo-attribute in the xml declaration at the start of a document or the text declaration at the start of an entity. Example:

<?xml version="1.0" encoding="UTF-8" ?> 

For HTML, use the <meta> tag inside <head>. Example:

<meta http-equiv="Content-Type" 
	content="text/html; charset=utf-8" >

For XHTML, you need a slash at the end. Example:

<meta http-equiv="Content-Type" 
	content="text/html; charset=utf-8" />

With this information, clients can easily map these encodings to Unicode. In practice, a few encodings will be preferred, most likely: ISO-8859-1 (Latin-1), US-ASCII , UTF-8 , UTF-16 , the other encodings in the ISO-8859 series, iso-2022-jp , euc-kr , and so on.

International Character Sets
Character Set Netscape Explorer
  Arabic
iso-8859-6 / asmo-708 no yes
x-mac-arabic no yes
windows-1256 no yes
  Baltic
iso-8859-4 no yes
windows-1257 no yes
  Central European
iso-8859-2 / latin2 yes yes
x-mac-ce yes yes
windows-1250 yes yes
  Chinese Simplified
euc-cn no yes
gb2312 yes yes
hz-gb-2312 yes yes
x-mac-chinesesimp no yes
cp-936 no no
  Chinese Traditional
big5 yes yes
x-mac-chinesetrad no yes
cp-950 no no
cp-932 no no
euc-tw yes no
  Cyrillic
iso-8859-5 yes yes
koi8-r yes yes
koi8 no yes
x-mac-cyrillic yes yes
windows-1251 yes yes
  Greek
iso-8859-7 yes yes
x-mac-greek yes yes
windows-1253 yes yes
  Hebrew
iso-8859-8* no yes
iso-8859-8-i no yes
x-mac-hebrew no yes
windows-1255 no yes
  Icelandic
x-mac-icelandic no yes
  Japanese
euc-jp yes yes
iso-2022-jp yes yes
x-mac-japanese no yes
shift_jis yes yes
  Korean
ks_c_5601-1987 / ksc5601 yes yes
euc-kr yes yes
iso-2022-kr yes yes
x-mac-korean no yes
  Latin 9
iso-8859-15 / latin9 no yes
  Maltese
iso-8859-3 / latin3 no yes
  Thai
windows-874 / tis-620 no yes
  Turkish
iso-8859-9 / latin5 yes yes
x-mac-turkish yes yes
windows-1254 no yes
  Unicode
utf-7 yes yes
utf-8 yes yes
iso-10646-ucs-2 yes yes
us-ascii yes yes
  Vietnamese
windows-1258 no yes
  Western European
iso-8859-1 / latin1 yes yes
x-mac-roman yes no
macintosh no yes
windows-1252 no yes
  Other
isiri-3342 no no
   

Site Themes

You can select from different site themes to suit your preference...

Print This Page

Click to print.

This page is already printer friendly...