It is very important that the character encoding of any XML or (X)HTML document is clearly labelled. This can be done in the following ways:
Use the charset parameter in the Content-Type header of HTTP. Example:
Content-Type: text/html; charset=UTF-8
For XML, use the encoding pseudo-attribute in the xml declaration at the start of a document or the text declaration at the start of an entity. Example:
<?xml version="1.0" encoding="UTF-8" ?>
For HTML, use the <meta> tag inside <head>. Example:
With this information, clients can easily map these encodings to Unicode. In practice, a few encodings will be preferred, most likely: ISO-8859-1 (Latin-1), US-ASCII , UTF-8 , UTF-16 , the other encodings in the ISO-8859 series, iso-2022-jp , euc-kr , and so on.