Copyright 2001-2002
Printer Working Group, All
Rights Reserved.
The Character Repertoires activity within the PWG seeks to agree on a set of repertoires to be used in a wide range of printing products.
The current document summarizes existing work in this area.
This document is informative only. It has not been reviewed by PWG Members nor approved. It is not a stable document and may not be cited as a normative reference from another document.
Public discussion of Character Repertoires takes place on the mailing list: cr@pwg.org (archive). To subscribe send an email to majordomo@pwg.org with the words subscribe cr in the body. You must be subscribed to the mailing list to post there. Please report errors in this document to one of the editors listed above or on the mailing list.
A list of current PWG Standards and other technical documents can be found at http://www.pwg.org/standards.html.
TBD...
Section 12.2.3 of [BPP] summarizes the Character Repertoires Supported field, with values taken from this table:
Bit Number | Character Repertoire | Description |
Bit0 | ISO-8859-1 | Latin alphabet No. 1 |
Bit1 | ISO-8859-2 | Latin alphabet No. 2 |
Bit2 | ISO-8859-3 | Latin alphabet No. 3 |
Bit3 | ISO-8859-4 | Latin alphabet No. 4 |
Bit4 | ISO-8859-5 | Latin/Cyrillic alphabet |
Bit5 | ISO-8859-6 | Latin/Arabic alphabet |
Bit6 | ISO-8859-7 | Latin/Greek alphabet |
Bit7 | ISO-8859-8 | Latin/Hebrew alphabet |
Bit8 | ISO-8859-9 | Latin alphabet No. 5 |
Bit9 | ISO-8859-10 | Latin alphabet No. 6 |
Bit10 | ISO-8859-13 | Latin alphabet No. 7 |
Bit11 | ISO-8859-14 | Latin alphabet No. 8 |
Bit12 | ISO-8859-15 | Latin alphabet No. 9 |
Bit13 | GB_2312-80 | Chinese (People’s Republic of China) |
Bit14 | Shift_JIS | Japanese |
Bit15 | KS_C_5601-1987 | Korean |
Bit16 | Big5 | Chinese (Taiwan) |
Bit17 | TIS-620 | Thai |
Generally, these repertoires were not defined originally in Unicode. Therefore we use various Unicode documents that map these character sets into Unicode, thus providing the list of Unicode values needed to support that repertoire.
[ISO-8859] defines various Latin-based alphabets (each up to 256 characters in size), while [Unicode-8859] is a set of mappings from ISO codes to Unicodes.
As part of their OpenType specification, Microsoft defines the WGL4.0 character set, which is expressed in terms of Unicode. It has 652 characters, containing many of the characters from the ISO Latin sets, as well as quite a few symbols. Any MS client is likely to assume these characters are available.
[XHTML-Chars] defines a number of pre-defined character entities, in these groups:
For a total of 254 entries.
You can compare the ISO-8859, Microsoft, and XHTML repertoires side by side here.
These are the relevant fields in the [Unihan] database:
For Thai, use 8859-11, which is equivalent to TIS 620-2533 (1990) with the addition of 0xA0 NO-BREAK SPACE.