User:Kukkurovaca/Unicode Consortium clips

From Wikipedia, the free encyclopedia

Q: I cannot find on Unicode charts the "half forms" of Devanagari letters (or any other Indic script). These characters are needed to form words such as "patni".

A: Unicode does not encode half or subjoined letters for the scripts of India. Like in the ISCII standard, Unicode forms all "consonant clusters" (such as the "tn" in "patni") by inserting the character "virama" (or "halant") between the two relevant consonant letters. For instance, the Devanagari syllable "tna" ("") is encoded with the following code points:

U+0924 DEVANAGARI LETTER TA U+094D DEVANAGARI SIGN VIRAMA (= halant) U+0928 DEVANAGARI LETTER NA

These three characters will be normally displayed using the single glyph tna ligature "". But it is also possible that they are displayed using a half ta glyph followed by a full na glyph "", or even with a full ta glyph combined with a virama glyph and followed by a full na glyph "".

Which form will be actually displayed is the decision of an underlying software module called a "display engine", which bases this decision on the availability of glyphs in the font.

If the sequence U+0924, U+094D is not followed by another consonant letter (such as "na") it is always displayed as a full ta glyph combined with the virama glyph "".

Unicode provides a way to force the display engine to show a half letter form. To do this, an invisible character called ZERO WIDTH JOINER should be inserted after the virama:

U+0924 DEVANAGARI LETTER TA U+094D DEVANAGARI SIGN VIRAMA (= halant) U+200D ZERO WIDTH JOINER U+0928 DEVANAGARI LETTER NA

This sequence is always displayed as a half ta glyph followed by a full na glyph "". Even if the consonant "na" is not present, the sequence U+0924, U+094D, U+200D is displayed as a half ta glyph "".

Unicode also provides a way to force the display engine to show the virama glyph. To do this, an invisible character called ZERO WIDTH NON-JOINER should be inserted after the virama:

U+0924 DEVANAGARI LETTER TA U+094D DEVANAGARI SIGN VIRAMA (= halant) U+200C ZERO WIDTH NON-JOINER U+0928 DEVANAGARI LETTER NA

This sequence is always displayed as a full ta glyph combined with a virama glyph and followed by a full na glyph "

For more detailed information, see Chapter 9 of the Unicode Standard, "South and Southeast Asian Scripts". For related issues, see "Where is My Character?" [MC]