About BhashaIndia | Contribute | SiteMap | Register | Sign in to Windows Live ID
  Developers Patrons
Hindi Tamil Kannada Gujarati Marathi Telugu Bengali Malayalam Punjabi Konkani Oriya Sanskrit Nepali
Home > Developers > KnowHow > UnicodeMaths > Alphanumeric Welcome Guest!

Alphanumeric characters

Basic Set of Alphanumeric Characters

  • Latin digits (0 - 9)
  • Upper- & lowercase Latin letters (a - z, A - Z)
  • Uppercase Greek letters Ω - O plus the nabla Ñ and the variant of theta T given by U+03F4
  • Lowercase Greek letters α - ω  plus the partial differential sign and glyph variants of ε, θ, κ, φ, ρ, and π
  • Only unaccented forms of letters are used

Mathematical notation uses a basic set of mathematical alphanumeric characters which consists of:

  • set of basic Latin digits (0 - 9) (U+0030 – U+0039)
  • set of basic upper- and lowercase Latin letters (a - z, A - Z)
  • uppercase Greek letters Ω - O (U+0391 – U+03A9), plus the nabla Ñ (U+2207) and the variant of theta Θ given by U+03F4
  • lowercase Greek letters α - ω (U+03B1 – U+03C9), plus the partial differential sign  (U+2202) and the six glyph variants of ε, θ, κ, φ, ρ, and π, given by U+03F5, U+03D1, U+03F0, U+03D5, U+03F1, and U+03D6.

Only unaccented forms of the letters are used for mathematical notation, because general accents such as the acute accent would interfere with common mathematical diacritics. Examples of common mathematical diacritics that can interfere with general accents are the circumflex, macron, or the single or double dot above, the latter two of which are used in physics to denote derivatives with respect to the time variable. Mathematical symbols with diacritics are always represented by combining character sequences, except as required by normalization.

In addition to this basic set, mathematical notation also uses the four Hebrew-derived characters (U+2135 – U+2138). Occasional uses of other alphabetic and numeric characters are known. Examples include U+0428 cyrillic capital letter sha, U+306E hiragana letter no, and Eastern Arabic-Indic digits (U+06F0 – U+06F9). However, these characters are used in only the basic form.

Math Alphanumeric Characters

  • Math needs various Latin and Greek alphabets like normal, bold, italic, script, Fraktur, and open-face
  • May appear to be font variations, but have distinct semantics
  • Without these distinctions, you get gibberish, violating Unicode rule: plain text must contain enough info to permit the text to be rendered legibly, and nothing more
  • Plain-text searches should distinguish between alphabets, e.g., search for script H shouldn't match H, etc.
  • Reduces markup verbosity

Mathematics has need for a number of Latin and Greek alphabets that on first thought appear to be just font variations of one another, e.g., normal, bold, italic and script H. However in any given document, these characters have distinct mathematical semantics. For example, a normal H represents a different variable from a bold H, etc. If one drops these distinctions in plain text, one gets gibberish. The next slide shows that instead of the well-known Hamiltonian formula H = òdt(eE²+mH²), you'd get the integral equation H =ò dt (eE²+mH²).

Accordingly, the STIX project requests adding normal, bold, italic, script, etc., Latin and Greek alphabets. Straight encoding leads to 996 characters. Some useful common information is lost, such as all variants of H might not be trivially recognizable as H's. But it does allow plain text to retain the proper character semantics and it allows simple (non-rich) search methods to work. For example when you want to search for a script upper-case H, you generally don't want to find any other kind of H.

Plain a-z, A-Z, 0-9, a-w, A-Ω
Bold a-z, A-Z, 0-9, a-w, A-Ω
Italic a-z, A-Z, a-w, A-Ω
Bold italic a-z, A-Z, a-w, A-Ω
Script a-z, A-Z
Bold script a-z, A-Z
Fraktur a-z, A-Z
Bold Fraktur a-z, A-Z
Double struck a-z, A-Z, 0-9
Sans-serif a-z, A-Z, 0-9
Sans-serif bold a-z, A-Z, 0-9, a-w, A-Ω
Sans-serif italic a-z, A-Z
Sans-serif bold italic a-z, A-Z, a-w, A-Ω
Monospace a-z, A-Z, 0-9

All these alphanumeric characters are in Plane 1 except for the plain characters and the italic, script, calligraphic and Fraktur characters contained in the Unicode Letterlike character block. Note that which fonts are used for these characters is beyond the scope of plain-text. 

The upper- and lower case Greek letters represented by A-Ω and a -  are defined on an earlier side

Note that accented characters are achieved in mathematical text via combining-mark sequences. There are many more accented characters in mathematics than in Latin and Greek text and these sequences give a uniform display of such characters.

Partner Profile | Privacy Statement | Why Passport | Testimonials
This site uses Unicode for non-English characters and uses Open Type fonts.
©2003-2007 Microsoft Corporation. All rights reserved.