About BhashaIndia | Contribute | SiteMap | Register | Sign in to Windows Live ID
  Developers Patrons
Hindi Tamil Kannada Gujarati Marathi Telugu Bengali Malayalam Punjabi Konkani Oriya Sanskrit Nepali
Home > Developers > KnowHow > CollationIntro > concepts of collation Welcome Guest!

Basic concepts of collation

By Cathy Wissink & Michael S.Kaplan - Windows Globalization, Microsoft Corporation

Sometimes called sorting, ordering or even 'alphabetizing', collation is the culturally expected order of linguistic characters in a particular language. In other words, speakers of a particular language have certain expectations of where to find strings, relative to other strings, when in a collated (or sorted) list. Using an easy example here, in English, a speaker expects a word starting with Q to sort after all words beginning with P and before all words starting with R.

Outside of technology, people use linguistic sorting everyday:

  • Searching for the name of an individual in a telephone book;
  • Looking for a subject in a book index;
  • Searching for a word in a dictionary;
  • Using the (still frequently-used) library card catalog;

Basically, anytime a user orders data or searches for data in a logical fashion within any kind of structure, collation is being used. If the set of data is sorted correctly for the language, the user will find the needed data quickly and efficiently. Conversely, if the list is sorted in some fashion that is not expected by the user, it will take more time and effort to find items in the list.

An interesting aspect of linguistic sorting is the speaker's (generally) subconscious knowledge of it. Native speakers have clear expectations of where to find data in a collated list; that is, they can easily identify a sorted list as correct or incorrect for their culture. In addition, users can generally describe the simple qualities of their collation; for example, most users can easily describe how the basic linguistic characters (e.g., "letters") of their language sort. However, when it comes to more complicated phenomena, such as "accented" characters (diacritics, matras), interaction with punctuation (e.g., hyphens), or compressions (e.g., the Spanish CH), it is often harder for users to explain how a sort works.

Partner Profile | Privacy Statement | Why Passport | Testimonials
This site uses Unicode for non-English characters and uses Open Type fonts.
©2003-2007 Microsoft Corporation. All rights reserved.