About BhashaIndia | Contribute | SiteMap | Register | Sign in to Windows Live ID
  Patrons Developers
Hindi Tamil Kannada Gujarati Marathi Telugu Bengali Malayalam Punjabi Konkani Oriya Sanskrit Nepali
Home > Patrons > SuccessStories > BobEaton Welcome Guest!

Meet Mr. Bob Eaton
Bob Eaton: Bridging Language and Language Computing
Mr. Bob EatonA Graduate in Electrical Engineering with a Master’s Degree in Computer Science, Bob Eaton began his career as an embedded systems developer. A chance visit to India enamored and planted the seed for a sustained interest in the field of linguistics and its merger with Computing. With the advent of Unicode, he immersed himself in converting programs to support Unicode. Currently pursuing his Ph.D. in Linguistics from the University of Texas, Arlington with Kangri as his research subject, Bob has been actively involved with Indic Computing. He has been instrumental in developing SIL Converters which help users convert characters in legacy-encoded fonts to Unicode.
Tell us something about your formative years. Where were you brought up and which university did you graduate from?
B: Like many Americans, my family moved a lot when I was growing up to follow the job market. I was born in Evanston, Illinois, and before going to college, I lived in Chicago, Minnesota, and finally Ohio.
I attended Cleveland State University in Ohio and graduated in 1988 with a degree in Electrical Engineering. I visited India, the first time in 1987, as part of a cultural training program between my junior and senior year.
What made you take up Electrical Engineering and choose a career as a Software Developer?
B: We have this joke in my family because my father is an engineer and my mother is a nurse. My father-in-law is an engineer and my mother-in-law is a nurse. So I became an engineer and married a nurse. In reality, I didn't know anything about engineering. I just always did well in school and felt that I should pursue something challenging. I was always better in math and science than English, so engineering seemed like a good option.
While pursuing my degree, I worked as a "co-op". The co-op program is where you alternate a semester of school with a semester of work. The work is usually at a local company in your field of study. So the student gets good experience and can make good money as well. And the company gets less expensive help to do more menial tasks, so it's a win-win situation.
While working as a co-op, I had the opportunity to maintain and enhance a number of computer programs and really began to think about working in software rather than hardware. When I finally graduated, Allen-Bradley Co, the company I worked as a co-op for, offered me a job doing embedded systems/firmware development. A few years later, I decided to get my Master's degree in Computer Science and shifted to another group doing full-time software development.
Software Development and Linguistics are two fields that were regarded as being like chalk and cheese. Yet they co-exist today. In this context, your pursuit of a Ph.D. in Linguistics at the University of Texas, Arlington, indicates a great deal of foresight. What drove you to merge the two in your career?
B: During my first trip in India, one of my Indian friends from Kerala told me, "Malayalam is the heavenly language. In heaven, we'll all speak Malayalam!" His friend said, "When we all get to heaven, we'll have to speak English, because the Americans can't learn anything else. "Duly chastised, I decided to try to learn a little bit from each place we visited and really found that I had a knack for language learning.
Later, when a friend told us about the work he was doing in language development with the Summer Institute of Linguistics, it sounded really interesting and eventually we joined also. Of course, to do language work, you have to have some training. So I started a degree in Linguistics while working part-time as a software consultant. It was only later, when we were trying to update our corporate software to support Unicode that I really started merging the two fields. I had worked with Unicode-enabled apps in a past job and knew how to make it work. So I asked some colleagues for the source code and started converting a few of the programs I used most often to support Unicode.
You have been living in India for the past few years. How do you think the diverse culture of the nation has affected the numerous dialects and languages?
B: The answer is definitely different in different areas of the country. In the south, there is more of an affinity with English rather than Hindi, and in that setting, the regional varieties are very stable.
In the north, where much of the population knows Hindi well (either as their mother-tongue or at least as a state/school language), the numerous dialects are beginning to decrease in use. This is both a good and bad thing: it's good in that having a single language of wider communication will help foster a sense of unity among people (something which India needs). On the bad side, however, the loss of a language--especially since it is so tied in with culture--is a sad thing.
It is one of the reasons why my interest is specifically in the northern areas (besides the fact that Himachal Pradesh is perhaps the most beautiful place on earth).
How did your interest in Hindi and Kangri languages come about? What special feature of these two languages interests you the most?
B: The interest in Hindi was for pragmatic reasons: knowing Hindi helps you to be able to get around India. Also, it is probably the single most well-known language in all of India and relative easy to learn (e.g. compared with English, for example).
Regarding Kangri, I had to choose a language in which to do research for my Ph. D., and Kangri was a great fit. It has a fair number of speakers, who have a positive language attitude, and has not had as many things written about it, compared with other major Indian languages.
I read a survey of the language situation in Himachal and took a bus trip to Palampur to see what the climate and people were like. When we came over the hill and saw the Dhuladhar mountain range, and met the most friendly, welcoming people, I felt that my choice was made for me!
How can languages like Kangri be enriched by enabling their scripting through the multilingual support of Windows XP?
B: That's a difficult question. Kangri is not (yet) written much. There are some poetry books and cultural stories, but there is a lack of standardization in using the Devanagari script. As the local proverb goes, "The language changes every 12 km." Also, not many people have computers yet.
The good part, though, is that a few of us (myself and a few local scholars) have worked to analyze the sound system of Kangri to make sure we have all the diacritics needed to represent the sounds in Unicode and have come up with what we hope will be the beginning of a standard orthography, based on the Devanagari script.
Is the merger of standards like ISCII and Unicode necessary in the current scenario? If yes, why?
B: Yes. For example, Kangri has a tone which derives from the 'h' sound. Historically, people have been using the halant with the 'h' to represent this sound (i.e. ह्‌). But of course, this violates the semantics of the halant in Unicode, since it means that this is the first letter in a consonant conjunct. So we're encouraging people to use the nyukta instead of the halant (i.e. ह़) so that it doesn't cause problems with Unicode in the future.
I've also been working with the folks in the Dogri Department of the Jammu University on this same issue to try to make as much overlap with Kangri as possible. We're also trying to get another character added to the Devanagari range of Unicode for a different diacritic that Dogri uses for tone. So, we're trying to make sure that languages like Kangri and Dogri can move forward using the Unicode standard, so that as they continue to take off as written languages, they will have a more workable solution into the future.
What is the precise difference between non-web and web ISFOC fonts? How and why does their support for encoding classes of programming languages like VB and VS.NET vary?
B: The web-based versus non-web-based ISFOC encodings are different from each other in the same way that ISFOC is different from Shusha or Annapurna font encodings. These are all "legacy" encodings that do not conform to any particular standard (e.g. ISCII or Unicode). Consequently, there is no code page support for any of these encodings. Because they don't have code page support, developers must use specialized converters for converting them into one of the standard encodings. So, for example, the Font2Iscii website (http://www.iiit.net/ltrc/FC-1.0/fc.html) has exe-based converters to convert between many different Indic legacy encodings (such as ISFOC) and ISCII. Once the data is in the ISCII encoding, then a programmer can use the appropriate ISCII code page to convert that data into Unicode.
But as I mentioned, the only support that programming languages normally give is between the code pages and Unicode. So this leaves users of non-standard encodings, such as ISFOC, without a native converter solution. This is where the package that I've been helping develop (SILConverters) comes in. Because SIL is working all over the world in different language groups, and because we've been using such legacy-encoded fonts for a while, the "non-roman script initiative" in SIL has developed a conversion tool called TECkit. This tool allows users to write maps that specify how to convert the characters from a legacy font, such as DV-TTYogesh, into Unicode. This way, makers of legacy fonts (or their users) can write TECkit "maps" to convert their non-standard encoding to Unicode. Of course, it's not easy to write a TECkit map, so it isn't for the average user. Map writers have to understand about encodings, characters, glyphs, code points, re-ordering, etc.
Also, the TECkit conversion engine is a DLL interface, which can be difficult to use directly in some programming languages. So my piece of the puzzle was to write a .Net wrapper for that DLL interface so that it could be more easily used in different programming environments. After developing the wrapper for TECkit, we realized that the same interface could be used for converters based on the Consistent Changes (CC) conversion engine (another SIL product) as well as ICU (an IBM package for Unicode converters and transliterators). And most recently, I've added wrappers for ITrans and the Font2Iscii converters mentioned above, so that they could be used with the same programming API.
You have been instrumental in enabling Unicode support in different linguistic software packages. Can C# be used to convert data to Unicode?
B: C#, of course, already supports Unicode. That is, all string data in C# is already Unicode-encoded. For existing legacy data in files, C# can convert it to Unicode if it is encoded in a standard code page encoding (e.g. ISCII Devanagari). Otherwise, developers have to use a custom solution, such as SILConverters to convert there legacy data to Unicode.
Is the INSCRIPT keyboard layout competent enough to adapt to the rapidly changing scenario of Indic Language Computing?
B:I've never used the INSCRIPT keyboard, so I'm not familiar with it.
How does the support of UTF for various languages like VB, .NET etc vary and why?
B: I'm not sure I understand this question. UTF formats don't vary. But perhaps you mean that VB dealt primarily with UTF-8, while VB.Net deals primarily with UTF-16. But I'm not even sure that's true. I think even VB6 strings were UTF-16 encoded, though, of course, the different form controls couldn't receive UTF-16 keyboard input directly.
Has the time come for the use of advanced bilingual tools in the support of Indian languages for various applications?
B: Yes! Having interface components in languages that people know well, is a requirement for computer use to really take off in India.
Organizations like TDIL, CDAC and CIIL have also been working on Indic Language Computing and e-learning for quite some time now. What in your opinion should their role be in the coming years?
B: I'm not sure that I'm qualified to comment, but I know that TDIL is doing a great job of making sure that the Unicode standard supports all the languages in India. CDAC has a long history of providing legacy solutions for Indic computing needs. It would be good for them to add direct support for Unicode-encoding in their products (perhaps they are already moving in that direction). And, of course, CIIL has been the prime mover in the linguistic scene in India. I hope they will promote the use of Unicode for the publications and grammars they write, and continue to make data (in Unicode-encoding) available on the internet. This will be a great help to the global linguistic community.
As information about Indian languages on the internet increases, Unicode is perhaps the single most important issue for making data accessible to users.
The introduction of the Unicode Standard and the improved multilingual support of Windows XP have provided a major boost for Indic Language Computing, leading to the release of LIP's and Office 2003 in many Indian languages by Microsoft. Is this the defining moment for the spread of Technology to the grass-root level?
B: Definitely! Most educated people in India know English well enough to use English versions of the OS and tools (which I imagine is why it took so long to have a Hindi version of the OS). But the remainder of the population will probably not be very interested in computers unless it has a regional language interface.
I believe some work is still needed to standardize the user-interface "glossary". For example, we've not had a successful thread on BhashaIndia regarding a systematic listing of menu equivalents (e.g. "File" is फ़ाइल, which is probably okay, but it's not clear to me that a lot of testing was done on the current equivalents). I think it would be great if some graduate student would do some testing on these with less educated folks to see what they understand and to see whether better translations are possible.
Indian languages have come a long way from just being spoken in dialects and existing in scripts, to being applied and used on the Web. What according to you is the scope for further development of Indic Language Computing and what needs to be done?
B: As I mentioned above, job number 1 (in my opinion) is to get data into Unicode. This single thing will give the most benefit towards making applications user-friendly for the long term. Not having the correct font on the system or in a document is perhaps one of the most confusing things for new computer users.
Even with Unicode, however, not all the pain will be relieved. Another great help would be pre-configured versions of Windows (perhaps at a more reasonable price); versions which have Indic support turned on, an appropriate input method, and perhaps even a regional LIPs turned on, by default.
Indic support and input methods account for a fair percentage of the queries posted to BhashaIndia. If these things could be pre-configured on a Windows installation CD, that would be a very positive step forward.

Partner Profile | Privacy Statement | Why Passport | Testimonials
This site uses Unicode for non-English characters and uses Open Type fonts.
©2003-2007 Microsoft Corporation. All rights reserved.