Meet Mr. Ibrahim Shareef
Developing Intelligent Indic Applications
Mr. Ibrahim ShareefA Programmer with the Sarma Group of Companies, Mr. Ibrahim Shareef has over seven years of experience with developing secure safe and user friendly computer applications.
With a Bachelor of Arts in Economics and pursuing a Masters in Psychology, at first glance Mr. Ibrahim Shareef would appear to be miles from the field of Indic Computing.
However, a candid interview with Mr. Shareef showed us how passion for knowledge is all that one needs to dive into development. Mr. Shareef shared his experiences at developing intelligent Indic language computing products with BhashaIndia.
Tell us more about yourself. How did your interest in Indic Language emanate?
IS: I am 25 years old, hailing from Thanjavur District in Tamil Nadu. My initial schooling was from Thiruvarur District with Electronics as my major subject in higher-secondary course (96-98). This has spurred in me a keen interest in computers, both hardware and software. My rudimentary knowledge of computers began in 1995, and I moved into core programming by 1999. I have completed my Bachelors in Economics and am now pursuing my Masters in Psychology. For the past 7 years I have I have been working extensively on Data Processing, System Utilities (Microsoft Platform), basic accounting & other packages for local customers. (MS Platform & Internet Based) , Image Processing Utilities, Microsoft Office Development Utilities, Mobile Based Utilities(Test Case),Security Models & Tests , evaluating Microsoft Operating Systems, using SDK in development, Data Structure Designing & Data Analyzing with various types of RDBMS. Since the last 5 years, from the end of year 2000, I have started collecting information about Artificial Intelligence and this is being used in Indian languages development projects. Since 2001, I have been working with Sarma Group Companies in the capacity of a programmer.
In the beginning of 2005, Sarma's have envisaged a business proposal to create a digital library. Some initial spadework was also done, but there was some delay in pursuing the project. The basic field level knowledge and information gained for the above project prompted us to move a little deeper into projects relating to Indian Languages. This is the background for my present Indian languages development projects.
You have a very unique educational background with B.A in Economics and pursuing M.A in Psychology. It is quite interesting to know that you have such an intense interest in Indic Language and computing. How did the interest start?
IS: "Customer satisfaction is important in all services". Every principle of economics is guided by two basic factors - Demand & Supply of Goods & Services. These factors, Demand & Supply, are directly dependent on the psychological state of mind of the consumer. Hence it is essential these two empirical sciences be studied in unison and not compartmentally. Study of psychology, I hope will sharpen my understanding of men and matter.
As for my interest in Indic Language computing, the current boom in the IT sector the world over, is drawing every section of society into its fold. Every individual has a burning desire to be an active player in this field. However, for many, a lack of proficiency in the languages used for specific commands is the main handicap. IT penetration into the ground level will be much faster if every operation such as commands, data input, results, user manuals, etc., is designed in one’s native language. This knowledge has prompted me go further into Indian Languages Development oriented projects. My basic knowledge in A.I has also given me an impetus to use the structured rules of grammar, both in Tamil & Sanskrit in my projects, for which I hope to make use of some of the principles of A.I in Machine Translation.
Tell us more about your Sanskrit Spell Checker and Basic Grammar Checker?
IS: Currently I am working on Tamil Spell Checker with Basic Grammar Checking. The two ancient and classical languages, namely, Tamil and Sanskrit, have a well defined grammatical structure for both text and speech. This product primarily targets mainly Microsoft Office products and the Microsoft platform, keeping in mind their user friendliness and wide spread use. The Tamil Spell Checker functions as the normal MS Office Spell Check plug-in. When standard texts are created in Tamil or other Indian Languages their basic grammatical correctness, barring colloquial or slang usages, is to be taken care of. My utility takes care of all the above requirements.
  • The Grammar Checking Structure is based on the classical treatise on Tamil Grammar viz. Tholkappiyam and Nannool.
  • It is capable of checking the errors in any font used and suggesting alternatives for erroneous inputs, both exact & nearest, also in the same font.
  • The vocabulary used is a repository of around 2.1 million (21, 00,000) Words.
  • It has an in-built English-Tamil Lexicon of 82,000 Words.
  • It supports all MS-Platforms (Win98 to Win2003) including all MS-Office Versions (Office 97 to Office 2003).
  • The structure that has been designed can be adapted suit to any Indian Language(s).
  • A Sanskrit spell checker with grammar checking is in the making for which an analysis about 0.5 million (5, 00,000) Sanskrit words is progress. A complete product is still some months in the making.
How exactly does the Tamil OCR and Speech Recognition that you are developing work?
IS: The Tamil OCR and Speech Recognition Program under development, is based on all the basic rules of Tamil Grammar in Script & Speech. Initially, the Tamil OCR targets printed Tamil texts and we hope to reach hand-written manuscripts. At present about 3000 styles are being analyzed. Preliminary work on speech recognition is just taking off. This process needless to say, will also be based on rules of Tamil Grammar, including certain peculiar rules applicable to Sanskrit words used in Tamil. For starters, we plan to analyze female voices for the Speech Recognition Project.
Considering your versatility in computer languages, how have you been able to implement it in you work and in creating innovative and effective softwares?
IS: Every computer programming language has its own unique features. Microsoft Development products (Computer Language Development Tools, Compilers, Concepts and IDEs) and the Microsoft platform facilitate the programmer to create totally separate and faster programs. Further MS Products enable the programmer to even incorporate the special features of programs written in other languages to deploy safe, secure & faster packages to the end-users. In that respect, I am using more than 9 MS languages and concepts along with Microsoft Platform SDK in my work. Hence I am able to provide error-free, secure and faster products to end-users.
Since you seem to have a considerable amount of knowledge on Microsoft and its products, do give us you insights on the platforms you have worked and on your experiences? Are there any suggestions that you might have?
IS: My initiation into computers began with MS-DOS 5.0 & MS Windows 3.x in 1995. I have now come up to Windows Vista including trouble shooting & platform development. In my opinion Windows 98 SE was very easy to use, error-free, stable & user friendly. I hope the Vista also will be user friendly like Win 98 and secure like Windows 2003. I guess we can hope for a more vibrant GUI with Vista. I have already forwarded my suggestion on Windows Vista about flexible Language Interface Package (LIP) to
How necessary do you think Indian Language Development is? How relevant is it in present day India?
IS: The digital penetration encompasses all of the Indian Languages. There are about 18 Languages recognized by the Indian Constitution and there are about 800 dialects. As every language group (person) wants to reap the benefits of the IT boom, programs developed in their native language(s) will have an immediate reception. Therefore, packages like ours will have a good response from the people and this will definitely cascade India’s progress in the IT sector. My expectation is that Microsoft also publishes their product documentation and manuals in Indian Languages to further this trend.
What does the future hold for India in the field of R&D in Artificial Intelligence?
IS: To the best of my knowledge, not much of R&D in A.I is being done in India now. Having said this, though, I must acknowledge that India boasts of a huge pool of talented programmers and developers whose expertise is being tapped by every country in the world today and the day may not be too far of, when Indians foray into Artificial Intelligence on a grand scale.
Where do you think India features globally in this field and what in your opinion should India do to reach the standards of countries and stay on par with them?
IS: The Japanese are very strong with robotics, an integral part of A.I. Germans are ages ahead at neuron based A.I. USA and other countries are developing A.I in contextual and other methods. India, in my opinion has a better chance to develop A.I in language based contexts I say this because all Indian Languages are phonetic in nature and have a technically structured & systematic grammar. The main concept of A.I is to analyze, categorize and create structures and choose the apt one for the user. By nature, Indians have a flair for languages and are able to learn other languages easily. This inherent strength can take India to the pinnacle of language based A.I developments.
Give us an insight in to your 'Maths & Science Formulae Calculation'. Does it cater to the needs of complicated high-level research standards?
IS: At present, MS-Word does not have the feature of calculation. This utility was developed to function inside MS-Word using commands of MS-Excel function in MS-Word. In addition to calculation, our package also has some basic Math, Science & Electronics formulae. My aim of developing the above package is to enable programmers or end-users to use Tamil instead of English as is the case at present. No Indian language has been developed to use commands for text or numerals in computers so far. This product is a step in that direction. Therefore, the end-user can execute commands, formulae in Tamil and get the results in Tamil numerals itself. For now, this product has been developed only for basic level calculations.
What are the features of 'Text-Speech in Natural Human Voice for Tamil'?
IS: The Tamil Text to Speech Program under development is based on all the basic rules of Tamil Grammar in speech. It will have features like normal human voice modulation with pitch adjustability especially for the Tamil Language. The Tamil Vocabulary we have (0.5 Million words) is being utilized for my speech analysis
What prompted you to take up Transliteration and Font Conversion? What is new that your expertise has been able to bring into this field?
IS: When we started our Tamil Spell Checking Project, we had to cover all existing Tamil fonts in use. Hence, we gathered a large number of Tamil fonts in use at present and the count went up to 183 Fonts. After thorough analysis, it came down to 54+1. The special feature in this product is that it enables the user to convert any type of font to and from any other font type including Unicode. We have made 32,000 random checks with this font conversion utility. Before it was deployed on the web as a free package, we also published the core DLL for conversion to Unicode for any existing data from their database. This package also functions as a MS Word plug-in for conversion within the MS-Word application. As Sarma’s group of companies is involved in the DTP Publishing in some Indian languages (such as Tamil, Hindi, Sanskrit, Malayalam and Kannada) there is a general interest in understanding the meaning of those texts. Most of us whose mother tongue is Tamil also have some proficiency in some other Indian language, and we are able to understand and converse in those language, without any scriptural knowledge of those languages. We will be able to understand and enjoy the writings in such languages if the text is transliterated into a language of our understanding. We have developed our tool for transliterating texts in 9 Indian languages supported by Unicode. This has led us to develop a unique structure for conversion of any Indian language font into Unicode or ISCII.
Give us an insight into your 'E-Mail Client with Tamil Interface and Multiple Tamil Fonts Support'
IS: At present mails can be sent only in a single font encoding and the person at the other end cannot view the mail, without the specific font in use. This tool helps to overcome this drawback and enables the sender to setup his address book with various font encodings so that user can send a mail using his own input font encoding and the utility converts and sends the message to the receiver’s choice. For example, a user knows only Bamini Font Encoding and has installed it in his/her system. In addition, he/she has an address book with the receiver’s address and font choice. When he/she sends the mail using this tool, it will convert the mail font into the font of receiver’s choice automatically. Even users with limited knowledge of computers can send mails with this utility (Full Tamil Interface & POP-3 /SMTP Based) and my next plan is expand this feature into Voice Mail.
How effective do you think 'MSN Search Engine’s Tamil Interface' has been?
IS: I have already mentioned that more people will be benefit if the UI is made available in their native language. In addition, there is no search engine for Tamil language with Grammar support & Tamil interface. However, MSN has an excellent search engine with 4 International Languages support. More people will use the internet if a search engine complete with a Indian language interface along with grammar support are available. Moreover, there is no facility available for saving the search results onto the local hard disk or the Internet. Hence, we have designed this Tamil Interface Search Engine as a stand-alone tool for Microsoft users. It will help to search Tamil Unicode data with Tamil grammar help, save the searched results into local hard disk with an integrated Tamil Interface. This utility functions, based on the Microsoft Network Search Engine.
Do you have a technical support team to back you up?
IS: With regards to designing, programming, testing, data management, research and development it is a one man show namely, me.
What do you think of the BhashaIndia project and how do you see it enhancing Indic computing?
IS: I often visit It is very helpful to write and publish my opinions and expectations from Microsoft. Also reading another’s feedback and requirements in their language development etc. is good for me. It is good that Microsoft evinces keen interest in the development of Indic Language Computing through wherein the opinion, suggestions and feedback can be made in any Indian Language (Unicode Supported) of their choice. Moreover, my expectation from is a FAQ Support on MS Products in Indian Language.

