Adventures in Hindi Part 3: India Shining, Internet, and the entertainment override
December 23, 2011 5 Comments
Going back to the drawing board regarding my Hindi project, I decided to look at some basics. Who, exactly, were these Hindi speakers I was concerned about? Given that Hindi came in so many forms, could a single approach work for most persons? I felt the answer was yes, mainly because of the success of Bollywood and also the way Hindi words have crept into so many of the advertisements.
(This is part 3 of a four-post entry; you can see the earlier parts here: Adventures in Hindi Part 1: A mother-tongue fading behind a veil and Adventures in Hindi Part 2: The failed experiment of Have-English-can-translate-to-Hindi)
To paraphrase an old ad, Kisko Hindi Mangta?
But I was not making movies or ad jingles; I was thinking about websites. Who visited Hindi websites, and for what? What sort of websites existed, and whom did they cater to? How did one locate Hindi websites of interest?
From what I understand, when people say they know Hindi, they usually mean they speak Hindi and watch movies and all that, but do they read Hindi? Even if they can read Hindi, do they want to read Hindi, do they read articles in Hindi if they chance upon them, do they *look* actively for such articles? How many Hindi-reading people use the Internet and how do they use it?
One keeps hearing of the Internet revolution and of Indians being experts in computers—does that have any relevance at all with respect to Hindi material on the web?
The two things go hand-in-hand: having online material available on a topic and having people who look for it. If there is no material, people will not look; if there are no people looking for material, other people will not create it. How has this balance panned out for Hindi? What’s the future? (If any of you know of any papers on this, please add the information to comments below).
Anyway, this post is not a coherent collation of thought-out stuff, it is a collage of random tidbits as I thrashed around for some weeks for information and ideas and I played around with whatever I could find.
Here’s one tidbit: There’s an interesting site that contains some world-wide statistics on the Internet at www.internetworldstats.com.
And here is the link where we learn that there is no Indian language(other than English) in the top ten Internet languages (see http://www.internetworldstats.com/stats7.htm
Some interesting numbers on Internet users: China 485 million, India 100 million, USA 245 million.
And some more: percentage population penetration (% of population using Internet) is 8.4% in India as against 36.3 in China, 78.4 in Japan, 1.1 in Bangladesh, 10.9 in Pakistan, and 78.2 in USA.
We’ve got some catching up to do, India Shining…
Curious about how interested the Internet-savvy users are in reading Hindi, I posted a “question” on Facebook, and so far, this is what the 34 responses are:
|Never thought of it||10 votes|
|Wondered about it, but have no idea how to do it||4 votes|
|I speak Hindi and am comfortable with it, but I don’t read Hindi||5 votes|
|I didn’t even know there is Hindi stuff on the Internet||0 votes|
|I guess I might just read Hindi articles if I chance upon them||7 votes|
|Yes, I sometimes search for Hindi articles by adding “Hindi” to the search||5 votes|
|I use the Hindi typing facility to search for Hindi articles||0 votes|
|I don’t know Hindi||3 votes|
I had requested responses from around 100 persons who were moderately active on Facebook, including many who listed Hindi as their language, and some who even wrote in Hindi (and used the Google transliterate for that). Surprisingly, some persons who are pretty enthusiastic about writing in Hindi did not respond; that does not necessarily mean they were not interested, because they could have been busy or may not have noticed my request. But on the plus side, I got some responses from people outside my “friends” list and some people also added some comments to explain their answers.
Asking a question on Facebook is hardly an unbiased survey. For one, these are all persons who use English to use the Internet and are unlikely to be too concerned about getting stuff in Hindi. And anyway, 34 responses do not constitute a significant sample-size…
Even so, the results are interesting. Almost everyone who responded knew Hindi (only 3 of 34 said they did not know Hindi), though amongst those who knew Hindi, 5 of the 34 did not read it.
Not surprisingly, Hindi articles were not top-of-mind for most respondents, and 10 of the 34 responses had never even thought about reading Hindi articles on the Internet. I was slightly more surprised by the 4 who had thought about it but wondered how to, especially as it is not too difficult to just try. Some of these 14 were tech folks who designed websites.
What was clear was that very few people actually looked for Hindi articles; they read Hindi articles only if they happened to get the link from someone. Even the few who looked for articles in Hindi typically did it by typing the word “Hindi” along with a search string. One respondent indicated through a comment that she searched for Hindi stuff on the Internet by using Hindi words typed out in Roman, the way one searches for songs on youtube. None of the persons who responded, not even those who were using the Hindi script for their Facebook and other posts, were using Hindi words typed in Devnagari to search the web.
And here I had been thinking that people would be using the Hindi typing facility to actively search for Hindi stuff. Ah well, that is why one needs surveys.
- Type in Hindi on the Internet using Google transliteration available here: http://www.google.com/transliterate
- Download the tool for your computer’s word processing here: http://www.google.com/ime/transliteration/
- Use Google Advanced search options to search for pages in a particular language (there is a Language option); check out http://www.google.co.in/advanced_search )
So well, not very encouraged, I went back to try and understand what people in India use the Internet for, and what sort of searches people do using Hindi.
Luckily for the curious, Google has a number of tools to show you what others use Google searches for. These tools are intended for persons who want to put out Google ads and want to decide what keywords to attach their ads for, and therefore want to know which keywords are used more often in the specific areas of interest.
Two tools I played around somewhat with are:
I played around with http://www.google.com/insights/search/first.
On the day I checked it, I learned that the top search terms from India in the last 30 days had been: download, facebook, songs, youtube, song, google,games, videos, gmail and yahoomail, and that the rising searches were kolaveri, kolaveri di, kolaveri song, veena malik, sunny leon, sunny leone, aieee, facebook log in, bigg boss 5, and desi boyz.
And in the category of health, the results I got for top search terms were: and, hospital, health, cancer, medicine, eye, aids, tablets, heart, diabetes, with the rising searches being: ivf surrogacy, surrogacy, amri hospital, aids day, world aids day, surrogate, ivf, aids, hiv aids, icsi
(I think we have Aamir Khan to thank for the surge in surrogacy…)
Running the health category search for 12 months gives a slightly more stable picture, with the top search terms being: hospital, medical, health, medicine, cancer, eye, heart, breast, diabetes, doctor and the rising searches being: delhi belly, xxy, epf, kpsc, defloration, gk, and mbbs (I am quite clueless about half of these but I suspect delhi belly refers to the movie and not the stomach infection)
The picture gets even more interesting if one types a word as a search term. I tried typing (what else?) “dementia”, and there was a regional pattern that showed that the interest was mainly in Kerala, Karnataka, Andhra Pradesh, Tamil Nadu, Maharashtra, Delhi, West Bengal, Punjab and Uttar Pradesh, and the list of search terms used across India was not even long enough to fill the 10 slots allowed. Senile was a commonly sought term, sought as much as dementia, and all the hype of World Alzheimer’s Day notwithstanding, there was no peak in September. The search for Alzheimer’s was even more amusing, with just one term being listed “alzheimers disease” and “not enough search volume to show results” and all searches being only from Karnataka.
Contrast the results with the sort of all-India enthusiasm and spread seen if you type the search term “cricket” instead, and the contrast is striking. Or even “amitabh”…
And what if we use Hindi to search? I typed cricket in Devnagari script क्रिकेट to check, and the regional searches are now clearly in the Hindi belt, peaking in Madhya Pradesh, and it is clear that there are quite a few people using Devnagari to type cricket and search the Internet. The enthusiasm for surfing using the Devnagari for health, स्वास्थ्य is markedly less, with “not enough search volume to show results”. बीमारी , दवाई (devnagari for illness and medicine) are no better, but रोग (another word for illness) is a mite better, with the term that is sought being स्त्री रोग.
The keyword tool traffic estimator at https://adwords.google.com s also interesting to see what people look for. It allows selecting country, language, and device used to access the Internet.
Looking for category health, for one month for India, Hindi, as accessed from laptops/ desktops, the top term is ayurvedic plants, with 6600 global searches of which 5400 are from India. The next is dabur shilajit gold, with a global count of 1900, all from India.
Type cricket in Devnagari, and it shoots up to 40,500 in a month, all from India.
So yes, there are the beginnings of profiles and patterns, if only one could understand them…
On a different line of thinking… (I did warn you this was a scattered post)
We hear a lot of hype about the Internet revolution and the IT industry and technical expertise of India, but I’ve often wondered how many of the IT people actually use the Internet at work. I think many IT offices ban surfing within the premises, either to ensure security of client data, or to prevent people wasting time during working hours. So these tech-savvy employees get to use Internet only when they are back home, tired after work and long commutes, and naturally youtube, songs and kolaveri di are more important then. How many people use Internet as a primary resource for information, and how many use it in Hindi? How many use it to stay informed about health?
I would suspect that Internet is a major source of entertainment, and not used as often (to put it diplomatically) for information…
Another bit of playing around I tried was seeing how helpful the Google search engine is while searching in Devnagari.
Suppose in a Google search box, we type “crickit” (Roman script, misspelt word) — Google recognizes a possible problem, saying:
Showing results for cricket
Search instead for crickit
And even if we are searching only for Hindi pages, this search for “crickit” suggests “About 18,900,000 results (0.32 seconds)”.
Type the word in Devnagari as क्रिकेट and the results of Hindi pages are “About 20,300,000 results (0.30 seconds)”
But then, type क्रीकेट ( a mis-spelling in Devnagari) and the results are drastically different, to wit, “About 1,300 results (0.31 seconds)”
Even for a word like cricket which seems popular enough, Google fails to recognize a spelling mistake. I don’t think Google currently recognizes possible spelling mistakes in Hindi.
This has major implications for surfers who want to use Devnagari to search the web. If we search for बीमारी we get 2,200,000 results, but बिमारी gives only 62,200. This also means that these 62000+ pages with the mis-spelled word may not be visible to those who spell the word correctly.
Hindi is, of course, pretty tricky. Sometimes we have words where the mis-spelling is easier to type than the correct spelling. Sometimes we have words with no correct spelling decided upon–like English words adopted in Hindi or used in Hindi.
Take the word for patient, mareez. Spell it as मरीज and you get “About 1,200,000 results (0.29 seconds)” but spell it as मरीज़ and you get fewer and different results “About 52,700 results (0.28 seconds)”. Now Hindi dictionaries give the correct spelling as मरीज़ but Microsoft Word puts a red squiggly under that, and seems to approve मरीज (In case you haven’t noticed, one of them uses ज and the other uses ज़ – note the dot)
Or take my favourite word, dementia. It can be written as डिमेंशिया or डिमेन्शिया and these, for Google, are two different words. So if I write a page using डिमेंशिया, then as I understand it, this page will not be visible to someone surfing for डिमेन्शिया in the current state of the search engine.
Now look at English words written in Hindi. Take Alzheimer’s, for example. When looking for a good transcription for it in Hindi some months ago, I found a number of well-translated pamphlets and videos where the word was written as एल्ज़ाइमर्ज़. That sounded correct, too. I therefore assumed this would be a good way to use the word and used it thus in a few trial blog entries I made. But when I did this playing around, a search on एल्ज़ाइमर्ज़ yielded only 8 results, mostly thanks to me 😦 On the other hand, एलजा़इमर gives 2,010 results, अल्जाइमर gives 78,400 (this is the spelling the newspapers use) and अल्ज़ाइमर gives 1870.
Clearly, search engines are not really too sophisticated for searching for keywords in Hindi yet.
A related area of thinking… how do people know where to find material in Hindi? There are newspapers that are probably regularly visited by people who know of them; there are bloggers who blog in Hindi. They know of each other, I guess, and also through places like indiblogger.in I read somewhere that there are also aggregator sites that list sites in Hindi, but the few names I got were all defunct sites.
I gave up searching at this point, mainly because it struck me that if Hindi searching was still in its infancy, then studying it for trends was pointless.
I think the Hindu usage trends are still much in their infancy right now. They will probably change erratically for a while till things stabilize and mature. One major brand of a mobile phone with a uber-friendly Hindi keyboard could swing things.
What I wonder about is, by the time people are equipped for searching effectively in Hindi, will there be enough Hindi material available that could be reached at through such searches? Right now the main source of Hindi material is newspapers on the web…that would have to be changed to satisfy a (hopefully) growing audience.
And any approach I decide on would have to factor in that many Hindi speakers were not Hindi readers, and that different people spoke and understood different types of Hindi / Hinglish and that the evolving trends of searching and reader preferences were still in such a formative stage that they could well evolve to parallel those in English, or may be very different. It was also possible that the Internet revolution would never percolate that deep, or that Internet would remain more a mode of entertainment (as it currently seems to be) and visitors may never really consider it a source of knowledge.
Or perhaps the persons who would get more interested in using the Internet may also be the profile that wants to use it in English (howsoever difficult they find it) rather than in Hindi.
I chanced upon this rather interesting article:Dispatch from a far flung corner of India (This article, folks, is about the Wikipedia Hindi project and is worth a read).
…Several of the middle-class non-Wikipedian locals I spoke to didn’t know there was a Hindi-language version of the Wikipedia but thought it made sense and one journalist even said he’s considered looking at the community of Indian Wikipedians himself. Of course, when a debate came up about the ages of Bollywood stars this didn’t stop anyone from searching in English on their mobiles.
… When locals do use computers … they do so in English and mostly as communication devices. These young people aren’t searching the web for information, they’re simply logging on to connect with friends. That the technology they have is in English is telling.
Enough of my thought-jumble for this week. More next week. And if you have any comments or any relevant data/ projections/ analysis, please share.
This is part 3 of a four-part post that shares observation on use of Hindi on the Internet and suggests ways to reach out to Hindi speakers over the web. The other parts are:
- Adventures in Hindi Part 1: A mother-tongue fading behind a veil
- Adventures in Hindi Part 2: The failed experiment of Have-English-can-translate-to-Hindi
- Adventures in Hindi Part 4: In the end is the beginning, or, more observations, a summing up and a way forward.
If you feel this “Adventures in Hindi” series of blog could be of interest to someone, please share the link.
Subsequent to this series of posts, I began work on a Hindi website for dementia care, and also created and uploaded more videos in Hindi. A brief update on the experience so far, along with links to the various uploaded material, is here: Creating online dementia care material in Hindi: my experience so far.
If you like this post, please Share/ like this post using the buttons below.
You can also follow this blog by getting email notifications; click the “Follow me” option at the bottom of the right sidebar. Thank you!