IBM’s Spoken Web
It’s a fact that Web is good for the economically backward because they get access to information(Just visit near cafe) that can help them improve their lot in life through education and finding jobs. But what good is a PC and a Web connection if you are physically challenged or visually impaired or people who have little or no literacy at all.However researchers at IBM are trying to bridge it using a network of what it calls the Spoken Web.Also IBM India Research Lab was honored with National Award for Technological Innovation based on Spoken Web.Thanks to IBM for these Humanity based innovation which are rare to watch!
For those of us with computers and Internet access, the World Wide Web has provided unfettered access to information, opened new business and employment opportunities, transformed the way we communicate, helped eliminate geographical barriers and paved the way for global collaboration and integration.Unfortunately, it is estimated that more than five billion people1 (or 75 percent of the world’s population) still do not have computers or connectivity to the Internet.What they do have is mobile phones! In India alone, the use of mobile phones has skyrocketed — the number of mobile phone users is over 360 million and services providers are adding millions of customers every month.
Under the terms of the partnership, IBM and Karnataka Vocational Training and Skill Development Corporation (KVTSDC) will aim to make mobile devices better job hunting tools. Karnataka is India’s fastest growing state. In India, only 7 percent of the population has Web access, according to McKinsey. However, nearly everyone in India has a cell phone.Noting this massive penetration, researchers at the India Research Lab decided to bring the Web to the people. Their project, the Spoken Web, helps people who are physically challenged or visually impaired reap the benefits of the World Wide Web through their mobile or landline phones. People who have little or no literacy, a constituency of 900 million people worldwide (35 percent of whom live in India), also profit significantly from the project because it uses the spoken word instead of text as its means of interface.
The Spoken Web creates a system that is comparable to the World Wide Web using speech technology and the telephone. Spoken Web helps people create voice sites using their telephones. The user gets a unique phone number which is equivalent to a URL and when other users access this voice site they get to hear whatever content has been uploaded there.IBM created a new technology called HyperSpeech Transfer Protocol (HSTP) for navigating by voice.For now there’s no direct link between the voice sites in the project and conventional Web sites, but eventually there will be.
Spoken Web or t he World Wide Telecom Web (“WWTW”, “TelecomWeb”, “T-Web” ) is the vision of a voice-driven ecosystem parallel and complementary to that of the WWW. WWTW is a network of VoiceSites that are voice driven applications created by users themselves and hosted in the network.
- WWTW is defined as an information and services space in which the items of interest, referred to as VoiceSites, are identified by global identifiers called VoiNumbers and maybe interconnected through VoiLinks.
- A VoiNumber is a virtual phone number that either maps onto a physical phone number or to other uniform resource identifiers such as a SIP URI.
- A VoiceSite is a voice driven application that consists of one or more voice pages (e.g. VoiceXML files) that are hosted in the telecom infrastructure. VoiceSites are accessed by calling up the associated VoiNumber and interacting with its underlying application flow, primarily through a telephony interface.
- A VoiLink is a hyperlink from one VoiceSite to another through which a caller interacting with the source VoiceSite can be transferred to the target VoiceSite in the context of the underlying application.
Four years since the first prototype was released, the spoken Web is part of everyday life for users in four Indian states and parts of Thailand and Brazil. These people use it to learn of things such as local grain prices or job opportunities.In an 8-month pilot in a village in rural India, IBM offered publishing of content under four broad categories: agriculture, healthcare, education, and professional services. More than 6,500 people accessed the voice site over 114,000 times.Enabling people who have disabilities and others to have access and share information, perform business transactions, or create social networks using their just their voices and their telephones opens a whole new world for so many. It’s no wonder the government of India gave its highest technology award to the IBM team working on the Spoken Web.
Now the project is going through a developmental stage that mirrors one from the regular Web’s history: the debut of search as a way to navigate a growing body of content.”As the number of voice sites grows, and they get more content, people need a way to find what they want quickly,” says Nitendra Rajput, a senior researcher with IBM Research India. Rajput was an early collaborator on the spoken Web with project founder Arun Kumar.
A voice site has some structure: for example, when a person calls in to upload a site, they interact with an automated telephone system that accepts voice commands and prompts the user to create a title for their site and add sections of different information. However, listening to long voice messages is inefficient and costly, says Rajput.”We want you to be able to speak a pesticide name, for example, to quickly find content about that,” he says. But designing a search engine that works like that is far from simple. Voice-recognition technology can be used to take a person’s search term and match it against a previously processed index of recorded voice sites. But presenting the results is a challenge. “We can’t have it read out a list of 20 results. It would take too long, and people would not remember them all,” says Rajput. “Instead it [must] tell the user it has that many, and ask how to narrow them down.”The user is asked which categories they wish to filter the results by—for example, by the name of the person who owns the site, the place it was created, or whether the search term was found in a section of a particular type, for example announcing news, or asking or answering a question. This step is repeated until there are five or fewer results, at which point they are all read out to the user who can choose which they want to “browse” to.
Trials involving 40 farmers in the Indian state of Gujarat validated this design, which is to be rolled out across the whole spoken Web. More features that aid navigation of content are needed, though. As the spoken Web grows, it becomes important to find more ways to aid navigation of content, says Rajput, just as similar mechanisms have been developed on the text-based Web.
Another improvement in the works provides a way to skim through voice sites. Users can already use a fast-forward function to hear a site at increased speed—the feature goes at 10 times normal speed, rendering words too fast to make out, but it slows down for certain important words or phrases. The effect is similar to a person skim-reading a text out loud, says Rajput, and it allows a person to very rapidly find what they want.
The researchers think the system could learn which words or phrases are important by looking at which particular phrases lead to users switching from fast-forward to normal-reading speed. “We are currently collecting the statistics from the users we have in order to know which words are important,” says Rajput.”So many people in the world have no idea how to use the Web or even to understand the text on it,” says Naushad UzZaman, a researcher at the University of Rochester, New York. “Although you cannot remove the digital divide, making it possible to get the benefits of the Web by voice is an example of how we can narrow it.”
Rajput says that, for now, IBM’s spoken Web is completely separate from the World Wide Web, and that most users are mainly interested in local concerns. However, it is possible that the twain could meet. “If there is relevant information on the real Web, we can pull it in to the spoken Web using API calls and text-to-speech technology,” says Rajput. “But it needs to be converted to the correct language, and support for that is not good outside U.S. English.”