Conversational Systems for Human/Computer Interaction

Principal Investigator James Glass

Project Website http://groups.csail.mit.edu.ezproxy.canberra.edu.au/sls/research/interface.shtml

As computers increasingly permeate our daily lives, our demand for online information is skyrocketing. Growing numbers of us turn to the Internet to catch up on the latest news, sports, and weather, obtain stock quotes, reserve airline flights, conduct research, or check out what's playing at local theaters. Unfortunately, navigating through vast amounts of data to obtain useful information can require a time-consuming series of keyboard entries and mouse clicks, and technical savvy. But there is a more efficient, more flexible tool available for human-computer interaction, something that even the most technically challenged of us could use anywhere, any time: spoken language.

In order to make it possible for humans to speak to computers a conversational interface is needed. A conversational interface enables humans to converse with machines (in much the same way we communicate with one another) in order to create, access, and manage information and to solve problems. It is what Hollywood and every "vision of the future" tells us that we must have. Since 1989, getting computers to communicate the way people do -- by speaking and listening -- has been the objective of the Spoken Language Systems (SLS) Group at MIT's Computer Science and Artificial Intelligence Laboratory.

Imagine talking to a computer to find a needle-in-the haystack job listing, or showtimes of a movie premiere at the closest theater. Today, obtaining such information online requires a programmed transaction between the user, who clicks through a pre-determined sequence of options and views results, and the computer, which retrieves user-selected data. With spoken language systems, however, user and machine can engage in a spontaneous, interactive conversation, incrementally arriving at the desired information in far fewer steps.

Many speech-based interfaces can be considered conversational, and they may be differentiated by the degree with which the system maintains an active role in the conversation, or the complexity of the potential dialogue. At one extreme are system-initiative, or "directed-dialogue" transactions where the computer takes complete control of the interaction by requiring that the user answer a set of prescribed questions, much like the touch-tone implementation of interactive voice response (IVR) systems. In the case of air travel planning, for example, a directed-dialogue system could ask the user to "Please say just the departure city." Since the user's options are severely restricted, successful completion of such transactions is easier to attain, and indeed some successful demonstrations and commercial deployment of such systems have been made. At the other extreme are user-initiative systems in which the user has complete freedom in what they say to the system, (e.g., "I want to visit my grandmother") while the system remains relatively passive, asking only for clarification when necessary. In this case, the user may feel uncertain as to what capabilities exist, and may, as a consequence, stray quite far from the domain of competence of the system, leading to great frustration because nothing is understood. Lying between these two extremes are systems that incorporate a "mixed-initiative", goal-oriented dialogue, in which both the user and the computer participate actively to solve a problem interactively using a conversational paradigm. It is this latter mode of interaction that is the primary focus of our research.

Raising the Level of Human to Computer Conversation -- Although tremendous progress has been made over the last decade in developing advanced conversational spoken language technology, much additional progress must be achieved before conversational interfaces approach the level of naturalness of human-human conversations. Today SLS researchers are refining core human language technologies and are incorporating speech with other kinds of natural input modilities such as pen and gesture. They are working to upgrade the efficiency and naturalness of application-specific conversations, improve new word detection/learning capability during speech recognition, and increase the portability of core technologies and develop new applications. As the SLS Group continues to address these issues, it brings us closer to the day when anyone, anywhere, any time, can interact easily with computers.