Tom Chi  

Voice Driven Interfaces

September 2nd, 2005 by Tom Chi :: see related comic

I’m sure we’ve all had experiences with voice systems like the one above. Usually they are deployed as cost savings measures to reduce customer support staff. Unfortunately, this is the only experience most have with voice systems. Although not often covered in the HCI literature, there is an incredible amount that voice interfaces can do. These options are also becoming more viable as the base of consumers with voice-enabled devices (usually phones) grows.

One idea would be to have location-aware voice-based assistance relayed through an earbud as one navigated in an unfamiliar space. The space could be driving directions, or just walking downtown. The information could range from educational to entertainment to cultural to (of course) commercial. This type of interaction would allow hands-free operation, and more importantly, leave the eyes open to experience the world instead of being focused on a tiny screen.

In the commercial space, there are a few companies that are advancing voice-based interfaces. TellMe is a good example a company applying small changes to greatly improve the usability of their system. These changes range from saying the action button after the description, to optimizing out unneeded pauses in speech. They also use an approach to menus that prompts for speech selections as soon as they are heard, which makes the interaction feel quicker, while also provided temporal context to help bolster voice recognition accuracy.

So now for the readers to weigh in… what other voice interfaces have you experienced? Which interactions techniques work?

17 Responses to “Voice Driven Interfaces”
Michele Dupont wrote:

As someone with speech problems, I can’t stand voice-only interfaces. Ones where you can speak or use the phone are ok, but anytime I run into a voice-only interface I just sit there in silence until I get a human. ;)

Jay wrote:

Usually, I try to avoid voice interfaces like these. However, I did have a good experience with one once when I was in Italy and was trying to make calling card calls from my hotel room. The room only had a pulse tone phone, making the process pretty painful. However, one of my long distance carriers (Telus) had a voice recognition system which made the experience much more palatable.

Greg wrote:

The best voice interface is an actual human. Other then that most voice systems are linear menus that force you to hop around options and sub-options. They never allow you to perform an arbitrary action/interaction. A voice system can work well for predictable subsets of interactions but I’ve never seen it work in the broad general sense.

Curt Sampson wrote:

When they work well, I find them to work better than touch-tone interfaces. I recently phoned up to change the billing address on one of my credit cards, and even using the voice-based automated system, it was a reasonably pleasant experience. I was quite surprised!

But sitting through long lists of menus to get to the option I want (inevitably at the end!) I find extremely frustrating.

Mike wrote:

I recently had a great experience with a cinema chain’s phone system. It recognised the name of the film & the time. It was quick but clear. It actually put me in a good mood!

Once the system had taken my information, it redirected me to the payment system…

…which was the old touchtone system of my local cinema. I then had to go through the booking process again. By the time I finished, the system informed me it was shut!!

You see, the old system (local to my cinema) shut at 8pm. Although I phoned at ten to, it took me so long to use the system that by the time I was done I was out of time.

So I’m all for modern voice recognition, as long as it’s well thought out & the system has actually been finished :)

Andy wrote:

I’ve had many good experiences with voice based phone systems. In general I think they have been improving steadily. However, I also think it is always important to give people the option of using the buttons on a phone. As an example, the majority of my calls to places using voice recognition phone systems involve some sort of business trasaction. When calling these places I’m often in a situation where I don’t want to be reciting my account or credit card information out loud! In cases like these, systems that can seemlessly switch between receiving button or voice input are the way to go.

Scott wrote:


Please forgive my using a comment field. I couldn’t find an email address.

I love your web site. Do you happen to know if there is a mailing list, newsgroup or web site that HCI
students in the U.S. regularly interact with. I was thinking of posting
the note below to students.

Thanks very much.




San Francisco’s Director of Elections has become very concerned about
the usability problems in voting equipment and wrote a letter to the
CEO’s of voting equipment manufacturers about his concerns. His office
released this press release:

One letter is a good start, but the CEO’s of voting equipment
manufacturers need to receive many more letters from election officials,
who are their customers, to take seriously the usability of their voting
equipment. Unfortunately, most election officials are unaware of what
usability is and are inclined to put off writing more letters. One
solution to this problem is to have people who are HCI/usability
professionals and students reach out to election officials in their
area. The “reach out” scenario could probably be something like:

1. usability person identifies election official for his/her area

2. usability person contacts election official and explains
that the San Francisco Director of Elections on his web page is
encouraging election officials to write letters to voting equipment
manufacturers about the need for usability. the web page is at:

3. usability person offers to meet with election official for
1 hour pro bono to explain what usability is and how it is relevant to
voting equipment

4. if election official accepts offer, usability person meets
with election official and explains what usability is. (if it
is helpful, San Francisco has put the powerpoint slides for
the introductory usability and voting equipment lecture
online at:

note: if these powerpoint slides are used, the down page button needs
to be used instead of the mouse)

usability person also goes over the simulation that
Diebold has put online of its touchscreen equipment and explains
the various usability problems. the simulation is online at:

5. usability person shows that a Word document with
the letter from San Francisco’s Director of Elections is online

and can be used as a basis for other letters

6. usability person offers to advise election official on writing
about need for usability in the letter

It would be great if 100 election officials wrote letters to voting
equipment manufacturers.

So why would a usability person want to reach out to his/her local
election official? First, the usability person would be helping
society address a social problem, i.e. usability problems in the voting
process. Another reason is that it would help increase the public
awareness of the usability field and how it can benefit them.

Thanks for your help.


PS If you do get an election official to write a letter to voting
equipment manufacturers, please encourage them to write a passionate
letter. (For various reasons, it was decided that SF’s letter would
be a little more on the mild side.)

PPS I’d appreciate knowing what election officials are writing
letters so we can see how the progress towards 100 letters is going.

James wrote:

Generally, I hate voice interfaces. I feel like a dork saying ‘new account’ or whatever it is, especially at work or in public. I think that’s a big problem: it removes a lot of privacy, as Andy commented. Personally I’d rather listen for an option listing then hit a number on the phone. Actually no, I’d rather do anything to do with service providers online, and rely on the telephone for when I need to talk to a human. As Greg said, the best voice interface is another human. Unless they’re an off-shore telemarketer. Or any telemarketer for that matter.

Bob Salmon wrote:

The posts above point to a lack of imagination on the part of the system designers (or maybe technical or financial constraints), similar to omnivores asking vegetarians “But what do you eat?” imagining their usual food but with a hole where the meat used to be.

Instead of mimicking the menu structure of DTMF or GUI interfaces, speech works best as a delegation interface. It allows you to specify your goal in user speak, rather than you needing to learn the designer’s / system’s model of the world. So, ideally, you say

“Give me times of trains between London and Cambridge, arriving in Cambridge around 4 p.m. on a weekday”

and the system recognises the speech, parses all the information, slots it into the query on its database and prompts the user if any slots are still empty and then gives the answers back. This requires a lot of flexibility on the part of the system, which takes serious dialogue design (e.g. interviewing people and/or listening to recordings of people buying tickets at stations etc.)

Notice there were no menus involved in the stuff above - you state your goal in the way you want without having to say “I want to select London from the From menu which is over there on the screen, Cambridge from the To menu…”

This is all likely to be over the phone, which is incredibly hard: a huge population of users with all sorts of accents, no training of the system to their voices, background noise, users talking to other people near them, users breathing into the phone / scraping their beard on it etc. The phone is also a terrible audio instrument - the part of the signal outside the range 300 Hz to 3.5 KHz is just dropped and speech goes up to at least 8 KHz (the letters S and F differ only above 3.5 KHz).

Also the vocab of a native speaker can be up to 40,000 words, so each word is a 1 in 40,000 choice. You can only narrow this down by using the grammar of the language, semantic context and so on - a native speaker gets it down to only 1 in 32 or so. But encoding all this extra info into a computer system is hard (AI complete, in fact).

Bob Salmon wrote:

Sorry to continue the rant, but this was my day job in a former existence. The speech interface that I haven’t used (if it exists at all) but I’d like to is the one-button phone. I work in a fairly large office with a standard phone system. It normally works OK until I want to transfer a call to someone or conference them in. There is a set of 8 identical small buttons up the side of the phone number pad, with a printed legend alongside that often falls off. On top of that there are about 150 people in my office, each with an extension number. We have the phone book on the intranet, which I’m not usually looking at before I make a phone call.

What I’d like is to be on the phone to someone and then press a “Wake up phone!” button and speak the instruction “transfer this to Neil”. It would recognise the command and work out Neil’s phone number. I don’t need to know that it’s small identical button 3 (or, worse still, on an analogue phone system something like *34#*).

Maybe my children will use this kind of thing if they end up in an office and then think our current junk quaint and old fashioned like starting handles on cars.

Reed H wrote:

(Off topic, but in response to Bob’s post…. how many cold cold mornings with a now-useless battery, have I wished that that my car had a starting handle/crank!)

Scott M. wrote:

The best voice-driven interfaces I’ve used are:

- Amtrak (1-800-USA-RAIL): “Julie” is extremely helpful, and knows when to transfer you to a real person.

- 511 (works within the Bay Area only): This transit information system is fantastic. Depending on what you ask for, it may help you itself or transfer you to the appropriate transit agency.

Jeff K wrote:

^^ I agree. Julie is able to process voice-activated credit card payments for train tickets.

andrew r wrote:

I have used voice interfaces and I stutter so this use of software is not my favorite. But, it does have value and hopefully will get better for all users that do or do not stutter.

Ken wrote:

As far as I’m concerned none of it matters because voice activated systems are totally annoying. It was bad enough having to go through selecting numbers on the keys but I can understand these are huge systems and they need to autmoate things to some extent. But I really can’t stand some recording of some dope trying to act chipper for me when all I want to know how much money is in my friggin bank account, or why my cable isn’t working. Very often I’ll have a question that is not covered under the selections and I have to sit there saying ‘let me talk to a person’ about five times before it gives up and transfers me. Often I wind up in service when I want accounts, and it’s ridiculaous that I have to have this aggravation often when I’m actually calling to find out how I can spend money with some company.

Daniel wrote:

Jeeeenny I’ve got your number..

I’ve not really ever had any problems with voice-driven systems. I suppose I speak clearly enough for them to get what I’m trying to say.. I’ve also never had one ask me for my pin number.. (the price of a cheese pizza and a large soda back where I used to work) :)

Walter Rolandi, Ph.D. wrote:

I have to echo Ken’s sentiments completely.

As a human factors consultant specializing in voice user interfaces, I constantly encounter poorly designed and annoying animated speech applications. In fact, I maintain that the entire “persona” movement in the speech application space has seriously inhibited the emergence of more usable speech systems. Ignoring user needs and basic ease of use, designers have instead pursued entertainment variables and marketing considerations.

Leave a Reply

OK/Cancel is a comic strip collaboration co-written and co-illustrated by Kevin Cheng and Tom Chi. Our subject matter focuses on interfaces, good and bad and the people behind the industry of building interfaces - usability specialists, interaction designers, human-computer interaction (HCI) experts, industrial designers, etc. (Who Links Here) ?