Just as the prosect of speech driven interfaces conjures the idea of a maddenly cacphonous workplace, the idea of full arm gestural interfaces (a la minority report) has me re-envisioning the workplace as an aerobics class. Probably the biggest win for this kind of interface is that it films well.
More seriously though, I am a fan of smaller scale gestural interfaces. The human hand has far more dexterity and muscle memory than a regular mouse could ever take advantage of. When I play the piano, my fingers can pour through complex actions which I learned 14 years ago (you can see the same virtuosity in users of vi). Beyond muscle memory, fingers and hands also have reasonable dynamic range in the force they can provide. Here’s a rudimentary example of how you might use this:Imagine a mouse which can sense how hard you are pressing down on it (this could be done a number of ways, pressure sensor, spring with distance ranger, etc). By virtue of being able to sense how hard you are pressing, the mouse can now be made to have a rudimentary sense of depth. This could be used in countless ways, but let’s take the simple example of window management in a WIMP interface. By pressing down on the mouse you could reach down and see windows which are behind the one you are looking at: maybe by clicking and holding you can drag it back to the top.
Anyway, I’m sure you can think of more examples, so why haven’t gestural interfaces caught on? Well first off, not everyone likes the idea of richer input devices. Apple for one has staunchly resisted adding a second button to their mouse, preferring that users hold down a key while clicking to get right-click functionality (real simple, huh?). Secondly, making gestural devices and rich interfaces requires considerable cooperation between OS people and hardware manufacturers — definitely not something that can be phased in tweak by tweak. Finally, we have been stuck in a WIMP world for a while. The metaphor of a pointer moving on a 2D surface in response to a hand moving on a 2D surface is a powerful one that people have grown accustomed to. What they might not see is how WIMP reintroduces the idea of physical space into a context which doesn’t always need it. The mouse wanders like a little bug, place to place in a continuous path. In its travels it might spawn a couple commands, drop a cursor, or set focus somewhere. This is great, but a mouse can miss, or mistrack, or double clicks may not be timed right… To tackle these problems, HCI people end up studying click target sizes (Fitt’s, anyone?), tracking accuracy, studying drag and drop, etc etc, to solve motion problems for this little bug. Ultimately though, it is the desired actions which are important and not the motion of the mouse. So let’s break free of thinking about tracking mice better and consider other ways of doing business.
Let’s just imagine this for a second: Create two new buttons on your keyboard. Button-A will cycle through applications (just go to next), and Button-B will cycle through different instances of that application. So I press Button-A, and I see focus (a glowy thing perhaps) move accross my taskbar. I press a couple times moving focus to my browsers — this brings all the browser windows to the fore. Now I press Button-B until the browser with the article I want is on top. Because the windows I am cycling through are coming to the top as I press the button, I can make use of my fast visual system to find the article (Visually I can quickly discern between CNN, the Onion, Salon, and bbspot). This might take me 2.5 seconds say.
Now that is way faster than using a mouse to displace windows until I find one buried window, and is likely faster than moving your mouse down to the taskbar and trying to parse the text in that list of browser windows: (today recognition relies on looking at title text of the web page instead of parsing the visuals of the page). You have a similar story for Mac. Which big auto-magnifying browser icon is it? um….
Exactly. Now I’ll make no claims that my interface suggestion is incredible, but it is a good example of how problems which people run into 100 times a day can be addressed if we are just willing to rethink how we use our hands.
To close, here’s a link to a company (fingerworks) which is trying harder than most to get gestures into our input lexicon.
“Let’s just imagine this for a second: Create two new buttons on your keyboard. Button-A will cycle through applications (just go to next), and Button-B will cycle through different instances of that application.”
My Mac already does that. Command-Tab cycles through my open apps (OS X uses the dock but LiteSwitch X does it better), and Command-~ cycles through my app windows. On top of that, Panther, the upcoming version of OS X, will offer even more with sweet window management features.
Ah. Definitely good to know. People on Windows might also note that Window’s Alt-TAB equals Mac’s Command-TAB, which equals my “Button-A”. So given that this feature is awesome - which we both agree to be the case - then why do we still live in a world where designers continue to be so mouse-centric? If something simple like Button-A can oblitherate a lot of useless wrist fumbling, then why not add more goodness?
As designers we still live in a mouse dominated world (since WIMP relies so heavily on it). We still get chastised by Tog for not obeying the Law. The point of the post was more an appeal to start pushing those bounds again (because they are artificial), and not so much a lament over the absence of Buttons A and B.
Great cartoon! Good point of discussion.
My three cents:
- designers are pushing the physical interface, just take a look at what a mouse can now do!: http://www.microsoft.com/hardware/mouseandkeyboard/features/tiltwheel.mspx#History
- i agree that the keyboard is a tried and true UI - and a fast one at that, which is probably why unix/linux/command line fanatics shun WIMP. the greater untapped market for physical UI development is everyone else - the lay person, which brings me to:
- in marketing and usability terms I would argue that a faster interface is not the best interface - rather, for the mass market simplicity and usability reign supreme, which is why moving around a mouse (a true WYSIWG interface) will always be easier than a few keystrokes
Looking forward to more cartoons!
Yes, you make the important distinction between faster vs. better for users. The topic really deserves an entire essay, but in brief, it’s true that faster is not necessarily better. The continuum goes something like this: (all numbers are approximate)
0-10 ms : very fast. Anything in this range feels instantaneous (though 10ms is about 10 million instructions these days). Any UI optimization to speed things up faster than 10ms goes pretty much unnoticed.
10-100 ms : fast. A lot of apps exist in this range. System feels very responsive and you’d only notice the lag if you were in a situation where you want true realtime (e.g. gaming, playing a software synth, etc).
100ms - 2sec : medium. These can feel a little slow, but not slow enough for you to think about going to do other things while it executes. Plenty of apps are in here too.
2-40sec : crappy. This is terrible range because it is a long time to wait for an action, but not enough time for you to switch and do something meaningful while you wait. Most of the web is here. Argh.
40sec-5min : Slow… but now a funny thing starts to happen. When tasks get this long, you no longer think of them in terms of response/unresponsive. You just fire them off and get back to them sometime later. Examples of this are using “Find” to find a file in the system, or Replicating a big chunk of email. Because you are not waiting for them, they actually are less of a mental hassle than 2-40sec zone.
5-60min : These tasks are interesting because they are long enough for you do to some meaningful thinking in the time they are running. When I did astrophysical research I would run these batched modelling processes, and in the 20 minutes it took to run, I’d be reviewing and analyzing previous results. Sometimes I think they even made me more productive because it was like a 20 minute egg-timer telling me that I should have finished 20 minutes of work - If it finished before I could review the last results, I felt like I needed to speed up.
60min+ : well you get the idea.
Now getting back to the idea of user interfaces, I think ideally you want to gravitate to the good zones of this continuum and stay out of the 2-40sec range. In most cases it makes more sense to gravitate toward faster - thus the logic behind my comments on gestural interfaces. Making things more understandable is also important, but I don’t feel like window management is inherently that understandable. Window management allows me to place windows in extremely precise positions, but really in the 99% case, I don’t care for it to sit in an exact spot. Consequently window “docking” has been created to simplify the task of arranging windows, but even then, things are always in the way.
The reason I like Buttons A and B, is that they are like the Channel+ button on a TV remote. I posit that moving to an app is like moving to another channel on the TV. When you move there you want to focus on that one thing only. Thus, having 5 applications up is like having a TV with 5 “picture in picture” views all messily scattered about. To me, the cognitive load and user bad generated by visually parsing all that would seem to outweigh the “simpleness” of the window metaphor.
Windows (well, many Windows applications) also does the cycle windows within an application thing, with Ctrl-Tab.
I’m being nit-picky here now and we’re digressing from Tom’s point … but switching in Windows is Alt-Tab and Shift-Alt-Tab. The implementation of which could be vastly improved …
dunno if anyone is still reading this, but on a mac, hold down shift while scrolling and it’ll scroll the window left and right. FYI.
I think a huge part of gestural interfacing has to become intuitive. The comment about the steering wheel is a huge point. Who wants to switch from one type of pattern recognition interaction to another? The interface has to become something that learns what the user wants to do. I move differently than someone with MS, and if they can’t do the gestures that I can, then they are out of luck. The machine has to compensate for that. Maybe Sony is onto something with their EYETOY. It’s pretty basic, but it doesn’t matter what gesture you make, as long as it knows where you are going.
I understand that pattern recognition is pretty much impossible to stray from in a commercial environment, but it was just a thought. Great article : ).
Regardless of how app switching is done today it’s clear that it was designed as a power user feature. A standard set of two buttons on all keyboards and all systems would be much clearer than the many obscure key combos that are quoted above.
As to your point, the intuitiveness of gestural interfaces is another huge topic. Looking through the history books one might say that making something intuitive guarantees little. For example, I would say a touch screen is more intuitive than a mouse. But even as that technology has gotten cheaper it has not made any real headway.
On the flipside of that, there are many key-combos like CNTL-V = PASTE and CNTL-Z = undo, which don’t have really any grounding in intuition and have done quite well. Though we all love intuitiveness (’it just makes sense!’), there are clearly more factors at play.
As people have said, alt-tab/alt-shift-tab, and ctrl-tab/ctrl-shift-tab do your Button A and Button B things, the first of which is vastly improved in XP when you install the PowerToys, which give you an enhanced alt-tab interface with thumbnails of the application you are switching to in the task selector.
There’s an interesting effect of response times in program development. Now we have nice IDEs, fast machines to run our compilers on etc, the write / compile / fix cycle can be very short. This can encourage hackery - hack it till it works. A colleague of mine worked somewhere at the beginning of his career where having compile time errors more than X times was a sacking offence. This is extreme, but I have often realised I was bypassing my brain when coding, and stuck in Just One More Compile syndrome (see chapter 4 of Code Complete by Steve McConnell).
So, an unresponsive UI may actually make you more productive rather than less, if you measure productivity in bug-free lines of code per week.
OK/Cancel is a comic strip collaboration co-written and co-illustrated by Kevin Cheng and Tom Chi. Our subject matter focuses on interfaces, good and bad and the people behind the industry of building interfaces - usability specialists, interaction designers, human-computer interaction (HCI) experts, industrial designers, etc. (Who Links Here) ?