Designing and Coding for Voice

Summary: An overview of simple interactions for voice command or speech recognition software users, and the accessibility barriers that can emerge. Adapted from an internal Ad Hoc Technology Blog post.

Introduction

One area of accessibility that’s I find is frequently overlooked is voice accessibility, or speech recognition accessibility. Improving voice support comes up frequently on VA.gov, since it’s something even experienced accessibility specialists may not be testing for or thinking about.

Modern operating systems include basic voice tools bundled with them (Windows, MacOS, Android, iOS), with a lot of variation in the quality of experience. Historically, the gold standard for voice interactions has been Dragon by Nuance (formerly Dragon NaturallySpeaking and DragonDictate), which was recently acquired by Microsoft. I’ll also mention Talon as a much newer offering, which combines spoken commands with eye-tracking. (Anecdotally, I’ve heard good things about Talon’s user experience but I don’t have any first-hand experience.)

Voice accessibility benefits users who don’t use a keyboard or a mouse as their primary input method. They may be unable to use these input devices (permanently, temporarily, or situationally), or they may wish to limit how often they use them, eg. to avoid repetitive stress injuries. Like many assistive technologies, these interaction patterns have been adopted for other hands-free use cases as well. The natural language processing algorithms behind Dragon also power interactions with Apple’s Siri, and Dragon is widely used by health care providers for transcribing notes.

For the purposes of this blog post, I’ve recorded demos using Dragon. Specific commands may be different (or not supported at all) using other software, but the design and coding principles should be consistent.

Common voice interactions

As a Dragon user browsing the web, I expect to be able to speak a command like “Click [name of element].”

Transcript:

Brian: Wake up
Computer: Microphone icon turns green
Brian: Click Example Button
Computer: Button is clicked, alert opens
Brian: Click OK
Computer: Alert OK button is clicked, alert closes
Brian: Click Ad Hoc
Computer: Link is clicked, frame attempts to navigate to Ad Hoc website but is blocked from loading by CodePen.
Brian: Go to sleep
Computer: Microphone icon turns blue

When multiple clickable elements have the same name, Dragon will higlight all of the matches and prompt me to select the one that I want by number. Note: If you’re using an aria-label to provide unique accessible names for each element for screen reader users, that’s great! But remember that sighted voice users will never see the aria-label, so they’re still visual duplicates that need to be distinguished for voice users.

I might also have trouble pronouncing link text, or run into trouble with Dragon recognizing my pronunciation. As a fallback, I can speak the command “Click link,” which will highlight all of the links on the page with the same kind of interaction pattern as when there are multiple matches.

Transcript:

Brian: Wake up
Computer: Microphone icon turns green
Brian: Click V.A.
Computer: Annotates each “Click VA” link with a number
Brian: Choose 2
Computer: Link #2 is clicked, frame navigates to Virginia tourism website
Brian: Press alt left
Computer: Frame navigates back to CodePen example
Brian: Click link
Computer: Annotates each link with a number
Brian: Choose 6
Computer: Link #6 is clicked, frame navigates to Virginia tourism website
Brian: Go to sleep
Computer: Microphone icon turns blue

(Note that the “Press alt left” is speaking a command to use a Chrome keyboard shortcut. Dragon plays nice with some browsers when you say “Go back,” but not others. Full-time voice users just have to get used to those kinds of quirks.)

Broken voice interactions

Those basic voice interactions should work for all common UI elements — links, buttons, form fields — provided the product you’re interacting with has been coded accessibly. Things become more challenging when you don’t use semantic HTML and don’t follow accessibility best practices.

In this example, we have a div styled to look like a button, with an onClick event to make it work like a button for mouse users. It even has an aria-label to give it an accessible name. But it doesn’t have the right role, it’s not keyboard accessible, and the only contents of the button are an icon with no visible text.

Dragon will match against the aria-label, so if you know what the aria-label says, you’re good! But the aria-label isn’t exposed to visually, so sighted voice users are left to guess what the correct command is.

Transcript:

Brian: Wake up
Computer: Microphone icon turns green
Brian: Click “Leave a comment”
Computer: Button is clicked, alert opens
Brian: Click OK
Computer: Alert OK button is clicked, alert closes
Brian: Click icon
Computer: Nothing happens
Brian: Click word bubble
Computer: Nothing happens
Brian: Click button
Computer: All of the semantic buttons are annotated with numbers, but the fake button is not
Brian: Go to sleep
Computer: Microphone icon turns blue

A user interaction based on guessing magic words is, well, not great.

When stuck with a bad interaction, Dragon provides users with some additional fallbacks, like the MouseGrid feature or the excruciatingly-slow option to move your mouse by voice. But these add significant friction to the interaction.

Transcript:

Brian: Wake up
Computer: Microphone icon turns green
Brian: MouseGrid
Computer: Screen subdivides into a three-by-three grid, labeled 1 through 9. The fake button is contained by region 4.
Brian: 4
Computer: Region 4 is highlighted and subdivided into a three-by-three grid, labeled 1 through 9. The fake button is contained by region 7.
Brian: 7
Computer: Region 7 is highlighted and subdivided into a three-by-three grid, labeled 1 through 9.
Brian: Click
Computer: Button is clicked, alert opens
Brian: Click OK
Computer: Alert OK button is clicked, alert closes
Brian: Moves mouse away from the fake button Move mouse left
Computer: The mouse slowly moves left until it is just above the fake button
Brian: Stop. Move mouse down
Computer: The mouse slowly moves down until it is inside the fake button
Brian: Mouse click
Computer: Button is clicked, alert opens
Brian: Click OK
Computer: Alert OK button is clicked, alert closes
Brian: Go to sleep
Computer: Microphone icon turns blue

Three important notes about these interactions:

They depend on users being familiar with all of the features of Dragon. This is not likely to be true for someone with a recently-acquired disability who’s re-learning how their computer works. “MouseGrid” is a useful tool but not a command you’re likely to discover on your own just through trial and error.
Not all voice command software supports these kinds of interactions, or may implement them differently. The Dragon interaction shown here is arguably a best-case scenario for an inaccessibly-coded button.
If your users have to MouseGrid their way through every interaction, you’re significantly increasing the number of spoken commands required to complete a task.

Real-world example

The VA’s design system is based on the US Web Design System version 1, which is now on version 3. Our Design System Team is working on getting us back into sync with the USWDS, as well as contributing some of the original work happening at the VA back to the USWDS for other agencies to use.

As part of that effort, I was asked to do a review of the USDWS file input component. As currently coded, it’s not voice accessible.

Transcript:

Brian: Wake up
Computer: Microphone icon turns green
Brian: Click choose from folder
Computer: Nothing happens
Brian: Click Drag file here or choose from folder
Computer: Nothing happens
Brian: Click link
Computer: Annotates each link with a number, but choose from folder is not annotated
Brian: Click No file selected
Computer: File upload dialog opens
Brian: Click cancel
Computer: File upload dialog closes
Brian: Go to sleep
Computer: Microphone icon turns blue

So what happened here?

First, I tried clicking the element styled as a link: “choose from folder,” and then the whole line of text. But it’s not really a link! Instead, it’s a span styled to look like a link, placed on top of the input using absolute positioning. Besides the CSS positioning, it’s not associated with the input itself in any way. Here’s the markup for the component:

<div class="usa-file-input">
    <div class="usa-file-input__target">
        <div class="usa-file-input__instructions" aria-hidden="true">
            <span class="usa-file-input__drag-text">
                Drag file here or 
            </span>
            <span class="usa-file-input__choose">
                choose from folder
            </span>
        </div>
        <div class="usa-file-input__box"></div>
        <input id="file-input-single" class="usa-file-input__input" type="file" name="file-input-single" aria-live="polite" aria-label="No file selected" data-default-aria-label="No file selected">
    </div>
</div>

The goal of the USWDS team was to provide a visual hint to users for where they should be clicking, but in the process they mislead voice users about what the name of the interactive element was.

The correct command to get to the input were “Click No file selected” because Dragon was able to match with the aria-label. But like the fake “Leave a comment” CodePen example, that aria-label isn’t visible to users. If you don’t already know the magic words, you’re stuck.

The USWDS team is incredibly receptive to accessibility feedback and is already working on a fix. I’m sharing this not to shame them, but to highlight that even something that’s pretty darn good for accessibility like the USWDS can have broken interactions for voice users.

Some takeaways

As you build the things that you build, here are some things to remember for voice users:

Don’t make your users guess the magic words. Make sure each interactive element has on-screen text that is visually and programmatically associated with the element.
- If you have both visible text and an aria-label on an element to give it a unique accessible name, make sure they have consistent words and a consistent grammatical structure. You’re more likely to get a match between the spoken command and the element name.
Maintain material honesty. If I’m having trouble getting something to work and it looks like a link, the next thing I’m going to try is “Click link.” If it’s a button that looks like a link, or a span that looks like a link, etc., my next choice for interacting with the element won’t work.
It may not be practical for you to test with Dragon or your computer’s voice tools. But before you deploy your code, say the buttons, links, and form labels out loud. Are there impossible-to-pronounce acronyms or unintentional tongue twisters? If it’s hard for you to say, it’s probably going to be a bad voice interaction.

Introduction

Common voice interactions

Broken voice interactions

Real-world example

Some takeaways

Pointing and Calling and Accessibility Testing