Can You Hear Me Now?

Posted by Ace on May 31st, 2009 filed in letters from Ace

While it won’t seem any different to anyone reading this, I dictated this entire journal entry rather than typing it using the speech recognition software in Windows Vista. As a casual fan of the original Battlestar Galactica series, where Commander Adama spoke into that thin boom mike and watched his words scroll across the CRT screen, I have to say it rocks pretty hard.

Was it faster than typing it by hand? Not even close. And it’s made about a million mistakes. But I’m willing to work with it and see what it can do, if only because it’s one of the many tantalizing promises made to us by science fiction and fantasy that it seems like we can finally fulfill to some degree.

Here’s to yesterday’s future!


7 Responses to “Can You Hear Me Now?”

  1. Ace Says:

    I am not able, however, to use it to call up Firefox, switch to WordPress, log in, paste the text in from WordPad and publish.

    Yet. :)

  2. yoko Says:

    Awesome. The recognition should get better the more you use it, yes?

  3. Neuro Says:

    In terms of dictating being faster than typing, check out this hard to locate demo from dragon naturally speaking:

    http://www.nuance.com/talk/

    (it seems like a Nuance is trying to bury this, but I think it is one of the best selling points of their product; I guess they just bought the Dragon company?)

    I think that I am actually getting faster results by dictating into windows than typing right now, but maybe you type at 100 words per minute?

  4. Ace Says:

    After some more experimenting, my initial flush of enthusiasm has given way to moderate disillusionment. There are a number of reasons why:

    1] Currently, I would estimate that the program has an accuracy rate of around 60% when transcribing my voice. Yes, Yoko, it is supposed to improve the more I work with it. But so far I haven’t seen much improvement (and frankly, a higher target percentage to start with would’ve been nice.)

    2] The list of things that can cause the transcription process to go other than as intended is not limited to the program misinterpreting what I’m saying. It also includes the computer failing to understand my speech or respond at all. And the computer identifying background noises such as the air-conditioner, the fan, the budgies chirping, or my breathing, coughing or sneezing as commands or speech (it particularly likes to interpret stray noises as the word “its”; if I do nothing and just look at the screen for the length of time it takes to read what I’ve dictated, it invariably tacks on some “its”-es for good measure). And my screwing up the command structure, causing the computer to interpret commands as dictation, or vice versa.

    3] Very little that I want to write takes the form of a straight-forward dictation. Or perhaps more accurately, I rarely know ahead of time precisely what I want to say. Instead I type and think and delete and revise and debate and struggle and type some more and think some more, and generally do just about anything but spin out a first draft in a single go, then go back and expertly hammer it into final form with a few swift revisions. I will grant that this is not terribly efficient. The writing teacher’s admonition not to get caught up in editing while you’re trying to nail down your first draft predates typewriters and word processors alike. But whether it’s good or bad, the reality for me is that some part of the clarification process is tied up in all the shuffling around, and working with voice makes it seem like either I have to do a lot of extra work, or that I have to do most of it in my head before I even start, which is crazy. I mean, c’mon: how often do you type a letter the content of which is as direct as, “Bob, I just found this great dictation program. Come on over and I’ll show it to you. Steve”?

    I don’t actually type all that quickly, Neuro– 40-60 wpm, according to a couple of on-line tests I took, and depending on the material– but when you factor in the time lost to all the things listed above on top of the dictation time, the typing still seems faster. To me, anyway.

    (BTW– did you know that the QWERTY keyboard we’re all using was originally designed in the late 1800s in order to slow typists down, because the early typewriters jammed if you struck keys too fast or struck adjacent keys in close succession? Experiments in the 1950s showed that there are far more efficient keyboard arrangements, ones that make it possible to type at double the speed with half the effort– but between the business schools and the typewriter manufacturers (and now the computer manufacturers), QWERTY is so entrenched that no-one wants to hazard the temporary disadvantages of trying to change it!)

    Also: take a closer look at that Dragon Naturally Speaking video, and you might notice a few other things. See any commas in the transcription? Or question marks? Or exclamation points? Any punctuation at all except a period? (Hear him say any sentences that would require them?) See anything in the environment except him and the mike and the screen? Like the network printer in the next cubicle running while he’s trying to talk? Or his co-workers punching him in the head because he keeps saying out loud over and over “Switch to Microsoft Office! Switch to Microsoft Office! Switch to Microsoft Office!”? ;)

  5. Neuro Says:

    OK, just for the heck of it I am going to dictate this response entirely using my windows expertise voice recognition system. I am also timing the amount of time it takes to speak this with my little timer program. I am speaking in a room with a large isolating box and running while I am typing, and so far this evening it has not been much of a problem. You’re mentioning that the birds chirping or random stray noises is causing your system to pick that up as a speech sound suggests that your microphone may be sent to hi. The Mike I am using is a very old and bad one that I bought from my friend and about 1989, but it seems to work fairly well on my laptop. OK, I hope this is a reasonable test, and I am going to now stop and check the time and we’ll be back in one second.

    (Ok, typing now. That took about 2 min to think/speak those 158 words. It got 6 words wrong (not counting capitalization), which I left in. This is ~96% accuracy at ~80wpm, w/ 1 training session, and a big box fan running. I think something is amiss with your setup. Did you configure the mic and do the practice? I am on Win XP, but I bet it is the same basic software on Vista)

    (And I agree that the process of writing is pretty tightly coupled with typing for many of us–see my next attempt).

    (Okay one more try but this time faster….)

    OK, here we go. Not willing to speak as quickly as I can within reason and hope that the program catches most of it. It is difficult as you said to write while speaking or to speak while writing and you can see that I’m getting a bit confused even trying to do it here. There is a strange feeling of racing against one’s own fonts when one knows that the words are being written down as quickly as I am able to generate them out of my mind and into my larynx. I’m sure the computer did not just catch the word larynx appropriately but I mean the voice box of the throat and to what tried out to finish up what I’ve been talking about.

    (Stats: 127 wds, ~1 min, ~6 errors = ~95% accuracy at ~127 wds/min. Interestingly, I said “racing against one’s own thoughts” but I almost prefer the mistake the software made!)

  6. Church Says:

    :)

    (/me watches the discussion from the sidelines.)

  7. Ace Says:

    Nice job with the quick reality check, Neuro. In that spirit, I experimented a little myself, and determined the following:

    1] I cannot directly test whether or not the recording level of the mike is too high, because there is no way I can control the recording level except by running Set Up Microphone under Speech Configuration options. I tried lowering it manually through the Properties and Levels tabs associated with the headset mike under Sounds in the Control Panel. The SR program ignored those settings. I verified it was ignoring them by returning to Control Panel> Sounds, reducing the recording level of the headset to zero and unchecking the boxes allowing programs to “exclusively use” the microphone, then reactivating the SR program. I also tried muting it with the mute button and then reactivating. In each case, the SR program overrode the settings, turned on the mike and returned the recording level to exactly what it was before. “Advanced Speech Settings” in the Speech Configuration panel is similarly useless, as it either sends you right back to Control Panel> Sounds, or right back to Set Up Microphone!

    2] As an indirect test, I ran Set Up Microphone under Speech Configuration options, and screamed the test sentence into the mike, trying to force it to turn the recording level down to compensate. The desired effect occurred, and the change in recording level persisted. It did seem to mitigate the “it’s” problem somewhat, but it did not totally eliminate it.

    3] The microphone, even at a recording level of 0 (which seems to be a misnomer, as it is apperently still recording sound when set there), is picking up the budgies. If I activate the Set Up Microphone function, hold my breath and sit there silently watching the VU meter, I can observe it spiking in time with them as they chirp– with the spikes filling from 20-40% of the meter! Apparently there are just certain noises they make that are exactly the right pitch or frequency to cut through all other extant noise, including my speech!

    4] I can not verify, however, that the budgies are responsible for the “it’s” phemonenon; some of the “it’s” appear in time with the chirps/spikes, while others seem to appear spontaneously, without any spikes.

    Frustrating.