The Contenders

When I first made the decision to try out speech recognition, there was an overwhelming favorite on the market: Dragon NaturallySpeaking. I had never used it before, but I'd heard about it and it was generally well-regarded. I picked up a copy of Dragon NaturallySpeaking 8.0 Preferred and commenced using it. The training process took about 20 minutes, another 20 or 30 minutes was spent scanning my documents for words and speech patterns, and then it was basically done and I was ready to start dictating. I've now been using Dragon NaturallySpeaking for several months, and during that time training has further improved the accuracy.

Dragon isn't a particularly cheap piece of software, but when you consider the versatility it offers and the fact that I've already spent about $700 on a desk, chair, and keyboard in an attempt to make an "ergonomic workspace," spending another $100-$200 is hardly a concern. The $100 Standard version apparently has reduced functionality, though apparently the only major difference is that it lacks the ability to transcribe recordings. For home use and personal use, you can get a discount on the Preferred version and buy it for $160. Unless that extra $60 is really important to you, I would have to recommend going with the Preferred version -- you never know when the ability to transcrive a recording will come in handy.

Of course, Microsoft Office 2003 also has built-in speech recognition. I have never heard anyone really talk about it, and I have never tried it myself, but having become familiar with Dragon NaturallySpeaking I figured it was only fair that I give Microsoft's product a shot. After all, practically every business in the world has a copy of Microsoft Office 2003 installed, so perhaps there isn't even a need to go out and purchase separate speech recognition software. One other item that may be of interest is how much processing time each product needs. Voice recognition may or may not benefit from dual core processors, but there's only one way to find out.

I conducted testing on several systems, but eventually settled on using one for the actual benchmarking. If there's interest, I can go back and look at performance on other systems, but for the most part I have found that modern Pentium/Athlon systems are sufficient - with a few exceptions that I'll get to in a moment.


Test System:
AMD Athlon X2 3800+ @ 2.60 GHz (10x260HTT)
2x1024MB Patriot SBLK @ DDR-433 (CPU/12)
Western Digital 250GB 16MB SATA-2 HDD


I began using Dragon NaturallySpeaking on a single core Athlon 64 3200+ socket 754 Newcastle (@2.42 GHz) -- my old primary system, which I have been using for about 18 months. I finally broke down recently and decided it was time to move on to a dual core setup for my main system. Both systems are of course overclocked, because that's the type of user I am. Since this is a look at a software technology as opposed to a hardware article, the system clock speed isn't particularly relevant except as a guideline of what level of performance you can expect.

The major reason for the upgrade is gaming - the old AGP 6800GT wasn't cutting it anymore, and the only reasonable upgrade required PCI Express. (That should tell you something about the amount of processing power most business tasks require - the 754 platform is still more than sufficient for most people!) I figured since I was already switching to socket 939, there was no reason not to add a second processor core. That extra core does help out when I'm trying to do multiple things at once, and Dragon does tend to consume a decent amount of resources. MMO gamers might find it useful as a way of chatting without having to type (and it might just cut down on the use of annoying abbreviations if more people did it, but I digress...). When I'm only dictating, though, I don't really notice the difference between my old system and my new system as far as speech recognition is concerned.

So how do you test and benchmark speech recognition packages? The more real world a test is the better, and what could be more real world than an article written for our web site? How about this very article? I'm going to take the first two pages of the article in their present form (minus the Isaac Asimov quote and potentially some later edits) and dictate the text into a sound file. All punctuation will be dictated, and I will edit the final sound file to remove any speech errors. The final sound file will be played back for both speech recognition packages, and with 1181 words of text we can come up with an accuracy rating.

This first sound file is basically my "dictation voice". There are two elements to training a speech recognition program: first, it learns to recognize your voice; second, you learn to adapt your voice to improve accuracy. After creating this first sound file, I realized that my voice didn't sound very normal to me. I'm okay with that, but I decided a second sound file was needed to stress test the software packages. I read the text a second time for this sound file, with a few minor updates to the text, but this time I spoke in a more natural voice and I didn't go back to correct any errors. I won't count any of my errors against the accuracy score, but this will hopefully provide additional insight into how these two voice recognition packages perform.

Health Considerations Accuracy Testing
Comments Locked

38 Comments

View All Comments

  • Googer - Saturday, April 22, 2006 - link

    BMW 7 series Speech recognition is about 50-75% accurate (my guess) and some users have more luck with it than others.
  • Googer - Friday, April 21, 2006 - link

    I think you should re-benchmark these on a system that is not overclocked. Overclocking may have contibuted to errouneous test results. It is possible that some of the benchmarks could have been better on a normal system. Also I am surprised this was not tested on a Intel Syststem. Prehaps one of the programs may benefit from the Netburst Architeture with or with out dual core.


    Also I would love to download the Dication and Normal Voice wav files, so I can understand the differance between them. Thanks for the article, it came in perfect time; Someone who is handicaped was asking me about this last night.
  • JarredWalton - Friday, April 21, 2006 - link

    I'll see about putting up some MP3s of the wave files -- of course, that will open the door for all of you to make fun of how I speak. LOL

    In case this wasn't entirely clear in article, this was all done on my system that I use every day for work. It's overclocked, and it's been that way for six months. I run stress tests (Folding at Home -- on both cores) all the time. I would be very surprised if the overclock has done anything to affect accuracy, especially considering that I did run some tests on a couple other systems that were not overclocked, and basically removed them from this article because they would have simply taken more time to put in the article, and they didn't give me any new information.

    It's pretty obvious that neither of these algorithms benefit from multiple processing cores -- HyperThreading, dual core, SMP, whatever. I also wasn't sure how much interest there would be from people in this topic, but if a lot of people want to know how this runs on Intel systems I could go back and look at one. One thing worth noting is that SysMark 2004 does include Dragon NaturallySpeaking version 6.5 as one of the tests. Of course, the results are buried in the composite scores.
  • JarredWalton - Friday, April 21, 2006 - link

    MP3 links available:

    http://www.anandtech.com/multimedia/showdoc.aspx?i...">http://www.anandtech.com/multimedia/showdoc.aspx?i...

    Note that DNS only uses WAV files (AFAICT), but uploading 45MB WAV files seems pointless. Convert them to WAVs if you want to try them with Dragon.
  • Googer - Saturday, April 22, 2006 - link

    Excellant job on the dictation/wav files, you are a very good reader and have a nice clear and concice voice. ;ThumbsUP)
  • stelleg151 - Friday, April 21, 2006 - link

    Cool article. I hope that voice recognition continues to improve, for I think it could be incredibly useful for areas like HTPC, or as you said messenging while doing other things (gaming).
  • Zerhyn - Friday, April 21, 2006 - link

    Have you ever tried out speech recognition and been underwhelmed? To you yearn to play the role of Scotty and call out..

    ?
  • PrinceGaz - Friday, April 21, 2006 - link

    Yes, that was the first thing I noticed before I even started reading the article. Maybe they used speech-recognition software to enter that.

    I think they should have an editor (or at least let another contributor read what others have written) who has to approve an article before it goes live as the current number of tyops is unforgiveable ;)
  • JarredWalton - Friday, April 21, 2006 - link

    I'm doing my best to catch typos before anything goes live, but after being up all night trying to finish off this article, I went to post and realized I didn't have a title or intro. So, I put one in using Dragon, but my diction goes to put when I'm tired, as does my eyesight and proofing ability. One typo in a 44 word intro (I didn't proof/edit it at all) isn't too bad for the software. Bad for me? Maybe, but mistakes do happpen. :)
  • johnsonx - Friday, April 21, 2006 - link

    One nice thing about Dragon, despite the high CPU utilization shown in the article, is that it will run quite happily with very lowly systems. I have a customer who uses it all day long on PentiumIII-850's with only 512Mb RAM (the max for those particular systems). The heaviest user there recently upgraded to a low-end Sempron64 with a gig of RAM, and he says the overall system is far more responsive (of course), but Dragon's operation isn't radically better; it worked great on the PIII, and works great now.

Log in

Don't have an account? Sign up now