Pages

Tuesday, March 29, 2016

A review of Google's voice typing feature

It was yesterday when I first saw that Google Docs now has a voice typing feature. After doing some research, I saw that the feature which had been available since September, apparently. I've been playing with it for multiple hours now, and this is what I think.


Let me add a little background. I do a lot of typing, so I've tried a lot of audio transcription programs.   Dragon Naturally Speaking, Windows 7 and Windows 8 built in transcription solutions, and a few cloud-based ones. They all seem to be surprisingly just on the other side of useful for me. In fact I abandoned the idea of using voice transcription software sometime last year, quick as I picked the idea up. I did end up using voice recording for writing, transcribing it myself, manually.  While I feel like that might have ended up with a better end product, giving me more options to edit as I entered the text, it did end up taking a lot more time than just typing out whatever I was working on from the get-go.


And just a tiny little digression, when did it start being called Google Docs again? Remember when they changed the name to Drive? I was pretty sure the Google Docs Android app's name was changed to Drive too, but now it looks like there are two different apps, one for Docs and one for Drive.

So I've used it for a few hours now, what do I think? Like any other audio transcription engine, Google's voice typing is pretty annoying. When it's wrong, it's annoyingly wrong. One example of this annoyance, is the word "useful". Even though Google's voice typing feature entered the word "useful", it's currently underlined in red. That usually denotes a spelling or grammar error. I right click on the word, and Docs suggests that "useful" needs another "L" at the end, to become "usefull". Very strange, especially because useful was used many other times in this paragraph alone and no other instance has that suggested correction.


One reason I had fairly high hopes for Google's transcription service, well there are two reasons actually. The first reason is its grammar checking is pretty spot-on. The second reason is the voice recognition in Android is pretty good too. In truth Google Docs voice typing feature seems to be incredibly smart. Perhaps too smart for its own good sometimes. For instance, if I end a sentence with the phrase "you know?" After that it will try to understand why the sentence is a question, sometimes making some very weird changed. Changing sorry to Sony in one instance, as if that would somehow have the sentence make more sense as a question.


To test speed, I typed in something from my phone's bookmarks, Status 451's entry on "Budgeting for Millennials".  Here's what it came up with:
Personal finance is hard. Most of the people I know struggle with it. It's one of the most important skills of Modern Life, and yet there aren't very many resources to help you learn it. This is one of them.


I like Simple Solutions. Simple Solutions are real of the best, but they're simple. Simplicity makes them easier to understand, easier to act on Simple Solutions are good in the blood Feud with the perfect.
I feel like I read this aloud fairly fast, as fast as I could while still being understood. The actual words chosen are dead-on, perfect. But notice all the strange capitalizations. "Simple Solutions", "Modern Life". Are these brand names that I'm not aware of? Is there any reason why Google feels like these phrases should be capitalized? But the words are right and that's the biggest deal. It's much easier when proofreading to change an uppercase letter to a lowercase letter than it is trying to understand what you actually meant when your voice recognition software wrote:  "banana dollars trenchcoat diaper".   That's not to say that the software doesn't get any words wrong, It actually got the word "wrote" wrong.   I said "voice recognition software wrote", and it typed in "voice recognition software throat".  And then as I was entering all that text in, it got another typo. It's by no means perfect, shocking as that may be.


It also does something that a lot of other voice recognition programs do, which is it highlights words that it thinks it might have gone wrong. Instead of a red underline as it uses for proofreading or grammatical errors, it underlines words with a faint dotted line. Right clicking on that word gives you a few other suggestions. I haven't had any typos so off base that I couldn't figure out what I meant to say, but I am curious to see whether or not those suggestions will actually be able to jog my memory on what I thought I said when that eventually happens.

I will also point out however that there a few times when it makes choices as to how it interprets my speech, for instance my using the phrase "one for" above was interpreted as the number 1 and the number 4, which I'll go into later.  There was no sort of highlighting there that might have let me choose any of other options. And I know there were other options because I had used "one for" multiple times in that sentence and it interpreted it in a completely different way both times, as you'll see near the bottom of this post.


When you are using the voice typing feature, it puts symbols like dots and colons in your current line, a placeholder to let you know that it is thinking about what words you used. It actually works really well, because if you don't use the microphone for long enough it'll shut itself off, and that way you know immediately that it's not working when you start talking.  You can see that those little symbols aren't there so clearly it's turned off.  It's a very clever design, I've had that problem with other software turning off and not letting me know about it.


After using the software for a few hours, I realized that it definitely works better if you speak in whole sentences. Maybe this is something that I've done wrong in other software as well, but it seems that if I pause  every few words to check its work, it makes more mistakes. The reason why I pause after every few words is to wait for it to finish thinking and see what words it enters into the text field. The problem is some mistakes don't get corrected until the very end of the sentence.  Sometimes even after I finish a sentence and start on the next one, it turns the whole previous sentence into those weird colon-like "thinking" symbols, and when it decides on a new phrasing it usually is more correct than it was before.


Instead of making you all suffer through my further impressions in paragraph form, I will go ahead and put them in a bulleted list: (it read "further" here as "brother" originally, but by the end of the sentence it had replaced it on it's own with "further", which is in fact what I said/meant.)
  •  As you can guess because it's Google, no swearing is allowed. "Fucking" is "F******" as spelled by the voice keyboard. There's a worse word that it won't even present that way, can you guess what it is? I'll give for a clue, Docs tried to substitute the word "account". This strange prudery is somehow creepier to me than the data mining I know Google is currently performing on all my documents. No option to enable swearing that I could find, and to be honest enabling swearing on its Android keyboard still results in the keyboard doing everything possible to not type those words.
  • Capitalization seems to be the bane of Docs' existence. I think I mentioned in my Docs proofreading review that it doesn't highlight proper names that aren't capitalized as incorrect. Starting fresh on a newline sometimes doesn't result in a capitalized first word.  Sometimes resuming in the middle of a sentence does result in a capitalized new word.  And as I said above, random words and phrases are sometimes capitalized for no discernible reason. And that happened a lot more, for a lot more words than I'll mention here. "Fairly" for instance, and "Behavior".
  • I said "comma" and it added a comma (","), as I wanted it to.  Then it quickly changed it to the word "comma" for some reason.  This happens a lot.  When I'm not writing about transcription services, I rarely if ever use the word "comma".
  • It seems to struggle, or at least pause, when I give it commands such as "select this phrase" or "delete that word". It's clearly trying to get context, is he still dictating text or is that a command? Why don't we make up a nonsense word and use that to signal voice commands? The times when saying "delete that" does work (and sometimes it just types the words "delete that") it takes too long.
  • Above when I said there were two Android apps, "one for Docs and one for Drive" I got this: "14 docs and 1/4 Drive."  Usually when I say Docs or Drive without saying "Google" in front of them they're lowercase, another inconsistency.
  • While the voice typing system does include some basic punctuation, it cannot insert quotation marks or parentheses. Parentheses I can understand, but quotation marks are pretty big deal in my world.  No way to insert spaces, but it does insert a hyphen when asked.  It inserted an actual colon once, and inserted the word every other time I tried.
  • For a while I had to use OCR programs at work to scan in documents, and I got really frustrated with it trying to turn quotation marks into superscript ones or twos. I was shocked that professional OCR software did not give the option to exclude superscript entirely from the working set of characters, or to delete certain characters out of the set. Consequently, I wish that I could tell Google Docs to stop inserting the word "comma". Another word it keeps trying to insert is Kama. I do not know what "Kama" is, I have never used it before, so it's a little frustrating when I keep seeing it popping up in my bodies of text.
  • I will also note that I added a two-word name, a screen name, to my personal dictionary and the voice typing system after a little more playing recognized it pretty well, it wasn't always capitalized correctly because it does consist of two common words, but it inserted it in surprisingly well.
  • Spelling words does work fairly well, it does have problems with some letters. It constantly wants to hear the letter I as the letter A.   Even if I do a fairly good job, in my opinion, of spacing them out and enunciating each letter properly.   So "lisp" becomes "lasp", and then after the sentence is finished it thinks again, sometimes turning it into something sillier like "lispd".  I have yet to figure out how to get it to do capital letters on voice command.
  • Google may not like my vocabulary, it had problems with terse and interject. It got interject when I tried it a second time but still doesn't seem to be able to work out terse. And when it didn't get those words right, it also didn't have them suggested in the context menu so it was really off. 
  • I can't just import an audio recording and have the software analyze it that way. That was one of Dragon's features which I liked a lot.
  • It's not available in the Android version of Docs. I tried it in Chrome on Android, I was able to enable it in desktop mode, but got strange results that makes me think it does some weird things to the microphone channels, perhaps using every microphone on the phone at once (which could be three mics: one on my headset, the primary mic, and a noise cancelling mic).
So I've used this voice typing feature to type most of this blog post, though I also did a manual editing pass with a physical keyboard.   The best thing I can say about this software is that after sinking this amount of time into other solutions I uninstalled them immediately. I was sick and tired of them. Dragon Naturally Speaking is a pretty good piece of software, but I just couldn't get over the few problems I had with it to actually use it productively. I could see myself using Docs for this on a regular basis. It's not perfect, and I'm not sure if this is the sort of thing where the more I use it the better it will get, but it seems to work pretty well.


I can't say that this voice typing feature doesn't alleviate the biggest problem with using voice  typing, you have to pay too much attention to what's going on on the screen, distracting you from what you need to say next. It's a very low trust process. I almost wish it were possible to use speech from an audio file, replaying the audio file and editing the document as I went. I still think that would be faster and easier on the hands than typing everything out manually, and it would move a lot of the trust issues out of the equation.

I have a few other ideas on how to use this software while not having to worry about trusting it so much. If I have any pronounced success with it I may report on it later.


Till then,

David

No comments:

Post a Comment