Samstag, 17. April 2010

Goodbye Portaudio! Long live QtMultimedia!

The sound stack of simon was long a source of many issues. This was mainly because it relied on portaudio which sadly isn't supported that well by the sound configuration of e.g. Ubuntu because it interferes with their Pulseaudio setup. Long story short: Users of Ubuntu often had completely unusable simon installations because it crashed often and seemingly at random. Because those crashes happened in portaudios code and not in simons, there was little for us to do.

In the last week, I finally found some time and threw out all the old sound handling code and replaced it by a completely new, QtMultimedia based system. QtMultimedia is still a very young library and too has issues but I suspect that those will get fixed pretty quickly.

While I was at it, I also implemented a much cleaner way to stream audio to simond. Older versions used Julius libsent to do this because of their voice activity detection implementation. We now implemented a similar system (configurable, level based voice activity detection) in simon and now have complete control over the audio stream. Because of the new implementation I also implemented the feature to keep recognition samples - complete with their recognition results - on the server. This could for example be used to gather training data during normal usage. All you'd need to do is check if the words were correctly recognized and add them to the model.

Because all sound in/output is handled through a central point, I implemented a quite primitive sound server that will handle multiple simultanious streams correctly. Recordings while simon is activated will now work much faster (because the sound device handle will simply stay open) and are of course completely stable. You even get automatic pausing / unpausing for interrupted streams (If you for example start to record one sample, while recording this one start to record another sample the first one will pause until you are done recording the second).

The new implementation also has a much better level meter integrated into the recording widget so you can check your current microphone volume while you record. If you start to clip, simon will now automatically display a warning message telling you to re-record the sample.

Btw, QtMultimedia also works e.g. on Symbian devices so a simond client on a mobile phone should be trivial now.

All this has already been merged to the master branch and works very well in my tests. However, just like any new code it might contain bugs so try it at your own risk :).

Sonntag, 11. April 2010

Usability

Considering that simon was designed to be as easy as possible, someone who just downloads and installs simon might say that we failed.

To many new users the concepts behind simon are - at first - too complicated and simply getting the recognition to work seems needlessly hard.

However, those that stick with it seem to "get" the ui pretty soon and it proves very powerful for expert users.

This is why one of the goals for 0.3 was to make this initial learning curve as flat as possible at get simon up and running quickly.

After I released the first alpha of 0.3 about a month ago, I posted a review request to the KDE Usability mailing list asking them for ideas of how to improve our interface. I got great feedback (thanks!) and it quickly became clear that it would be best if simon provided an assistant on the first start that would guide new users through an initial setup.

Some users might remember this concept from 0.1. Back then we had a first run wizard but it took ages to complete because it consited of dozens of pages with quite complicated instructions.

In 0.3 however, with the introduction of scenarios and base models we designed another such wizard. It now includes just 5 pages (including welcome and finish pages) and includes short but precise instructions for every step. If the users follows through, they will be rewarded with a completely functional simon within minutes.



We are still fine tuning the wizard (updating the descriptions, everything is already fully functional) which is why it still resides in its own branch ("hci") for those of you who want to check it out.

The wizard will be included in the next release.

Thanks again to the KDE usability team for their valuable input!