Donnerstag, 30. Juli 2009

Look out - cool stuff coming your way!

Ok I have way to little time at the moment for simon development let alone regular blog updates.

However here is a quick overview of the latest updates:

  • simon can now import dictionaries to the active lexicon. While you obviously not want the whole BOMP or Voxforge Dictionary in your active dictionary it is a little step towards easy export and import of the speech model.

  • The URL to the BOMP has been corrected - they had moved.

  • simon can now import prompts files through the import training data wizard.

  • simon can now be launched through the ksimond context menu.

  • Some phoneme segmentation issues have been fixed.

And finally: A new application has been added to the simon suite: "sam".

sam stands for simon acoustic modeller and is an application targeted towards power users to tweak and test their speech models. Of course sam is nowhere near usable right now but the first lines of code have been written so I thought I should mention it here.

Montag, 20. Juli 2009

simon 0.3: One Week In

About a week ago, I announced the simon 0.2 stable release. Fueled by this milestone and a lot of positive feedback all around, simon 0.3 development has already started ... and is already showing results!

I'll start small: simon now supports a "Power Training" mode which starts the recording immediatly as the text to say is shown. The recording is then, upon preceding to the next page, automatically stopped, saved and the next one starts. This simple change really makes training of large texts a lot faster!

Ok but that alone is not blog worthy, right? Right! One of the most awaited features has made it's appearance: Confidence scores.

The recognition server now provides information about how confident it was on the recognition result

Moreover it also not only provides simon with the most likely result but with the ten most probable ones. simon now ranks them based on the recognition confidence and can ignore them if the recognition was just not sure enough (with a configurable threshold).

Now the cool part: If two results (or more) are very likely and simon can not determine which one you meant, simon will simply display a nice list from which you can select (of course with your voice) what you meant.

This looks like this:

The feature is already quite stable and works well in combination with other plugins. There are of course safeguards in place to prevent recursive "did-you-mean-popups".

Of course the confidence scores of the results are also relayed to the plugins and if they want to they can even retrieve the whole list of recognition results including the phonetic transcription of the result. This brings even more flexibility to the plugin developers without making plugin development more complicated (the base classes have appropriate implementations that you don't need to overwrite if you don't want the additional information).

If you are running a svn snapshot and are upgrading: You will need to manually copy the julius.jconf file from `kde4-config --prefix`/share/apps/simond/default.jconf to ~/.kde/share/apps/simond/models//active/julius.jconf (overwriting the old one) as simon(d) will not do that automatically.

Freitag, 10. Juli 2009

simon 0.2 released

Almost three years after the start of the development, the first stable version of the open source speech recognition suite simon has finally been released: simon 0.2 is ready for download.

With simon you can control your computer with your voice. You can open programs, URLs, type configurable text snippets, simulate shortcuts, control the mouse and more.

Because of simons architecture, it is not bound to a specific language and can be used with any dialect. It is also specifically designed to handle speech impairments which makes simon a viable alternative to conventional input methods for physically disabled people.

simon 0.2 is based off of the open source Julius speech recognition engine and the HTK (which - due to licensing restrictions - has to be installed seperately).

In comparison to the 0.1 series that never made it past alpha quality, simon 0.2 does not only bring stability improvements.

simon 0.2 is now based on KDE 4 and thus perfectly integrates in every KDE setup. This move also brings KIO to simon which allows for network transparency, transparent compression and more.

The seperate Juliusd application has been discontinued and replaced by the much advanced simond which features network audio streaming, centralized model management with automatic backups and more. simond is a command line application which makes it easy to set up a central simon server without the heavy X dependencies. For users of graphical environments the front-end ksimond has been introduced.

Moreover, the command architecture has been completely overhauled and now uses a much more flexible plugin architecture and supports individual triggers per plugin. New plugins include the list plugin (which can be used to display options), the composite plugin (similar to "macros"), a number input plugin and an artificial intelligence. Combined with the improved commands of previous simon versions this makes a total of 10 command plugins out of the box!

The import of the shadow dictionary now also supports PLS and SPHINX dictionaries which opens the door for dictionaries like the German GPL dictionary from Voxforge.

Because of the growing user base simon has been translated to English, German and French and also partly to Spanish, Dutch and Czech.

simon 0.2 is also the first version of simon ever to ship complete with an extensive user manual - available in English and German.

Next to the source package, the release is also available in convenient binary packages for 32-bit and 64-bit users of both GNU/Linux (Ubuntu and OpenSUSE) as well as Microsoft Windows operating systems and can be downloaded from the sourceforge project page.

Donnerstag, 9. Juli 2009

Two Final Issues

The last round of testing of the simon 0.2 codebase only resulted in two found bugs.

The first one is quite annoying in that it essentially limits simon functionality. The HTK does not like words that start with the character "'". That makes "words" like "'em" (short version of "them") fail during the model compilation with a confusing error message.

As I really don't want to mess with the wordlist code (we would have to escape special characters under certain conditions) so late in the development process, I delayed that fix for the 0.3 series. In the mean time just stay away from 's at the beginning of the word, please. Words like "that's" are no problem, tough (as the "'" is not at the beginning of the word).

The second bug was a rather strange one: Some people reported that over time, the recognition became slower and slower for them. All of the users that reported that bug were using Windows. During testing, I found out that using the pseudo device called "SoundMapper" (or similar) caused this - when using the hardware device everything was working. So if you experience this issue, please check that you use the appropriate hardware device instead of meta-devices.

For users that don't read the blog, I added entries for both problems in the troubleshooting guide on our wiki.

And yes, I know that these are hardly the last two bugs in the 0.2 code - but they are the last to be fixed before the stable release which makes them kinda special ... for me anyways :)