Simon: Open-Source Speech Recognition: August 2009

Freitag, 28. August 2009

Calculator Plugin and Keyboard Plugin

Thanks to the "Österreichische Forschungsförderungsgesellschaft" (literal trans.: 'Austrian Researchfundingassociation') the SIMON listens team has been expanded with the two summer interns Mario Strametz and Dominik Neumeister.

After some general testing and getting to know the system, they are now working on two promising command plugins: A calculator plugin and a keyboard plugin.

The calculator plugin is a natural extension of the existing input-number-plugin.

As seen it is still quite basic but already usable to a certain extend. However, it is under heavy development and we expect first stable versions by the end of next week.

The calculator is - beside the obvious - also targeted to school kids doing their math homework so upon pressing ok it provides the option to not only write out the result but also the calculation leading up to it (e.g.: "1+1=2" instead of just "2"). The finished version will also include formatting options like formatting the output as an amount of money, etc.

At the same time, the two are working on a keyboard plugin (no screenshot there yet as development has just started). However, our "keyboard" will not only be a regular on screen keyboard.

The keyboard plugin will not have a fixed amount of fields (keys), nor will their values be fixed to that of a qwerty keyboard.

Instead, the user will be able configure them as he likes in configuration sets (sensible defaults will of course be provided) and even spread the keys out across multiple tabs.

While this just seems overly complicated on paper it makes advanced configurations possible with e.g. a text-snippet tab that combines his most often used text snippets or allows the user to add - for him - important special characters (e.g. Currency symbols for an accountant) right where he wants them.

I will update this blog as the development progresses so check back!

Publicity

Hi fellow readers! Long time no see!

As some of you might have seen, there was an article about simon on the dot. Thanks for Troy Unrau for making that happen!

The article spawned a lot of discussion and interested and several sites brought it up. Most notably the discussion on lwn focusing on the license issues. The article also hit digg (50 digs), osnews, several twitter/identi.ca feeds and a lot of blogs everywhere.

Of course this also showed on our download statistics. We had more downloads in the last week than we had in the whole month before that! The forum has also been noticeably busier than usual but the low number of support requests showed that the extensive documentation of simon 0.2 really helps a lot.

The simon homepage runs google analytics so there has been quite some interesting data about our (newly found) user base:

55% of all visitors were running GNU/Linux (Windows: 39%; Mac: 5%)

Our 5000 hits were spread out to 106 countries using 75 different languages; The most used languages were English (2500), German (1000), French (500), Chinese (200).

In the open source scene, firefox rules the browser battle (58%)

More people are using konqueror (9%) than Internet Explorer (7%) (of course this is because of the KDE-specific audience this month but I still found it interesting; konqueror was actually on 2nd place after firefox)

Donnerstag, 6. August 2009

sam

I already mentioned it in the last post: A new application has been added to the simon application suite: sam.

sam is targeted towards power users who want to tweak and improve their acoustic model manually to improve recognition rates even further.

sam will include a sophisticated testing framework to immediatly receive feedback on changes in the model configuration. In fact during optimizing models manually, I realized that IMHO a well working, automated model testing framework is the most essential part in manual optimization as it makes the impact of changes immediatly visisble.

In contrast to simon, sam will not hide any of the internal workings from the user (due to the different target group) so the logs of both the building and the testing of the model are displayed and the whole operation can be double-checked for errors or warnings.

An initial, working version is already available through SVN.

Selecting the input files:

Building the model:

Testing the model:

Test results:

As you can see, simon will run the recognition with the generated models on the trainingssamples to see if simon correctly recognizes their contents. The algorithm already recognizes and considers confidence scores of the recognition results which is why in the screenshot you can see the recognition rate of e.g. "NULL" not being 100% even tough every instance of it was recognized correctly (5/5).

Btw: This is a well trained, rather small model which really works very well in practice so don't be alarmed by the very high recognition rate...

Greetings,
Peter