Mittwoch, 20. Mai 2009

Ubuntu 9.04 and simon

Ubuntu Jaunty Jackalope and simon are not exactly best friends. If you tried it, you most likely experienced some very strange sound issues.

simon hangs, something else hangs, everything hangs...

I installed Kubuntu 9.04 today and had the very same issues.

Turns out the problem is a pretty old portaudio version.
The current snapshot fixed all those problems for me. You can get it on the portaudio website. Download the pa_snapshot.tgz, compile and install.

I haven't yet found a pre-packaged debian package to spare you the compiling part. If you know one, please let me know in the comments.

Also, it worked for me but obviously YMMV. Still issues? Again: let me know in the comments.

Freitag, 15. Mai 2009

XML - Again...

Ok it seems my last blog post has triggered quite a reply.

However, I think there still seems to be a bit of confusion. Let's try again...

Why am I so into XML-based standards? Because I understand them.

Even if this was not your intention, this is a bit misleading as it suggests that I don't understand XML files and if I would, I would share your points of view.

I can assure you that I understand the principles of XML files very well. Hell, a lot of parts of simon even use XML files (take a look at how the commands are stored for example).

But at the moment it just doesn't make sense to use a custom PLS modification (adding terminal tags). We need the lexicon to be readable by julius and HTK. This implies that we have to store the lexicon in that format (or add support to Julius; HTK is essential closed-source so we would still have to keep a seperate HTK lexicon around). It essentially does not matter what I believe is the better format for the job.

So the only possible way to incorporate a XML based dictionary storage format would be to add an additional layer. This, however, means that the features supported can only be the smallest common denominator of both formats. So no fancy IPA (no support in HTK), no nice multiple-graphemes per word (HTK could be compared to 1NF if you are familiar with database normalization), etc. In the end this additional layer would bring nothing beneficial to the table because we can't use it's nice features as long as we have to keep HTK compliant too. All it would do is introduce another source for errors.

All these considerations are irrelevant when we take Julius and HTK out of the equation. Then, adopting and modifying PLS is not such a bad idea (altough I would like to store commands and the dictionary in the same file for the upcoming package-based structure). Removing the dependency on HTK is something I would like very much but it doesn't seem feasible right now and in the near future.

And, by the way: I don’t like to read SAMPA. I prefer the IPA when editing the pronouncing dictionary.

The HTK does not support UTF-8. However, I would prefer using the SAMPA even if it did. I find that it is much easier to read and learn the SAMPA (especially if you speak german). Also, I do prefer to be able to transcribe my words with the keyboard instead of using sign-tables to pick out the symbols.

As the IPA and X-Sampa can be converted to and from each other without loosing anything I don't really see a problem there.

Sometimes, I ask myself the question: why don’t they switch from SAMPA to the IPA? Why don’t they switch their homepage from ISO-8895-1 to UTF-8?

This somehow confused me a bit. Our homepage is UTF-8 encoded? So are all the files produced by simon (except where it is not possible because of third party products that don't support it)...

Then I saw that you linked to the SPHINX homepage and not to our homepage...

export functionality is a low priority feature

OK. From my point of view, Voxforge needs an export functionality.

This is especially confusing as I was talking about export functionality of simon. You talk about an export functionality from Voxforge which would be an import feature from simons point of view. And as I stated in my previous blog post this is something that I am indeed very interested in.

I don’t know about the exotic BOMP standard, I couldn’t find an entry in the Wikipedia. So I assume that BOMP is not a relevant standard.

BOMP is no standard at all. It is a dictionary following the HADIFIX "standard". HADIFIX is a speech synthesis project that uses phonetic dictionaries to know how to pronounce the words. Those dictionaries have to follow a specific format which could be called the "HADIFIX standard" (I have not found a definition of it anywhere).
The import functionality was implemented because a very large, high quality phonetic dictionary (the "BOMP" dictionary) exists following that format.

Simon allows me to record just single words, not utterances. I am not convinced by that concept.

I wouldn't be either. Fortunately, this is not true. Take a look at the Training module. You can easily import "normal" Texts. Try to input a text file containing this: "I am an utterance. And here comes another.". Even the standard examples shipped with simon contain sentences and not individual words, btw.

You see, there are several aspects. The world is not just about simon. It is about Voxforge, too.

Please don't lecture me. It is disrespectful and unnecessary. I am very grateful of the effort that Ken and all the contributers put into Voxforge and actively promote participation when people ask me about dictation with simon.

I am also investigating how to best use the voxforge model with simon and have stated on several occasions that I have intention to integrate the possibility to contribute to voxforge from within simon.

Followup 16.05.2009
Today I have been contacted by ralfherzog by e-Mail where he explained the misunderstanding.

Donnerstag, 14. Mai 2009

XML Standards: Clarification

One of the largest contributers to the german voxforge acoustic model and one of the main contributer to the german GPL lexicon called ralfherzog keeps posting about simons (missing) import / export functionalities in his "testing simon"-blog. Normally I answer him directly per mail but I think this warrants a blog entry as this might be interesting to other readers as well.

First off some facts:

  • simon does support importing PLS dictionaries

  • simon does not support any explicit export functionalities what-so-ever. There are no export functions for the training data, the lexicon, vocabulary or anything.

  • simon does not support the import of training data based on a supplied prompts file - be that in plain text or XML.



None of those missing features are due to idelogical reasons but mostly due to time constraints. However, I am not as convinced as ralfherzog that they are that essential.

As far as I know, simon is the only application using PLS dictionaries so an export functionality is a low priority feature. The same goes for the training data. An integration with voxforge is planned for the future which would in my opinion be the only practical use case for export features right now anyways.

Some might wonder why we don't use PLS as the default dictionary format in the first place but the answer is very simple. The PLS standard does not allow for any terminal information to be stored with the dictionary. The current storage format is a standard Julius vocabulary file and an accompanying HTK dictionary. Those are the respective file formats of the underlying components and as they are not (yet) exchangeable I see no reason to introduce new file formats.

The import of training data is something that is included in simon 0.2 but only in a very basic form. Its current state is usable if you have training data gathered by a previous simon installation. However, everything else is not yet supported. I would personally like to see importing of a "normal" HTK prompts file but don't see the advantage in SSML. SSML is not designed for that paticular usage and just introduces unnescessairy overhead. Yes, content validation is a nice thing that makes XML a very good choice for many, many things but prompts are imho not one of them. So maybe we might see a import function for SSML formated prompts for data that is already gathered and stored in that format but making it the primary storage format of prompts in simon is probably not going to happen anytime soon. Its the same as with PLS: HTK expects the prompts in that format so why introduce an additional source of errors by introducing another conversion step?

Homepage

We finally decided to bring our homepage up to speed.

Much of the information on it was (and still is) outdated and sometimes even plain wrong. So we decided to restructure it a bit to make the core information more accessible to the new use and to get rid of the outdated content.

Obviously, there is still a lot to do. But even just after the new menu was implemented, it is already much easier to find what you are looking for.

Personally, I don't like the external links to the wiki. The howtos, tips & tricks, etc. are still on the simon wiki but are linked from the main homepage. This of course is a bit confusing when you click a link in the sidebar and suddenly end up on a completely different homepage. However, we don't want to miss the advantages of having the (ever changing) content on our wiki as it is much easier to update.

So dear lazyweb: Is there a convenient solution (like a mediawiki plugin) for embedding wiki content in a typo3 page?

Ideas and of course feedback for the homepage would be much appreciated!

Microblogging

Yes I finally gave in to all that peer pressure :)

You can now follow me on identi.ca/bedahr.