I need to create application which takes some audio files (wav, mp3 or other) with speech (about two minutes) in which somebody says numbers (probably digit by digit). I need to be able to read all of those digits and write them into XML (or other format) file. Those are twelve-digits numbers where the last one digit is for correction. There would be about thirteen words which should be properly recognised (spoken by many different people) - digits from zero to nine and some commands like "start", "next numer", "end". It is not English language but Polish. Can you suggest me any tutorials / technologies which I can use?
Thanks very much for your help in advance!