slackbuilds_ponce/desktop/simon/README.setup

50 lines
2.5 KiB
Text
Raw Normal View History

You may want to install the Hidden Markov Model Toolkit (HTK) which is
covered by a license which does not permit free distribution. However,
you need HTK if you want to train your acoustic model. You can obtain
HTK from here (but only after registering): http://htk.eng.cam.ac.uk/
If you are creating solutions which will be used by more than one user, or
simply don't have the time to train the system, you can use static base models.
Static models are used as-is and are not modified by simon in any way.
Because of this, it is important that the selected base model matches your
voice as closely as possible.
Even if you use a static model, you NEED to get an acoustic model from the web.
You can download some prebuilt models at http://www.voxforge.org/
BEGINNER GUIDE:
If you are a beginner and you don't know exactly how a speech recognition works,
but want just to enable this "cool feature", you may want to follow these steps
(static model), in order to make simon operative (English).
This is to help you to your first approach to this program, next you will
be able to customize more and more!
0. Browse acoustic models from
http://www.repository.voxforge1.org/downloads/Nightly_Builds/current/
Download "HTK_AcousticModel-2010-12-16_16kHz_16bit_MFCC_O_D.tgz"
1. Uncompress the model where you want.
2. Run "ksimond" (not from root). You need to have the daemon simond running.
3. Configure "simond". (ksimond -> configuration -> simond). Add a username and a
password which are going to be used by simon.
4. Run "simon". An assistant will appear. Click "Next" once to jump to "Scenarios"
section of the assistant.
5. Get some scenario. You need at least one, download a scenario in English.
6. Configure base model. Choose "Static model" type. From the uncompressed acoustic
model of step 1 choose:
- "hmmdefs" file for HMM definition
- "tiedlist" file for Tiedlist
- "macros" file for Macros
- "stats" file for Stats
Click "Ok", then "Next".
7. "Server" and "Sound devices" sections configuration depends on what hardware
and software you're going to use.
You can safely just press "Next" to leave them unchanged.
8. Adjust the volume of your microphone (or any input device you're going to use)
I suggest you to get the rumor at few percentage (3%-4%) and to get
"Volume correct" while speaking (I boosted my microphone for that)
9. Optionally perform a training of speechable texts of your scenario to put your
voice in training data for a better recognition.
10. Speak!