================ General ========================= Q: Which measure should I use, Accuracy or WER? A: Either is fine (as they carry the same information), but use one and do no mix. Q: How is Accuracy calculated? A: See Section 17.19.1 of the HTK manual. Q: I've got a message of "No token survived". A: This means HVite failed to find the best path, and it does not give any recognition output for that input. The problem is that HResults ignore the corresponding recognition output from the analysis, and it does not give accurate statistics as a result. This does not necessary mean you should always discard the recognition experiment - it would be ok if the number of the errors is very small (e.g. 1) as long as you understand the risk described below. It would be a good idea to know how much amount of influence the error could have on the word accuracy. The following is a simulation. We have two test data sets, "si_dt5a.scp" (368 utterances) and "si_dt5a-div3.scp" (123 utterances) which is called "small". Assuming that N utterances were not recognised because of the error and the rest were recognised without errors, HResults will will give you the accuracy of 100%. The actual (worst) accuracy when the unrecognised utterances are taken into account is given as follows: test set \ N | 1 5 10 ---------------------------------- si_dt5a | 99.7 98.6 97.2 si_dt5a-div3 | 99.2 95.9 91.9 ---------------------------------- (where I assume all the utterances have the same length (in the number of words)) Q: How can I try experiments using different feature vectors? A: You need to calculate new feature vectors from speech wave files. Sample scripts that were used to obtain the current data can be found in Org/scripts directory. They are run_wave2featurevectors.sh wave2featurevectors.sh Q: I got an error message something like "models/R9/hmm11/MODELS" already exists. A: That was caused because your training script tried to overwrite the existing model. As a result, training is unfinished. There will be two options to resolve this. Option A: remove the existing model and its directory. Option B: Rename the directory of either the existing model or the new one. ================= Monophone models ======================= ================= Triphone models ======================= Q: How can I obtain the number of clusters? A: See logs/R9/log-hhed Q: How can I change the number of tied-state triphone models? A: You cannot specify the number directly, but you can control it with the "TB" value. See Section 17.8.1 of the HTK manual for details. Q: What range of TB should I try? A: One extreme case would be a TB which results in a similar number of clusters to the one of monophone models. (NB: clustering is done state wise rather than phone model wise) Q: In decision tree-based tied-state triphone models, how can I write a script that carries out sets of experiments (clustering, training, and recognition) for different clustering thresholds? (21/Mar) A: See ShellScriptExamples/ex-tied-triphones.sh as an example. ================= MLLR-based speaker adaptation ========= Q: Should the number of regression classes be the power of 2? A: No, it can be a any natural number. This is because clustering proceeds by splitting a node into two, meaning each splitting increases the number of leaf nodes by 1. (See HTK Manual section 9.1.4 and 10.7) Q: I set the number of regression classes to 8, only to find 4 classes. Why? A: It suggests that only 4 leaf nodes (clusters) got sufficient amount of data, and the others did not. (See HTK Manual section 9.1.4 and 10.7) If this is the case, you will need to use more complex HMMs ( e.g. those with more Gaussian mixtures, those with more contexts) to carry out an experiment in which larger numbers of regression classes are tried. * The same thing can happen when you try to increase the number of Gaussian mixture components. Q: In MLLR speaker adaptation, how can I change the size of adaptation data? A: Use a subset of adaptation data defined in file_lists/dev-01.scp. For this, create another list, e.g. file_lists/dev-01-subset.scp into which you copy a subset of file_lists/dev-01.scp. Recall speaker number is included in the data file name. For example, it is "c31" for c3la010b.mfc. In file_lists/dev-01.scp, there are 20 speakers, each of which gives 18 utterances.