A review of the SPHINX Discourse Acknowledgment Framework

0
0
1608 days ago, 621 views
PowerPoint PPT Presentation
SPHINX is a framework that exhibits the practicality of precision, expansive vocabulary ... SPHINX accomplished word correctnesses of 71, 94, and 96 percent on a 997-word assignment. ...

Presentation Transcript

Slide 1

A diagram of the SPHINX Speech Recognition System Jie Zhou, Zheng Gong Lingli Wang, Tiantian Ding M.Sc in CMHE Spoken Language Processing Module Presentation of the discourse acknowledgment framework 27 th , February 2004 Presentation for the discourse acknowledgment framework

Slide 2

Abstract SPHINX is a framework that shows the achievability of precision, substantial vocabulary speaker-free, consistent discourse acknowledgment. SPHINX depends on discrete shrouded Markov model (HMM's) with LPC-determined parameters. To give speaker autonomy To manage co-explanation in constant discourse Adequately speak to an expansive vocabulary SPHINX accomplished word correctnesses of 71, 94, and 96 percent on a 997-word errand. Presentation for the discourse acknowledgment framework

Slide 3

Introduction SPHINX is a framework that tries to defeat three imperatives: 1) Speaker subordinate 2) Isolated words 3) Small vocabulary Presentation for the discourse acknowledgment framework

Slide 4

Introduction Speaker free Train on less proper preparing information Many more information can be obtained which may make up for the less fitting preparing material Continuous discourse acknowledgment's troubles Word limits are hard to find Coarticulatory impacts are much more grounded in constant discourse Content words are frequently underlined , while work words are ineffectively verbalized Large vocabulary 1000 words or more Presentation for the discourse acknowledgment framework

Slide 5

Introduction To enhance speaker autonomy Presented extra learning using various vector quantized codebooks Enhance the recognizer with deliberately composed models and word length demonstrating. To manage coarticulation in persistent discourse Function-word-subordinate telephone models Generalized triphone models SPHINX accomplished speaker-autonomous word acknowledgment exactnesses of 71, 94 and 96 percent on the 997 word DARPA asset administration assignment with punctuations of perplexity 997, 60 and 20. Presentation for the discourse acknowledgment framework

Slide 6

The gauge SPHINX framework This framework utilizes standard HMM procedures Speech Processing Sample rate 16KHz Frame traverse 20ms, every casing cover 10ms Each casing is duplicated by Hamming window Computing the LPC coefficients 12 LPC-inferred cepstral coefficients are got 12 LPC cepstrum coefficient are vector quantized into one of 256 model vectors Presentation for the discourse acknowledgment framework

Slide 7

Task and Database The asset Management assignment SHPINX was assessed on the DARPA asset administration errand Three troublesome sentence structures are utilized with SPHINX Null language structure (perplexity 997) Word-combine linguistic use (perplexity 60) Bigram syntax (perplexity 20) The TIRM Database 80 "preparing" speakers 40 "improvement test" speakers 40 "assessment" speakers Presentation for the discourse acknowledgment framework

Slide 8

Task and Database Phonetic Hidden Markov Models HMM's are parametric models especially reasonable for depicting discourse occasions. Each HMM speaks to a telephone An aggregate number of 46 telephones in English {s}: an arrangement of states {a ij }: an arrangement of moves where an ij is the likelihood of move from state i to state j {b ij (k)}: the yield likelihood framework Phonetic HMM's topology figure Presentation for the discourse acknowledgment framework

Slide 9

Phonetic HMM's topology Presentation for the discourse acknowledgment framework

Slide 10

Task and Database Training An arrangement of 46 telephone models was utilized to instate the parameters. Ran the forward-in reverse calculation on the asset administration preparing sentences. Make a sentence show from word models, which were thusly connected from telephone models. The prepared move likelihood are utilized straightforwardly as a part of acknowledgment The yield probabilities are smoothed with a uniform dispersion The SPHINX acknowledgment pursuit is a standard time-synchronous Viterbi pillar look. Presentation for the discourse acknowledgment framework

Slide 11

Task and Database The outcomes with the gauge SPHINX framework, utilizing 15 new speakers with 10 sentences each for assessment are appeared in table I. Gauge framework is insufficient for any reasonable vast vocabulary applications, without consolidating information and relevant demonstrating Presentation for the discourse acknowledgment framework

Slide 12

Adding learning to SPHINX Fixed-Width Speech Parameters Lexical/Phonological Improvements Word Duration Modeling Results Presentation for the discourse acknowledgment framework

Slide 13

Fixed-Width Speech Parameter Bilinear Transform on the Cepstrum Coefficients Differenced Cepstrum Coefficients Power and Differenced Power Integrating Fixed-Width Parameters in Multiple Codebooks Presentation for the discourse acknowledgment framework

Slide 14

Lexical/Phonological Improvements This arrangement of changes included the adjustment of the arrangement of telephones and the elocution lexicon. These progressions prompt to more precise presumptions about how words are verbalized, without changing our suspicion that every word has a solitary pronunciation.  The initial step we took was to supplant the baseform elocution with the probably pronunciation.  to enhance the propriety of the word articulation lexicon, a little arrangement of standards was made to adjust conclusion stop sets into discretionary compound telephones when fitting alter/t/'s and/d/'s into/dx/when suitable decrease nasal/t/'s when proper perform different mappings, for example,/t s/to/ts/.  Finally, there is the issue of what HMM topology is ideal for telephones by and large, and what topology is ideal for every telephone. Presentation for the discourse acknowledgment framework

Slide 15

Word Duration Modeling HMM's model span of occasions with move probabilities, which prompt to a geometric circulation for the length of state habitation. We joined word term into SPHINX as a part of the Viterbi look. The span of a word is displayed by a univariate Gaussian conveyance, with the mean and change assessed from an administered Viterbi division of the preparation set. Presentation for the discourse acknowledgment framework

Slide 16

Results We have displayed different methodologies for adding learning to SPHINX.  Consistent with prior results, we found that bilinear changed coefficients enhanced the acknowledgment rates. A much more prominent change originated from the utilization of differential coefficients, control, and differenced control in three separate codebooks.  Next, we upgraded the word reference and the telephone set-a stage that prompted to a considerable improvement.  Finally, the expansion of durational data altogether enhanced SPHINX's precision when no linguistic use was utilized, yet was not useful with a sentence structure. Presentation for the discourse acknowledgment framework

Slide 17

Context Modeling in SPHINX Previously Proposed Units of Speech Function-Word Dependent Phones Generalized Triphones Smoothing Detailed Models Presentation for the discourse acknowledgment framework

Slide 18

Previously Proposed Units of Speech Since absence of sharing crosswise over words, word models not down to earth for huge vocabulary discourse acknowledgment keeping in mind the end goal to enhance trainability, some subword unit must be utilized Word-subordinate telephones: a trade off btw word demonstrating and telephone displaying Context-subordinate telephones: triphone show, rather than demonstrating telephone in-word, they demonstrate telephone in-setting Presentation for the discourse acknowledgment framework

Slide 19

Function-Word Dependent Phones Function words are especially risky in consistent discourse acknowledgment since they are normally unstressed The telephones in capacity words are mutilated Function-word-subordinate telephones are the same as word-ward telephones, aside from they are utilized for capacity words Presentation for the discourse acknowledgment framework

Slide 20

Generalized Triphones model are inadequately prepared and devour generous memory Combining comparable triphones, enhancing the trainability and decrease the memory stockpiling Create summed up triphones by consolidating settings with an agglomerative bunching method To decide the comparability btw two models, we utilize the accompanying separation metric: Presentation for the discourse acknowledgment framework

Slide 21

Generalized Triphones In measuring the separation btw the two models, we just consider the o/p probabilities and overlook the move probabilities, which are of optional critical This setting speculation calculation gives the perfect intends to finding the harmony btw trainability and affectability. Presentation for the discourse acknowledgment framework

Slide 22

Smoothing Detailed Models Detailed models are exact, however are less powerful since numerous o/p probabilities will be zeros, which can be deplorable to acknowledgment. Brushing these point by point models with other more hearty ones. A perfect answer for weighting diverse evaluations of a similar occasion is erased inserted estimation. Strategy to join the point by point models and hearty models Using the uniform appropriation to smooth the dissemination Presentation for the discourse acknowledgment framework

Slide 23

Entire preparing methodology The synopsis of the whole preparing technique is outlined in figure 2 Presentation for the discourse acknowledgment framework

Slide 24

Summary of Results The six renditions relate to the accompanying depictions with incremental changes: the standard framework, which utilizes just LPC cepstral parameters as a part of one codebook; the expansion of differenced LPC cepstral coefficients, control, and differenced control in one codebook; every one of the four capabilities were utilized as a part of three separate codebooks tuning of telephone models and the elocution lexicon, and the utilization of word term displaying; work word subordinate telephone demonstrating summed up triphone displaying Presentation for the discourse acknowledgment framework

Slide 25

Results of five forms of SPHINX Presentation for the discourse acknowledgment framework

Slide 26

Conclusion Given an altered measure of preparing, model specificity and model trainability posture t

SPONSORS