CMU Shpinx Discourse Acknowledgment Motor

1606 days ago, 625 views
PowerPoint PPT Presentation
Sphinx is an open source Speech Recognition created at CMU. ... Mosur K.Ravishankar, Kevin A. Lenzo ,Sphinx-II User Guide , CMU,2001. Xuedong Huang,Alex ...

Presentation Transcript

Slide 1

CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab

Slide 2

Purposes of this venture Finding out how an effective discourse acknowledgment motor can be executed. Look at the source code of Sphinx2 to discover the part and capacity of every segment. Perusing key sections of Dr. Mosur K. Ravishankar's postulation as a source of perspective. Some demo projects will be given amid oral presentation.

Slide 3

Presentation Agenda Project Summary/Agenda/Goal. (In English) Introduction. Rudiments of Speech Recognitions. Design of CMU Sphinx. Acoustic Model and HMM. Dialect Model. Java™ Platform Issues. Demo Conclusion.

Slide 4

Voice Technologies In the mid-to late 1990s, PCs began to end up sufficiently intense to bolster ASR The two key hidden innovations behind these advances are discourse acknowledgment (SR) and content to-discourse blend (TTS).

Slide 5

Basics of Speech Recognition

Slide 6

Speech Recognition Capturing discourse (simple) signals Digitizing the sound waves, changing over them to fundamental dialect units or phonemes( 音素 ). Building words from phonemes, and relevantly breaking down the words to guarantee remedy spelling for words that sound alike, (for example, compose and right).

Slide 7

Speech Recognition Process Flow Source:Microsoft Speech.NET Home(

Slide 8

Recognition Process Flow Summary Step 1:User Input The framework gets client's voice as simple acoustic flag . Step 2:Digitization Digitize the simple acoustic flag. Step 3:Phonetic Breakdown Breaking signals into phonemes.

Slide 9

Recognition Process Flow Summary(2) Step 4:Statistical Modeling Mapping phonemes to their phonetic representation utilizing measurements show. Step 5:Matching According to language structure , phonetic representation and Dictionary , the framework gives back a n-best rundown (I.e.:a word in addition to a certainty score) Grammar - the union words or expressions to requirement the scope of info or yield in the voice application. Lexicon - the mapping table of phonetic representation and word(EX:thu,thee the )

Slide 10

Architecture of CMU Sphinx.

Slide 11

Introduction to CMU Sphinx A discourse acknowledgment framework created at Carnegie Mellon University. Comprises of an arrangement of libraries center discourse acknowledgment works low-level sound catch Continuous discourse interpreting Speaker-autonomous

Slide 12

Brief History of CMU Sphinx-I (1987) The main client free ,superior ASR of the world. Written in C by Kai-Fu Lee ( 李開復博士,現任 Microsoft Asia 首席技術顧問/副總裁 ). Sphinx-II (1992) Written by Xuedong Huang in C. ( 黃學東博士,現為 Microsoft Speech.NET 團隊領導人 ) 5-state HMM/N-gram LM. ( 我們可以推測, CMU Sphinx 的核心技術對 Microsoft Speech SDK 影響很大。 )

Slide 13

Brief History of CMU Sphinx (2) Sphinx 3 (1996) Built by Eric Thayer and Mosur Ravishankar. Slower than Sphinx-II yet the outline is more adaptable. Sphinx 4 (Originally Sphinx 3j) Refactored from Sphinx 3. Completely actualized in Java. Not completed yet.

Slide 14

Components of CMU Sphinx

Slide 15

Front End libsphinx2fe.lib/libsphinx2ad.lib Low-level sound get to Continuous Listening and Silence Filtering Front End API diagram .

Slide 16

Knowledge Base The information that drives the decoder. Three arrangements of information Acoustic Model. Dialect Model. Vocabulary (Dictionary).

Slide 17

Acoustic Model/show/well/6k Database of measurable model. Each factual model speaks to a phoneme. Acoustic Models are prepared by breaking down huge measure of discourse information.

Slide 18

HMM in Acoustic Model HMM speak to every unit of discourse in the Acoustic Model. Average HMM utilize 3-5 states to demonstrate a phoneme. Every condition of HMM is spoken to by an arrangement of Gaussian blend thickness capacities . Sphinx2 default telephone set .

Slide 19

Gaussian Mixtures Refer to course reading p 33 eq 38 Represent every state in HMM. Every arrangement of Gaussian Mixtures are called "senones". Gee can share "senones".

Slide 21

Language Model Describes what is probably going to be talked in a specific setting Word moves are characterized as far as move probabilities Helps to compel the hunt space See cases of LM .

Slide 22

N-gram Language Model Probability of word N reliant on word N-1, N-2, ... Bigrams and trigrams most normally utilized Used for huge vocabulary applications, for example, transcription Typically prepared by vast (a large number of words) corpus

Slide 23

Decoder Selects next arrangement of likely states Scores approaching components against these states Drop low scoring states Generates comes about

Slide 24

Speech in Java™ Platform

Slide 25

Sun Java Speech API First discharged on October 26, 1998. The Java™ Speech API permits Java applications to join discourse innovation into their UIs. Characterizes a cross-stage API to bolster order and control recognizers, correspondence frameworks and discourse synthesizers.

Slide 26

Implementations of Java Speech API Open Source FreeTTS/CMU Sphinx4. IBM Speech for Java. Cloud Garden. L&H TTS for Java Speech API. Conversa Web 3.0.

Slide 27

Free TTS Fully actualized with Java. Based upon Flite 1.1 : a little run-time discourse blend motor created at CMU. Incomplete support for JSAPI 1.0. Discourse Recognition capacities. JSML.

Slide 28

Sphinx 4 (Sphinx 3j) Fully executed with Java. Speed is equivalent or quicker than Sphinx3. Acoustic model and Language model is under development. Source code are accessible by CVS.(but you can not run any applications without models !) For Example : To look at the Sphinx4 ,you can utilizing the accompanying summon. cvs - z3 - co sphinx4

Slide 29

Java™ Platform Issues GC makes overseeing information much less demanding Native motors commonly upgrade internal circles for the CPU – can't do that on the Java stage. Local motors orchestrate information to upgrade store hits – can't generally do that either.

Slide 30

DEMO Sphinx-II bunch mode. Sphinx-II live mode. Sphinx-II Client/Server mode. A Simple Free TTS Application. (Java-based) TTS versus (c-based)SR . Movement Planner with Free TTS-utilizing Java Web Start™.(This is GRA course last venture)

Slide 31

Summary Sphinx is an open source Speech Recognition created at CMU. FE/KB/Decoder shape the center of SR framework. FE gets and forms discourse flag. Learning Base give information to Decoder. Decoder seek the states and give back the outcomes. Discourse Recognition is a testing issue for the Java stage.

Slide 32

Reference Mosur K.Ravishankar, Efficient Alogrithms for Speech Recognition , CMU, 1996. Mosur K.Ravishankar, Kevin A. Lenzo ,Sphinx-II User Guide , CMU,2001. Xuedong Huang,Alex Acerd,Hsiao-Wuen hon ,Spoken Language Processing ,Prentice Hall,2000.

Slide 33

Reference (on-line) On-line reports of Java™ Speech API archives of Free TTS records of Sphinx-II

Slide 34

Q & A