74.406 Normal Dialect Handling - Discourse Preparing -

1610 days ago, 517 views
PowerPoint PPT Presentation
spectrogram. other recurrence based representations. LPC (direct prescient coding) ... signal investigation - spectrogram. highlight extraction. phoneme acknowledgment ...

Presentation Transcript

Slide 1

74.406 Natural Language Processing - Speech Processing - Spoken Language Processing from discourse to content to sentence structure and semantics to discourse Speech Recognition human discourse acknowledgment and generation acoustics flag examination phonetics acknowledgment strategies (HMMs) Review

Slide 3

Speech Production & Reception Sound and Hearing change in pneumatic stress  sound wave gathering through inward ear film/amplifier separation into recurrence segments : receptors in cochlea/numerical recurrence investigation (e.g. Quick Fourier Transform FFT )  Frequency Spectrum observation/acknowledgment of phonemes and along these lines words (e.g. Neural Networks , Hidden-Markov Models )

Slide 5

Speech Recognizer Architecture (Fig. 7.2)

Slide 6

Speech Signal Speech Signal made out of various sinus waves with various frequencies and amplitudes Frequency - waves/second  like pitch Amplitude - stature of wave  like uproar + Noise (not sinus wave, non-consonant) Speech Signal composite flag containing diverse recurrence parts

Slide 7

Waveform (fig. 7.20) Amplitude/Pressure Time "She simply had a baby."

Slide 8

Waveform for Vowel ae (fig. 7.21) Amplitude/Pressure Time

Slide 9

Speech Signal Analysis Analog-Digital Conversion of Acoustic Signal Sampling in Time Frames ( " windows " ) recurrence = 0-intersections per time span  e.g. 2 intersections/second is 1 Hz (1 wave)  e.g. 10kHz requirements inspecting rate 20kHz measure amplitudes of flag in time span  digitized wave frame isolate distinctive recurrence segments  FFT (Fast Fourier Transform)  spectrogram other recurrence based representations  LPC (straight prescient coding),  Cepstrum

Slide 10

Waveform and Spectrogram (figs. 7.20, 7.23)

Slide 11

Waveform and LPC Spectrum for Vowel ae (Figs. 7.21, 7.22) Amplitude/Pressure Time Energy Formants Frequency

Slide 12

Speech Signal Characteristics From Signal Representation infer, e.g. formants - dull stripes in range strong recurrence segments; portray specific vowels; sexual orientation of speaker pitch – key recurrence baseline for higher recurrence sounds like formants; sex trademark change in recurrence circulation trademark for e.g. plosives (type of enunciation)

Slide 15

Video of glottis and discourse motion in lingWAVES (from http://www.lingcom.de)

Slide 19

Phoneme Recognition Process in view of components separated from otherworldly investigation phonological guidelines measurable properties of dialect/articulation Recognition Methods Hidden Markov Models Neural Networks Pattern Classification as a rule

Slide 20

Pronunciation Networks/Word Models as Probabilistic FAs (Fig 5.12)

Slide 21

Pronunciation Network for "about" (Fig 5.13)

Slide 22

Word Recognition with Probabilistic FA/Markov Chain (Fig 5.14)

Slide 23

Viterbi-Algorithm - Overview (cf. Jurafsky Ch.5) The Viterbi Algorithm finds an ideal grouping of states in nonstop Speech Recognition, given a perception arrangement of telephones and a probabilistic (weighted) FA (state diagram). The calculation gives back the way through the machine which has most extreme likelihood and acknowledges the perception succession . a[s,s'] is the move likelihood (in the phonetic word display) from current state s to next state s', and b[s',o t ] is the perception probability of s' given o t . b[s',o t ] is 1 if the perception image coordinates the state, and 0 generally.

Slide 24

Viterbi-Algorithm (Fig 5.19) work VITERBI ( perceptions of len T , state-diagram ) returns best-way num-states  NUM-OF-STATES (state-chart) Create a way likelihood lattice viterbi[num-states+2,T+2] viterbi[0,0]  1.0 for every time step t from 0 to T do for every state s from 0 to num-states do for every move s' from s in state-diagram new-score  viterbi[s,t] * a[s,s'] * b[s',(o t )] if ((viterbi[s',t+1] = 0) || (new-score > viterbi[s',t+1])) then viterbi[s',t+1]  new-score back-pointer[s',t+1]  s Backtrace from most astounding likelihood state in the last segment of viterbi[] and return way word demonstrate perception (discourse recognizer)

Slide 25

Viterbi-Algorithm Explanation (cf. Jurafsky Ch.5) The Viterbi Algorithm sets up a likelihood lattice , with one section for every time list t and one line for every state in the state chart . Every segment has a cell for every state q i in the single joined robot for the contending words (in the acknowledgment procedure). The calculation first makes N+2 state segments . The main segment is an underlying pseudo-perception , the second relates to the primary perception telephone , the third to the second perception et cetera. The last segment speaks to again a pseudo-perception . In the primary section, the likelihood of the Start-state is at first set to 1.0 ; alternate probabilities are 0 . At that point we move to the following state. For each state in segment 0, we process the likelihood of moving into every state in section 1 . The esteem viterbi [t, j] is figured by taking the greatest over the expansions of the considerable number of ways that prompt to the present cell . An augmentation of a way at state i at time t-1 is registered by increasing the three elements : the past way likelihood from the past cell forward[t-1,i] the move likelihood an i,j from past state i to current state j the perception probability b jt that present state j matches perception image t. b jt is 1 if the perception image coordinates the state; 0 generally.

Slide 26

Speech Recognition Acoustic/sound wave Filtering, Sampling Spectral Analysis; FFT Frequency Spectrum Features (Phonemes; Context) Signal Processing/Analysis Phoneme Recognition: HMM, Neural Networks Phonemes Grammar or Statistics Phoneme Sequences/Words Grammar or Statistics for likely word arrangements Word Sequence/Sentence

Slide 27

Speech Recognizer Architecture (Fig. 7.2)

Slide 28

Speech Processing - Important Types and Characteristics single word versus nonstop discourse boundless versus expansive versus little vocabulary speaker-subordinate versus speaker-free preparing (or not) Speech Recognition versus Speaker Identification

Slide 29

Natural Language and Speech Processing Natural Language Processing composed content as information sentences (all around framed) Speech Recognition acoustic flag as information change into composed words Spoken Language Understanding examination of talked dialect (deciphered discourse)

Slide 31

Speech & Natural Language Processing Areas in Natural Language Processing Morphology Grammar & Parsing (syntactic investigation) Semantics Pragamatics Discourse/Dialog Spoken Language Understanding Areas in Speech Recognition Signal Processing Phonetics Word Recognition

Slide 32

Additional References Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development . Prentice-Hall, NJ, 2001 Figures taken from: Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000, Chapters 5 and 7. lingWAVES: http://www.lingcom.de NL and Speech Resources and Tools: German Demonstration Center for Speech and Language Technologies: http://www.lt-demo.org/

Slide 33

Speech Recognition Phases Speech Recognition acoustic flag as info flag examination - spectrogram include extraction phoneme acknowledgment word acknowledgment transformation into composed words