A Model based Way to deal with Programmed Burst Discovery in Voiceless Stops

2021 days ago, 644 views
PowerPoint PPT Presentation
Look at the spectrogram of the objective token at every point against ... Evaluate how

Presentation Transcript

Slide 1

An Exemplar-based Approach to Automatic Burst Detection in Voiceless Stops YAO UC BERKELEY YAOYAO@BERKELEY.EDU http://linguistics.berkeley.edu/~yaoyao JULY 25, 2008

Slide 2

Overview Background Data Methodology Algorithm Tuning the model Testing Results General Discussion

Slide 3

Background Purpose of the study To discover the purpose of burst in a word introductory voiceless stop (i.e. [p], [t], [k]) close discharge vowel onset Existing methodology Detecting the purpose of maximal vitality change (cf. Niyogi and Ramesh, 1998; Liu, 1996)

Slide 4

Background Our approach Compare the spectrogram of the objective token at every point against that of fricatives and quiet Assess how "fricative-like" and "hush like" the spectrogram is at every time point Find the point where "fricative-ness" all of a sudden ascents and "quiet ness" abruptly drops ��  purpose of burst

Slide 5

Background Our approach (cont'd) What do we require? Ghostly components of a given time allotment Spectral formats of fricatives and quiet Specific to speaker and the recording environment Measure and analyze fricative-ness and hush ness A calculation to locate the in all likelihood point for discharge Advantage Easy to actualize No stresses over change in the earth and individual contrasts

Slide 6

Data Buckeye corpus (Pitt, M. et al. 2005) 40 speakers All occupants of Columbus, Ohio Balanced in sex and age One-hour talk with Transcribed at word and telephone level 19 utilized as a part of the present study Target tokens Transcribed word-introductory voiceless stops (e.g. [p], [t], [k])

Slide 7

Methodology: phantom measures Spectral vector 20ms Hamming window Mel scale 1 × 60 exhibit Spectral format Speaker-particular, telephone particular Ignore tokens shorter than normal length of that telephone of the speaker For the rest of the tokens Calculate an otherworldly vector for the center 20ms window Average over the ghastly vectors

Slide 8

Methodology: unearthly layout [a] of F01 [f] of F01 Silence of F01

Slide 9

Methodology: closeness scores Similarity between ghostly vectors x and u D x,u = S x,u = e - 0.005D x,u Comparing the given acoustic information against any unearthly layouts of that speaker Stepsize = 5ms

Slide 10

Similarity scores Formulae: D x,t = S x,t = e - 0.005D x,t Step estimate = 5ms - [s] score - <sil> score

Slide 11

Methodology: finding the discharge point Basic thought Near the discharge point - Fricative likeness score rises - Silence similitude score drops close discharge vowel onset Q1: Which fricative to utilize? Q2: Which time of rise or drop to pick?

Slide 12

Methodology : finding the discharge point Slope is a superior indicator than outright score esteem The end purpose of a period with maximal slant ��  the discharge point Which fricative? [sh] score is more reliable than different fricatives [h] [s] [sh] <sil> similitude scores

Slide 13

Methodology : finding the discharge point Initial [t] in "doing" Initial [k] in "nations" [h] [s] [sh] <sil> [h] [s] [sh] <sil>

Slide 14

Methodology : finding the discharge point Original calculation Find the end purpose of a time of speediest increment in <sh> score Find the end purpose of a time of quickest lessening in <sil> score Return the center purpose of the two end focuses as the purpose of discharge If either or both end focuses can't be found inside the span of the stop, return NULL.

Slide 15

Methodology : finding the discharge point Select two speakers' information to tune the model Hand-tag the discharge point for all tokens in the test set. In the event that the stop doesn't seem to have a discharge point on the spectrogram, stamp it as a hazardous case, and take the end purpose of the stop as the discharge point, for computing blunder.

Slide 16

Methodology : tricky cases no burst no conclusion frail and twofold release(??) [ sh ] < sil >

Slide 17

Methodology : finding the discharge point 17 Calculate the contrast between hand-labeled discharge point and the assessed one (i.e. mistake) for every case. RMS (Root Mean Square) of blunder is utilized to quantify the execution of the calculation.

Slide 18

Methodology : mistake investigation F07 ( n=231 tokens) M08 (n=261 tokens) Add 5ms to the estimation 14.ms RMS = 7.22ms RMS = 13.11ms 4.85ms genuine discharge assess genuine discharge appraise

Slide 19

Methodology: tuning the calculation 1 st Rejection Rule - An objective token will be rejected if the adjustments in scores are not sufficiently intense. E.g. [ sh ] < sil > Insignificant ascent ��  Reject!

Slide 20

Methodology: tuning the calculation Applying 1 st Rejection Rule Rejecting 4 cases inF07 RMS(+5ms) = 4.19ms Rejecting 28 cases in M08 covering the majority of the tricky cases RMS(+5ms)=9.27ms Error examination in M08 after 1 st dismissal lead RMS(+5ms) = 14ms 9.27ms

Slide 21

Methodology : tuning the calculation Still an issue… Multiple discharges Each might compares to an ascent/drop of the scores Initial [k] in "cause" of M08 [ sh ] < sil >

Slide 22

Methodology: tuning the calculation 2 nd Rejection Rule - An objective token will be dropped If the focuses found in <sh> and <sil> scores are too far separated. (>20ms) Partly takes care of the numerous discharge issue The perfect path would to recognize all competitor discharge focuses, and give back the first.

Slide 23

Methodology: tuning the calculation Applying 2 nd Rejection Rule Rejecting 3 cases inF07 RMS(+5ms) = 3.22ms Rejecting 20 cases in M08 Only 2 tricky cases remain RMS(+5ms) = 3.44ms Error investigation in M08 after 2 nd dismissal run RMS(+5ms) = 9.26ms 3.44ms Compare: Optimal blunder is 2.5ms given the 5ms stage measure…

Slide 24

F07 M08 Methodology: tuning the calculation Rejection rate: 3.03% Rejection rate: 15.05%

Slide 25

Methodology: testing the calculation Select an irregular example of 50 tokens from all speakers Hand-tag the discharge point Use the present calculation together with two dismissal guidelines to discover the evaluated discharge. Look at the hand-labeled point and the assessed one 4 dismisses by the 1 st lead (3 were honest to goodness) 3 dismisses by the 2 nd govern (2 were authentic) 43 acknowledged cases. RMS(error) <5ms

Slide 26

Calculate <silence> score and <sh> score Calculate the slant in <silence> score and <sh> score In a marked voiceless stop traverse, (i)find the time purpose of biggest positive slant in <sh> score, and store in p1; (ii)find the time purpose of littlest negative incline in <silence> score, and store in p2 p1 = invalid or p2 = invalid slant (p1)<0.02 and slant (p2)>0.04 |p1–p2|>=0.02 s return (p1+p2)/2+0.005 reject the case Methodology: rundown Y N Y N Y N

Slide 27

Results: stupendous means Rejection rates (2 rules joined) Varies from 3. 03% to 30.5% (mean = 13.3%,sd= 8.6%) crosswise over speakers. VOT and conclusion span

Slide 28

Results: VOT by speaker

Slide 29

General Discussion Echoing past discoveries Byrd (1993): Closure term and VOT in read discourse Shattuck-Hufnagel & Veilleux (2007): 13% of missing milestones in unconstrained discourse

Slide 30

General Discussion Future work Fine-tune the 2 nd dismissal administer Generalize the model based strategy for other programmed phonetic handling issue?

Slide 31

Acknowledgment Anonymous speakers Buckeye corpus designers Prof. Keith Johnson Members of the phonology lab in UC Berkeley Thank you! Any remarks are welcome.

Slide 32

References Byrd, D. (1993) 54,000 American stops. UCLA Working Papers in Phonetics. No 83, pp: 97-116. Johnson, K. (2006) Acoustic quality scoring: A preparatory report. Liu, S. (1996) Landmark discovery for unmistakable element based discourse acknowledgment. J. Acoust. Soc. Amer. Vol 100, pp 3417-3430. Niyogi , P., Ramesh , P. (1998) Incorporating voice onset time to enhance letter acknowledgment exactnesses. Procedures of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '98. Vol 1, pp: 13-16. Pitt, M. et al. (2005) The Buckeye Corpus of conversational discourse: marking traditions and a trial of transcriber unwavering quality. Discourse Communication. Vol 45, pp: 90-95 Shattuck-Hufnagel , S., Veilleux , N.M. (2007) Robustness of acoustic points of interest in suddenly spoken American English. Procedures of International Congress of Phonetic Science 2007, Saarbrucken, August 2007. Zue , V.W. (1976) Acoustic Characteristics of stop consonants: A controlled study. Sc. D. proposal. MIT, Cambridge, MA.