Outline of the TDT 2001 Assessment and Results

2657 days ago, 808 views
PowerPoint PPT Presentation
Gaithersburg Holiday Inn. Gaithersburg, Maryland. November 12-13, 2001 ... Create applications that sort out and find significant stories from a consistent ...

Presentation Transcript

Slide 1

Review of the TDT 2001 Evaluation and Results Jonathan Fiscus Gaithersburg Holiday Inn Gaithersburg, Maryland November 12-13, 2001

Slide 2

Outline TDT Evaluation Overview 2001 TDT Evaluation Result Summaries First Story Detection (FSD) Topic Detection Topic Tracking Link Detection Other Investigations www.nist.gov/TDT

Slide 3

TDT 101 "Applications for sorting out content" Terabytes of Unorganized information 5 TDT Applications Story Segmentation Topic Tracking Topic Detection First Story Detection Link Detection www.nist.gov/TDT

Slide 4

TDT's Research Domain Technology challenge Develop applications that arrange and find important stories from a nonstop nourish of news stories Research driven by assessment assignments Composite applications worked from Automatic Speech Recognition Story Segmentation Document Retrieval www.nist.gov/TDT

Slide 5

Definitions A point is … an original occasion or movement, alongside all specifically related occasions and exercises. A story is … a topically durable fragment of news that incorporates at least two DECLARATIVE free provisions about a solitary occasion. www.nist.gov/TDT

Slide 6

Example Topic Title: Mountain Hikers Lost WHAT: 35 or 40 youthful mountain explorers were lost in a torrential slide in France around the twentieth of January. WHERE: Orres, France WHEN: January 1998 RULES OF INTERPRETATION: Rule #5. Mischances www.nist.gov/TDT

Slide 7

TDT 2001 Evaluation Corpus TDT3 + Supplemental Corpora utilized for the assessment * ��  TDT3 Corpus Third continuous use for assessments XXX stories, 4 th Qtr. 1998 Used for Tracking and Link Detection improvement test Supplement of 35K stories added to TDT3 No comments Data included from both 3 rd and 4 th Qtr. 1998 Not utilized for FSD tests LDC Annotations ��  120 completely explained subjects: separated into distributed and withheld sets 120 incompletely commented on themes FSD utilized every one of the 240 subjects Topic Detection utilized the 120 completely clarified themes Tracking and Link Detection utilized the 60 completely commented on withheld subjects * see www.nist.gov/discourse/tests/tdt/tdt2001 for subtle elements ��  see www.ldc.upenn.edu/Projects/TDT3/for points of interest www.nist.gov/TDT

Slide 8

TDT3 Topic Division TDT 2000 Systems Two theme sets: Published themes Withheld points Selection criteria: 60 themes for every set 30 of the 1999 subjects 30 of the 2000 points Balanced by number of on-theme stories www.nist.gov/TDT

Slide 9

TDT Evaluation Methodology Evaluation assignments are given a role as discovery undertakings: YES there is an objective, or NO there is not Performance is measured as far as location cost: "a weighted aggregate of missed identification and false alert probabilities" C De t = C Miss • P Miss • P target + C FA • P FA • (1-P target ) C Miss = 1 and C FA =0.1 are preset costs P target = 0.02 is the from the earlier likelihood of an objective Detection Cost is standardized to by and large lie somewhere around 0 and 1: (C Det ) Norm = C Det/min{C Miss • P target , C FA • (1-P target )} When in light of the YES/NO choices, it is alluded to as the genuine choice cost Detection Error Tradeoff (DET) bends graphically delineate the execution tradeoff between P Miss and P FA Makes utilization of probability scores joined to the YES|NO choices Minimum DET point is the best score a framework could accomplish with appropriate edges www.nist.gov/TDT

Slide 10

TDT: Experimental Control Good research requires exploratory controls Conditions that influence execution in TDT Newswire versus Communicate News Manual versus programmed translation of Broadcast News Manual versus programmed story division Mono versus multilingual dialect material Topic preparing sums and dialects Default programmed English interpretations of Mandarin versus local Mandarin orthography Decision deferral periods www.nist.gov/TDT

Slide 11

Outline TDT Evaluation Overview 2001 TDT Evaluation Result Summaries First Story Detection (FSD) Topic Detection Topic Tracking Link Detection Other Investigations www.nist.gov/TDT

Slide 12

First Stories on two subjects = Topic 1 = Topic 2 Not First Stories First Story Detection Results System Goal: To distinguish the principal story that talks about every point Evaluating "part" of a Topic Detection framework, i.e., when to begin another bunch www.nist.gov/TDT

Slide 13

2001 TDT Primary FSD Results Newswire+BNews ASR, English writings, programmed story limits, 10 File Deferral www.nist.gov/TDT

Slide 14

TDT Topic Detection Task System Goal: To recognize subjects as far as the (groups of) stories that examine them. "Unsupervised" point preparing New themes must be distinguished as the approaching stories are handled. Input stories are then connected with one of the points. Point 1 Story Stream Topic 2

Slide 15

Primary Topic Detection Sys. Newswire+Bnasr, Multilingual, Auto Boundaries, Deferral=10 Mandarin Native Translated Mandarin www.nist.gov/TDT

Slide 16

preparing information on-point obscure test information Topic Tracking Task System Goal: To recognize stories that examine the objective subject, in various source streams. Supervised Training Given N t test stories that talk about a given target theme Testing Find all consequent stories that examine the objective theme www.nist.gov/TDT

Slide 17

Primary Tracking Results Newswire+BNman, English Training:1 Positive-0 Negative www.nist.gov/TDT

Slide 18

TDT Link Detection Task System Goal: To distinguish whether a couple of stories examine a similar point. (Can be considered as a "primitive administrator" to construct an assortment of uses) ? www.nist.gov/TDT

Slide 19

Primary Link Det. Comes about Newswire+BNasr, Deferral=10 NTU's threshholding is irregular Native Mandarin Native Translated Mandarin www.nist.gov/TDT

Slide 20

Outline TDT Evaluation Overview 2001 TDT Evaluation Result Summaries First Story Detection (FSD) Topic Detection Topic Tracking Link Detection Other Investigations www.nist.gov/TDT

Slide 21

Primary Topic Detection Sys. Newswire+Bnasr, Multilingual, Auto Boundaries, Deferral=10 www.nist.gov/TDT

Slide 22

Topic Detection: False Alarm Visualization UMass1 Systems carry on diversely IMHO a client might not want to utilize a high FA rate framework Perhaps False cautions ought to get more weight in the cost work Outer Circle: Number of stories in a bunch Light => group was mapped to a reference subject Blue => unmapped group Inner Circle: Number of on-theme stories Topic ID TNO1-late System groups, requested by size Topic ID ` System bunches, requested by size

Slide 23

Topic Detection: 2000 versus 2001 Index Files Multilingual Text, Newswire + Broadcast News, Auto Boundaries, Deferral =10 The 2000 test corpus secured 3 months The 2001 corpus secured 6 months 35K more stories Might influence execution, BUT shows up not to. www.nist.gov/TDT

Slide 24

Topic Detection Evaluation by means of a Link-Style Metric Motivation: There is precariousness of measured execution amid framework tuning Likely to be an immediate consequence of the need to guide reference subject bunches to framework characterized groups We might want to stay away from the presumption of free themes www.nist.gov/TDT

Slide 25

Topic Detection Evaluation through a Link-Style Metric Evaluation Criterion: "Is this match of stories talk about a similar point?" If a story combine is on a similar point A missed identification is announced if the framework put the stories in various groups Otherwise, it's a right recognition If a couple of stories in not on a similar point A false alert is pronounced if the framework put the stories in a similar group Otherwise, it's a right non-location www.nist.gov/TDT

Slide 26

Link-Based versus Point Detection Metrics: Parameter Optimization Sweep System 1: 62K Test Stories 98 Topics The connection bend is less unpredictable for System1 Link bend is higher: What does this mean? Framework 2: 27K Test Stories 31 Topics www.nist.gov/TDT

Slide 27

What can be scholarly? Are all the trial controls fundamental? Following execution corrupts half going from manual to programmed translation, and an extra half going to programmed limits Cross-dialect issues still not fathomed Most frameworks utilized just the required deferral time frame Progress was unassuming: did the absence of another assessment corpus obstruct examine? www.nist.gov/TDT

Slide 28

Summary TDT Evaluation Overview 2001 TDT Evaluation Results Evaluating Topic Detection with the Link-based metric is possible, however uncertain The TDT3 corpus explanations are currently open! www.nist.gov/TDT