Learning Semantic Depictions of Web Data Sources

Learning semantic descriptions of web information sources l.jpg
1 / 20
0
0
1408 days ago, 476 views
PowerPoint PPT Presentation
JaroWinkler > 0.85 for organization, lodging and airplane terminal. written by hand system for date. ... Inn Deals. 5* Hotels. By State. Separation. Between. Zipcodes. Government ...

Presentation Transcript

Slide 1

Learning Semantic Descriptions of Web Information Sources * I am presently looking for a Postdoc position in Europe some place close northern Italy …

Slide 2

Orbitz Flight Search Qantas Specials KLM Online Mediator lowestFare("MXP","HYD") Query Reformulated Query Reformulated Query SELECT MIN(price) FROM flight WHERE depart="MXP" AND arrive="HYD" calcPrice("MXP","HYD","economy") Source Definitions: Orbitz Flight Search KLM Online - Qantas Specials New Service: Alitalia Generate Model of Service? Inspiration Approach Search Scoring Experiments Related Work Conclusions Mediators & Source Definitions Explosion of online data sources Mediators run inquiries over different sources Require explanatory source definitions New administration ��  display it naturally? IJCAI-07

Slide 3

Known Source 1 Known Source 2 Known Source 3 remove New Source 4 zipcode source4( $startZip, $endZip, division) Motivation Approach Search Scoring Experiments Related Work Conclusions Modeling Sources: an Example source1($zip, lat, long) :- centroid(zip, lat, long). source2($lat1, $long1, $lat2, $long2, dist) :- greatCircleDist(lat1, long1, lat2, long2, dist). source3($dist1, dist2) :- convertKm2Mi(dist1, dist2). Step 1: classify input & yield semantic sorts, utilizing: Metadata (marks) Data (content) IJCAI-07

Slide 4

Known Source 1 Known Source 2 Known Source 3 Motivation Approach Search Scoring Experiments Related Work Conclusions Modeling Sources: Step 2 source1($zip, lat, long) :- centroid(zip, lat, long). source2($lat1, $long1, $lat2, $long2, dist) :- greatCircleDist(lat1, long1, lat2, long2, dist). source3($dist1, dist2) :- convertKm2Mi(dist1, dist2). Step 2: model usefulness by: producing conceivable definitions source4( $zip1, $zip2, dist) :- centroid(zip1, lat1, long1), centroid(zip2, lat2, long2), greatCircleDist(lat1, long1, lat2, long2, dist2), convertKm2Mi(dist1, dist2). source1(zip1, lat1, long1), source1(zip2, lat2, long2), source2(lat1, long1, lat2, long2, dist2), source3(dist2, dist). IJCAI-07

Slide 5

coordinate Motivation Approach Search Scoring Experiments Related Work Conclusions Modeling Sources: Step 2 source4( $zip1, $zip2, dist) :- source1(zip1, lat1, long1), source1(zip2, lat2, long2), source2(lat1, long1, lat2, long2, dist2), source3(dist2, dist). Step 2: model usefulness by: creating conceivable definitions contrasting the yield they deliver IJCAI-07

Slide 6

Known Source Known Source Target Tuples Invoke Source New Source Example Inputs Compare yields Candidate Tuples Execute definition Motivation Approach Search Scoring Experiments Related Work Conclusions Summary - Modeling Sources Previous Work! Lerman, Plangprasopchok and Knoblock. Consequently marking information utilized by web administrations. AAAI'06. Step 1: Semantic Labeling Classify input & yield semantic sorts , utilizing: Labels: metadata Content: yield information Step 2: Functional Modeling Model the usefulness of administration via: Search: creating conceivable definitions Scoring: look at the yield they deliver IJCAI-07

Slide 7

Sample the new source Invoke focus with set of irregular data sources; Add exhaust statement to line ; while ( line not unfilled) v := best definition from line ; forall ( v' in Expand ( v ) if ( Eval ( v' ) > Eval ( v ) insert v' into line ; Best-first pursuit through space of hopeful definitions Motivation Approach Search Scoring Experiments Related Work Conclusions Searching for Definitions Expressive Language Sufficient for displaying most online sources Search space of conjunctive questions: target( X ) :- source1( X 1 ), source2( X 2 ), … IJCAI-07

Slide 8

New Source 5 haphazardly created input tuples Non-purge Result Empty Result Motivation Approach Search Scoring Experiments Related Work Conclusions Invoking the Target Invoke source with arbitrarily produced tuples Use dispersion if accessible If no yield is delivered have a go at conjuring different sources source5( $zip1, $dist1, zip2, dist2) IJCAI-07

Slide 9

New Source 5 Expand source5 (zip1,_,_,_) :- source4 (zip1,zip1,_). source5 (zip1,_,zip2,dist2) :- source4 (zip2,zip1,dist2). source5 (_,dist1,_,dist2) :- < (dist2,dist1). … Motivation Approach Search Scoring Experiments Related Work Conclusions Top-down Generation of Candidates Start with purge condition & practice it by: Adding a predicate from set of sources Check that definition is not repetitive source5 (_,_,_,_). source5( $zip1,$dist1,zip2,dist2) IJCAI-07

Slide 10

New Source 5 Expand source5 (zip1,dist1,zip2,dist2) :- source4 (zip2,zip1,dist2), source4 (zip1,zip2,dist1). source5 (zip1,dist1,zip2,dist2) :- source4 (zip2,zip1,dist2), < (dist2,dist1). … source5 (zip1,_,_,_) :- source4 (zip1,zip1,_). source5 (zip1,_,zip2,dist2) :- source4 (zip2,zip1,dist2). source5 (_,dist1,_,dist2) :- < (dist2,dist1). … Motivation Approach Search Scoring Experiments Related Work Conclusions Best-first Enumeration of Candidates Evaluate provisos & grow the best one source5 (_,_,_,_). source5( $zip1,$dist1,zip2,dist2) IJCAI-07

Slide 11

Standard strategies Non-standard method Motivation Approach Search Scoring Experiments Related Work Conclusions Limiting the Search Extremely Large Search space! Compelled by utilization of Semantic Types Limit seek by: Maximum Clause length Maximum Predicate Repetition Maximum Number of Existential Variables Definition must be Executable Maximum Variable Repetition inside Literal IJCAI-07

Slide 12

No Overlap No Overlap! Inspiration Approach Search Scoring Experiments Related Work Conclusions Scoring Candidates Need to score contender to direct best-first inquiry Score definitions in view of cover IJCAI-07

Slide 13

At slightest portion of info tuples are non-purge summons of target Average results just when yield is returned Motivation Approach Search Scoring Experiments Related Work Conclusions Scoring Candidates II Sources may give back various tuples and not be finished: Use Jaccard comparability as wellness capacity Average results crosswise over various data sources return normal( wellness ) forall (tuple in InputTuples ) T_target = conjure (target, tuple) T_clause = execute (provision, tuple) if not (| T_target |=0 and | T_clause |=0) wellness = Jaccard closeness IJCAI-07

Slide 14

Motivation Approach Search Scoring Experiments Related Work Conclusions Approximating Equality Allow adaptability in qualities from various sources Numeric Types like separation Error Bounds (eg. +/ - 1%) Nominal Types like organization String Distance Metrics (e.g. JaroWinkler Score > 0.9) Complex Types like date Hand-composed balance checking techniques. 10.6 km ≈ 10.54 km Google Inc. ≈ Google Incorporated Mon, 31. July 2006 ≈ 7/31/06 IJCAI-07

Slide 15

Motivation Approach Search Scoring Experiments Related Work Conclusions Experimental Setup 25 issues 35 known sources All genuine administrations Time breaking point of 20 minutes Inductive hunt inclination: Max condition length: 7 Predicate reiteration: 2 Max variable level: 5 Executable competitors No factor redundancy Equality Approximations: 1% for separation , speed , temperature & cost 0.002 degrees for scope & longitude JaroWinkler > 0.85 for organization , lodging & air terminal written by hand system for date . IJCAI-07

Slide 16

Distinguished conjecture from ebb and flow conditions momentum value = yesterday's nearby + change Motivation Approach Search Scoring Experiments Related Work Conclusions Actual Learned Examples 1 GetDistanceBetweenZipCodes ($zip0, $zip1, dis2):- GetCentroid (zip0, lat1, lon2), GetCentroid (zip1, lat4, lon5), GetDistance (lat1, lon2, lat4, lon5, dis10), ConvertKm2Mi (dis10, dis2). 2 USGSElevation ($lat0, $lon1, dis2):- ConvertFt2M (dis2, dis1), Altitude (lat0, lon1, dis1). 3 YahooWeather ($zip0, cit1, sta2, , lat4, lon5, day6, dat7,tem8, tem9, sky10) :- WeatherForecast (cit1,sta2,,lat4,lon5,,day6,dat7,tem9,tem8,,,sky10,,,), GetCityState (zip0, cit1, sta2). 4 GetQuote ($tic0,pri1,dat2,tim3,pri4,pri5,pri6,pri7,cou8,,pri10,,,pri13,,com15) :- YahooFinance (tic0, pri1, dat2, tim3, pri4, pri5, pri6,pri7, cou8), GetCompanyName (tic0,com15,,), Add (pri5,pri13,pri10), Add (pri4,pri10,pri1). 5 YahooAutos ($zip0, $mak1, dat2, yea3, mod4, , pri7, ) :- GoogleBaseCars (zip0, mak1, , mod4, pri7, , yea3), ConvertTime (dat2, , dat10, , ), GetCurrentTime ( , dat10, ). IJCAI-07

Slide 17

Motivation Approach Search Scoring Experiments Related Work Conclusions Experimental Results Overall Results: Average Precision: 88% Average Recall: 69% Results for various areas: IJCAI-07

Slide 18

Motivation Approach Search Scoring Experiments Related Work Conclusions Related Work Semantic Labeling: Metadata-based administration order (Hess & Kushmerick, '03) Woogle: Web Service grouping (Dong et al, 2004) Neither framework produces adequate data for combination Functional Modeling: Category Translation (Perkowitz & Etzioni 1995) Less entangled (single information, single yield) definitions. iMAP: Complex diagram matcher (Dhamanka et. al. 2004) Many-to-1 relatively few to-numerous mappings Type-particular hunt calculations Not composed down live data sources IJCAI-07

Slide 19

Government Hotel List Great Circle Distance Centroid of Zipcode Hotels By Zipcod

SPONSORS