Accomplishing Productive Access to Extensive Incorporated Arrangements of Semantic Information in Web Applications

1609 days ago, 571 views
PowerPoint PPT Presentation
Television Grabbers, IMDb, Movie-trailers, and so on. Setting delicate suggestions. Proposals ... Semantic improvement of TV metadata with IMDB film depictions ...

Presentation Transcript

Slide 1

Accomplishing Efficient Access to Large Integrated Sets of Semantic Data in Web Applications Pieter Bellekens, Kees van der Sluijs , William van Woensel, Sven Casteleyn, Geert-Jan Houben ICWE 2008 07/16/2008

Slide 2

Introduction Context Semantic Web (SW) develops Ever more SW datasets are accessible Legacy information can ordinarily be changed in SW terms Class of Web application rises that need to use the capability of these Web-sized SW datasets Practical Application iFanzy, a business customized EPG Uses expansive arrangements of coordinated SW information Runner up application in the SW-challenge; demonstrated that the class of huge scale SW-applications needs more research

Slide 3

iFanzy Personalized EPG Multi-stage Set-best box, concentrate on experience Web, concentrate on intuitiveness Central server engineering Integrating information from different heterogeneous sources TV-Grabbers, IMDb, Movie-trailers, and so forth Context delicate suggestions Recommendations in light of semantics structure of e.g. class cosmology, associations between projects, and so forth

Slide 4


Slide 5

iFanzy Datasets (Live) Heterogeneous information sources Online TV guides, XMLTV design test: 1.278.718, day by day upgraded Online motion picture databases, IMDB content dumps at present 53.268.369 (full), 7.986.199 (trimmed) trailers from (API) Broadcast depictions, BBC-backstage, TV-Anytime arrange (space display) test: 91.447, day by day overhauled Various vocabularies and ontologies

Slide 6

iFanzy Datasets cont.

Slide 7

Converting TV Metadata in RDF/OWL Input source 1: Input source 2: <program title="Match of the Day"> <channel>BBC One</channel> <start>2008-03-09T19:45:00Z</start> <duration>PT01H15M00S</duration> <genre>sport</genre> </program> <program channel="NED1"> <source></source> <title>Sportjournaal</title> <start>20080309184500</start> <end>20080309190000</end> <genre>sport nieuws</genre> </program> Translation to TV-Anytime in RDF/OWL <TVA:ProgramInformation ID="crid://"> <hasTitle> Sportjournaal </hasTitle> <hasGenre rdf:resource=" TVAGenres: "/> </TVA:ProgramInformation> <TVA:Schedule ID="TVA:Schedule_0001"> <serviceIDRef> NED1 </serviceIDRef> <hasProgram crid="crid://"/> <startTime rdf:resource="TIME:TimeDesc_0001"/> </TVA:Schedule> <TIME:TimeDescription ID= "TIME:TimeDesc_0001"> <year> 2008 </year> <month> 3 </month> <day> 9 </day> <hour> 18 </hour> <minute> 45 </minute> <second> 0 </second> </TIME:TimeDescription>

Slide 8

Converting Vocabularies in RDF/OWL <Term termID="3.1"> <Name xml:lang="en">NON-FICTION/INFORMATION</Name> <Term termID="3.1.1"> <Name xml:lang="en">News</Name> <Term termID=""> <Name xml:lang="en">Sport News</Name> <Definition xml:lang="en">News of sports</Definition> </Term> </Term> </Term> <Term termID="3.2"> <Name xml:lang="en">SPORTS</Name> <Term termID="3.2.1"> <Name xml:lang="en">Athletics</Name> <Term termID=""> … </Term> </Term> </Term> Translation of TV-Anytime classifications to RDF/OWL utilizing SKOS <TVAGenres:genre ID="TVAGenres:"> <rdfs:label>Sport News</rdfs:label> <skos:broader rdf:resource="TVAGenres:3.1.1"/> <skos:related rdf:resource="TVAGenres:3.2"/> </TVAGenres:genre> <TVAGenres:genre ID="TVAGenres:3.2"> <rdfs:label>Sport</rdfs:label> <skos:related rdf:resource="TVAGenres:"/> </TVAGenres:genre> <TVAGenres:genre ID="TVAGenres:3.1.1"> <rdfs:label>News</rdfs:label> <skos:narrower rdf:resource="TVAGenres:"/> <skos:broader rdf:resource="TVAGenres:3.1"/> </TVAGenres:genre>

Slide 9

Aligning and Enriching Vocabularies Alignment of Genre vocabularies The substance sources utilize a few distinctive type vocabularies Semantic improvement of Genre vocabulary Via SKOS smaller, more extensive and related relations Enrichment of the client demonstrate Import of interpersonal organization profile includes premiums in projects, people (performing artists, directors,...), areas, and so on. XMLTV:documentaire ��  TVA:"Documentary" IMDB:Thriller ��  TVA:"Thriller" IMDB:Sci-Fi ��  TVA:"Science Fiction" News –skos:narrower-> Sports News => Original Term chain of command Sport News –skos:related-> Sport => Partial mark matches Skating –skos: related-> 'Ice skating' => Partial name matches 'American Football' - skos:related-> Rugby => Domain master

Slide 10

Aligning and Enriching Vocabularies Semantic enhancement of TV metadata with IMDB motion picture portrayals Programs are coordinated crosswise over sources Use some portion of relations in a land pecking order to relate areas in the diverse sources Alignment of date/time depictions to Time metaphysics ideas to permit worldly thinking <time:year>2006</time:year> <time:day>01</time:day> <time:hour>12</time:hour> " 2006-01-01T12:00:00 " ��  "Buono, il brutto, il cattivo, Il (1966)" ��  "The Good, the Bad and the Ugly" "White Plains" ��  "New York" ��  "USA"

Slide 11

Using the Semantic Graph Recommendations are produced in view of use information, the RDF/OWL chart and conduct examination Search usefulness utilizes the diagram to show associations between things Showing semantically related substance by taking after the connections Interface representation types and areas in the interface can be perused in light of their relations to different ideas

Slide 12

Scalability & Performance Issues Large scale SW-applications confront execution issues with ebb and flow day SW devices Current RDF databases are not execution develop Especially for complex questions Inference is tedious or space escalated RDF databases are bland; don't utilize particular learning about the sources Target: Efficient access to our information Low inertness, clients expect fast reaction from Web applications Web 2.0 permits offbeat overhauls We should have the capacity to scale to a great many clients

Slide 13

Technologies and Strategies Technological decisions RDF Database: Sesame (variant 1 and 2) Query Language: SeRQL We took a gander at various information disintegration methodologies Vertical Decomposition Horizontal Decomposition We connected a few application particular improvements Using Relational Database where conceivable Using freetext web search tool

Slide 14

Natural Solution: One major dataset All sources in one vault Pro: Data is very incorporated One inquiry to get all information Con: Maintenance can be hard The greater the store, the more extended inquiry execution times (i.e. additionally for basic inquiries) Some common iFanzy questions together with execution times: Query1: All programs with type "show" (or one of its subgenres) Query2: All programs with class "dramatization" and a catchphrase in the program metadata (title, outline and watchwords) Query3: All programs with a watchword in the program metadata (title, abstract and watchwords) Query4: All programs with kind "dramatization" and a catchphrase in the program metadata or the individual metadata (individual name)

Slide 15

Decomposition Table X can be disintegrated in: x1, x2,… ,xn Vertical decay (part properties) n Query comes about because of the deterioration should be joined to locate the last result Building the last result gets more convoluted as more tables are included Horizontal deterioration (part occasions) n Query comes about because of the disintegration should be joined by means of a UNION to locate the last result If the outcome set should be requested, requesting should be done after all inquiry execution

Slide 16

Vertical Decomposition Splitting the information sources in view of properties Genres, Geo and Synonyms (WordNet) are divided from Relations between sources are not broken because of uniqueness of URIs Result of one inquiry is contribution of the following in the inquiry pipeline E.g. equivalent words found in WordNet are utilized to inquiry the information Different methodologies impact execution extraordinarily (see table)

Slide 17

Horizontal Decomposition Splitting the information sources in light of cases The BBC and XMLTV datasets (which have indistinguishable structures) are isolated into two tables Joining the outcomes is a basic UNION Retrieve from one source until enough results are discovered Queries to the split sources can be executed in parallel

Slide 18

Horizontal Decomposition cont. The greatest information source (the IMDb set) is additionally responsible for the greatest deferral in responsiveness While containing almost one million motion pictures, just a part are likewise know by the overall population Indicator: The more votes a film got, the more known it is Trimming the IMDb database in view of nr of votes (see table) Filtering all motion pictures which have more than 500 votes brought about 11.500 motion pictures or 7.986.199 triples in the database Also questioning time was diminished extremely

Slide 19

Reasoning enhancement In RDF, we can reason over truths to find new actualities Inference can be pre-computed ��  More triples in database Inference can be considered while questioning ��  Much more mind boggling inquiry Inference for sublocations ("California": 8877 sublocations) Inference for subgenres ("Action": 10 subgenres)

Slide 20

Further streamlining Different sorts of databases Some all around organized information archives can be spared in social databases Different adaptations of Sesame, or diff