Past SQL: Organized Information Recovery by Positioning

Beyond sql structured data retrieval by ranking l.jpg
1 / 52
0
0
1363 days ago, 631 views
PowerPoint PPT Presentation
Locate the best inn bargains by value, separation, and so on. Interactive media ... Propose a lodging to stay and an exhibition hall to visit: B. R. 20. Preparing Ranking Queries in ...

Presentation Transcript

Slide 1

Past SQL: Structured Data Retrieval by Ranking Chengkai Li October 25, 2006 CS511 visitor address

Slide 2

About Myself Advisor: Kevin Chang (UIUC) Research Interests Structured Data Retrieval and Ranking Query Processing Web Information Integration XML Query Processing Misc. http://www.ews.uiuc.edu/~cli cli [at] uiuc [dot] edu

Slide 3

Boolean Semantics of SQL: Success and Barrier Example: SELECT * FROM Homes H WHERE 400K<price<600K AND #bedroom = 4 Problems: Query Semantics: True or False , nothing in the center What do we need?: likeness, importance, inclination, and so forth . Inquiry Results Presentation: Too numerous answers ("data over-burden" issue): hard to process and investigate; or excessively few answers : no enough alternatives to consider What do we need?: a positioned list

Slide 4

"Google" on Databases: From Boolean Query to Fuzzy Retrieval

Slide 5

Fuzzy Retrieval by Ranking (Top-k) Queries Top k comes about by a positioning capacity with various criteria SELECT * FROM Homes H Order By 0.3*big(H.size)+0.7*cheap(H.price) Limit 5

Slide 6

Beyond Boolean Semantics: Ranking Useful and vital in some true database applications: E-Commerce Find the best lodging bargains by value, separate, and so on. Sight and sound Databases Find the most comparative pictures by shading, shape, surface, and so forth. Content Retrieval and Document Management Find the most significant records/reports/pages. OLAP, Decision Support Find the top beneficial clients.

Slide 7

Application 1: E-trade The home inquiry illustration More points of interest later

Slide 8

Application 2: Decision Support What are the main 5 territories to promote another protection item? SELECT zipcode, AVG(income*w1+age*w2+credit*w3) as score FROM client WHERE occupation='student' GROUP BY zipcode ORDER BY score LIMIT 5

Slide 9

Application 3: Multimedia Databases Find the most comparative pictures by shading, shape, surface, and so on. Online demo: http://amazon.ece.utexas.edu/~qasim/research.htm

Slide 10

Application 4: Keyword Queries Text information exist together with organized information IR-style database questions [slides affability of Vagelis Hristidis]

Slide 11

Example – Complaints Database Schema

Slide 12

Example - Complaints Database Data Complaints Customers Products

Slide 13

Example – Keyword Query [ Maxtor Netvista ] Complaints Customers Products

Slide 14

Keyword Query Semantics (meaning of "report" in databases) Keywords are : in same tuple in same connection in tuples associated through essential remote key connections Score of result : separation of watchwords inside a tuple remove between catchphrases as far as essential outside key associations IR-style score of result tree

Slide 15

Example – Keyword Query [ Maxtor Netvista ] Complaints Customers Products Results: (1) c 3, (2) p 2  c 3, (3) p 1  c 1

Slide 16

Great Interest in Database Community Top-k Queries Top-k to range inquiry change: [ChaudhuriG99] Indexing: [ChangBC+00], [TsaparasPK+03] View: [HristidisKP01], [YiYY+03], [DasGK+06] Query administrators and calculations: [CareyK97], [NatsevCS+01], [IlyasAE03], [IlyasSA+04], [ZhangHC+06] Ranking polynomial math : [ChaudhuriRW05] [LiCI+05] Ranking total inquiries: [LiCI06] Other Types of Queries Keyword inquiries : [BhalotiaHN+02], [AgrawalCD02], [HristidisP02] IR-style TFIDF positioning: [AgrawalCD+03] Ranking of qualities: [DasHK+06] Preference questions : [AgrawalW00], [Kießling02] Skyline questions : [BorzsonyiSK01] DBRank07

Slide 17

Theme Life Cycles: Bursting Topics(Current Bursting) [slide politeness of Chengixang Zhai]

Slide 18

RankSQL: a RDBMS with Efficient Support of Ranking Queries Algebraic Foundation and Optimization Framework SPJ inquiries (SELECT … FROM … WHERE … ORDER BY … ) Ad-Hoc Ranking Aggregate Queries beat k gathers rather than tuples (SELECT … FROM … WHERE … GROUP BY … ORDER BY … )

Slide 19

Example: Trip Planning Suggest a lodging to stay and an exhibition hall to visit: Select * From Hotel h, Museum m Where h.star=3 AND h.area=m.area Order By cheap(h.price) + close(h.addr, "BWI airplane terminal") + related(m.collection,"dinosaur") Limit 5 participation measurement: Boolean predicates, Boolean capacity B arrange measurement: positioning predicates, scoring capacity R

Slide 20

5 comes about R sort cheap+close+related B h.area=m.area σ h.star=3 scan(m) scan(h) Processing Ranking Queries in Traditional RDBMS Select * From Hotel h, Museum m Where B Order By R Limit 5

Slide 21

5 comes about R B Problems of Traditional Approach Naïve Materialize-then-Sort plot Overkill total request of all outcomes; just 5 best results are asked. Exceptionally wasteful Scan substantial base tables Join huge middle results Evaluate each positioning on each tuple Full sorting

Slide 22

Therefore the issue is: Unlike Boolean builds, positioning is menial. Positioning is prepared as a Monolithic part ( R ), constantly after the Boolean segment ( B ).

Slide 23

σ h.star=3 AND h.area=m.area How could we have been able to we make Boolean "top of the line"? Select * From Hotel h, Museum m Where h.star=3 AND h.area=m.area Hotel X Museum (1) appear then-channel

Slide 24

h.area=m.area σ h.star=3 scan(m) scan(h) Splitting and Interleaving σ h.star=3 AND h.area=m.area Select * From Hotel h, Museum m Where h.star=3 AND h.area=m.area Hotel X Museum (1) emerge then-channel (2) B is part into joins and determinations , which interleave with each other.

Slide 25

Relational Algebra Enables Such Splitting and Interleaving Algebra : Arabic word, " unite broken parts ". Administrators : to break separated the Boolean condition ( part ) determination, projection, join, and so forth. Logarithmic Laws : to unite them ( interleaving ) pushdown determination: σ c(R Χ S) = σ c(R) Χ S split choice: Σ c1 Λ c2 (R) = σ c1( σ c2 (R)) Relational polynomial math is the establishment of question enhancement

Slide 26

Rank-Relational Algebra By: Splitting positioning capacity Interleaving positioning with Boolean operations We accomplish: Support positioning as a top of the line inquiry sort Integrate positioning with conventional Boolean develops Enable effective inquiry arranges that were inaccessible

Slide 27

5 comes about 5 comes about R μ close sort cheap+close+related rank-join h.area=m.area B μ related σ h.star=3 h.area=m.area σ h.star=3 rank-check( h, shoddy ) scan(m) scan(m) scan(h) Ranking Query Plan split and interleave : decrease of middle results, along these lines preparing cost appear then-sort : gullible, pointless excess

Slide 28

Possibly requests of greatness change Implementation in PostgreSQL plan1: customary emerge then-sort arrange plan2-4: new positioning question arranges Observations: an amplified arrange space with arrangements of different expenses.

Slide 29

μ p3 μ p2 Scan p1 (H) Example Ranking capacity split, predicates assessed incrementally. Iterator structure. Yield arrange: how "encouraging" as indicated by the assessed predicates. H p1, p2, p3 H p1, p2 Select * From Hotel H Order By p1+p2+p3 Limit 1 GetNext() H p1

Slide 30

inn upper-bound μ p3 lodging upper-bound μ p2 inn upper-bound Scan p1 (H) Example H p1, p2, p3 H p1, p2 Select * From Hotel H Order By p1+p2+p3 Limit 1 H p1

Slide 31

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 32

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 33

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example h2 2.9 Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 34

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 35

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 36

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 37

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 38

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 39

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 40

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 41

H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1

Slide 42

interestingly: emerge then Sort p1+p2+p3 Select * From Hotel H Order By p1+p2+p3 Limit 1 Scan(H)

Slide 43

Data Model: of Rank-Relation Two Logical Properties Membership of the tuples: assessed Boolean predicates Order among the tuples: assessed positioning predciates rank-connection Membership( B ) : h.area=m.area, h.star=3 Order( R ) : close(h.addr, "BWI air terminal") related(m.collection,"dinosaur") cheap(h.price) B Membership( B ) : h.star=3 Order( R ) : cheap(h.price) A rank-connection

Slide 44

Ranking Principle: upper-bound decides the request F=cheap + close + related Processing in the " promising " arrange, maintaining a strategic distance from superfluous handling. B tuple h 1 tuple h 2 Without further handling h1, we can't yield any outcome; A

Slide 45

Ranking Operators To accomplish part and interleaving: New administrator : μ : assess positioning predicates piece by piece. Amplified administrators: rank-choice rank-join rank-examine rank-union, rank-convergence.

Slide 46

parser, rewriter administrators and arithmetical laws Logical Query Plan analyzer Plan enumerator cost show usage of administrators Physical Query Plan Results agent Impact of Rank-Relational Algebra Ranking Query .:tslid

SPONSORS