Past SQL: Structured Data Retrieval by Ranking Chengkai Li October 25, 2006 CS511 visitor address
Slide 2About Myself Advisor: Kevin Chang (UIUC) Research Interests Structured Data Retrieval and Ranking Query Processing Web Information Integration XML Query Processing Misc. http://www.ews.uiuc.edu/~cli cli [at] uiuc [dot] edu
Slide 3Boolean Semantics of SQL: Success and Barrier Example: SELECT * FROM Homes H WHERE 400K<price<600K AND #bedroom = 4 Problems: Query Semantics: True or False , nothing in the center What do we need?: likeness, importance, inclination, and so forth . Inquiry Results Presentation: Too numerous answers ("data over-burden" issue): hard to process and investigate; or excessively few answers : no enough alternatives to consider What do we need?: a positioned list
Slide 4"Google" on Databases: From Boolean Query to Fuzzy Retrieval
Slide 5Fuzzy Retrieval by Ranking (Top-k) Queries Top k comes about by a positioning capacity with various criteria SELECT * FROM Homes H Order By 0.3*big(H.size)+0.7*cheap(H.price) Limit 5
Slide 6Beyond Boolean Semantics: Ranking Useful and vital in some true database applications: E-Commerce Find the best lodging bargains by value, separate, and so on. Sight and sound Databases Find the most comparative pictures by shading, shape, surface, and so forth. Content Retrieval and Document Management Find the most significant records/reports/pages. OLAP, Decision Support Find the top beneficial clients.
Slide 7Application 1: E-trade The home inquiry illustration More points of interest later
Slide 8Application 2: Decision Support What are the main 5 territories to promote another protection item? SELECT zipcode, AVG(income*w1+age*w2+credit*w3) as score FROM client WHERE occupation='student' GROUP BY zipcode ORDER BY score LIMIT 5
Slide 9Application 3: Multimedia Databases Find the most comparative pictures by shading, shape, surface, and so on. Online demo: http://amazon.ece.utexas.edu/~qasim/research.htm
Slide 10Application 4: Keyword Queries Text information exist together with organized information IR-style database questions [slides affability of Vagelis Hristidis]
Slide 11Example – Complaints Database Schema
Slide 12Example - Complaints Database Data Complaints Customers Products
Slide 13Example – Keyword Query [ Maxtor Netvista ] Complaints Customers Products
Slide 14Keyword Query Semantics (meaning of "report" in databases) Keywords are : in same tuple in same connection in tuples associated through essential remote key connections Score of result : separation of watchwords inside a tuple remove between catchphrases as far as essential outside key associations IR-style score of result tree
Slide 15Example – Keyword Query [ Maxtor Netvista ] Complaints Customers Products Results: (1) c 3, (2) p 2 c 3, (3) p 1 c 1
Slide 16Great Interest in Database Community Top-k Queries Top-k to range inquiry change: [ChaudhuriG99] Indexing: [ChangBC+00], [TsaparasPK+03] View: [HristidisKP01], [YiYY+03], [DasGK+06] Query administrators and calculations: [CareyK97], [NatsevCS+01], [IlyasAE03], [IlyasSA+04], [ZhangHC+06] Ranking polynomial math : [ChaudhuriRW05] [LiCI+05] Ranking total inquiries: [LiCI06] Other Types of Queries Keyword inquiries : [BhalotiaHN+02], [AgrawalCD02], [HristidisP02] IR-style TFIDF positioning: [AgrawalCD+03] Ranking of qualities: [DasHK+06] Preference questions : [AgrawalW00], [Kießling02] Skyline questions : [BorzsonyiSK01] DBRank07
Slide 17Theme Life Cycles: Bursting Topics(Current Bursting) [slide politeness of Chengixang Zhai]
Slide 18RankSQL: a RDBMS with Efficient Support of Ranking Queries Algebraic Foundation and Optimization Framework SPJ inquiries (SELECT … FROM … WHERE … ORDER BY … ) Ad-Hoc Ranking Aggregate Queries beat k gathers rather than tuples (SELECT … FROM … WHERE … GROUP BY … ORDER BY … )
Slide 19Example: Trip Planning Suggest a lodging to stay and an exhibition hall to visit: Select * From Hotel h, Museum m Where h.star=3 AND h.area=m.area Order By cheap(h.price) + close(h.addr, "BWI airplane terminal") + related(m.collection,"dinosaur") Limit 5 participation measurement: Boolean predicates, Boolean capacity B arrange measurement: positioning predicates, scoring capacity R
Slide 205 comes about R sort cheap+close+related B h.area=m.area σ h.star=3 scan(m) scan(h) Processing Ranking Queries in Traditional RDBMS Select * From Hotel h, Museum m Where B Order By R Limit 5
Slide 215 comes about R B Problems of Traditional Approach Naïve Materialize-then-Sort plot Overkill total request of all outcomes; just 5 best results are asked. Exceptionally wasteful Scan substantial base tables Join huge middle results Evaluate each positioning on each tuple Full sorting
Slide 22Therefore the issue is: Unlike Boolean builds, positioning is menial. Positioning is prepared as a Monolithic part ( R ), constantly after the Boolean segment ( B ).
Slide 23σ h.star=3 AND h.area=m.area How could we have been able to we make Boolean "top of the line"? Select * From Hotel h, Museum m Where h.star=3 AND h.area=m.area Hotel X Museum (1) appear then-channel
Slide 24h.area=m.area σ h.star=3 scan(m) scan(h) Splitting and Interleaving σ h.star=3 AND h.area=m.area Select * From Hotel h, Museum m Where h.star=3 AND h.area=m.area Hotel X Museum (1) emerge then-channel (2) B is part into joins and determinations , which interleave with each other.
Slide 25Relational Algebra Enables Such Splitting and Interleaving Algebra : Arabic word, " unite broken parts ". Administrators : to break separated the Boolean condition ( part ) determination, projection, join, and so forth. Logarithmic Laws : to unite them ( interleaving ) pushdown determination: σ c(R Χ S) = σ c(R) Χ S split choice: Σ c1 Λ c2 (R) = σ c1( σ c2 (R)) Relational polynomial math is the establishment of question enhancement
Slide 26Rank-Relational Algebra By: Splitting positioning capacity Interleaving positioning with Boolean operations We accomplish: Support positioning as a top of the line inquiry sort Integrate positioning with conventional Boolean develops Enable effective inquiry arranges that were inaccessible
Slide 275 comes about 5 comes about R μ close sort cheap+close+related rank-join h.area=m.area B μ related σ h.star=3 h.area=m.area σ h.star=3 rank-check( h, shoddy ) scan(m) scan(m) scan(h) Ranking Query Plan split and interleave : decrease of middle results, along these lines preparing cost appear then-sort : gullible, pointless excess
Slide 28Possibly requests of greatness change Implementation in PostgreSQL plan1: customary emerge then-sort arrange plan2-4: new positioning question arranges Observations: an amplified arrange space with arrangements of different expenses.
Slide 29μ p3 μ p2 Scan p1 (H) Example Ranking capacity split, predicates assessed incrementally. Iterator structure. Yield arrange: how "encouraging" as indicated by the assessed predicates. H p1, p2, p3 H p1, p2 Select * From Hotel H Order By p1+p2+p3 Limit 1 GetNext() H p1
Slide 30inn upper-bound μ p3 lodging upper-bound μ p2 inn upper-bound Scan p1 (H) Example H p1, p2, p3 H p1, p2 Select * From Hotel H Order By p1+p2+p3 Limit 1 H p1
Slide 31H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 32H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 33H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example h2 2.9 Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 34H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 35H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 36H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 37H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 38H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 39H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 40H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 41H p1, p2, p3 μ p3 H p1, p2 μ p2 H p1 Scan p1 (H) Example Select * From Hotel H Order By p1+p2+p3 Limit 1
Slide 42interestingly: emerge then Sort p1+p2+p3 Select * From Hotel H Order By p1+p2+p3 Limit 1 Scan(H)
Slide 43Data Model: of Rank-Relation Two Logical Properties Membership of the tuples: assessed Boolean predicates Order among the tuples: assessed positioning predciates rank-connection Membership( B ) : h.area=m.area, h.star=3 Order( R ) : close(h.addr, "BWI air terminal") related(m.collection,"dinosaur") cheap(h.price) B Membership( B ) : h.star=3 Order( R ) : cheap(h.price) A rank-connection
Slide 44Ranking Principle: upper-bound decides the request F=cheap + close + related Processing in the " promising " arrange, maintaining a strategic distance from superfluous handling. B tuple h 1 tuple h 2 Without further handling h1, we can't yield any outcome; A
Slide 45Ranking Operators To accomplish part and interleaving: New administrator : μ : assess positioning predicates piece by piece. Amplified administrators: rank-choice rank-join rank-examine rank-union, rank-convergence.
Slide 46parser, rewriter administrators and arithmetical laws Logical Query Plan analyzer Plan enumerator cost show usage of administrators Physical Query Plan Results agent Impact of Rank-Relational Algebra Ranking Query .:tslid
SPONSORS
SPONSORS
SPONSORS