Past Relations as Sets: Multisets, Complex Questions, Successions, and Streams

Beyond relations as sets multisets complex objects sequences and streams l.jpg
1 / 51
0
0
1364 days ago, 423 views
PowerPoint PPT Presentation
For every date in 97, locate the ten least expensive stocks. Figure the % change of every stock amid 97, and afterward ... the expenses of all autos before sorting to decide the 10 least expensive. ...

Presentation Transcript

Slide 1

Past Relations as Sets: Multisets, Complex Objects, Sequences, and Streams Equivalences Among Relational Expressions; Aho, Sagiv, and Ullman, SIAM J. Processing, 1979 Four Views of Complex Objects: A Sophisticate's Introduction; Hull, LNCS 361 Nested Relations and Complex Objects, 1987 SRQL: Sorted Relational Query Language; Ramakrishnan et al., SSDBM 1998 Raghu Ramakrishnan

Slide 2

What's there Beyond Relations As Sets? Relations as multisets (genuine SQL) Object id (Object DBMSs) Nested structure (Object-Relational DBMSs) Partial structure (e.g., XML) Sequences (SQL:1999) Temporal databases (adaptations, "substantial" time) Streams (information encourages, ready frameworks) information/inquiry duality

Slide 3

Multiset Relations

Slide 4

Relations-As-Multisets (Bags) What if the quantity of duplicates of a tuple matters? E.g., Find the normal pay actually, a connection is a multiset in SQL, as a matter of course. The outcomes for conjunctive question regulation over relations-as-sets do not hold anymore! Regulation: No more extended in NP (Chaudhuri-Vardi); comparability is still in NP! Regulation of unions of conjunctive questions: Undecidable (Ioannidis-Ramakrishnan) Lesson: Small changes in the information model can have a major effect!

Slide 5

Sequences, Top-N

Slide 6

Querying Sequences in SQL:1999 Trend examination is hard to do in SQL-92: Find the % change in month to month deals Find the main 5 item by aggregate deals Find the trailing n - day moving normal of offers The initial two inquiries can be communicated with trouble, yet the third can't be communicated in SQL-92 if n is a parameter of the inquiry. The WINDOW proviso in SQL:1999 permits us to compose such questions over a table saw as a succession (verifiably, in light of client indicated sort keys)

Slide 7

Stocks(date, image, close) Trans(cust, image, date, offers) Motivation Find the trailing week after week moving normal for every stock. For every date in '97, locate the ten least expensive stocks. Figure the % change of every stock amid '97, and afterward discover stocks in the main 5% (those that changed the most). For every week, locate the stock in which Joe had the most contributed.

Slide 8

Sequences Simple succession: A connection (coherently) sorted on a key; key characteristics are called sequencing qualities. Composite arrangement: A connection that is initially divided utilizing gathering qualities , then sorted inside every segment by sequencing characteristics. ord(t): Ordinal # of tuple inside its gathering

Slide 9

Sequence Algebra Relational variable based math, stretched out to work over multisets of tuples, is the establishment for SQL. We extend the social administrators to take a shot at sorted relations. Furthermore, we characterize some new administrators over successions. (Stand out of these new administrators is essential; the rest are for accommodation.)

Slide 10

Extending Relational Algebra The administrators are: s, p, x, u, - , unmistakable s can utilize conditions over ord attr of contribution To extend these operations to take a shot at sorted relations, we should characterize the gathering and sequencing properties of the outcome. We do this essentially: the gathering and sequencing quality records are unfilled for the yield of the above operations. I.e., the outcome is unordered.

Slide 11

Extending the Join Operator Since cross-item ( x ) does not spread the ord estimations of the info table to its yield, we should characterize join as a primitive operation! Join ( ): Can allude to ord estimations of data sources. Left-external join ( ): Left tuples with no coordinating right tuple show up in result with NULLs in right tuple attrs. Result has exhaust gathering/sequencing records.

Slide 12

Creating a Sequence The sequencing administrator y is our major augmentation to rel variable based math. Connected to a table R, with gathering attrs g and sequencing attrs s, it gives back the relating composite arrangement.

Slide 13

ShiftAll Operator D (R,I) joins each tuple t in arrangement R with all tuples t' in a similar gathering to such an extent that ord(t') = ord(t)+I. In the event that such t' does not exist, utilize NULLs. Can be characterized as left-external join took after by sequencing; included for comfort. Can be stretched out to utilize an altered ordinal, for example, FIRST or LAST, furthermore to adjust each tuple t to more than one other tuple t'.

Slide 14

Shift Operator d Modifies each tuple t of R by including the estimation of the sequencing properties of tuple t' (whose ord esteem is moved by I). Contrasts from ShiftAll (trailed by projection of undesirable segments) if R has copies in the sequencing segments. Like ShiftAll, Shift can be characterized regarding left-external join and sequencing.

Slide 15

SELECT S.g,S.t,S.v, SHIFTALL (S,- 1).v, SHIFT (S,1).t FROM R GROUP BY g, SEQUENCE BY t AS S SHIFTALL offers access to all attrs of the tuple(s) at the moved ordinal position. Move gives the sequencing attr estimations of the tuple at the moved ord position.

Slide 16

SELECT t, AVG(v) OVER 0 TO 1 FROM R SEQUENCE BY t POSITIONAL AGGREGATE VALUE-BASED AGGREGATE SELECT t, AVG(v) OVER VALUES 0 TO 1 FROM R SEQUENCE BY t

Slide 17

SELECT expr-list FROM table-list WHERE predicate GROUP BY expr-list SEQUENCE BY expr-list WITH predicate HAVING predicate ORDER BY expr-list SRQL Queries FROM provision can contain expressions including sequencing; first assess these and discover cross-push. At that point apply determinations in WHERE proviso, and sort according to GROUP BY and SEQUENCE BY provisos to create a succession. For each tuple (and each agg expr in SELECT ), recognize window and apply agg fn. Apply HAVING proviso, at long last ORDER BY .

Slide 18

SELECT V.name, Q.name FROM Volcano AS V, Quake SEQUENCE BY time AS Q WHERE Q.time <= V.time AND (SHIFT(Q,1).time > V.time OR SHIFT(Q,1).time IS NULL) AND Q.magnitude > 7 For every fountain of liquid magma emission where the latest shudder was > 7, discover the name of this tremor: Using some syntactic sugar: SELECT V.name, Q.name FROM Volcano AS V, Quake SEQUENCE BY time AS Q WHERE Q.time PRECEDES= V.time AND Q.magnitude > 7

Slide 19

Aggregate Functions As in SQL, we have to extend the variable based math to permit the utilization of total capacities like MIN and SUM. Must characterize "window" for every use of an agg fn. In SQL, window is a parcel made by GROUP BY; agg operation connected to every segment, and yields one esteem for every segment. In SRQL, for each agg fn, can have an alternate window for each tuple.

Slide 20

Window Aggregate Op w (R, p, f, V): R is an arrangement, f is an agg fn, V is an attr of R, p is a determination predicate; result is a succession with same gathering/seq records as R For every info tuple t, yield contains < t, f( p v ( s t.g=R.g and p(t) (R) ) > I.e., for each tuple t of R, apply p(t) to discover its "window" inside the gathering of t, apply agg fn f to the multiset of V-values in window, and incorporate with t in result.

Slide 21

SELECT t, AVG(v) OVER 0 TO 1 FROM R SEQUENCE BY t POSITIONAL AGGREGATE VALUE-BASED AGGREGATE SELECT t, AVG(v) OVER VALUES 0 TO 1 FROM R SEQUENCE BY t

Slide 22

SELECT day, AVG(profit) OVER 0 TO 1 FROM Sales SEQUENCE BY day WITH vol>100 Find 2-day moving normal of the benefit made on deals volume > 100:

Slide 23

SELECT day, AVG(profit) OVER 0 TO 1 FROM Sales SEQUENCE BY day WHERE vol>100 Considering just days with deals volume > 100, discover 2-day moving normal of the benefit: Placing choice in WHERE proviso influences window for positional collection!

Slide 24

SELECT item, day, AVG(vol) OVER 0 TO 1 FROM Sales GROUP BY item SEQUENCE BY day Find the 2-day moving normal of volume sold for every item: as a result, makes a succession by day for every item, and figures the moving normal over each of these arrangements. Watch how this sums up SQL's GROUP BY : shows force of composite arrangements and collection.

Slide 25

The WINDOW Clause SELECT L.state, T.month, AVG(S.sales) OVER W AS movavg FROM Sales S, Times T, Locations L WHERE S.timeid=T.timeid AND S.locid=L.locid WINDOW W AS (PARTITION BY L.state ORDER BY T.month RANGE BETWEEN INTERVAL `1' MONTH PRECEDING AND INTERVAL `1' MONTH FOLLOWING ) Let the consequence of the FROM and WHERE provisions be "Temp". (Thoughtfully) Temp is divided by PARTITION BY proviso. Like GROUP BY, however the answer has one column for every line in a segment, not one line for every parcel! Every segment is sorted by ORDER BY provision. For every line in a segment, the WINDOW statement makes a "window" of close-by (going before or succeeding) tuples. Can be esteem based, as in case, utilizing RANGE Can be founded on number of columns to incorporate into the window, utilizing ROWS provision The total capacity is assessed for every line in the parcel utilizing the relating window. New total capacities that are valuable with windowing incorporate RANK (position of a column inside its segment) and its variations DENSE_RANK, PERCENT_RANK, CUME_DIST.

Slide 26

Top N Queries If you need to discover the 10 (or something like that) least expensive autos, it would be pleasant if the DB could abstain from processing the expenses of all autos before sorting to decide the 10 least expensive. Thought: Guess at a cost c with the end goal that the 10 least expensive all cost not as much as c, and that not very numerous more cost less. At that point include the determination cost<c and assess the inquiry. In the event that the figure is correct, awesome, we maintain a strategic distance from calculation for autos that cost more than c. On the off chance that the figure isn't right, need to reset the choice and recompute the first inquiry.

Slide 27

Top N Queries SELECT P.pid, P.pname, S.sales FROM Sales S, Products P WHERE S.pid=P.pid AND S.locid=1 AND S.timeid=3 ORDER BY S.sales DESC OPTIMIZE FOR 10 ROWS OPTIMIZE FOR develop is not in SQL:1999! Cut-off esteem c is picked by streamlining agent. SELECT P.pid, P.pname, S.sales FROM Sales S, Products P WHERE S.pid=P.pid AND S.locid=1 AND S.timeid=3 AND S.sales > c ORDER BY S.sales DESC

Slide 28

Streams

Slide 29

Streams are Forever … So no idea of question over "whole" stream Queries m

SPONSORS