Cutting edge Information Systems

2539 days ago, 854 views
PowerPoint PPT Presentation
The Digital Age. Advanced data shapes the paste for mixing the fields of figuring, correspondence and entertainment.At the focal point of this unrest is information that is put away, gotten to and conveyed in computerized design. A percentage of the significant issues encompassing this sort of information are:Data is to be accessible to the clients at whatever time and anyplace and with the wanted QoS.Data access must hold fast to security a

Presentation Transcript

Slide 1

Cutting edge Information Systems Avi Silberschatz Department of Computer Science Yale University URL:

Slide 2

The Digital Age Digital data frames the paste for mixing the fields of processing, correspondence and excitement. At the focal point of this insurgency is information that is put away, got to and conveyed in advanced organization. A portion of the real issues encompassing this kind of information are: Data is to be accessible to the clients at whatever time and anyplace and with the fancied QoS. Information get to must hold fast to protection and security arrangements. Information Interoperability. Quick access to information, which infers bolster for inquiries with surmised answers. Information examination and mining abilities over extensive datasets. A significant number of the advances in data frameworks are because of improvement of new innovations. These advances, thusly, are pushing the improvements of even more current advances.

Slide 3

Research Challenges Storage recovery and conveyance of interactive media information Storage System Issues QoS issues of nonstop media information (e.g., video and sound) Approximate answers helpful for huge informational indexes valuable for Web seeking Data mining Discovering "intriguing" examples in expansive informational indexes Discovering "fascinating" examples from deficient Data Interoperability Privacy and security Next era Networks Converged systems Network Management

Slide 4

Multimedia Data Regular Data content, paired, picture Database Data tuples, objects Continuous Media Data Video Data The show (playback) of the information must be ceaseless with a settled rate , which is normally 30 outlines/second. A watcher may wish to control the way the information is to be shown by applying different VCR-sort operations to the video information. Sound Data The playback must be persistent with a settled rate, which is reliant on the example rate. An audience may wish to control the way the information is played back.

Slide 5

Storage System Issues Rapid development away limit request overall introduced stockpiling: 738 PetaByte in 2000 more than 75% every year stockpiling limit increment throughout the following 5 years achieves ZettaByte in 2009 information put away at Global 2500 organizations twofold at regular intervals information put away at web based business organizations develop at 400% a year Management 40-half of organization IT spending plan is spent on capacity portion of IT spending plan spent on capacity is relied upon to develop taken a toll for capacity administration surpasses cost of capacity hardware administration: $300 per GB every year low-end stockpiling: $14 - $50 per GB (bundled, controlled, arranged) administration cost is relied upon to develop Storage Requirement 24 x 7 Disaster recuperate

Slide 6

Storage is Moving Into the Network Motivation Use ware IP based systems IT staff know-how Distance and general get to Applications Disaster recuperation Archiving Backups Content Distribution Managed capacity Value included capacity administrations Consolidation of capacity

Slide 7

Storage is overseen conceivable by various spaces Storage gadgets are associated over systems administration foundation Client webpage #1 LAN Client website #2 LAN Metro/WAN record server LAN document servers record servers SAN IP-Based Network Storage

Slide 8

IP-based Network Storage (Cont.) IETF principles are being drafted Most prevalent: iSCSI and FCIP Almost all systems administration and capacity organizations are taking an interest in these norms Issues Performance Reliability Future end-to-end iSCSI; end-to-end IP stockpiling organizing? destruction of FC? Mixture? FC (InfiniBand) SAN islands associated over IP systems FC SANs in server farms got to by IP systems

Slide 9

Network Storage Security Customers may not believe the capacity specialist co-op (SSP) Storage combination over various clients is fundamental to make stockpiling outsourcing practical. In any case, clients may not believe each other Threat show Disclosure of information to a spy capturing correspondence Disclosure of information to capacity specialist organization (SSP) and to different clients of the SSP Manipulation of correspondence by an assailant Manipulation of information by the SSP or different clients of the SSP Challenges high throughput encryption (e.g., 1Gbps, 10 Gbps) security without frustrating execution

Slide 10

Multimedia Storage and Delivery Issues The span of a few databases is huge, particularly those that are utilized for information mining (e.g., money enroll exchanges). 3 0 terabytes biggest business database Some data sources create information at a surprising rate (e.g., satellite pictures). EOS – 1-2 terabytes for each day The BBC is wanting to digitize the most recent 50 years of programming. Ceaseless media information is voluminous: 100 moment MPEG-1 video requires 1.125GB 100 moment HDTV video requires 15GB Continuous media information require bolster for QoS.

Slide 11

System Resources to be Managed for QoS Storage Server Resources Tertiary Storage I/O Bus Secondary Storage I/O Bus Buffer Space Processor(s) Network

Slide 12

Research Issues Admission control Disk Scheduling Buffer Management Storage Management information format changing plate exchange rates circle striping meta information adaptation to internal failure Tertiary stockpiling

Slide 13

Cycle-based Scheduling Let T be the length of an administration cycle Maintain a line of solicitations comparing to a demand to see a CM cut. Each ask for has a related rate r i . For each demand, a cradle is distributed of size Requests in the line are served in a cyclic request utilizing twofold buffering. In each cycle I: get information from plate to cushion (I mod 2) exchange information from the (I + 1 mod 2) cradle to the customer

Slide 14

Disk Scheduling Request are adjusted in administration cycles (rounds). In the start of an administration cycle solicitations are requested in C-SCAN arrange. In the start of each administration cycle, it is guaranteed that hold. (where are the rotational deferral, settle time, and look for time, separately, and B is the cushion pool estimate). The estimation of T is balanced relying upon the workload. In each administration cycle, bits of information recovered for each demand.

Slide 15

Admissions Control Queue is limited by an affirmation control conspire For each demand, the administration time for a demand is assessed. An ask for is conceded just if the whole of the assessed benefit times for all conceded demands does not surpass the term of administration cycle T.

Slide 16

Admission Control (cont.) Reserve a small amount of administration cycle T, say for nonstop media demands. An ask for (ongoing, non-constant), is conceded if A constant demand is conceded if Above plan guarantees both nonstop and non-consistent media solicitations are dispensed time amid an administration cycle. at whatever time amid an administration cycle unused by ceaseless media solicitations is distributed to non-constant media demands.

Slide 17

Length of T What about the length of T?

Slide 18

Buffer Space Constraints Let B be the accessible support measure Let N be the quantity of conceded customers Assume boundless plate transfer speed Requirements: N T For a given cradle estimate B, the bigger T, the less customers can be conceded.

Slide 19

Disk Bandwidth Constraints Assume endless support space Use C-SCAN plate planning Requirements: N T The bigger T the bigger N is

Slide 20

Combining Disk & Buffer Constraints N circle imperative cushion limitation T The ideal T is acquired by tackling a quadratic condition of the plate and cradle space limitations.

Slide 21

Minimizing Response Time Under a few workloads (e.g., ask for with little, for example, 64 Kbps), the estimation of T that expands throughput can be high (e.g., 20 secs.). This may yield high reaction times. Arrangement: keep up little T values all together not to debase throughput, for each demand R i information is prefetched from circle in each k i benefit cycles (rather than in each administration cycle) The greatest measure of information prefetched is cradle space designated to R i is

Slide 22

Minimizing Response Time (contd.) Issues: Calculation of k i 's Admission control: benefit cycles to oversee For a demand R i , finding the minimum stacked administration cycles to decrease reaction time, begin another demand R i in the principal conceivable administration cycle and after that move it incrementally to the chose slightest stacked administration cycle. This arrangement likewise gives higher throughput to workloads with little r i 's

Slide 23

Querying Huge Data Sets Give me all items (e.g., pictures) that resemble this. On the off chance that we are managing PetaBytes of information, this may take days or weeks. One arrangement is to catch "meta information" data about the put away questions as the items are put away in the database. Questioning is done against the "meta information". Real issue – nature of the meta information. Another arrangement is to offer help for "estimated answers".

Slide 24

Providing Approximate Answers Traditional databases give correct responses to questions, however... In gigantic information conditions, can take minutes to hours because of circle I/Os In dispersed situations, information might be remote or as of now inaccessible progressively conditions, even single I/O might be too moderate

Slide 25

Providing Approximate Answers (Cont.) Trade-off precision for execution: e.g., 30 minutes for correct answer versus 3 seconds for an inexact answer with 5% blunder Examples where quick rough answers are favored: bore down question grouping in information mining: hunting down the "fascinating" inquiries speculative answer when base information inaccessible driving digits suffice (e.g., 3.5 million versus 3.512 million) Can continue to the correct answer, if coveted

Slide 26

Network Result (w/blunder limits) The AQUA System Approximate Query Engine for information warehousing (Fast) Query on the Aqua summations DBMS for Large Data Warehouse Aqua abstracts (Slow) Query on the stockroom information SQL Query Q SQL Query Q' HTML XML Browser Excel Aqua precomputes and keeps up little summaries of the information Aqu