Devices for High Performance Network Monitoring

1604 days ago, 533 views
PowerPoint PPT Presentation

Presentation Transcript

Slide 1

Apparatuses for High Performance Network Monitoring Les Cottrell , Presented at the Internet2 Fall individuals Meeting, Philadelphia, Sep 2005 sep05.ppt Partially subsidized by DOE/MICS for Internet End-to-end Performance Monitoring (IEPM)

Slide 2

Outline Data concentrated sciences (e.g. HEP) necessities to move huge volumes of information overall Requires understanding and successful utilization of quick systems Requires consistent checking Outline of talk: What does observing give? Dynamic E2E estimations today and difficulties Visualization, anticipating, issue ID Passive checking Netflow, SNMP, Conclusions

Slide 3

Uses of Measurements Automated issue recognizable proof & investigating: Alerts for system heads, e.g. Data transfer capacity changes in time-arrangement, iperf, SNMP Alerts for frameworks individuals OS/Host measurements Forecasts for Grid Middleware, e.g. copy director, information situation Engineering, arranging, SLA (set & check) Security: spot peculiarities, interruption location Accounting

Slide 4

Active E2E Monitoring

Slide 5

Using Active IEPM-BW estimations Focus on elite for a couple has expecting to send information to a little number of teammate destinations, e.g. HEP layered model Makes standard estimations with devices Ping (RTT, network), traceroute pathchirp, ABwE, pathload (parcel combine scattering) iperf (single & multi-stream), thrulay, Bbftp, bbcp (record exchange applications) Looking at GridFTP however complex requiring reestablishing authentications Lots of investigation and representation Running at major HEP destinations: CERN, SLAC, FNAL, BNL, Caltech to around 40 remote locales

Slide 6

Ping/traceroute Ping still valuable ( in addition to ca reste … ) Is way associated? RTT, misfortune, jitter Great for low execution joins (e.g. Advanced Divide), e.g. AMP (NLANR)/PingER (SLAC) Nothing to introduce, yet blocking OW AMP/I2 comparative yet O ne W ay But needs server introduced at flip side and great clocks Traceroute Needs great representation (traceanal/SLAC) Little use for committed λ layer 1 or 2 However still need to know topology of ways

Slide 7

Packet Pair Dispersion Bottleneck Min dividing At bottleneck Spacing safeguarded On higher speed joins Send bundles with known division See how partition changes because of bottleneck Can be low system meddlesome, e.g. ABwE just 20 parcels/course, additionally quick < 1 sec From PAM paper, pathchirp more exact than ABwE, however Ten times as long (10s versus 1s) More system activity (~factor of 10) Pathload component of 10 again more IEPM-BW now underpins ABwE, Pathchirp, Pathload

Slide 8

BUT … Packet combine scattering depends on precise planning of entomb bundle partition At > 1Gbps this is getting past determination of Unix timekeepers AND 10GE NICs are offloading capacity Coalescing interferes with, Large Send & Receive Offload, TOE Need to work with TOE merchants Turn off offload (Neterion bolsters different stations, can dispose of offload to get more precise planning in host) Do timing in NICs No principles for interfaces

Slide 9

Achievable Throughput Use TCP or UDP to send as much information as can memory to memory from source to goal Tools: iperf (bwctl/I2), netperf, thrulay (from Stas Shalunov/I2), udpmon … Pseudo document duplicate: Bbcp and GridFTP likewise have memory to memory mode

Slide 10

Iperf versus Thrulay Maximum RTT Iperf has multi streams Thrulay more reasonable & gives RTT They concur well Throughput ~ 1/avg(RTT) Average RTT ms Minimum RTT Achievable throughput Mbits/s

Slide 11

BUT … At 10Gbits/s on transoceanic way Slow begin takes once again 6 seconds To get 90% of estimation in clog shirking need to quantify for 1 minute (5.25 GBytes at 7Gbits/s (today's ordinary execution) Needs booking to scale, and still, at the end of the day … It's not plate to-circle or application-to application So utilize bbcp, bbftp, or GridFTP

Slide 12

AND … For testbeds, for example, UltraLight, UltraScienceNet and so on need to save the way So the estimation framework needs to add ability to hold the way (so require API to reservation application) OSCARS from ESnet building up a web administrations interface ( For lightweight have a "tireless" ability For more meddlesome, must save just before make estimation

Slide 13

Visualization & Forecasting

Slide 14

Visualization MonALISA ( Caltech apparatus for bore down & representation Access to late (most recent 30 days) information For IEPM-BW, PingER and screen have particular parameters Adding web benefit access to ML SLAC information>MonALISA Client=>Start MonALISA GUI => Groups => Test => Click on IEPM-SLAC

Slide 15

ML illustration

Slide 16

Changes in system topology (BGP) can bring about emotional changes in execution Hour Samples of traceroute trees created from the table Los-Nettos (100Mbps) Remote host Snapshot of traceroute rundown table Notes: 1. Caltech misrouted by means of Los-Nettos 100Mbps business net 14:00-17:00 2. ESnet/GEANT dealing with courses from 2:00 to 14:00 3. A past event went un-saw for 2 months 4. Next stride is to auto recognize and advise Drop in execution (From unique way: SLAC-CENIC-Caltech to SLAC-Esnet-LosNettos (100Mbps) - Caltech ) Back to unique way Dynamic BW limit (DBC) Changes distinguished by IEPM-Iperf and AbWE Mbits/s Available BW = (DBC-XT) Cross-activity (XT) Esnet-LosNettos section in the way (100 Mbits/s) ABwE estimation one/minute for 24 hours Thurs Oct 9 9:00am to Fri Oct 10 9:01am

Slide 17

Forecasting Over-provisioned ways ought to have truly level time arrangement Short/nearby term smoothing Long term direct patterns Seasonal smoothing But occasional patterns (diurnal, week after week should be represented) on around 10% of our ways Use Holt-Winters triple exponential weighted moving midpoints

Slide 18

Alerting Have false positives down to sensible level, so sending cautions Experimental Typically few every week. As of now by email to organize administrators Adding pointers to additional data to help administrator in further diagnosing the issue, including: Traceroutes, observing host parms, time arrangement for RTT, pathchirp, thrulay and so forth. Plan to add on-request estimations (amped up for perfSONAR)

Slide 19

Integration Integrate IEPM-BW and PingER estimations with MonALISA to give extra get to Working to make traceanal a callable module Integrating with AMP When OK with guaging, occasion recognition will sum up

Slide 20

Passive - Netflow

Slide 21

Netflow et. al. Switch distinguishes stream by sce/dst ports, convention Cuts record for every stream: src, dst, ports, convention, TOS, begin, end time Collect records and investigate Can be a considerable measure of information to gather every day, needs part cpu Hundreds of MBytes to GBytes No meddling movement, genuine: activity, associates, applications No records/pwds/certs/keys No reservations and so on Characterize movement: beat talkers, applications, stream lengths and so forth. Web 2 spine after week/SLAC:

Slide 22

Typical day's streams Very much work in advance Look at SLAC outskirt Typical day: >100KB streams ~ 28K streams/day ~ 75 locales with > 100KByte mass information streams Few hundred streams > GByte

Slide 23

Forecasting? Gather records for a few weeks Filter 40 noteworthy associate locales, enormous (> 100KBytes) streams, mass transport applications/ports (bbcp, bbftp, iperf, thrulay, scp, ftp Divide by remote site, total parallel streams Fold information onto one week, see groups at known limits and RTTs ~ 500K streams/mo

Slide 24

Netflow et. al. Tops at known limits and RTTs RTTs may recommend windows not streamlined

Slide 25

what number locales have enough streams? In May '05 discovered 15 destinations at SLAC fringe with > 1440 (1/30 mins) streams Enough for time arrangement guaging for occasional impacts Three locales (Caltech, BNL, CERN) were effectively observed Rest were "free" Only 10% destinations have huge regular impacts in dynamic estimation Remainder require less streams So encouraging

Slide 26

Compare dynamic with uninvolved Predict stream throughputs from Netflow information for SLAC to Padova for May '05 Compare with E2E dynamic ABwE estimations

Slide 27

Netflow impediments Use of element ports. GridFTP, bbcp, bbftp can utilize settled ports P2P frequently utilizes dynamic ports Discriminate sort of stream in light of headers (not depending on ports) Types: mass information, intelligent … Discriminators: between landing time, length of stream, parcel length, volume of stream Use machine learning/neural nets to bunch streams E.g. Aggregation of parallel streams (not troublesome) SCAMPI/FFPF/MAPI permits more adaptable stream definition See application logs (OK if little number)

Slide 28

More difficulties Throughputs frequently rely on upon non-organize variables: Host interface speeds (DSL, 10Mbps Enet, remote) Configurations (window sizes, has) Applications (circle/record versus mem-to-mem) Looking at conveyances by webpage, regularly multi-modular Predictions may have huge standard deviations How much to answer to application

Slide 29

Conclusions Traceroute dead for devoted ways Some things keep on working Ping, owamp Iperf, thrulay, bbftp … yet Packet combine scattering needs work, its time might be over Passive looks encouraging with Netflow SNMP needs AS to make open Capture costly ~$100K ( Joerg Micheel ) for OC192Mon

Slide 30

More data Comparisons of Active Infrastructures: Some dynamic open estimation frameworks: at 10Gbits/s (DAG), w