Kasi L.K. Anbumony Bureau of Electrical and PC Building Reddish College Coppery, AL 36849

2800 days ago, 953 views
PowerPoint PPT Presentation
Consecutive clothing takes 6 hours for 4 loads. On the off chance that they learned ... Clothing similarity: Divide our washer into three machines that perform the wash, ...

Presentation Transcript

Slide 1

Superscalar Processors Kasi L.K. Anbumony Department of Electrical and Computer Engineering Auburn University Auburn, AL 36849

Slide 2

Outline Pipelining: Motivation Pipeline Hazards Advanced Pipelining Instruction Level Parallelism (ILP) Multiple Issue (MIPS Superscalar) Static Multiple Issue (SW driven) Dynamic Multiple Issue (HW driven) Superscalar Processor Conclusion

Slide 3

Pipelining: Motivation Multiple guidelines are covered in execution. To misuse the Instruction level parallelism(ILP) One of method to make the processors quick Some terms: Stages Task Order Throughput In pipeline the stages happen simultaneously (or) parallely Possible the length of we have isolate assets for every stage

Slide 4

6 PM Midnight 7 8 9 11 10 Time A B C D Sequential Laundry: Non-pipelined Sequential clothing takes 6 hours for 4 loads If they learned pipelining, to what extent would clothing take? 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e r

Slide 5

40 30 40 20 A B C D Pipelined Laundry: Start work ASAP Pipelined clothing takes 3.5 hours for 4 loads 6 PM Midnight 7 8 9 11 10 Time T a s k O r d e r

Slide 6

40 30 40 20 A B C D Pipelining: Lessons 6 PM 7 8 9 Improvement in throughput of whole workload without enhancing at whatever time to finish a solitary load Pipeline rate constrained by slowest pipeline arrange Multiple errands working at the same time Potential speedup = Number pipe stages Unbalanced lengths of pipe stages lessens speedup Time to "fill" pipeline and time to "deplete" it diminishes speedup Time T a s k O r d e r

Slide 7

Comparison: Example Consider a non-pipelined machine with 5 execution ventures of lengths 200 ps, 100 ps, 200 ps, 200 ps, and 100 ps. Because of clock skew and setup, pipelining adds 5 ps of overhead to every guideline arrange. Disregarding inactivity affect, what amount of speedup in the direction execution rate will we pick up from a pipeline?

Slide 8

800 200 100 200 100 200 100 200 100 200 100 200 100 200 100 200 100 Sequential versus Pipelined Execution Sequential Execution Pipelined Execution 200 100 200 100 200 100 200 100

Slide 9

Speed Up Equation for Pipelining Speedup from pipelining = = = Ideal CPI pipelined = CPI unpipelined/Pipeline profundity Speedup =

Slide 10

Speed Up Equation for Pipelining

Slide 11

It's Not That Easy for Computers: Limitation Limits to pipelining: Hazards keep next direction from executing amid its assigned clock cycle Structural risks: Hardware can't bolster this blend of guidelines that must be executed in a similar clock cycle (washer+dryer) Data perils: Instruction relies on upon consequence of earlier direction still in pipeline (one sock missing) Control dangers: Pipelining of branches & different directions. Basic arrangement is to slow down the pipeline until the risk "rises" through the pipeline

Slide 12

Instruction Level Parallelism Longer pipeline Laundry relationship: Divide our washer into three machines that play out the wash, flush and turn ventures of a customary machine To get the full speedup,we need to rebalance the rest of the means so they are of a similar length Amount of parallelism misused is higher, since there are more operations being covered

Slide 13

Advanced Pipelining: Techniques Motivation: To further endeavor the Instruction Level Parallelism (ILP) Multiple Issue Replicate the inside parts of the PC so it can dispatch different directions in each pipeline organize Dynamic Pipeline booking (or) Dynamic Pipelining (or) Dynamic Multiple issue by equipment to maintain a strategic distance from pipeline dangers

Slide 14

Multiple Issue: Superscalar Launch various guidelines in parallel A Superscalar clothing would supplant our family washer and dryer with say , three washers and three dryers. Likewise took after by 3 associates to crease and set away thee times as much clothing in a similar measure of time. Drawback additional work expected to keep every one of the machines occupied and exchanging burden to next pipeline arrange. Superscalar is characterized as executing more than one direction for every clock cycle

Slide 15

Performance Metrics: CPI & IPC Instruction execution rate surpass the clock rate Example: 6GHz, 4-way different issue microchip can execute at a pinnacle rate of 24 billion guidelines for every second and have a best instance of CPI of 0.25 Instructions for each clock cycle (IPC) (for the above case: 4) Assume a 5 arrange pipeline such a processor would have 20 directions in execution at any given time.

Slide 16

Multiple issue processor: Decision Strategy Static Multiple Issue Decisions are set aside a few minutes before execution Software based Compiler booking VLIW(Very Long Instruction Word) Dynamic Multiple Issue Decisions are made at run/execution time by the processor Dynamic planning Hardware based

Slide 17

Static Multiple Issue Processor Issue Packet: Set of guidelines which can be matched to frame one expansive direction with different operations (VLIW) Relies on Compiler to go up against obligations regarding taking care of information and control perils Some of the compiler's duties might be static branch forecast and code planning

Slide 18

Getting CPI < 1: Static 2 Issue pipeline Superscalar MIPS: 2 directions, 1 ALU & 1 LOAD direction – Fetch 64-bits/clock cycle; ALU on left, Load on right – Can just issue second guideline if first direction issues Type Pipe Stages ALU instruction IF ID EX MEM WB Load instruction IF ID EX MEM WB ALU instruction IF ID EX MEM WB Load instruction IF ID EX MEM WB ALU instruction IF ID EX MEM WB Load instruction IF ID EX MEM WB

Slide 19

Static Multiple Issue: Datapath ALU/bx xion ALU Reg. document IM lw/sw xion ALU

Slide 20

Example: Multiple Issue code booking Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0 ($s1) addi $s1, $s1, - 4 bne $s1,$zero, Loop After reordering the guidelines in view of conditions, we get a CPI=0.8 (or) IPC=1.25

Slide 21

Loop Unrolling: 4 Iterations Multiple duplicates of the circle body are made , therefore more ILP by covering directions from various emphasess CPI=8/14=0.57

Slide 22

Dynamic Multiple-Issue Processors Instructions are issue all together and the processor chooses whether at least zero,one directions can issue in a given clock cycle Again accomplishing great execution requires the compiler to calendar guidelines to move conditions separated and in this manner enhancing the guideline issue rate

Slide 23

Dynamic Scheduling: Definition Dynamic pipeline planning goes past slows down to discover later directions to execute while sitting tight for the slow down to be determined Chooses which direction to execute next by reordering the guidelines to maintain a strategic distance from slows down (element issue choices) lw $t0, 20($s2) addu $t1, $t0, $s2 sub $s4, $s4, $t3 slti $t5, $s4, 20 bne $s1,$zero, Loop

Slide 24

HW Schemes: Why ? Why in HW at run time? Works when can't know genuine reliance at assemble time Compiler less complex Code for one machine runs well on another

Slide 25

Dynamic Pipeline Scheduling: Model Inst. Bring & disentangle unit all together Res. station Res. station Res. station … .. Out request FP lw/sw Integer … .. Reorder cradle all together Commit unit

Slide 26

HW Units: Working Inst get/decipher unit gets instructions,decodes them and sends every guideline to a relating useful unit of the execute arrange 5-10 useful units with supports called reservation stations that holds the operands and operation As soon as cushion contains every one of the operands , practical unit executes, the outcome is figured It is for the submit unit to choose when it is protected to put the outcome into the enlist document (or) for store into memory

Slide 27

Dynamic planning: all together fulfillment To make programs carry on as though they keep running on a non-pipelined PC, the direction bring and unravel unit is required to issue guidelines all together, and the confer unit is required to compose results to registers and memory in program execution arrange ( all together finishing ) Hence a special case happens, the PC can indicate the last direction executed and the main registers upgraded will be every one of those composed by the directions before exemption

Slide 28

Dynamic booking: Speculation Speculative execution : Dynamic booking can be consolidated with branch forecast, so after a mispredicted branch , confer unit have the capacity to dispose of the considerable number of results in the execution unit Dynamic planning can likewise be joined with Superscalar execution, so every unit might confer 4 to 6 guidelines for every cycle

Slide 29

Superscalar Processor

Slide 30

Conclusion: Several Steps ILP Exploitation

Slide 31

References Computer Organization & Design, Patterson & Hennessy, 2 & 3 Edition http://www.cs.berkeley.edu/~pattrsn/152F97/index_lectures.html http://www.cse.lehigh.edu/~mschulte/ece401-01/http://paul.rutgers.edu/courses/cs505/S03/http://engineering.dartmouth.edu/~engs116/addresses/engs%20116%20lecture%204-05f.ppt (Pipelining)