Adaptable Reference-Counting-Based Hardware Acceleration for Garbage Collection

Flexible reference counting based hardware acceleration for garbage collection l.jpg
1 / 19
0
0
1289 days ago, 537 views
PowerPoint PPT Presentation
Adaptable Reference-Numbering Based Equipment Speeding up for Refuse Gathering. Jos

Presentation Transcript

Slide 1

Adaptable Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * HPS Research Group University of Texas at Austin ‡ Computer Architecture Laboratory Carnegie Mellon University

Slide 2

Motivation: Garbage Collection Garbage Collection (GC) is a key component of Managed Languages Automatically liberates memory obstructs that are not utilized any longer Eliminates bugs and enhances security GC distinguishes dead (inaccessible) questions, and makes their pieces accessible to the memory allocator Significant overheads Processor cycles Cache contamination Pauses/delays on the application 2

Slide 3

Software Garbage Collectors Tracing authorities Recursively take after each pointer beginning with worldwide, stack and enlist factors, filtering each protest for pointers Explicit accumulations that visit every single live protest Reference tallying Tracks the quantity of references to each protest Immediate recovery Expensive and can't gather cyclic information structures State-of-the-craftsmanship: generational authorities Young articles will probably pass on than old items Generations: nursery (new) and develop (more established) locales 3

Slide 4

Overhead of Garbage Collection 4

Slide 5

Hardware Garbage Collectors Hardware GC by and large reason processors? Ties one GC calculation into the ISA and the microarchitecture High cost because of significant changes to processor as well as memory framework Miss openings at the product level, e.g. territory change Rigid exchange off: lessened adaptability for higher execution on particular applications Transistors are accessible Build quickening agents for regularly utilized usefulness How much equipment and what amount of programming for GC? 5

Slide 6

Our Goal Architectural and equipment increasing speed bolster for GC Reduce the overhead of programming GC Keep the adaptability of programming GC Work with any current programming GC calculation 6

Slide 7

Basic Idea Simple however deficient equipment waste gathering until the load is full Software GC runs and gathers the staying dead questions Overhead of GC is lessened 7

Slide 8

Hardware-helped Automatic Memory Management (HAMM) Hardware-programming agreeable quickening for GC Reference number following To discover dead protests without programming GC Memory piece reuse taking care of To give accessible squares to the product allocator Reduce recurrence and overhead of programming GC Key attributes Software memory allocator is in charge Software GC still runs and settles on abnormal state choices HAMM can improve: does not need to track all items 8

Slide 9

ISA Extensions for HAMM Memory designation REALLOCMEM, ALLOCMEM Pointer following ( store pointer ) MOVPTR, MOVPTROVR PUSHPTR, POPPTR, POPPTROVR Garbage accumulation 9

Slide 10

Overview of HAMM … Core 1 Core N L1 RCCB RC refreshes LD/ST Unit L1 Reference Count Coalescing Buffer (RCCB) Block address L1 ABT L2 ABT L2 RCCB Core 0 CPU Chip 0 CPU Chip 1 RC Reusable pieces … RC CPU Chip M Available Block Table (ABT) Live questions Main memory 10

Slide 11

Modified Allocator addr ← REALLOCMEM measure if ( addr == 0) then //ABT does not have a free square → general programming allocator addr ← bump_pointer ← bump_pointer + estimate … else //utilize address gave by ABT end if/Initialize square beginning at addr ALLOCMEM object_addr , estimate 11

Slide 12

Example of HAMM L1 Reference Count Coalescing Buffer (RCCB) expulsion removal ousting RC refreshes LD/ST Unit A: 3 A: 1 A: - 1 A: - 1 A: 2 A: 2 Block address prefetch incRC An incRC An incRC A decRC A decRC A decRC An A: 1 A: - 1 MOV R3, 0x50 ALLOCMEM A, measure MOVPTR R3, A MOV addr1, 0x50 REALLOCMEM R2, estimate MOV addr2, 0x020 PUSHPTR A L1 ABT L2 ABT L2 RCCB Core prefetch expulsion CPU Chip RC Reusable obstructs A dead A 1 0 Available Block Table (ABT) Main memory 12

Slide 13

ISA Extensions for HAMM Memory assignment REALLOCMEM, ALLOCMEM Pointer following ( store pointer ) MOVPTR, MOVPTROVR PUSHPTR, POPPTR, POPPTROVR Garbage accumulation FLUSHRC 13

Slide 14

Methodology Benchmarks: DaCapo suite on Jikes Research Virtual Machine with its best GC, GenMS Simics + cycle-exact x86 test system 64 KB, 2-way, 2-cycle I-reserve 16 KB perceptron indicator Minimum 20-cycle branch misprediction punishment far reaching , 128-section direction window 64 KB, 4-way, 2-cycle, 64B-line, L1 D-reserve 4 MB, 8-way, 16-cycle, 64B-line, bound together L2 store 150-cycle least memory inactivity Different philosophies for two segments: GC time assessed in light of genuine refuse accumulation work over the entire benchmark Application: cycle-precise recreation with microarchitectural adjustments on 200M-guideline cuts 14

Slide 15

GC Time Reduction 15

Slide 16

Application Performance Since GC time is decreased by 29%, HAMM is a win 16

Slide 17

Why does HAMM work? HAMM decreases GC time in light of the fact that Eliminates accumulations: 52%/half of nursery/full-store Enables memory square reuse for 69% of every single new protest in nursery and 38% of assignments into more seasoned era Reduces GC work: 21%/49% for nursery/full-stack HAMM does not back off the application fundamentally Maximum L1 reserve miss increment: 4% Maximum L2 reserve miss increment: 3.5% HAMM itself is in charge of just 1.4% of all L2 misses 17

Slide 18

Garbage gathering is exceptionally valuable, however it is likewise a huge wellspring of overhead Improvements on unadulterated programming GC or equipment GC are restricted We propose HAMM, an agreeable equipment programming strategy Simplified equipment helped reference tallying and piece reuse Reduces GC time by 29% Does not essentially influence application execution Reasonable cost (67KB on a 4-center chip) for a compositional quickening agent of a vital usefulness HAMM can be an empowering agent urging designers to utilize oversaw dialects Conclusion 18

Slide 19

Thank You! Questions?

SPONSORS