+ All Categories
Home > Technology > Silent stores

Silent stores

Date post: 16-Feb-2017
Category:
Upload: harish-chetty
View: 92 times
Download: 0 times
Share this document with a friend
19
SILENT STORES HARISH CHETTY , SUJAY GANDHAM & POORNA CHANDRA VELADI
Transcript
Page 1: Silent stores

SILENT STORESHARISH CHETTY , SUJAY GANDHAM & POORNA CHANDRA VELADI

Page 2: Silent stores

255 0 0 0 0 0 0 0

255 0 0 0 0 0 0 0

147 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

SILENT STORE

SILENT BYTES

Page 3: Silent stores

RESEARCH QUESTION

1] To determine the ratio of silent stores vs total stores in different benchmarks

2] To determine clustering and pattern behavior of silent stores. To determine clustering behavior of only silent stores To determine clustering behavior of silent and non-silent stores

Page 4: Silent stores

MODIFICATIONS

We had to make two modifications to acquire the required data.

1] Modified lsq_unit_impl.hh and transferred the data to a file (Store.txt) This file consists of 2 lines for each store. The first line was the Address where the store was being written to The second line was the Data which the store was about to write

2] Modified packet.hh and transferred the data to a file (Cache.txt) This file consists of 4 lines for each packet The first line was the Address where the packet was writing The second line was the number of bytes being written The third line was the old data at the destination The fourth line was the new data being written at the destination

Page 5: Silent stores

Addr : 0x1d5cf8Data : 0x0

Addr : 0x1d5cf0Data : 0x248

Addr : 0x1d5ce8Data : 0x231

Addr : 0x1d5ce0Data : 0x0

Addr : 0x1d5cd8Data : 0x0

Addr : 0x1d5cf8Size : 80 0 0 0 0 0 0 00 0 0 0 0 0 0 0Addr : 0x1d5cf0Size : 80 0 0 0 0 0 0 0248 0 0 0 0 0 0 0Addr : 0x1d5ce8Size : 80 0 0 0 0 0 0 0231 0 0 0 0 0 0 0Addr : 0x1d5ce0Size : 80 0 0 0 0 0 0 00 0 0 0 0 0 0 0Addr : 0x1d5cd8Size : 80 0 0 0 0 0 0 00 0 0 0 0 0 0 0

Cache.txt

Store.txt

Page 6: Silent stores

SETUP

All the benchmarks were tested with 8KB L1 Cache (4-Way Set Associative/ 64 byte line size)

All the tests were carried out on detailed cpu . Enormous amount of time was consumed to run each test. To speedup we used cloud computers to parallelize the operation. All the computers were 4-Core / 8 GB RAM and 80 GB SSD. The time range to complete benchmarks was between 33 minutes (soplex) to

3897 minutes (omnetpp) There were many which did not complete (Time range was > 6000 minutes)

Page 7: Silent stores

PROCESSING DATA

Processing the data was very difficult! The file sizes were much more larger than main memory.

Impossible to read them and carry out any sort of mapping or modification. File sizes were in order of > 25 GB for some benchmarks

A lot of amount of coding! Two different forms of lazy reading Sampling logic for plotting Lazy selective sorting

Page 8: Silent stores

SILENT STORE RATIO

Configuration Total Stores Silent Stores Ratio Statusspecrand_i_X86_8KB_4_64 11993059 5939535 0.495248 Completedpovray_X86_8KB_4_64 2006962 1060460 0.528391 Completedsoplex_X86_8KB_4_64 5855911 2174472 0.371329 Completed

perlbench_X86_8KB_4_64 322898 77418 0.23976 Completedgobmk_X86_8KB_4_64 3320091 3195427 0.962452 Completed

libquantum_X86_8kBKB_4_64 13980555 2106616 0.150682 Completedbzip2_X86_8kB_4_64 226490984 24108946 0.106445 Completedgamess_X86_8kB_4_64 333742515 60388278 0.180943 Completedomnetpp_X86_8KB_4_64 333742515 60388278 0.180943 Completed

gcc_X86_8KB_4_64 72284661 31533701 0.436243 Abortednamd_X86_8KB_4_64 169225684 76474519 0.451908 Incompletelbm_X86_8KB_4_64 371077787 172324400 0.464389 Incompletemcf_X86_8kBKB_4_64 98439312 22986711 0.233511 Incompletemilc_X86_8kBKB_4_64 286986509 17784410 0.0619694 Incomplete

Page 9: Silent stores

SILENT BYTE RATIO

Configuration Total Store Bytes Silent Bytes Ratio Statusspecrand_i_X86_8KB_4_64 76518563 61601237 0.80505 Completedpovray_X86_8KB_4_64 15282175 13221672 0.86517 Completedsoplex_X86_8KB_4_64 36234532 24984419 0.68952 Completed

perlbench_X86_8KB_4_64 2311327 1738174 0.752024 Completedgobmk_X86_8KB_4_64 21637745 21249594 0.982061 Completed

libquantum_X86_8kBKB_4_64 109697353 96032613 0.875432 Completedbzip2_X86_8kB_4_64 742892581 458953854 0.617793 Completedgamess_X86_8kB_4_64 2422301950 1704319167 0.703595 Completedomnetpp_X86_8KB_4_64 535292751 434227897 0.811197 Completed

gcc_X86_8KB_4_64 2422301950 1704319167 0.703595 Abortednamd_X86_8KB_4_64 1082700667 903980569 0.834931 Incompletelbm_X86_8KB_4_64 2911874103 1978336222 0.679403 Incompletemcf_X86_8kBKB_4_64 752304117 573852760 0.762794 Incompletemilc_X86_8kBKB_4_64 ??? ??? ??? Incomplete

Page 10: Silent stores

PLOTTING DATA

Plotting the stores was necessary to determine clustering behavior The first idea was to plot each and every store vs store number.

This was impossible to do as the number of stores was enormous We did not have enough main memory to create such a plot Even if were able to plot it, the information would be practically useless due to

the scale.

Created a sampling technique Divided the entire store subspace into 500 subparts Plotted only the first store in each subpart. Created charts using this via python

There was still one major problem!!!

Page 11: Silent stores

hmmer_X86_8KB_4_64

Page 12: Silent stores

gobmk_X86_8KB_4_64

Page 13: Silent stores

lbm_X86_8KB_4_64

Page 14: Silent stores

1

0

1

0

Incorrect Sequence

Page 15: Silent stores

RUN LENGTH ENCODING

Had to determine a new idea to identify clusters.

We noticed that there were only 2 conditions for stores Silent vs Non-Silent Which is equivalent to True or False Condition (1’s and 0’s) Thus logically our data was a very large string of binary data. This was similar to jpeg images where data compression is always used in such

conditions.

It was possible to apply the same idea here of Run Length Encoding. Since storing the entire RLE was also not feasible, we capped it at 200. To make sure silent stores were not dominated by non-silent, we did 2 forms of RLE

1] Top 200 RLE of both silent and non-silent stores 2] Top 200 RLE of only silent stores.

Page 16: Silent stores

1111111111000001111111111000111111111100000111110001110001111111111111111111100000000000000

10 X 105 X 0 10 X 1 03 X 0 10 X 1 05 X 0 05 X 1 03 X 0 03 X 1 03 X 0 20 X 1 14 X 0

20 X 1 14 X 010 X 1 10 X 1 10 X 1 05 X 0 05 X 005 X 1 03 X 0 03 X 0 03 X 0 03 X 1

Sorted20 X 1 14 X 010 X 1 10 X 1 10 X 1

Trimmed

Example RLE of size 5

Page 17: Silent stores

Type Length

0 1865497

0 1799967

0 1465497

0 1399967

0 1065499

0 999969

0 999967

0 740025

0 674501

0 366149

0 342447

Type Length

0 263

1 152

0 39

0 30

0 28

0 25

0 23

0 22

0 19

0 18

0 17

Type Length

1 1560002

1 1560002

1 22889

1 22528

0 12341

0 8823

0 5289

0 1368

0 1368

0 1368

0 1368bzip2 specrand gobmk

Type Length

0 102406

1 84450

0 23942

0 11987

0 11986

1 11973

1 11973

1 11973

1 11973

1 11973

1 11973mcf

T 3320091 S 3195427

T 11993059S 5939535

T 98439312S 22986711

T 226490984S 24108946

Page 18: Silent stores

Type Length

1 65538

1 5576

1 5460

1 4200

1 3288

1 3260

1 3138

1 3094

1 2965

1 2962

1 2814

Type Length

1 152

1 15

1 15

1 14

1 14

1 14

1 14

1 14

1 14

1 14

1 14

Type Length

1 1560002

1 1560002

1 22889

1 22528

1 152

1 107

1 58

1 58

1 58

1 58

1 58bzip2 specrand gobmk

Type Length

1 84450

1 11973

1 11973

1 11973

1 11973

1 11973

1 11973

1 11973

1 11973

1 11972

1 11972mcf

T 3320091 S 3195427

T 11993059S 5939535

T 98439312S 22986711

T 226490984S 24108946

Page 19: Silent stores

CONCLUSION

Amount of silent stores are significant in almost all benchmarks. There is also a requirement to focus on silent bytes. Silent stores do show some amount of observable relation in programs. More evaluation is necessary to determine in which phase of the program the sequences

happen. Also it is necessary to evaluate how the nature of the program impacts silent stores.


Recommended