+ All Categories
Home > Documents > Neural Network Assisted Tile Size Selection › ~pouchet › doc › iwapt-slides.10.pdfAutomatic...

Neural Network Assisted Tile Size Selection › ~pouchet › doc › iwapt-slides.10.pdfAutomatic...

Date post: 08-Feb-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
18
Neural Network Assisted Tile Size Selection Mohammed Rahman, Louis-Noël Pouchet and P. Sadayappan Dept. of Computer Science and Engineering Ohio State University June 22, 2010 iWAPT 2010 Workshop Berkeley, USA
Transcript
  • Neural Network Assisted Tile Size Selection

    Mohammed Rahman, Louis-Noël Pouchet and P. Sadayappan

    Dept. of Computer Science and EngineeringOhio State University

    June 22, 2010

    iWAPT 2010 WorkshopBerkeley, USA

  • Introduction: iWAPT’10

    Overview

    Situation:I New advances in parametric tiling → more user code to be tunedI The problem of tile size selection is complex and unsolved!

    Our approach:I Use machine learning to create a performance predictor of tile size

    performance, for a specific programI Rely on the distribution shape to extract promising subspaces for

    empirical searchI Outcome: < 2% of the space traversed → 90+% of maximal speedup

    achieved

    Ohio State 2

  • Problem Statement: iWAPT’10

    Tiling

    I Tiling partition the computation into blocksI Note we consider only rectangular tiling hereI For tiling to be legal, such a partitioning must be legal

    Ohio State 3

  • Problem Statement: iWAPT’10

    Parametric Tiling

    Automatic parametric tiling [ICS’09,CGO’10]:I Produce code where the tile dimensions are parametersI Seamlessly find/apply all required transformation to make the code

    tilableI Actual tile sizes are given at run-timeI very useful for tile size selection (no need to recompile)I recent progresses have generalized the approach:

    I Operates on arbitrary affine-control loops (imperfectly nested)I Produce good quality codeI Even expose pipeline-parallelism if neededI Software (from OSU): Pluto, PrimeTile/DynTile/PTile

    Ohio State 4

  • Problem Statement: iWAPT’10

    Tile Size Selection

    Problem: how to select the tile size to have the best performance?

    I data reuse within the execution of a tile;I data reuse between tiles;I the layout in memory of the data used in a tile;I the relative penalty of misses at each level of the hierarchy, which is

    machine-dependent.I the cache replacement policy;I the interaction with other units, such at prefetching;I the interaction with vectorization, to enable a profitable steady-state for

    the vectorized loop(s);I ...

    Ohio State 5

  • Problem Statement: iWAPT’10

    Performance DistributionPerformance distribution of fdtd-2d and syr2k

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    1:1

    :1

    4:2

    :40

    8:4

    :500

    12:8

    :30

    16:1

    0:3

    00

    20:1

    6:1

    2

    25:3

    0:2

    00

    30:4

    0:8

    35:4

    8:1

    28

    42:1

    00

    :4

    48:1

    28

    :64

    Ex

    ec

    uti

    on

    Tim

    e i

    n S

    ec

    on

    ds

    Tile Sizes ( Ti:Tj:Tk)

    fdtd-2d: Performance distribution with Tile Size configurations

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    1:1

    :1

    4:2

    :40

    8:4

    :500

    12:8

    :30

    30:1

    0:3

    00

    40

    :16

    :12

    64:3

    0:2

    00

    12

    8:4

    0:8

    200

    :48:1

    28

    300

    :100:4

    500

    :128:6

    4

    Ex

    ec

    uti

    on

    tim

    e i

    n S

    ec

    on

    ds

    Tile sizes- Ti:Tj:Tk

    dsyr2k: Performance Distribution with Tile Size Configurations

    I Search space: 10648 possible tile sizesI {1,2,4,6,8,10,12,16, 30,32,40,48,64,100,128,

    150,200,256,300,400,500,600}I Machine: Core i7 (1 thread)I 2 "standard" distribution shapes

    Ohio State 6

  • Performance Prediction: iWAPT’10

    Ojectives

    Correlate execution time with tile sizes

    I (Static) performance models do exist...I ... but fail to capture the interplay between all hardware componentsI Usually better suited for well-known problems (eg, uniform reuse +

    square tiles)I Another view: pruning the space of poor-performing tile sizes

    Our approach:I Build a neural network to model the performance distributionI Focus directly on the execution timeI ANN dedicated to a specific program + dataset size

    Ohio State 7

  • Performance Prediction: iWAPT’10

    Neural Network

    Layout:I Fully connected, multi-layer perceptron (MLP)I Input layer: the tile sizes (Ti, Tj, Tk)I Output layer: predicted execution timeI One hidden layer consisting of 30 hidden neuronsI Use Stuttgart Neural Network Simulator library

    Training:I Select 5% (530 tuples) from the search space of 10648I Run the program on the machine using the tile size specified by the

    tuplesI Train with resilient back-propagation (rprop), using the actual execution

    time for a tupleI Standard 10% cross-validation procedure

    Ohio State 8

  • Performance Prediction: iWAPT’10

    Performance Prediction [1/2]

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    10

    :12

    :8

    16

    :2:8

    12

    :1:4

    8

    45

    :12

    8:6

    20

    :2:1

    6

    12

    :40

    0:8

    32

    :4:4

    30

    :64

    :15

    0

    10

    :1:2

    56

    16

    :40

    0:4

    00

    40

    :60

    0:1

    2

    Execu

    tio

    n T

    ime in

    Seco

    nd

    s

    Tile Sizes (Ti:Tj:Tk)

    fdtd-2d: Predicted versus Actual Performance

    ExTime (Actual )

    ExTime (Predicted)

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    8:4

    :64

    60

    0:1

    28

    :32

    64

    :4:1

    6

    10

    :40

    0:5

    00

    12

    8:2

    :30

    0

    25

    6:2

    00

    :25

    6

    100:4

    0:3

    00

    30

    :30

    0:3

    00

    40

    :10

    :4

    10

    0:3

    00

    :12

    6:1

    2:1

    Exe

    cu

    tio

    n T

    ime

    in

    se

    co

    nd

    s

    Tile sizes - Ti:Tj:Tk

    dsyr2k : Predicted versus Actual Performance

    ExTime(Actual)

    ExTime(Predicted)

    Ohio State 9

  • Performance Prediction: iWAPT’10

    Performance Prediction [2/2]

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    12

    :12

    :16

    32

    :2:1

    28

    64

    :40

    :16

    2:1

    0:1

    1:3

    2:2

    56

    25

    6:6

    4:4

    10

    :25

    6:1

    2

    4:5

    00

    :10

    30

    :64

    :40

    0

    6:2

    00

    :50

    0

    25

    6:4

    00

    :16

    Ex

    ec

    uti

    on

    Tim

    e i

    n S

    ec

    on

    ds

    Tile sizes ( Ti:Tj:Tk)

    lu: Predicted versus Actual Performance

    ExTime (Actual)

    ExTime (Predicted)

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    1:1

    :1

    4:2

    :40

    8:4

    :50

    0

    12:8

    :30

    30

    :10

    :30

    0

    40

    :16

    :12

    64

    :30

    :20

    0

    12

    8:4

    0:8

    20

    0:4

    8:1

    28

    30

    0:1

    00

    :4

    50

    0:1

    28

    :64

    Exec

    uti

    on

    Tim

    e in

    Se

    co

    nd

    s

    Tile Sizes (Ti:Tj:Tk)

    dgemm: Predicted versus Actual Performance

    ExTime (Actual)

    ExTime (Predicted)

    Ohio State 10

  • Performance Prediction: iWAPT’10

    Discussions

    I for trmm, lu, 2d-jacobi, syr2k and doitgen, predict more than 90% of oursearch space with less than 10% deviation for the actual execution time

    I In total, can predict 80% and more with less than 10% deviationI Usually smaller deviation for the best tile sizes

    → These ANN are able to model the performance distribution

    Openings:I Program classifier w.r.t. performance distributionI Training: do not "fit" that much the training points?

    Ohio State 11

  • Tile Size Selection: iWAPT’10

    Selecting the Best Tile Size

    The performance distribution can drive the empirical search to focus onpromising subspaces

    Tile size selection:I Random approach has a huge variability on some distribution shapesI Exhaustive search is likely not neededI Need for an intermediate solution

    I Low number of empirical runsI Good convergence, good variabilityI General enough to work on arbitrary user codes

    Ohio State 12

  • Tile Size Selection: iWAPT’10

    Overview of the Algorithm

    1 Generate a parametrically tiled code

    2 Randomly select x% of the tile size space, and run them on the machine

    3 Train an ANN using this data

    4 Use the ANN to predict performance of the entire space

    5 Collect y tile sizes that are predicted best and not already ran

    6 Run the y tile sizes on the machine, output the best found

    Ohio State 13

  • Tile Size Selection: iWAPT’10

    Experimental Setup

    I Studied various kernels (perfectly/imperfectly nested, BLAS & stencils)I Only focused on single-threaded execution, on an Intel Core i7

    I Comparison: simple random search (R), ANN search (ANN)I Repeat each experiment 100 times, for various sampling rate

    Ohio State 14

  • Tile Size Selection: iWAPT’10

    Experimental Results (y = 50)doitgen gemm syr2k lu 2d-jacobi fdtd-2d

    1%

    R-best 100% 99.86% 98.15% 99.89% 99.91% 97.75%R-average 98.71% 96.29% 94.80% 92.19% 94.10% 84.15%R-worst 95.35% 69.64% 89.81% 40.63% 17.69% 31.02%ANN-best 100% 99.86% 100% 100% 99.91% 100%ANN-average 98.89% 96.35% 96.01% 92.62% 98.51% 84.50%ANN-worst 97.26% 82.93% 89.79% 79.68% 94.23% 66.53%

    2%

    R-best 99.97% 99.86% 98.71% 99.89% 100% 100%R-average 98.71% 96.42% 94.80% 92.87% 97.60% 84.10%R-worst 86.49% 67.89% 88.20% 45.29% 55.98% 27.30%ANN-best 100% 99.86% 100% 100% 100% 100%ANN-average 98.89% 96.76% 96.69% 95.34% 98.55% 88.61%ANN-worst 97.26% 89.83% 89.65% 85.80% 94.17% 60.65%

    3%

    R-best 99.97% 99.86% 98.71% 99.89% 100% 100%R-average 98.77% 96.47% 94.80% 94.27% 98.39% 85.47%R-worst 94.89% 63.58% 87.99% 61.24% 84.54% 47.99%ANN-best 99.97% 99.86% 100% 100% 100% 100%ANN-average 98.93% 97.14% 97.17% 95.34% 98.74% 91.45%ANN-worst 97.64% 91.01% 92.27% 85.80% 94.50% 63.34%

    4%

    R-best 99.97% 99.86% 98.71% 99.89% 100% 100%R-average 98.80% 96.65% 94.93% 92.19% 98.41% 85.55%R-worst 96.86% 69.73% 88.57% 52.03% 82.47% 43.74%ANN-best 100% 99.86% 100% 100% 100% 100%ANN-average 98.99% 97.67% 97.20% 95.79% 98.90% 93.55%ANN-worst 98.28% 93.65% 92.66% 85.80% 94.50% 79.26%

    Ohio State 15

  • Tile Size Selection: iWAPT’10

    Some Related Work

    Epshteyn et al. [LCPC’05]:I Search-oriented contributionI Uses regression curves to approximate the performance distributionI Uses active learning to select good candidates for empirical evaluationI Good results for BLAS kernels

    Yuki et al. [CGO’10]:I Aims at selecting/combining between different static modelsI Uses program features to characterize accesses, train ANNI Results demonstrated for matrix-like kernels

    Ohio State 16

  • Tile Size Selection: iWAPT’10

    Conclusions and Future Work

    ANN is a candidate approach to connect tile sizes with performanceI Good prediction qualityI Deviation usually smaller for the good pointsI Combined search heuristic proposed:

    I Strong variability improvement over naive random approachI 90+% efficiency using < 2% of the space, likely can be improved further

    Future work:I Generalization!

    I Categorize benchmarks reg. the performance distribution shapeI Dataset size

    I Do not try to fit the random samples during trainingI Reduce the training timeI problem: ANN configuration

    Ohio State 17

  • Acknowledgements: iWAPT’10

    Acknowledgements

    This work was funded in part by the U.S. National Science Foundationthrough award 0926688 and the Defense Advanced Research ProjectsAgency through AFRL Contract FA8650-09-C-7915. The opinions andfindings in this document do not necessarily reflect the views of eitherthe United States Government or the Ohio State University.

    Ohio State 18

    IntroductionProblem StatementPerformance PredictionTile Size SelectionAcknowledgements


Recommended