Overview Let’s get started!

Post on 13-Jan-2016

32 views 0 download

description

Overview Let’s get started!. Outline Quick Introduction PLINQ Hands-On Performance Tips Prerequisites .NET and C# LINQ Threading Basics. Multi-Core and .NET 4 In the words of developers. - PowerPoint PPT Presentation

transcript

OverviewLet’s get started!

> Outline1. Quick Introduction2. PLINQ Hands-On 3. Performance Tips

> Prerequisites1. .NET and C#2. LINQ3. Threading Basics

Multi-Core and .NET 4In the words of developers

> “Getting an hour-long computation done in 10 minutes changes how we work.”- Carl Kadie, Microsoft’s eScience Research Group

> “.NET 4 has made it practical and cost-effective to implement parallelism where it may have been hard to justify in the past.“- Kieran Mockford, MSBuild

> “I do believe the .NET Framework 4 will change the way developers think about parallel programming.“- Gastón C. Hillar, independent IT consultant and freelance author

Visual Studio 2010Tools, programming models and runtimes

Parallel Pattern Library

Resource Manager

Task Scheduler

Task Parallel Library

Parallel LINQ

Managed NativeKey:

ThreadsOperating System

Concurrency Runtime

Programming Models

ThreadPool

Task Scheduler

Resource Manager

Data

Stru

ctu

res

Data

Str

uctu

res

Tools

Tooling

ParallelDebugge

r Tool Windows

Concurrency

Visualizer

AgentsLibrary

UMS Threads

.NET Framework 4 Visual C++ 10

Visual Studio

IDE

Windows

Parallel LINQ

From LINQ to Objects to PLINQAn easy change

> LINQ to Objects query:

int[] output = arr .Select(x => Foo(x)) .ToArray();

int[] output = arr.AsParallel() .Select(x => Foo(x)) .ToArray();

> PLINQ query:

PLINQ hands-on

coding walkthrough

Array Mapping

1Thread

1

Select

int[] input = ...bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray();

input: output:

6

3

8

2

7

Thread 2

Select

Thread N

Select

F

F

T

F

T

T

Array to array mapping is simple and efficient.

Sequence Mapping

Thread 1

Thread 2

Thread N

Select

IEnumerable<int> input = Enumerable.Range(1,100);bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray();

Select

Select

Results 2

Results N

...

Results 1

Input Enumerator

Lock

output:

Each thread processes a partition of inputs and stores results into a buffer.

Buffers are combined into one array.

Asynchronous Mapping

var q = input.AsParallel() .Select(x => IsPrime(x));foreach(var x in q) { ... }

Thread 1

Thread 2

Thread N

Input Enumerator

Select

Select

Select

Lock

Results 1

Results 2

Results N

OutputEnumerator

Main Thread

foreach

...

Poll

MoveNext

In this query, the foreach loop starts consuming results as they are getting computed.

Async Ordered Mapping or Filter

var q = input.AsParallel().AsOrdered() .Select(x => IsPrime(x));foreach(var x in q) { ... }

Thread 1

Thread 2

Thread N

Input Enumerator

Op

Op

Op

Lock

Results 1

Results 2

Results N

OutputEnumerator

Main Thread

foreach

...

Poll

MoveNext

Ordering

Buffer

When ordering is turned on, PLINQ orders elements in a reordering buffer before yielding them to the foreach loop.

Aggregation

Thread 1

Input Enumerator

Aggregate

Lock

...

Thread 2

Aggregate

Thread N

Aggregate

int result = input.AsParallel() .Aggregate( 0, (a, e) => a + Foo(e), (a1,a2) => a1 + a2);

res1:

res2:

resN:

result:

Each thread computes a local result.

The local results are combined into a final result.

Search

Thread 1

Input Enumerator

First

Lock

...

Thread 2

First

Thread N

First

int result = input.AsParallel().AsOrdered() .Where(x => IsPrime(x)) .First();

resultFound:

Fresult:

Poll

Set

More complex queryint[] output = input.AsParallel() .Where(x => IsPrime(x)) .GroupBy(x => x % 5) .Select(g => ProcessGroup(g)) .ToArray();

Thread 1

Where

...

Input Enumerator

Lock

GroupBy

Groups 2

Groups1

output:

Thread 2

Where

GroupBy

Thread 1

Select

...

Thread 2

Select Results2

Results1

PLINQ PERFORMANCE TIPS

Performance Tip #1:Avoid memory allocations

> When the delegate allocates memory> GC and memory allocations can become

the bottleneck> Then, your algorithm is only as scalable

as GC> Mitigations:

> Reduce memory allocations> Turn on server GC

Performance Tip #2:Avoid true and false sharing

> Modern CPUs exploit locality> Recently accessed memory locations are

stored in a fast cache> Multiple cores

> Each core has its own cache> When a memory location is modified, it is

invalidated in all caches> In fact, the entire cache line is invalidated

> A cache line is usually 64 or 128 bytes

Core 1 5

Performance Tip #2:Avoid True and False Sharing

Thread 1

Core 2

Thread 2

Core 3

Thread 3

Core 4

Thread 4

Invalidate

Cache

Cache

Cache

Cache

6 7 3 25 7 3 2

Memory:

5 7 3 2

5 7 3 2

5 7 3 2

Cache line

If cores continue stomping on each other’s caches, most reads and writes will go to the main memory!

Performance Tip #3:Use expensive delegates

> Computationally expensive delegate is the best case for PLINQ

> Cheap delegate over a long sequence may also scale, but:> Overheads reduce the benefit of scaling

> MoveNext and Current virtual method calls on enumerator

> Virtual method calls to execute delegates> Reading a long input sequence may be

limited by the memory throughput

Performance Tip #4:Write simple PLINQ queries

> PLINQ can execute all LINQ queries> Simple queries are easier to reason about> Break up complex queries so that only the

expensive data-parallel part is in PLINQ:

src.Select(x => Foo(x)) .TakeWhile(x => Filter(x)) .AsParallel() .Select(x => Bar(x)) .ToArray();

Performance Tip #5:Choose appropriate partitioning

> Partitioning algorithms vary in:> Overhead> Load-balancing> The required input representation

> By default:> Array, IList<> are partitioned statically> Other IEnumerable<> types are partitioned

on demand in chunks> Custom partitioning supported via

Partitioner

Performance Tip #6Use PLINQ with thought and care

> Measure, measure, measure!> Find the bottleneck in your code> If the bottleneck fits a data-parallel

pattern, try PLINQ> Measure again to validate the

improvement> If no improvement, check

performance tips 1-5

More Information

> Parallel Computing Dev Center> http://msdn.com/concurrency

> Code samples> http://code.msdn.microsoft.com/ParExtSamples

> Team Blogs> Managed: http://blogs.msdn.com/pfxteam > Tools: http://blogs.msdn.com/visualizeparallel

> Forums> http://social.msdn.microsoft.com/Forums/en-US/categor

y/parallelcomputing> My blog

> http://igoro.com/

YOUR FEEDBACK IS IMPORTANT TO US!

Please fill out session evaluation

forms online atMicrosoftPDC.com

Learn More On Channel 9

> Expand your PDC experience through Channel 9

> Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses

channel9.msdn.com/learnBuilt by Developers for Developers….

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.