+ All Categories
Home > Documents > TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static...

TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static...

Date post: 12-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
20
TensorFlow Marco Serafini COMPSCI 590S Lecture 22
Transcript
Page 1: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

TensorFlow

Marco Serafini

COMPSCI 590SLecture 22

Page 2: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length
Page 3: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

3 3

Motivations• DistBelief: Previous iteration, parameter server• Limitations:

• Monolithic layers, difficult to define new ones• Difficult to offload computation with complex dependencies to parameter servers

• E.g. Apply updates based on gradients accumulated over multiple iterations• Fixed execution pattern

• Read data, compute loss function (forward pass), compute gradients for parameters (backward pass), write gradients to parameter server

• Not optimized for single workstations and GPUs

Page 4: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

44

TensorFlow• Dataflow graph of operators, but not a DAG

• Loops and conditionals• Deferred (lazy) execution

• Enables optimizations, e.g. pipelining• Composable, simple basic operators

• Matrix multiplication, convolution• Can be combined in more complex operators

• Stateful operators • For shared parameters

• Concept of devices• CPUs, GPUs, mobile devices

Page 5: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

55

Example

Page 6: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

66

Tensors• Format

• n-dimensional arrays• Elements have primitive types (including byte arrays)

• Tensors are dense• All elements are represented• User must find ways to encode sparse data efficiently

Page 7: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

77

Operations• Inputs and outputs are tensors• State is kept through stateful operators• Operations to handle variables (also tensors)

• Variable op: Returns unique reference handle• Read op: Take reference handle, produce value of variable• Write ops: Take reference and value and update. Multiple possible write operatios

• Queues are also stateful operators• Get reference handle, modify through operations• Blocking semantics, backpressure, synchronization

Page 8: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

88

Execution Model• We have a computation graph• Step: client executes a subgraph by indicating:

• Edges to feed the subgraph with input tensors• Edges to fetch the output tensors• Runtime prunes the subgraph to remove unnecessary steps

• Can invoke multiple concurrent steps• Example: concurrent batches for data-parallel training

Page 9: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

99

Example• Data-parallel training looks like this

Stateful queues

Stateful variables

Concurrent steps for data parallelism

Page 10: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

1010

Scheduling: Tasks and Devices• Tasks: named processes that send messages

• PS tasks: store parameters, but can also run computations• Worker tasks: the rest• Note: “informal” categories, not enforced by TensorFlow

• Devices: CPU, GPU, TPU, mobile, …• CPU is the host device• Device executes kernel for each operation assigned to it

• Same operation (e.g. matrix multiplication) has different kernels for different devices

• Requirements for a device• Must accept kernel for execution• Must allocate memory for inputs and outputs• Must transfer data to and from host memory

Page 11: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

1111

Placement• TensorFlow runtime places operations on devices

• Implicit constraints: stateful operation on same device as state

• Explicit constraints: dictated by the user

• Optimal placement still open question

• Obtain per-device subgraphs• All operations assigned to device

• Send and Receive operations to replace edges across devices

• Specialized per-device implementations• CPU – GPU: CUDA memory copy

• Across tasks: TCP or RDMA

• Placement preserved throughout session

Page 12: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

1212

Control Flow• How do enable dynamic control flow with static graph?• Example: recurrent neural network

• Train network for sequence of variable length without unrolling• Conditional: Switch and Merge

SwitchData input

Control input

op

op

op

op

Merge

input

dead

Output one non-dead

input

Page 13: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

1313

Loops• Uses three additional operators

EnterData input op op Exit

NextIteration

Page 14: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

14

Scaling to Large Models• Parameter server approach to avoid moving terabytes of parameters every time

• Gather: reads tensor data from shard and computes• Part: Partitions the input across shards of parameters• Stitch: Aggregates all partitions

Page 15: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

1515

Fault Tolerance• Long running tasks face failures and pre-emption

• Sometimes run at night on idle machines• Small operations, no need to tolerate individual failures

• Even RDDs are overkill• User invokes Save for checkpointing

• Each variable in a task connected to same save for batching• Not consistent

• Other use cases: transfer learning

Page 16: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

1616

Synchronous Coordination• Use blocking queues for synchrony • Redundant tasks for stragglers

Page 17: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

1717

Implementation

Page 18: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

1818

Single-Machine Performance• Four convolutional models using one GPU

Page 19: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

19

Synchronous Microbenchmarks• Null training steps• Sparse performance is close to optimal (scalar)

Page 20: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length

2020

Scalability• Scalability bound by access to PS tasks (7)


Recommended