Date post: | 24-May-2015 |
Category: |
Technology |
Upload: | amd-developer-central |
View: | 956 times |
Download: | 0 times |
CE-4030: OPTIMIZING PHOTO EDITING APPLICATION
FOR AMD HETEROGENEOUS SYSTEM ARCHITECTURE
CYBERLINK MARKETING MANAGER
STANLEY LAM
AGENDA
Why Photo Editing Application – PhotoDirector?
Photo Editing Pipelines (RAW processing)
How AMD HSA helps in Photo Editing?
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL2
How AMD HSA helps in Photo Editing?
Proof of Concept: HSA Performance Showcase
Key Takeaways
Why Photo Editing Software
– PhotoDirector?
WHY PHOTO EDITING SOFTWARE?
� CyberLink Multimedia Software
‒ Media Playback: PowerDVD
‒ Video Editing: PowerDirector
‒ Photo Editing: PhotoDirector
� Why Photo Editing Software?
‒ Many editing tasks can be parallelize
THE RIGHT APPLICATION FOR HSA
ModelResolution
(M)Width Height MEM Space
Nikon D3S 24 6034 4012 193,667,264
Nikon D4 24 6048 4032 195,084,288
Nikon D70S 24 6034 4028 194,439,616
Nikon D800E 36 7378 4924 290,634,176
Nikon D90 36 7360 4912 289,218,560
Canon Eos 20D 21 5616 3744 168,210,432
Canon Eos 5D Mark Iii 21 5616 3744 168,210,432
Canon Eos 600D 22 5760 3840 176,947,200
Canon Eos 7D 20 5472 3648 159,694,848
Samsung Nx11 20 5472 3648 159,694,848
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL4
‒ Many editing tasks can be parallelize
‒ Processing / Decoding RAW files is time consuming
‒ RAW image editing can be both computational & memory
intensive
� How AMD HSA helps in Photo Editing?
‒ Utilize GPU compute units to speed up performance
‒ Eliminate overheads and memory copy bottlenecks between
HOST and DEVICE memories
Samsung Nx11 20 5472 3648 159,694,848
Samsung Dslr-A700 20 5472 3648 159,694,848
Sony Slt-A77V 24 6000 4000 192,000,000
Sony Dslr-A850 24 6000 4000 192,000,000
Sony Dslr-A900 24 6048 4032 195,084,288
Sony Nex-5N 24 6048 4032 195,084,288
Sony Dsc-Rx100 24 6000 4000 192,000,000
Sony Dsc-Rx1 20 5472 3648 159,694,848
Sony Dsc-F828 24 6000 4000 192,000,000
Pentax K-5 Ii 40 7264 5440 316,129,280
Phase One P 20 22 4096 5456 178,782,208
Phase One P 30 22 4096 5456 178,782,208
Phase One P40+ 32 6526 4904 256,028,032
Phase One P 45+ 39 7246 5444 315,577,792
Phase One P65+ 39 7246 5444 315,577,792
Phase One Dslr-A100 60 8984 6732 483,842,304
Photo Editing Pipeline
PHOTO EDITING PIPELINE
� KEY Area for potential performance improvement
RAW PROCESSING
Camera Model RAW Decode time
RAW Decoder JPEG Encoder
IMG_0077.CR2 NEW.JPG
Photo RetouchRAW Decoder
Photo Retouch
(Full Scale Size)
JPEG Encoder
IMG_0077.CR2 NEW.JPG
Photo Retouch
(Preview Size)
RAW Decoder
RAW Decoder
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL6
� KEY Area for potential performance improvement
‒ RAW Decoder
‒ Decoder elapse time is long for complex RAW formats
� RAW Decode is necessary during all stages in the editing
pipeline
‒ When generating FULL SCALE preview
‒ When entering Retouch module for the first time
‒ When resuming from previous editing
‒ When exporting to JPG/TIFF files
Test Platform
CPU: AMD A10-4655M
RAM: 4GB
OS: Windows 7 32-bit
Test Tool
PhotoDirector 5
Camera Model RAW Decode time
(single photo)
Canon 1D-X 7.347 seconds
Canon 1Ds MK3 8.400 seconds
Panasonic DMC FZ100 7.916 seconds
Phase One P25 10.475 seconds
Phase One P30 12.495 seconds
Phase One P45 13.049 seconds
Samsung NX10 6.263 seconds
Samsung NX100 5.280 seconds
Sony A700 5.522 seconds
Sony F828 6.996 seconds
RAW Decoder
(GPU)
PHOTO EDITING PIPELINE OPENCL AND MEMORY MANAGEMENT
RAW Decoder
(GPU)
JPEG Encoder
(CPU)
IMG_0077.CR2 NEW.JPG
Photo Retouch
(CPU & GPU)
HOST MemoryFrame Buffer Frame Buffer
UN-MAP UN-MAP
Frame Buffer
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL7
� Performance can be improved by utilizing GPU compute
power (OpenCL 1.x)
‒ Improve RAW decode performance
‒ Improve EDITING (Retouch) performance
‒ OpenCL 1.x is great, however…
DEVICE MemoryFrame Buffer
MAP
Frame Buffer
MAP
UN-MAP
Frame Buffer
UN-MAP
MEMORY SPACE AND PERFORMANCE
� OpenCL 1.x can speed up performance substantially and
yet creates new challenges
‒ Buffering between HOST and DEVICE creates overheads
‒ Sometimes the overheads are taking up a large portion of
execution time
‒ DEVICE memory space is limited
‒ 512MB can only hold one 36MP photo, or two 24MP photos
‒ Creates more read and writes between HOST and DEVICE
RELATIVE KERNEL VS. BUFFER PERFORMANCE ANALYSIS
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL8
‒ Creates more read and writes between HOST and DEVICE
memories
512MB
Frame Buffer
36MP
Tiling
More Reads
More Writes
How AMD HSA helps in
Photo Editing?
RAW Decoder
OPTIMIZING PERFORMANCE WITH AMD HSATHE ADVANTAGE OF ADOPTING HSA WITH OPENCL
RAW Decoder JPEG Encoder
IMG_0077.CR2 NEW.JPG
Photo Retouch
HOST Memory
Frame Buffer Frame Buffer
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL10
� Using AMD HSA to improve performance over OpenCL 1.x
‒ Share virtual memory breaks border of CPU and GPU
‒ Reduce overheads of moving data
‒ Use AMD APU platform to achieve true Heterogeneous Computing
DEVICE Memory
Frame Buffer Frame Buffer Frame Buffer
3 LEVELS OF SHARED VIRTUAL MEMORY
� 3 Levels of Shared Virtual Memory support (can be configured during initialization)
‒ Coarse Grain Buffer
‒ Ability to share virtual pointers between HOST and DEVICE
‒ Fine Grain Buffer
‒ Ability to share buffer space between HOST and DEVICE
CHOOSING SHARED VIRTUAL MEMORY
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL11
‒ Fine Grain System Buffer
‒ Ability to allow DEVICE to access entire HOST address space
‒ **Eliminates the need to specify explicit SVM pointers
� Coding Complexity
‒ Complexity: Coarse Grain > Fine Grain > Fine Grain System
COARSE GRAIN SHARED BUFFER
� PhotoDirector’s existing code base does not contain excessive pointers, we are able to choose the buffer
type that gives the best performance
OPENCL BUFFER VS. HSA BUFFER
Standard OCL Buffers
HOST DEVICE HOST DEVICE
Buffer 1 Buffer 1 Buffer 1
HSA Coarse Grain Buffers
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL12
Buffer 1
Buffer 2
Buffer 1
Buffer 2Buffer 1
Buffer 2
Buffer 1
Buffer 2
Proof of concept:
HSA Performance Showcase
AMD HSA BUFFER TYPESRELATIVE PERFORMANCE COMPARISON
� Our proof of concept codes showed
potential performance difference
‒ Good potential performance when using
Coarse Grain Buffers
‒ Results show roughly 2x difference between
Coarse Grain vs. Fine Grain implementation
CoarseGrain
Performance Index of Applying Hue Change to RAW Photo
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL14
Coarse Grain vs. Fine Grain implementation
FineGrain
Test Platform
CPU: AMD KAVERI
RAM: 4GB
OS: Windows 7 64-bit
Test Tool
PhotoDirector 5 Testbed
Key Takeaways
KEY TAKEAWAY
� AMD HSA shows great potential for
photo editing application
– CyberLink PhotoDirector
‒ Many more photo editing tasks can
leverage the performance advantage on
AMD HSA Platforms
AMD HSA SHOWS GREAT POTENTIAL
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL16
AMD HSA Platforms
‒ It’s important to experiment and work
with the most suitable HSA buffer type
‒ Potential performance improvements for
Parallelizable and Memory intensive
applications
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
| PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL17
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names
are for informational purposes only and may be trademarks of their respective owners.