+ All Categories
Home > Documents > Performance Analysis of Machine Learning Algorithms for...

Performance Analysis of Machine Learning Algorithms for...

Date post: 04-Oct-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
1
Performance Analysis of Machine Learning Algorithms for Phylanx: An Asynchronous Array Processing Toolkit Weile Wei, Rod Tohid, Bibek Wagle, Shahrzad Shirzad, Parsa Amini, Bita Hasheminezhad, Katy Williams, Adrian Serio, Hartmut Kaiser Louisiana State University Abstract Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. In this poster, we present early results that compare our implementation of widely used machine learning algorithms to accepted NumPy standards. Python Decorated Code Output Transformation Rules Frontend Optimizer Compiler Executor HPX AST AST AST Execution Tree Tasks Result Figure 1:Phylanx program flow Phylanx program flow: Phylanx frontend generates an AST (PhySL) of the decorated Python code. The AST could be directly passed to the compiler to generate the execution tree or, optionally, fed to the optimizer first and then the compiler. Once the Kernel is invoked, Phylanx triggers the evaluation of the the execution tree on HPX. After finishing the evaluation, the result is returned in Python. Phylanx and its External Libraries Visualization Tools Phylanx Frontend Optimizer Backend Compiler Executor Perf. Counters pybind11 Python NumPy Blaze HPX Figure 2:Phylanx toolkit and its interactions with external libraries. I Phylanx’s data structures rely on the high-performance open-source C++ library Blaze, which supports HPX as a parallelization library back-end and perfectly maps its data to Python data structures. I To avoid data copies between Python and C++, we take advantage of Python buffer protocol through pybind11 library. I Each Python list is mapped to a C++ vector and 1-D and 2-D NumPy arrays are mapped to a Blaze vector and Blaze matrix respectively. Visualization of AST using Traveler Tool Figure 3:Visualization of AST using Traveler Tool. References [1] R Tohid, Bibek Wagle, Shahrzad Shirzad, Patrick Diehl, Adrian Serio, Alireza Kheirkhahan, Parsa Amini, Katy Williams, Kate Isaacs, Kevin Huck, et al. Asynchronous execution of python code on task based runtime systems. arXiv preprint arXiv:1810.07591, 2018. Acknowledgments I This material is based upon work supported by the National Science Foundation under Grant No. 1737785. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. I This work is supported by The Defense Technical Information Center under the contract: DTIC Contract FA8075-14-D-0002/0007. Performance Results Figure 4:Comparing reference implementation of the Logistic Regression algorithm Squares in NumPy with the corresponding PhySL code. Experiment was performed on a node consisting of two Intel(R) Xeon(R) CPU E5-2660 v3 clocked at 2.6GHZ, with 10 cores (20 threads) each providing a total of 20 cores and 128 GB DDR4 Memory. Figure 5:Comparing reference implementation of the Alternating Least Squares in NumPy with the corresponding PhySL code. All listed experiments below were performed on nodes consisting of two Intel Xeon E5-2450 CPUs clocked at 2.10GHZ providing a total of 16 cores and 48GB 1333 MHZ DDR3 memory. Figure 6:Comparing reference implementation of the K-Means algorithm in NumPy with the corresponding PhySL code. Figure 7:Comparing reference implementation of the Neural Networks algorithm in NumPy with the corresponding PhySL code. Conclusion I Our early results show that the Alternating Least Square as well as the Logistic Regression algorithm outperforms the NumPy implementation in a few cases. I While Neural Networks and K-Means needs much improvement, we are confident that with new features and further performance improvements we will be able to match or outperform these NumPy benchmarks. https://github.com/STEllAR-GROUP/phylanx [email protected]
Transcript
Page 1: Performance Analysis of Machine Learning Algorithms for …stellar.cct.lsu.edu/pubs/wei_riken_2019_phylanx_poster.pdf · 2019. 2. 26. · Performance Analysis of Machine Learning

Performance Analysis of Machine LearningAlgorithms for Phylanx: An Asynchronous Array

Processing ToolkitWeile Wei, Rod Tohid, Bibek Wagle, Shahrzad Shirzad, Parsa Amini,

Bita Hasheminezhad, Katy Williams, Adrian Serio, Hartmut Kaiser

Louisiana State University

Abstract

Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPyoperations into code which can be executed in parallel on HPC resources by mappingPython and NumPy functions and variables into a dependency tree executed by HPX, ageneral purpose, parallel, task-based runtime system written in C++. In this poster, wepresent early results that compare our implementation of widely used machine learningalgorithms to accepted NumPy standards.

Python

DecoratedCode

Output

TransformationRules

Frontend

Optimizer

Compiler Executor HPXAST

AST AST

Execution

Tree

Tasks

Result

Figure 1:Phylanx program flow

Phylanx program flow: Phylanx frontend generates an AST (PhySL) of the decoratedPython code. The AST could be directly passed to the compiler to generate the executiontree or, optionally, fed to the optimizer first and then the compiler. Once the Kernel isinvoked, Phylanx triggers the evaluation of the the execution tree on HPX. After finishingthe evaluation, the result is returned in Python.

Phylanx and its External Libraries

Vis

ualiz

atio

nT

ools

Phylanx

Frontend

Optimizer

BackendCompiler

Executor

Perf. Counters

pybi

nd11

Python

NumPy

Blaze

HPX

Figure 2:Phylanx toolkit and its interactions

with external libraries.

I Phylanx’s data structures rely on the

high-performance open-source C++ library Blaze,

which supports HPX as a parallelization library

back-end and perfectly maps its data to Python data

structures.

I To avoid data copies between Python and C++, we

take advantage of Python buffer protocol through

pybind11 library.

I Each Python list is mapped to a C++ vector and

1-D and 2-D NumPy arrays are mapped to a Blaze

vector and Blaze matrix respectively.

Visualization of AST using Traveler Tool

Figure 3:Visualization of AST using Traveler Tool.

References

[1] R Tohid, Bibek Wagle, Shahrzad Shirzad, Patrick Diehl, Adrian Serio, Alireza Kheirkhahan,Parsa Amini, Katy Williams, Kate Isaacs, Kevin Huck, et al.Asynchronous execution of python code on task based runtime systems.arXiv preprint arXiv:1810.07591, 2018.

Acknowledgments

I This material is based upon work supported by the National Science Foundation underGrant No. 1737785. Any opinions, findings, and conclusions or recommendations expressedin this material are those of the author(s) and do not necessarily reflect the views of theNational Science Foundation.

I This work is supported by The Defense Technical Information Center under the contract:DTIC Contract FA8075-14-D-0002/0007.

Performance Results

Figure 4:Comparing reference implementation of the Logistic Regression algorithm Squares in

NumPy with the corresponding PhySL code. Experiment was performed on a node consisting

of two Intel(R) Xeon(R) CPU E5-2660 v3 clocked at 2.6GHZ, with 10 cores (20 threads) each

providing a total of 20 cores and 128 GB DDR4 Memory.

Figure 5:Comparing reference implementation of the Alternating Least Squares in NumPy with

the corresponding PhySL code. All listed experiments below were performed on nodes consisting

of two Intel Xeon E5-2450 CPUs clocked at 2.10GHZ providing a total of 16 cores and 48GB

1333 MHZ DDR3 memory.

Figure 6:Comparing reference implementation of the K-Means algorithm in NumPy with the

corresponding PhySL code.

Figure 7:Comparing reference implementation of the Neural Networks algorithm in NumPy with

the corresponding PhySL code.

Conclusion

I Our early results show that the Alternating Least Square as well as the Logistic Regressionalgorithm outperforms the NumPy implementation in a few cases.

I While Neural Networks and K-Means needs much improvement, we are confident that withnew features and further performance improvements we will be able to match oroutperform these NumPy benchmarks.

https://github.com/STEllAR-GROUP/phylanx [email protected]

Recommended