+ All Categories
Home > Documents > Suyash Thesis Presentation

Suyash Thesis Presentation

Date post: 22-Jan-2018
Category:
Upload: tanvee-katyal
View: 79 times
Download: 0 times
Share this document with a friend
34
Technische Universität München Efficient Parallelization of Robustness Validation for Digital Circuits Suyash Shukla Institute for Electronic Design Automation Univ.-Prof. Dr.-Ing. Ulf Schlichtmann
Transcript
Page 1: Suyash Thesis Presentation

Technische Universität München

Efficient Parallelization of Robustness Validation for Digital

Circuits

Suyash Shukla

Institute for Electronic Design AutomationUniv.-Prof. Dr.-Ing. Ulf Schlichtmann

Page 2: Suyash Thesis Presentation

Technische Universität München

Agenda

Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 2

Page 3: Suyash Thesis Presentation

Technische Universität München

Agenda

Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 3

Page 4: Suyash Thesis Presentation

Technische Universität München

Basic concepts of parallel computing

Block of code executed concurrently by multiple threads. Utilizes multiple core processors on a single machine. Large functions are broken into smaller discrete parts which are

executed concurrently ( iterate_d( ) ). Instructions from each part execute simultaneously on different

processors depending on the ‘number of threads’. Each instruction has its own copy of data input tg. OpenMP interface was chosen for parallel computing.

Note: tg dynamically stores functions of class TG.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 4

Page 5: Suyash Thesis Presentation

Technische Universität München

Agenda

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 5

Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion

Page 6: Suyash Thesis Presentation

Technische Universität München

OpenMP Interface

OpenMP (Open Multi-processing) is an Application Programming Interface that supports shared-memory parallel programming in C++ and FORTRAN.

It consists of its own:1. Directives2. Execution Environment Routines3. Timing Routines4. Environment variables

A parallel region is created by the directive #pragma omp parallel Code within this region is executed by multiple threads

simultaneously but in a random thread order.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 6

Page 7: Suyash Thesis Presentation

Technische Universität München

Multi-threading depends on environment variable, void omp_set_num_threads (int num_threads), where the master thread “forks” out a specific number of worker threads.

Each thread has its thread ID which can be called by int omp_get_thread_num (void). At the end of parallel region, threads “join” back to one master thread.

Note: In my project, the number of threads is defined by NUM_THREADS which can be set in the command line by --numthreads <int NUM_THREADS>

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 7

Code in Parallel region

Master thread

“Fork” “Join”

Code in Parallel region

Thread ID = 0

Page 8: Suyash Thesis Presentation

Technische Universität München

Agenda

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 8

Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion

Page 9: Suyash Thesis Presentation

Technische Universität München

Existing serial code for evaluating Robustness

Robustness is calculating by:1. Verifying specification points ( verifySpec() , verifyPoint () ).2. Checking if any of the 4 points are violated or not

(SpecViolated).3. Iterating (iterate_d ()) from these 4 points and calculating a

validPoint.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 9

Volt

TemptempReq.first tempReq.second

voltReq.first

voltReq.second

‘v’

‘t’

Page 10: Suyash Thesis Presentation

Technische Universität München

4. 8 valid Points are stored in a vector ‘m_validPoints’.5. The function probs( ) iterates through vector m_validPoints and

returns a doublePair value. This value is sent to the function robustness( ) which returns a double “robustnessprobValue”.

Area is calculated by ‘dichotomy’ where the iterations in 8 directions take place serially, one after the other, which is very time consuming.

The iterate_d function calls verifyPoint( ) and verifyTolerance( ) which uses *tg to access the useProfile, updateArrivalTime( ) and getSinkArrivalTime( ).

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 10

Page 11: Suyash Thesis Presentation

Technische Universität München

Agenda

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 11

Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion

Page 12: Suyash Thesis Presentation

Technische Universität München

Ideology

My main aim was to speed up the process of calculating robustness, especially for larger circuits.

To make this possible, I had to run the iterate_d ( ) function in parallel. This would help me to compute all the 8 validPoints concurrently.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 12

V

T

validPoints 1.176

1.212

65-20

Page 13: Suyash Thesis Presentation

Technische Universität München

If I parallelize the function iterate_d( ), multiple threads also execute the other functions such as verifyPoint( ) and verifyTolerance( ) simultaneously.

Each of these functions need their own copy of *tg.

Hence, depending on the number of threads, NUM_THREADS, I create those many copies of ‘tg’ and store it in a vector named ‘TGVec’.

Each instance of tg can then be accessed by TGVec[thread ID]; tg0 is stored in TGVec[0], tg1 is stored in TGVec[1] and so on.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 13

Page 14: Suyash Thesis Presentation

Technische Universität München

Agenda

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 14

Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion

Page 15: Suyash Thesis Presentation

Technische Universität München

Changes made in the existing code

1. TG *tg; tg = new TG;

Instead of declaring just 1 tg, I now create several copies of tg’s depending on the number of threads, NUM_THREADS. I then store them in a vector.

std :: vector<TG*> TGVec;

for ( int j = 0 ; j < NUM_THREADS ; j++)

{

TG *tg;

tg = new TG;

TGVec.push_back(tg); }

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 15

Page 16: Suyash Thesis Presentation

Technische Universität München

2. tg -> loadTimingLib (xmllib);

Since there are several copies of tg now, I need to point every tg to the functions:

- TGVec[i]-> loadOA(oalib, oadesign, oaview);- TGVec[i]-> loadConstraintsLib(xmlconstrlib);- TGVec[i]-> set_useProfile(prof1);- TGVec[i]->getSourceNodes();- TGVec[i]->getSinkNodes();- TGVec[i]->getNodes();

for ( int i = 0 ; i < TGVec.size() ; i++ )

{

TGVec[i] -> loadTimingLib(xmllib); }

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 16

Page 17: Suyash Thesis Presentation

Technische Universität München

3. useProfile *oldProf = m_timer -> get_useProfile(); useProfile newProf;

m_timer was declared as ‘static TG *m_timer ;’ But since this function is executing in parallel in verifyPoint() and verifyTolerance(), I need to create multiple copies of ‘m_timer’ or ‘tgs’

useProfile *oldProf = m_TGVec[omp_get_thread_num()] -> get_useProfile();

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 17

tg 0 tg 1 tg 2 tg 3verifyPoint ( );verifyTolerance ( );

calculateRobustness( );

Iterate_d ( )

Page 18: Suyash Thesis Presentation

Technische Universität München

4. startPoint = doublePair (m_tempReq.first, m_voltReq.first); boundary = doublePair (m_tempReq.first, m_tempLimit.first); iterate_d (startPoint, boundary, 't', m_intervalLimit);

The function iterate_d () is no longer a void function. It now “returns” a double pair value, start.first and start.second.

After iterating 8 times in parallel sections, and hitting the barrier directive, each thread waits for the rest to finish their computation.

The 8 values are then stored in a vector ‘m_validPoints’. It is important to maintain the order to the iteration values starting from voltReq.first, tempReq.first, iterating with respective to the temperature axis.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 18

Page 19: Suyash Thesis Presentation

Technische Universität München

#pragma omp parallel // Parallel region begins here

{

#pragma omp sections // Code is distributed and executed over the threads

{

#pragma omp section

{

startPoint = doublePair (m_tempReq.first, m_voltReq.first);

boundary = doublePair (m_tempReq.first, m_tempLimit.first);temp1 = iterate_d (startPoint, boundary, 't', m_intervalLimit);

printf (“Iteration from 1st point thread %d\n", omp_get_thread_num()); }

}

#pragma omp barrier // All threads wait here for each other

} // Parallel region ends here m_validPoints.push_back (doublePair (temp1.first, temp1.second) );

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 19

Page 20: Suyash Thesis Presentation

Technische Universität München

21.02.2011 Zuverlässigkeit von CMOS-Schaltungen 20

#pragma omp sectioniterate_d ( )

Returns doublePair temp1

#pragma omp barrier

Returns doublePair temp2

Returns doublePair temp3

Returns doublePair temp4

Returns doublePair temp5

m_validPoints.push_back

#pragma omp parallel

#pragma omp sectioniterate_d ( )

#pragma omp sectioniterate_d ( )

#pragma omp sectioniterate_d ( )

#pragma omp sectioniterate_d ( )

Vector “m_validPoints”

temp1 is stored first in this vector

temp5 is stored in the end this vector

Page 21: Suyash Thesis Presentation

Technische Universität München

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 21

5. To analyze the results, I used a OpenMP Timing routine, double omp_get_wtime(void);

This returns the real time elapsed in seconds for any kind of computations or functions.

double OmpStart;

double OmpEnd;

OmpStart = omp_get_wtime();

{ … calculateRobustness ( ) … }

OmpEnd = omp_get_wtime();std::cout << "Robustness at "<< age << "years calculated in ";

std::cout << static_cast<double> (OmpEnd - OmpStart) << " OpenMP real time seconds! ";std::cout << std::endl;

Page 22: Suyash Thesis Presentation

Technische Universität München

6. To set the number of threads for parallel computing, I could either declare it by hard coding: ‘#define NUM_THREADS 2’ or any other int value like 4 or 8. Instead I declared it as a Command Line argument.

The number of threads can now be set by - - numthreads <int> in the command line. The advantage is that, the user need not compile it every time. It is an automated program!

TCLAP::ValueArg<int> numThreadsArg("", "numthreads", "Override Number of threads usage", false, 1, "int", cmd);

If nothing is declared in the command line, number of threads will be set to 1, as default.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 22

Page 23: Suyash Thesis Presentation

Technische Universität München

Agenda

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 23

Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion

Page 24: Suyash Thesis Presentation

Technische Universität München

Results of parallel computing

The program was compiled with multiple threads and OpenMP, #include <omp.h>

Hence, computation time for robustness was much lesser. It speeded the calculation by almost 2 times.

The program utilizes all the resources. The real computation time decreases with increase in number of

threads, even though the CPU computation time increases.

For rechner machines (4 core processors), 8 threads were resulting to be the best and 12 threads were observed to be the maximum.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 24

Page 25: Suyash Thesis Presentation

Technische Universität München

The program utilizes both the core processors for running the program ‘Robustness’.

CPU usage is 157% which implies both the CPUs are busy with the same program.

With increase in processors, the %CPU usage increases too.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 25

Page 26: Suyash Thesis Presentation

Technische Universität München

Results for:NangateDesign: cell 1908_i89 ( NUM_THREADS = 2 )Dimensions: 3D

Machine: Rechner2 04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 26

Page 27: Suyash Thesis Presentation

Technische Universität München

Results for:– NangateDesigns: cell c1908_i89– Dimension: 2D ( Age 10 years )– Machine: Rein ( 2 processors )

By increasing NUM_THREADS, the run time reduces and the robustness value remains unchanged. The program executes the code faster without doubt.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 27

Num_threads Robustness value CPU Time OpenMP Real Time

1 0.317204 47.15 47.1668 sec

2 0.317204 48.6 28.8759 sec

4 0.317204 48.29 28.1237 sec

8 0.317204 48.47 27.5828 sec

Page 28: Suyash Thesis Presentation

Technische Universität München

Results for:– NangateDesigns: cell c1908_i89– Dimension: 2D ( Age 10 years )– Machine: Rechner3 ( 4 processors )

Results here seem to be as expected too!

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 28

Num_threads Robustness value CPU Time OpenMP Real Time

1 0.317204 43.55 43.6704 sec

2 0.317204 43.53 25.3362 sec

4 0.317204 44.19 16.4133 sec

8 0.317204 47.97 11.563 sec

16 0.317204 50.06 11.7313 sec

Page 29: Suyash Thesis Presentation

Technische Universität München

Graphical representation of OpenMP real time

Based on the values from the previous slides.

The time elapsed for computation decreases with increase in OpenMP threads.

21.02.2011 Zuverlässigkeit von CMOS-Schaltungen 29

Page 30: Suyash Thesis Presentation

Technische Universität München

Agenda

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 30

Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion

Page 31: Suyash Thesis Presentation

Technische Universität München

Further Improvements

The number of OpenMP threads can also be set as a ‘environment variable’ in the linux terminal by the command.

export OMP_NUM_THREADS = <int #>

int NUM_THREADS could then be defined as

int NUM_THREADS = omp_get_num_threads( );

With this working, I wouldn’t need the command line input for number of threads.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 31

Page 32: Suyash Thesis Presentation

Technische Universität München

Agenda

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 32

Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion

Page 33: Suyash Thesis Presentation

Technische Universität München

Conclusion

Parallelism has been employed for more than a few years now, mainly in high-performance computers, but with multi core computers being so common these days, the interest has grown massively.

Parallel computer programs are more difficult to write than sequential programs as it requires more planning and skills to troubleshoot software bugs like race conditions.

The objective of this bachelor thesis, however, has been met. Parts of the robustness validation programs are now efficiently

parallelized, which speed up the whole process of robustness calculation.

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 33

Page 34: Suyash Thesis Presentation

Technische Universität München

04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 34


Recommended