Date post: | 22-Jan-2018 |
Category: |
Documents |
Upload: | tanvee-katyal |
View: | 79 times |
Download: | 0 times |
Technische Universität München
Efficient Parallelization of Robustness Validation for Digital
Circuits
Suyash Shukla
Institute for Electronic Design AutomationUniv.-Prof. Dr.-Ing. Ulf Schlichtmann
Technische Universität München
Agenda
Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 2
Technische Universität München
Agenda
Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 3
Technische Universität München
Basic concepts of parallel computing
Block of code executed concurrently by multiple threads. Utilizes multiple core processors on a single machine. Large functions are broken into smaller discrete parts which are
executed concurrently ( iterate_d( ) ). Instructions from each part execute simultaneously on different
processors depending on the ‘number of threads’. Each instruction has its own copy of data input tg. OpenMP interface was chosen for parallel computing.
Note: tg dynamically stores functions of class TG.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 4
Technische Universität München
Agenda
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 5
Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion
Technische Universität München
OpenMP Interface
OpenMP (Open Multi-processing) is an Application Programming Interface that supports shared-memory parallel programming in C++ and FORTRAN.
It consists of its own:1. Directives2. Execution Environment Routines3. Timing Routines4. Environment variables
A parallel region is created by the directive #pragma omp parallel Code within this region is executed by multiple threads
simultaneously but in a random thread order.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 6
Technische Universität München
Multi-threading depends on environment variable, void omp_set_num_threads (int num_threads), where the master thread “forks” out a specific number of worker threads.
Each thread has its thread ID which can be called by int omp_get_thread_num (void). At the end of parallel region, threads “join” back to one master thread.
Note: In my project, the number of threads is defined by NUM_THREADS which can be set in the command line by --numthreads <int NUM_THREADS>
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 7
Code in Parallel region
Master thread
“Fork” “Join”
Code in Parallel region
Thread ID = 0
Technische Universität München
Agenda
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 8
Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion
Technische Universität München
Existing serial code for evaluating Robustness
Robustness is calculating by:1. Verifying specification points ( verifySpec() , verifyPoint () ).2. Checking if any of the 4 points are violated or not
(SpecViolated).3. Iterating (iterate_d ()) from these 4 points and calculating a
validPoint.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 9
Volt
TemptempReq.first tempReq.second
voltReq.first
voltReq.second
‘v’
‘t’
Technische Universität München
4. 8 valid Points are stored in a vector ‘m_validPoints’.5. The function probs( ) iterates through vector m_validPoints and
returns a doublePair value. This value is sent to the function robustness( ) which returns a double “robustnessprobValue”.
Area is calculated by ‘dichotomy’ where the iterations in 8 directions take place serially, one after the other, which is very time consuming.
The iterate_d function calls verifyPoint( ) and verifyTolerance( ) which uses *tg to access the useProfile, updateArrivalTime( ) and getSinkArrivalTime( ).
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 10
Technische Universität München
Agenda
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 11
Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion
Technische Universität München
Ideology
My main aim was to speed up the process of calculating robustness, especially for larger circuits.
To make this possible, I had to run the iterate_d ( ) function in parallel. This would help me to compute all the 8 validPoints concurrently.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 12
V
T
validPoints 1.176
1.212
65-20
Technische Universität München
If I parallelize the function iterate_d( ), multiple threads also execute the other functions such as verifyPoint( ) and verifyTolerance( ) simultaneously.
Each of these functions need their own copy of *tg.
Hence, depending on the number of threads, NUM_THREADS, I create those many copies of ‘tg’ and store it in a vector named ‘TGVec’.
Each instance of tg can then be accessed by TGVec[thread ID]; tg0 is stored in TGVec[0], tg1 is stored in TGVec[1] and so on.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 13
Technische Universität München
Agenda
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 14
Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion
Technische Universität München
Changes made in the existing code
1. TG *tg; tg = new TG;
Instead of declaring just 1 tg, I now create several copies of tg’s depending on the number of threads, NUM_THREADS. I then store them in a vector.
std :: vector<TG*> TGVec;
for ( int j = 0 ; j < NUM_THREADS ; j++)
{
TG *tg;
tg = new TG;
TGVec.push_back(tg); }
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 15
Technische Universität München
2. tg -> loadTimingLib (xmllib);
Since there are several copies of tg now, I need to point every tg to the functions:
- TGVec[i]-> loadOA(oalib, oadesign, oaview);- TGVec[i]-> loadConstraintsLib(xmlconstrlib);- TGVec[i]-> set_useProfile(prof1);- TGVec[i]->getSourceNodes();- TGVec[i]->getSinkNodes();- TGVec[i]->getNodes();
for ( int i = 0 ; i < TGVec.size() ; i++ )
{
TGVec[i] -> loadTimingLib(xmllib); }
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 16
Technische Universität München
3. useProfile *oldProf = m_timer -> get_useProfile(); useProfile newProf;
m_timer was declared as ‘static TG *m_timer ;’ But since this function is executing in parallel in verifyPoint() and verifyTolerance(), I need to create multiple copies of ‘m_timer’ or ‘tgs’
useProfile *oldProf = m_TGVec[omp_get_thread_num()] -> get_useProfile();
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 17
tg 0 tg 1 tg 2 tg 3verifyPoint ( );verifyTolerance ( );
calculateRobustness( );
Iterate_d ( )
Technische Universität München
4. startPoint = doublePair (m_tempReq.first, m_voltReq.first); boundary = doublePair (m_tempReq.first, m_tempLimit.first); iterate_d (startPoint, boundary, 't', m_intervalLimit);
The function iterate_d () is no longer a void function. It now “returns” a double pair value, start.first and start.second.
After iterating 8 times in parallel sections, and hitting the barrier directive, each thread waits for the rest to finish their computation.
The 8 values are then stored in a vector ‘m_validPoints’. It is important to maintain the order to the iteration values starting from voltReq.first, tempReq.first, iterating with respective to the temperature axis.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 18
Technische Universität München
#pragma omp parallel // Parallel region begins here
{
#pragma omp sections // Code is distributed and executed over the threads
{
#pragma omp section
{
startPoint = doublePair (m_tempReq.first, m_voltReq.first);
boundary = doublePair (m_tempReq.first, m_tempLimit.first);temp1 = iterate_d (startPoint, boundary, 't', m_intervalLimit);
printf (“Iteration from 1st point thread %d\n", omp_get_thread_num()); }
}
#pragma omp barrier // All threads wait here for each other
} // Parallel region ends here m_validPoints.push_back (doublePair (temp1.first, temp1.second) );
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 19
Technische Universität München
21.02.2011 Zuverlässigkeit von CMOS-Schaltungen 20
#pragma omp sectioniterate_d ( )
Returns doublePair temp1
#pragma omp barrier
Returns doublePair temp2
Returns doublePair temp3
Returns doublePair temp4
Returns doublePair temp5
m_validPoints.push_back
#pragma omp parallel
#pragma omp sectioniterate_d ( )
#pragma omp sectioniterate_d ( )
#pragma omp sectioniterate_d ( )
#pragma omp sectioniterate_d ( )
Vector “m_validPoints”
temp1 is stored first in this vector
temp5 is stored in the end this vector
Technische Universität München
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 21
5. To analyze the results, I used a OpenMP Timing routine, double omp_get_wtime(void);
This returns the real time elapsed in seconds for any kind of computations or functions.
double OmpStart;
double OmpEnd;
OmpStart = omp_get_wtime();
{ … calculateRobustness ( ) … }
OmpEnd = omp_get_wtime();std::cout << "Robustness at "<< age << "years calculated in ";
std::cout << static_cast<double> (OmpEnd - OmpStart) << " OpenMP real time seconds! ";std::cout << std::endl;
Technische Universität München
6. To set the number of threads for parallel computing, I could either declare it by hard coding: ‘#define NUM_THREADS 2’ or any other int value like 4 or 8. Instead I declared it as a Command Line argument.
The number of threads can now be set by - - numthreads <int> in the command line. The advantage is that, the user need not compile it every time. It is an automated program!
TCLAP::ValueArg<int> numThreadsArg("", "numthreads", "Override Number of threads usage", false, 1, "int", cmd);
If nothing is declared in the command line, number of threads will be set to 1, as default.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 22
Technische Universität München
Agenda
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 23
Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion
Technische Universität München
Results of parallel computing
The program was compiled with multiple threads and OpenMP, #include <omp.h>
Hence, computation time for robustness was much lesser. It speeded the calculation by almost 2 times.
The program utilizes all the resources. The real computation time decreases with increase in number of
threads, even though the CPU computation time increases.
For rechner machines (4 core processors), 8 threads were resulting to be the best and 12 threads were observed to be the maximum.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 24
Technische Universität München
The program utilizes both the core processors for running the program ‘Robustness’.
CPU usage is 157% which implies both the CPUs are busy with the same program.
With increase in processors, the %CPU usage increases too.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 25
Technische Universität München
Results for:NangateDesign: cell 1908_i89 ( NUM_THREADS = 2 )Dimensions: 3D
Machine: Rechner2 04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 26
Technische Universität München
Results for:– NangateDesigns: cell c1908_i89– Dimension: 2D ( Age 10 years )– Machine: Rein ( 2 processors )
By increasing NUM_THREADS, the run time reduces and the robustness value remains unchanged. The program executes the code faster without doubt.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 27
Num_threads Robustness value CPU Time OpenMP Real Time
1 0.317204 47.15 47.1668 sec
2 0.317204 48.6 28.8759 sec
4 0.317204 48.29 28.1237 sec
8 0.317204 48.47 27.5828 sec
Technische Universität München
Results for:– NangateDesigns: cell c1908_i89– Dimension: 2D ( Age 10 years )– Machine: Rechner3 ( 4 processors )
Results here seem to be as expected too!
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 28
Num_threads Robustness value CPU Time OpenMP Real Time
1 0.317204 43.55 43.6704 sec
2 0.317204 43.53 25.3362 sec
4 0.317204 44.19 16.4133 sec
8 0.317204 47.97 11.563 sec
16 0.317204 50.06 11.7313 sec
Technische Universität München
Graphical representation of OpenMP real time
Based on the values from the previous slides.
The time elapsed for computation decreases with increase in OpenMP threads.
21.02.2011 Zuverlässigkeit von CMOS-Schaltungen 29
Technische Universität München
Agenda
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 30
Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion
Technische Universität München
Further Improvements
The number of OpenMP threads can also be set as a ‘environment variable’ in the linux terminal by the command.
export OMP_NUM_THREADS = <int #>
int NUM_THREADS could then be defined as
int NUM_THREADS = omp_get_num_threads( );
With this working, I wouldn’t need the command line input for number of threads.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 31
Technische Universität München
Agenda
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 32
Basic concepts of parallel computing OpenMP Interface Existing serial code for evaluating robustness Ideology Changes made in the existing code Results of parallel computing Further improvements Conclusion
Technische Universität München
Conclusion
Parallelism has been employed for more than a few years now, mainly in high-performance computers, but with multi core computers being so common these days, the interest has grown massively.
Parallel computer programs are more difficult to write than sequential programs as it requires more planning and skills to troubleshoot software bugs like race conditions.
The objective of this bachelor thesis, however, has been met. Parts of the robustness validation programs are now efficiently
parallelized, which speed up the whole process of robustness calculation.
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 33
Technische Universität München
04.01.2013 Zuverlässigkeit von CMOS-Schaltungen 34