Static Branch Frequency and
Program Profile Analysis
Divino César Soares Lucas
Laboratório de Sistemas de Computação
Instituto de Computação
UNICAMP
Youfeng Wu
Intel Labs
James R. Larus
University of Wisconsin
Schedule
1. Introduction
2. Related Work
3. Key Idea
4. Branch Prediction
5. Branch Probabilities
6. Combining Predictions
7. Local Block and Edge Frequency
8. From Local to Global Frequencies
9. Results
10. Conclusion
11. References
Introduction
• What is a program profile?
• Dynamic profile
• Static profile
• Why we need profile?
• Instruction scheduling
• Identifying program bottlenecks
• Enhance memory locality
Related Work
• Dynamic profile
• Work centered on reducing profiling overhead [3, 6]
• Static profile
• Simple estimation heuristics [4]
• Estimation based on markov models [5]
Key Idea [1]
• Predict Branches
• Use heuristics
• Compute Probabilities
• Use heuristic hit rates
• Compute Frequency
• Use probabilities
Branch Prediction
• A branch prediction predicts if a branch will be taken or not
taken. It’s a binary decision!
• Some static heuristics [2]:
• LBH - Loop Branch Heuristic
• PH - Pointer Heuristic
• OH - Opcode Heuristic
• GH - Guard Heuristic
• LEH - Loop Exit Heuristic
• LHH - loop Header Heuristic
• CH - Call Heuristic
• SH - Store Heuristic
• RH - Return Heuristic
Branch Probabilities
• A branch probability is a estimate whether the branch will
be taken or not. It’s a continuous value among [0, 1].
Heuristic H.R.
Loop Branch Header 88%
Pointer Heuristic 60%
Opcode Heuristic 84%
Guard Heuristic 62%
Loop Exit Heuristic 80%
Loop Header Heuristic 75%
Call Heuristic 78%
Store Heuristic 55%
Return Heuristic 72%
• We will use these Hit Rates as
branch probabilities.
Combining Predictions
• What happen if two or more heuristics are applicable?
if (k < 0) then
k = y;
else
return ;
end-if
• OH predicts the then part! (With 84% of hit rate).
• RH predicts the else part! (With 72% of hit rate).
• In these situations we use Dempster-
Shafer algorithm…
Combining Predictions
• Each branch has a set of possible targets. In our case two,
taken or not taken:
𝐵 = *𝑡1, 𝑡2+
• Each heuristic gives a evidence that an event can happen:
1 𝑡1 = 𝑎 1 𝑡2 = 1 − 𝑎
2 𝑡1 = 𝑏 2 𝑡2 = 1 − 𝑏
• Dempster-Shafer algorithm combine these evidences:
1⊕2 𝑡1 = 1(𝑡1)2(𝑡1)
1 𝑡1 2 𝑡1 + 1(𝑡2)2(𝑡2)
1⊕2 𝑡2 = 1(𝑡2)2(𝑡2)
1 𝑡1 2 𝑡1 + 1(𝑡2)2(𝑡2)
Combining Predictions
Example:
1 𝑡1 = 0.5 1 𝑡2 = 0.5
2 𝑡1 = 0.7 2 𝑡2 = 0.3
1⊕2 𝑡1 = 0.5𝑥0.7
0.5𝑥0.7+0.5𝑥0.3 = 0.7
3 𝑡1 = 0.6 3 𝑡2 = 0.4
1⊕2 𝑡2 = 0.5𝑥0.3
0.5𝑥0.7+0.5𝑥0.3 = 0.3
2⊕3 𝑡1 = 0.7𝑥0.6
0.7𝑥0.6+0.3𝑥0.4 = 0.778
2⊕3 𝑡2 = 0.3𝑥0.4
0.7𝑥0.6+0.3𝑥0.4 = 0.222
Local Block and Edge Frequency
• The Branch/Edge frequency is a estimate of how often a
block or edge is executed or taken.
• We calculate local branch/block frequency by propagating
branch probabilities, that is:
bfreq(bi) = 1 bi is entry
bfreq(bi) = 𝑓𝑟𝑒𝑞(𝑏𝑝 → 𝑏𝑖) 𝑏𝑝 ∊ 𝑝𝑟𝑒𝑑 𝑏𝑖 otherwise
freq(bi → bj) = bfreq(bi) prob(bi → bj)
• But these formulas doesn’t work when we have a cycle!
Local Block and Edge Frequency
𝑏𝑓𝑟𝑒𝑞 𝑏0 = 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + 𝑓𝑟𝑒𝑞(𝑏𝑖𝑘𝑖=1 → 𝑏0)
= 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + (𝑏𝑓𝑟𝑒𝑞(𝑏𝑖𝑘𝑖=1 )𝑝𝑟𝑜𝑏(𝑏𝑖 → 𝑏0))
= 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + (𝑏𝑓𝑟𝑒𝑞(𝑏0𝑘𝑖=1 )𝑟𝑖𝑝𝑟𝑜𝑏(𝑏𝑖 → 𝑏0))
= 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + 𝑏𝑓𝑟𝑒𝑞(𝑏0) 𝑟𝑖𝑝𝑟𝑜𝑏(𝑏𝑖 → 𝑏0)𝑘𝑖=1
Let
𝑐𝑝 𝑏0 = 𝑟𝑖𝑝𝑟𝑜𝑏(𝑏𝑖 → 𝑏0)𝑘𝑖=1
𝑏𝑓𝑟𝑒𝑞 𝑏0 = 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + 𝑏𝑓𝑟𝑒𝑞 𝑏0 𝑐𝑝(𝑏0)
𝑏𝑓𝑟𝑒𝑞 𝑏0 = 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0)
1 − 𝑐𝑝(𝑏0)
From Local to Global Frequencies
• The frequency a function f calls another function g can be
expressed by – considering one invocation of f:
𝑙𝑓𝑟𝑒𝑞 𝑓, 𝑔 = bfreq(bi) calls(bi, g)
• The global frequency of f calling g is:
𝑔𝑓𝑟𝑒𝑞 𝑓, 𝑔 = cfreq(f) lfreq(f, g)
• Where:
𝑐𝑓𝑟𝑒𝑞 𝑓 = 1, 𝑓 𝑖𝑠 𝑚𝑎𝑖𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝑐𝑓𝑟𝑒𝑞 𝑓 = 𝑓𝑟𝑒𝑞(𝑝, 𝑓) 𝑝 ∊ 𝑝𝑟𝑒𝑑 𝑓 , 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒
• Global block/edge frequency can be calculated multiplying
function execution frequency by local block/edge frequency.
Results
• Results came from SPECint92 C benchmarks and some
Unix applications.
• The system used was a Sequent S2000/750 with i486
processors and the Sequent DYNIX/ptx C compiler 2.1.
• Use of Wall [5] weighted and unweighted match score.
Conclusion
• A new technique for static profile was presented.
• The technique introduced a new way to combine multiple
evidences for a branch outcome.
• Although the heuristics hit rate are from another
environment they resulted in considerable results.
References
[1] Y. Wu and J. R. Larus. Static Branch Frequency and Program Profile Analysis.
In Proceedings of the 27th Annual International Symposium on Microarchitecture.
pages 1-11, 1994.
[2] T. Ball and J. R. Larus. Branch prediction for free. In SIGPLAN Conference on
Programming Language Design and Implementation. pages 300-313, 1993.
[3] T. Ball and J. R. Larus. Optimally profilling and tracing programs. ACM
Transactions on Programming Languages and Systems. 16(4):1319-1360, July
1994.
[4] T. A. Wagner, V. Maverick, S. L. Graham, and M. A. Harrison. Accurate static
estimators for program optimization. In Proceedings of the ACM SIGPLAN’94
conference on Programming Language Design and Implementation. pages 85-96.
ACM Press, 1994.
References
[5] D. W. Wall. Predicting Program Behavior Using Real or Estimated Profiles.
Proceedings of ACM SIGPLAN’91 Conference on Programming Language Design
and Implementation. pages 59-70, 1991.
[6] V. Sarkar. Determining average program execution times and their variance. In
SIGPLAN Conference on Programming Language Design and Implementation.
pages 298.312, 1989.