BP= C.(1-Ap).b.IR [wasted clock / instruction] (4)
Where we denoted:
C = number of penalty cycles wasted due to a branch miss-prediction;
Ap = prediction accuracy;
b = the ratio of the number of branches reported to the number of total instructions;
IR = the average number of instructions that are executed per cycle.
CPIreal = CPIideal + BP [clock cycle / instruction] (5)
Where:CPIideal represents the average number of cycles per instruction considering a perfect branch prediction (Ap=100% => BP=0). It is obviously that CPIideal < 1.
CPIreal represents the average number of cycles per instruction considering a real branch prediction (Ap<100% => BP>0 => CPIreal > CPIideal).
Therefore, the real processing rate (the average number of instructions executed per cycle) results immediately from formula:
IRreal = [instruction / clock cycle] (6)
The relation (6) proves the non-linearly correlation between processing rate (IR) and prediction accuracy (Ap). With these metrics, we adapted the model to our results presented during this paragraph. Further, we use the following notations:
x = the ratio of biased context instances
1 - x = the ratio of unbiased context instances
Since Apglobal represents a weighted mean among predictions accuracies applied both to bias and unbiased branches, the biased prediction accuracy, Apbiased, can be determined .Apglobal = x.Apbiased + (1-x).Apunbiased (7)
Let's consider the follwing example:
Apglobal = 0.936 = 0.8253*Abiased + 0.1747*0.722 (where x = 0.8253)
It results that Apbiased = 0.9813.
Therefore, further we determined how much is influenced the branch penalty (BP) by the increasing of context length and what is the speed-up in these conditions. For this, we softly modified the initial model proposed by Chang [2] by substituting Ap with our Apglobal, according to formula (7). Thus, it is considered a penalty introduced for miss-prediction of biased branches – the term (1-Apbiased).x, respectively for considered wrong prediction of all unbiased branches (Apunbiased=0) – the term (1-x).
Model proposed by Chang | Our “modified” model |
BP= C.(1-Ap).b.IR | BP = C.b.IR.[1– x.Apbiased] (8) |
In our previous research [3] we proved a decrease of unbiased branches (1-x) by extending the context length that leads to a reduction of branch penalty (BP) according to (8), and implicitly to a greater IR according to (6). It can be written the followings relations ( means a growth):
Next, we computed the IR relative speedup when varying the context length. Starting from the well-known metric
, where L is the feature’s length, L {20, 24, and 28}, we obtained the relative speedup:
Relative Speedup .
The baseline processor model has an IRideal of 4 [instruction / clock cycle]. The number of penalty cycles wasted due to a branch miss-prediction considered in our model is 7. The ratio of the number of simulated branches over the number of total simulated instructions is b=8% (generated by our previous simulations [3]).
Figure 1 illustrates not only the necessity of a greater number of prediction features to improve the processor’s performance, but also the necessity of new performing branch predictors that could consider a larger amount of information in generating predictions (but without scaling exponentially with input features set length).
The analytical model described above is implemented by ABPS.
ABPS computes:
ABPS computes the above metrics using the following mathematical background:
Note: The Relative speedup (S) represents the Issue Rate gain, relative to a specified configuration (above, IR(16) means the IR for a configuration having 16 bits of history length; L can be in this case: 20, 24, 28, like in Figure 1)
References:
[1] Vinţan L., Prediction Techniques in Advanced Computing Architectures (in english), MatrixRom Publishing House, Bucharest, ISBN 978-973-755-137-5, 2007.
[2] Chang P.-Y., Hao E., Yeh T.-Y., Patt Y. N., Branch Classification: A New Mechanism for Improving Branch Predictor Performance, Proceedings of the 27th International Symposium on Microarchitecture, San Jose, California, 1994.
[3] Vinţan L, Gellert A., Florea A., Oancea M., Egan C., Understanding Prediction Limits through Unbiased Branches, Eleventh Asia-Pacific Computer Systems Architecture Conference, Shanghai 6-8th, September, 2006 (“Lecture Notes in Computer Science” vol. 4186, pp. 480-487, Springer-Verlag, Berlin / Heidelberg, 2006)