Function always returns 1 - c++

I´m trying to write a simple branch predictor that should output either TAKEN (1) or NOT_TAKEN (0) depending on history stored in int. However it always outputs TAKEN instead of dynamicaly changing the prediction.
#define PHT_CTR_MAX 3
#define PHT_CTR_INIT 2
class PREDICTOR{
private:
UINT32 counter;
public:
PREDICTOR(void);
bool GetPrediction(UINT64 PC);
void UpdatePredictor(UINT64 PC, OpType opType, bool resolveDir, bool predDir, UINT64 branchTarget);
};
PREDICTOR::PREDICTOR(void){
counter = PHT_CTR_INIT;
}
bool PREDICTOR::GetPrediction(UINT64 PC){
if(counter > (PHT_CTR_MAX/2)){
return TAKEN;
}else{
return NOT_TAKEN;
}
}
void PREDICTOR::UpdatePredictor(UINT64 PC, OpType opType, bool resolveDir, bool predDir, UINT64 branchTarget){
if(resolveDir == TAKEN){
SatIncrement(counter, PHT_CTR_MAX);
}else{
SatDecrement(counter);
}
}
PREDICTOR::PREDICTOR is used to "build" the predictor (create arrays, set initial values...), it is called right in the beginning.
PREDICTOR::GetPrediction should return either TAKEN (when counter = 3 or 2) or NOT_TAKEN (when counter = 0 or 1).
PREDICTOR::UpdatePredictor is called after GetPrediction. It updates the predictor via resolveDir - resolveDir is the actual direction of the branch.
If resolveDir = 1 it does saturated increment of counter (saturated means it never exceeds PHT_CTR_MAX).
If resolveDir = 0 it decrements the counter.
Although this predictor is really simple it does not work. It throws out exactly same results as if I just did GetPrediction{return TAKEN} which is obviously wrong. My coding skills aren´t really great so I might have done something wrong - probably in the GetPrediction or UpdatePredictor function.
Here is an example of predictor that works just fine, although this one is little bit more complex:
#define PHT_CTR_MAX 3
#define PHT_CTR_INIT 2
#define HIST_LEN 17
class PREDICTOR{
private:
UINT32 ghr; // global history register
UINT32 *pht; // pattern history table
UINT32 historyLength; // history length
UINT32 numPhtEntries; // entries in pht
public:
PREDICTOR(void);
bool GetPrediction(UINT64 PC);
void UpdatePredictor(UINT64 PC, OpType opType, bool resolveDir, bool predDir, UINT64 branchTarget);
PREDICTOR::PREDICTOR(void){
historyLength = HIST_LEN;
ghr = 0;
numPhtEntries = (1<< HIST_LEN);
pht = new UINT32[numPhtEntries];
for(UINT32 ii=0; ii< numPhtEntries; ii++){
pht[ii]=PHT_CTR_INIT;
}
}
bool PREDICTOR::GetPrediction(UINT64 PC){
UINT32 phtIndex = (PC^ghr) % (numPhtEntries);
UINT32 phtCounter = pht[phtIndex];
if(phtCounter > (PHT_CTR_MAX/2)){
return TAKEN;
}
else{
return NOT_TAKEN;
}
}
void PREDICTOR::UpdatePredictor(UINT64 PC, OpType opType, bool resolveDir, bool predDir, UINT64 branchTarget){
UINT32 phtIndex = (PC^ghr) % (numPhtEntries);
UINT32 phtCounter = pht[phtIndex];
if(resolveDir == TAKEN){
pht[phtIndex] = SatIncrement(phtCounter, PHT_CTR_MAX);
}else{
pht[phtIndex] = SatDecrement(phtCounter);
}
// update the GHR
ghr = (ghr << 1);
if(resolveDir == TAKEN){
ghr++;
}
}
This predictor works in the same way as my simple one, except that it uses an array of counters instead of single one. When GetPrediction is called the array is indexed by last 17 bits of resolveDir (branch history, global history register or ghr) that are XORed with PC (adress of current branch). This selects the appropriate counter from array that is then used to do the prediction. UpdatePredictor works the same way, array is indexed and then counter is choosen. Counter is updated with information from resolveDir. Lastly the global history buffer (ghr, branch history, call it what you want) is also updated.
Code of the SatIncrement and SatDecrement functions:
static inline UINT32 SatIncrement(UINT32 x, UINT32 max)
{
if(x<max) return x+1;
return x;
}
static inline UINT32 SatDecrement(UINT32 x)
{
if(x>0) return x-1;
return x;
}
Thanks for help.

The reason the code doesn't work as expected is that SatIncrement and SatDecrement take arguments by-value and return the new value, which then must be assigned back to the variable that is supposed to be incremented/decremented.
SatIncrement(counter, PHT_CTR_MAX);
will pass the value of counter but will not modify counter itself. The return value with the new value is not used and so effectively this line does nothing. The same is true for SatDecrement(counter);.
Therefore your branch predictor never changes state and always returns the same prediction.
Fix it by following the other code example:
counter = SatIncrement(counter, PHT_CTR_MAX);
and
counter = SatDecrement(counter);
Given that this is an exercise you probably cannot change SatIncrement and SatDecrement, however in practice one would probably let these functions take arguments by-reference, so that they can modify the passed variable directly, avoiding the repetition of counter at the call site:
static inline void SatIncrement(UINT32& x, UINT32 max)
{
if(x<max) x++;
}
If the original signature were chosen, then since C++17 one can add the [[nodiscard]] attribute to the function to make the compiler print a warning if the return value is not used:
[[nodiscard]] static inline UINT32 SatIncrement(UINT32 x, UINT32 max)
{
if(x<max) return x+1;
return x;
}
It would have warned you here and made the problem clearer.

Related

How to safely compare two unsigned integer counters?

We have two unsigned counters, and we need to compare them to check for some error conditions:
uint32_t a, b;
// a increased in some conditions
// b increased in some conditions
if (a/2 > b) {
perror("Error happened!");
return -1;
}
The problem is that a and b will overflow some day. If a overflowed, it's still OK. But if b overflowed, it would be a false alarm. How to make this check bulletproof?
I know making a and b uint64_t would delay this false-alarm. but it still could not completely fix this issue.
===============
Let me clarify a little bit: the counters are used to tracking memory allocations, and this problem is found in dmalloc/chunk.c:
#if LOG_PNT_SEEN_COUNT
/*
* We divide by 2 here because realloc which returns the same
* pointer will seen_c += 2. However, it will never be more than
* twice the iteration value. We divide by two to not overflow
* iter_c * 2.
*/
if (slot_p->sa_seen_c / 2 > _dmalloc_iter_c) {
dmalloc_errno = ERROR_SLOT_CORRUPT;
return 0;
}
#endif
I think you misinterpreted the comment in the code:
We divide by two to not overflow iter_c * 2.
No matter where the values are coming from, it is safe to write a/2 but it is not safe to write a*2. Whatever unsigned type you are using, you can always divide a number by two while multiplying may result in overflow.
If the condition would be written like this:
if (slot_p->sa_seen_c > _dmalloc_iter_c * 2) {
then roughly half of the input would cause a wrong condition. That being said, if you worry about counters overflowing, you could wrap them in a class:
class check {
unsigned a = 0;
unsigned b = 0;
bool odd = true;
void normalize() {
auto m = std::min(a,b);
a -= m;
b -= m;
}
public:
void incr_a(){
if (odd) ++a;
odd = !odd;
normalize();
}
void incr_b(){
++b;
normalize();
}
bool check() const { return a > b;}
}
Note that to avoid the overflow completely you have to take additional measures, but if a and b are increased more or less the same amount this might be fine already.
The posted code actually doesn’t seem to use counters that may wrap around.
What the comment in the code is saying is that it is safer to compare a/2 > b instead of a > 2*b because the latter could potentially overflow while the former cannot. This particularly true of the type of a is larger than the type of b.
Note overflows as they occur.
uint32_t a, b;
bool aof = false;
bool bof = false;
if (condition_to_increase_a()) {
a++;
aof = a == 0;
}
if (condition_to_increase_b()) {
b++;
bof = b == 0;
}
if (!bof && a/2 + aof*0x80000000 > b) {
perror("Error happened!");
return -1;
}
Each a, b interdependently have 232 + 1 different states reflecting value and conditional increment. Somehow, more than an uint32_t of information is needed. Could use uint64_t, variant code paths or an auxiliary variable like the bool here.
Normalize the values as soon as they wrap by forcing them both to wrap at the same time. Maintain the difference between the two when they wrap.
Try something like this;
uint32_t a, b;
// a increased in some conditions
// b increased in some conditions
if (a or b is at the maximum value) {
if (a > b)
{
a = a-b; b = 0;
}
else
{
b = b-a; a = 0;
}
}
if (a/2 > b) {
perror("Error happened!");
return -1;
}
If even using 64 bits is not enough, then you need to code your own "var increase" method, instead of overload the ++ operator (which may mess your code if you are not careful).
The method would just reset var to '0' or other some meaningfull value.
If your intention is to ensure that action x happens no more than twice as often as action y, I would suggest doing something like:
uint32_t x_count = 0;
uint32_t scaled_y_count = 0;
void action_x(void)
{
if ((uint32_t)(scaled_y_count - x_count) > 0xFFFF0000u)
fault();
x_count++;
}
void action_y(void)
{
if ((uint32_t)(scaled_y_count - x_count) < 0xFFFF0000u)
scaled_y_count+=2;
}
In many cases, it may be desirable to reduce the constants in the comparison used when incrementing scaled_y_count so as to limit how many action_y operations can be "stored up". The above, however, should work precisely in cases where the operations remain anywhere close to balanced in a 2:1 ratio, even if the number of operations exceeds the range of uint32_t.

Random function call from multiple threads in Qt/C++

I have a multi-thread QT application that sometimes need a random alphanumeric string from one of its threads (some threads start at application startup, others start or die during lifetime), and I would like to obtain that by calling a function defined in a common header, to avoid code replication.
Here there's a code snippet:
QString generateRandomAlphanumericString(int length)
{
qsrand(static_cast<uint>(QTime::currentTime().msec())); //bad
QString randomAS = QString();
static const char alphanum[] =
"0123456789"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
for (int i = 0; i < length; ++i)
randomAS[i] = alphanum[qrand() % (sizeof(alphanum) - 1)];
return randomAS;
}
I initially did some mistakes.
At the beginning I called qsrand(static_cast<uint>(QTime::currentTime().msec())); in the main function, but I've learned that it should be done per-thread.
Then I put the qsrand call in the function above, but it's not correct.
Please consider that at program startup many threads start "together", so if I initialize the seed with current time in msec the seed is the same among them.
Is there a way to change that function accordingly without modify all points in my application where a thread starts its life?
Any implementation done in pure C++ (without the use of QT) is fine. Could the new random C++11 library help in some way to achieve my task?
void InitSeedForThread(uint globalSeed, int myThreadIndex)
{
qsrand(globalSeed);
for (int i = 0; i < myThreadIndex; ++i)
qrand();
}
auto GetRandom(int numThreads)
{
for (int i = 0; i < numThreads - 1)
qrand();
return qrand();
}
Given an ordered list of numbers A, B, C, D, E, F, G, H, ...
splits it into n lists. If n was 4, you would get
1. A, E, I, ...
2. B, F, J, ...
3. C, G, K, ...
4. D, H, L, ...
Con: Doing RNG is somewhat expensive, and you're repeating a lot of work. However, since you're doing QT (UI-bound) I'm assuming that performance isn't an issue.
Alternatively, you could do a global random function with a mutex, but that ain't free either.
I finally found a good solution (thanks everybody who has contributed with comments):
enum ThreadData {TD_SEED};
static QThreadStorage<QHash<ThreadData, uint> *> cache;
inline void insertIntoCache(ThreadData data, uint value)
{
if (!cache.hasLocalData())
cache.setLocalData(new QHash<ThreadData, uint>);
cache.localData()->insert(data, value);
}
inline void removeFromCache(ThreadData data)
{
if (cache.hasLocalData())
cache.localData()->remove(data);
}
inline bool hasInCache(ThreadData data)
{
if (!cache.hasLocalData()) return false;
return cache.localData()->contains(data);
}
inline uint getCachedData(ThreadData data)
{
if (cache.hasLocalData() && cache.localData()->contains(data))
return cache.localData()->value(data);
return 0;
}
inline int getThRandom()
{
uint seed = 0;
if (!hasInCache(TD_SEED))
{
seed = QDateTime::currentMSecsSinceEpoch() % 100000000;
#ifdef Q_OS_WIN
seed += GetCurrentThreadId();
#else
seed += QThread::currentThreadId();
#endif
qsrand(static_cast<uint>(seed));
insertIntoCache(TD_SEED, seed);
}
else {
seed = getCachedData(TD_SEED);
}
return qrand();
}
Basically, as suggested by Igor I've made use of QThreadStorage to store a seed for each thread. I've used an hash for future extensions.
Then, I've made use of QDateTime::currentMSecsSinceEpoch() instead of QTime::currentTime().msec() to have a different number across multiple application starts (if for example the random generated value is stored in a file/db and should be different).
Then, I've add an offset, as suggested by UKMonkey, using the thread ID.
So, my original function will be:
QString generateRandomAlphanumericString(int length)
{
QString randomAS = QString();
static const char alphanum[] =
"0123456789"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
for (int i = 0; i < length; ++i)
randomAS[i] = alphanum[getThRandom() % (sizeof(alphanum) - 1)];
return randomAS;
}
I've run some tests, producing from different threads thousand of alphanumeric strings, storing them to multiple files and double checked for duplicates among them and between multiple application run.

Why local arrays in functions seem to prevent TCO?

Looks like having a local array in your function prevents tail-call optimization on it on all compilers I've checked it on:
int foo(int*);
int tco_test() {
// int arr[5]={1, 2, 3, 4, 5}; // <-- variant 1
// int* arr = new int[5]; // <-- variant 2
int x = foo(arr);
return x > 0 ? tco_test() : x;
}
When variant 1 is active, there is a true call to tco_test() in the end (gcc tries to do some unrolling before, but it still calls the function in the end). Variant 2 does TCO as expected.
Is there something in local arrays which make it impossible to optimize tail calls?
If the compiler sill performed TCO, then all of the external foo(arr) calls would receive the same pointer. That's a visible semantics change, and thus no longer a pure optimization.
The fact that the local variable in question is an array is probably a red herring here; it is its visibility to the outside via a pointer that is important.
Consider this program:
#include <stdio.h>
int *valptr[7], **curptr = valptr, **endptr = valptr + 7;
void reset(void)
{
curptr = valptr;
}
int record(int *ptr)
{
if (curptr >= endptr)
return 1;
*curptr++ = ptr;
return 0;
}
int tally(void)
{
int **pp;
int count = 0;
for (pp = valptr; pp < curptr; pp++)
count += **pp;
return count;
}
int tail_function(int x)
{
return record(&x) ? tally() : tail_function(x + 1);
}
int main(void)
{
printf("tail_function(0) = %d\n", tail_function(0));
return 0;
}
As the tail_function recurses, which it does via a tail call, the record function records the addresses of different instances of the local variable x. When it runs out of room, it returns 1, and that triggers tail_function to call tally and return. tally sweeps through the recorded memory locations and adds their values.
If tally were subject to TCO, then there would just be one instance of x. Effectively, it would be this:
int tail_function(int x)
{
tail:
if (record(&x))
return tally();
x = x + 1;
goto tail;
}
And so now, record is recording the same location over and over again, causing tally to calculate an incorrect value instead of the expected 21.
The logic of record and tally depends on x being actually instantiated on each activation of the scope, and that outer activations of the scope have a lifetime which endures until the inner ones terminate. That requirement precludes tail_function from recursing in constant space; it must allocate separate x instances.

Efficient way to retrieve count of number of times a flag set since last n seconds

I need to track how many times a flag is enabled in last n seconds. Below is the example code I can come up with.StateHandler maintains the value of the flag in active array for last n(360 here) seconds. In my case update function is called from outside every second. So when I need to know how many times it set since last 360 seconds I call getEnabledInLast360Seconds. Is it possible to do it more efficiently like not using an array size of n for booleans ?
#include <map>
#include <iostream>
class StateHandler
{
bool active[360];
int index;
public:
StateHandler() :
index(0),
active()
{
}
void update(bool value)
{
if (index >= 360)
{
index = 0;
}
active[index % 360] = value;
index++;
}
int getEnabledInLast360Seconds()
{
int value = 0;
for (int i = 0; i < 360; i++)
{
if (active[i])
{
value++;
}
}
return value;
}
};
int main()
{
StateHandler handler;
handler.update(true);
handler.update(true);
handler.update(true);
std::cout << handler.getEnabledInLast360Seconds();
}
Yes. Use the fact that numberOfOccurrences(0,360) and numberOfOccurrences(1,361) have 359 common terms. So remember the sum, calculate the common term, and calculate the new sum.
void update(bool value)
{
if (index >= 360)
{
index = 0;
}
// invariant: count reflects t-360...t-1
if (active[index]) count--;
// invariant: count reflects t-359...t-1
active[index] = value;
if (value) count++;
// invariant: count reflects t-359...t
index++;
}
(Note that the if block resetting index removes the need for the modulo operator % so I removed that)
Another approach would be to use subset sums:
subsum[0] = count(0...19)
subsum[1] = count(20...39)
subsum[17] = count(340...359)
Now you only have to add 18 numbers each time, and you can entirely replace a subsum every 20 seconds.
Instead of fixing the buffer, you can simply use std::set<timestamp> (Or perhaps std::queue). Every time you check, pop off the elements older than 360s and count the remaining ones.
If you check scarcely but update often, you might want to add the "popping" to the update itself, to prevent the set from growing too big.

c++ method sometimes returns unexpected high values

I've traced a bug down to a function which should be returning float values between 20 and 100 or so, but is sometimes (1 time in 10) returning values much much higher than that. The problem exists when I have an expression in the last line of the method, like this:
return snap(baseNumber, targets) + (octave * NOTES_PER_OCTAVE);
If I store the return value in a variable first, then return that variable, the problem goes away:
float ret = snap(baseNumber, targets) + (octave * NOTES_PER_OCTAVE);
return ret;
Here's the complete method:
static inline float octaveSnap(float number, std::vector<float>* targets){
static const int NOTES_PER_OCTAVE = 12;
int octave = number / NOTES_PER_OCTAVE;
float baseNumber = number - (octave * NOTES_PER_OCTAVE);
float ret = snap(baseNumber, targets) + (octave * NOTES_PER_OCTAVE);
return ret;
}
and here's 'snap':
// given a single value and a list of values (a scale), return the member of the list which is closest to the single value
static inline float snap(float number, std::vector<float>* targets){
float ret;
float leastDistance = -1;
for(int i = 0; i<targets->size(); i++){
float distance = targets->at(i) - number;
if(distance < 0){
distance = -distance;
}
if(leastDistance == -1){
leastDistance = distance;
}
if(distance < leastDistance){
leastDistance = distance;
ret = targets->at(i);
}
}
return ret;
}
I'm completely baffled by this. Any idea why the first explodes and the second works perfectly?
My psychic debugging powers tell me that when you use the temp variable the problem only appears to go away and that either you're accidentally doing targets[<foo>] inside snap or you use it correctly but rarely run off the end, returning garbage.
EDIT for comment:
I should elaborate a bit: targets is a pointer to vector so using [] on it will select one of several vectors, NOT elements from the vector. That said I can't understand how you could call .at on such a pointer, so I suspect the code in your program is not the code you showed us.
In snap() the local variable ret is never initialized so if the input vector is either zero-sized or the "found" element is the first one then your return value is unspecified.
Try modifying snap to be:
static inline float snap(float number, std::vector<float>* targets){
float ret = 0;
float leastDistance = -1;
for(int i = 0; i<targets->size(); i++){
float distance = targets->at(i) - number;
if(distance < 0){
distance = -distance;
}
if(leastDistance == -1){
leastDistance = distance;
ret = targets->at(i);
}
else if(distance < leastDistance){
leastDistance = distance;
ret = targets->at(i);
}
}
return ret;
}
and see if that fixes things.
Edit: I realized this doesn't address why adding a temporary variable appears to fix things in the original question. The uninitialized ret will probably take on whatever value is left on the stack: this, of course, is unspecified and system/platform dependent. When a new local variable is added to store the result of snap(), however, this shifts the stack such that ret has a different position, most likely, a different uninitialized value. The return result is still "wrong" but it may simply appear "less wrong" due to whatever uninitialized value ret has.