Exceeding built-in data type ranges - c++

I have been using C++ for quite some time by now and literally took things for granted.
Recently, I asked myself how can the compiler return accurate values{always} when I use out of range values for calculation.
I understand the 2^n{n = bits} concept.
For example: If I would like to add two int's which are out of range such as:
10e6, I would expect the compiler to return a result that is wrong as the bits are overwritten and ultimately represent a wrong integer. But this is never seen to happen.
Can anyone shed some light over this.
Thanks.

Related

Detecting bad input for `boost::math::tools::brent_find_minima()`

This documentation page of boost::math::tools::brent_find_minima says about its first argument:
The function to minimise: a function object (or C++ lambda) ... with no maxima occurring in that interval.
But what happens if this is not the case? (After all, this condition is rather difficult to pre-ensure, especially since the function is usually expensive to evaluate at many points.) Best would be to detect violations to this condition on the fly.
If this condition is violated, does boost throw an exception, or does it exhibit undefined behavior?
A workaround I am thinking of is to build the checking into the lambda ("function to minimize"), by capturing and maintaining a std::map<double,double> holding all the points that have been evaluated, and comparing each new evaluation with its nearest neighbor in each direction, to check whether there may be a local maximum. But I don't want to do all that if it isn't necessary.
There is no way for this to be done. If you read Corless's A Graduate Introduction to Numerical Methods, you'll read a very interesting point: All numerically defined functions are discontinuous halfway between representables, and have zero derivatives between representables. Basically they can be thought of as a sum of Heaviside functions.
So none of them are differentiable in the mathematical sense. Ok, maybe you think this is a bit unfair-the scale should be zoomed out. But how much? We know that |x-1| isn't differentiable at x=1, but how could a computer tell that? How does it know that there isn't some locally smooth mollifier that makes it differentiable between x=1-eps and x=1+eps? I don't think there's a good answer to this question.
One of the most difficult problems in this class arises in quadrature. Some of these methods work fast when the complex extension of the function has poles far from the real axis. Try to numerically determine that.
Function spaces are impossible to determine numerically. Users just have to get it right.

Which to use? int32_t vs uint32_t [duplicate]

When is it appropriate to use an unsigned variable over a signed one? What about in a for loop?
I hear a lot of opinions about this and I wanted to see if there was anything resembling a consensus.
for (unsigned int i = 0; i < someThing.length(); i++) {
SomeThing var = someThing.at(i);
// You get the idea.
}
I know Java doesn't have unsigned values, and that must have been a concious decision on Sun Microsystems' part.
I was glad to find a good conversation on this subject, as I hadn't really given it much thought before.
In summary, signed is a good general choice - even when you're dead sure all the numbers are positive - if you're going to do arithmetic on the variable (like in a typical for loop case).
unsigned starts to make more sense when:
You're going to do bitwise things like masks, or
You're desperate to to take advantage of the sign bit for that extra positive range .
Personally, I like signed because I don't trust myself to stay consistent and avoid mixing the two types (like the article warns against).
In your example above, when 'i' will always be positive and a higher range would be beneficial, unsigned would be useful. Like if you're using 'declare' statements, such as:
#declare BIT1 (unsigned int 1)
#declare BIT32 (unsigned int reallybignumber)
Especially when these values will never change.
However, if you're doing an accounting program where the people are irresponsible with their money and are constantly in the red, you will most definitely want to use 'signed'.
I do agree with saint though that a good rule of thumb is to use signed, which C actually defaults to, so you're covered.
I would think that if your business case dictates that a negative number is invalid, you would want to have an error shown or thrown.
With that in mind, I only just recently found out about unsigned integers while working on a project processing data in a binary file and storing the data into a database. I was purposely "corrupting" the binary data, and ended up getting negative values instead of an expected error. I found that even though the value converted, the value was not valid for my business case.
My program did not error, and I ended up getting wrong data into the database. It would have been better if I had used uint and had the program fail.
C and C++ compilers will generate a warning when you compare signed and unsigned types; in your example code, you couldn't make your loop variable unsigned and have the compiler generate code without warnings (assuming said warnings were turned on).
Naturally, you're compiling with warnings turned all the way up, right?
And, have you considered compiling with "treat warnings as errors" to take it that one step further?
The downside with using signed numbers is that there's a temptation to overload them so that, for example, the values 0->n are the menu selection, and -1 means nothing's selected - rather than creating a class that has two variables, one to indicate if something is selected and another to store what that selection is. Before you know it, you're testing for negative one all over the place and the compiler is complaining about how you're wanting to compare the menu selection against the number of menu selections you have - but that's dangerous because they're different types. So don't do that.
size_t is often a good choice for this, or size_type if you're using an STL class.

How to store doubles in memory

Recently I changed some code
double d0, d1;
// ... assign things to d0/d1 ...
double result = f(d0, d1)
to
double d[2];
// ... assign things to d[0]/d[1]
double result = f(d[0], d[1]);
I did not change any of the assignments to d, nor the calculations in f, nor anything else apart from the fact that the doubles are now stored in a fixed-length array.
However when compiling in release mode, with optimizations on, result changed.
My question is, why, and what should I know about how I should store doubles? Is one way more efficient, or better, than the other? Are there memory alignment issues? I'm looking for any information that would help me understand what's going on.
EDIT: I will try to get some code demonstrating the problem, however this is quite hard as the process that these numbers go through is huge (a lot of maths, numerical solvers, etc.).
However there is no change when compiled in Debug. I will double check this again to make sure but this is almost certain, i.e. the double values are identical in Debug between version 1 and version 2.
Comparing Debug to Release, results have never ever been the same between the two compilation modes, for various optimization reasons.
You probably have a 'fast math' compiler switch turned on, or are doing something in the "assign things" (which we can't see) which allows the compiler to legally reorder calculations. Even though the sequences are equivalent, it's likely the optimizer is treating them differently, so you end up with slightly different code generation. If it's reordered, you end up with slight differences in the least significant bits. Such is life with floating point.
You can prevent this by not using 'fast math' (if that's turned on), or forcing ordering thru the way you construct the formulas and intermediate values. Even that's hard (impossible?) to guarantee. The question is really "Why is the compiler generating different code for arrays vs numbered variables?", but that's basically an analysis of the code generator.
no these are equivalent - you have something else wrong.
Check the /fp:precise flags (or equivalent) the processor floating point hardware can run in more accuracy or more speed mode - it may have a different default in an optimized build
With regard to floating-point semantics, these are equivalent. However, it is conceivable that the compiler might decide to generate slightly different code sequences for the two, and that could result in differences in the result.
Can you post a complete code example that illustrates the difference? Without that to go on, anything anyone posts as an answer is just speculation.
To your concerns: memory alignment cannot effect the value of a double, and a compiler should be able to generate equivalent code for either example, so you don't need to worry that you're doing something wrong (at least, not in the limited example you posted).
The first way is more efficient, in a very theoretical way. It gives the compiler slightly more leeway in assigning stack slots and registers. In the second example, the compiler has to pick 2 consecutive slots - except of course if the compiler is smart enough to realize that you'd never notice.
It's quite possible that the double[2] causes the array to be allocated as two adjacent stack slots where it wasn't before, and that in turn can cause code reordering to improve memory access efficiency. IEEE754 floating point math doesn't obey the regular math rules, i.e. a+b+c != c+b+a

Magic Numbers In Arrays? - C++

I'm a fairly new programmer, and I apologize if this information is easily available out there, I just haven't been able to find it yet.
Here's my question:
Is is considered magic numbers when you use a literal number to access a specific element of an array?
For example:
arrayOfNumbers[6] // Is six a magic number in this case?
I ask this question because one of my professors is adamant that all literal numbers in a program are magic numbers. It would be nice for me just to access an element of an array using a real number, instead of using a named constant for each element.
Thanks!
That really depends on the context. If you have code like this:
arr[0] = "Long";
arr[1] = "sentence";
arr[2] = "as";
arr[3] = "array.";
...then 0..3 are not considered magic numbers. However, if you have:
int doStuff()
{
return my_global_array[6];
}
...then 6 is definitively a magic number.
It's pretty magic.
I mean, why are you accessing the 6th element? What's are the semantics that should be applied to that number? As it stands all we know is "the 6th (zero-based) number". If we knew the declaration of arrayOfNumbers we would further know its type (e.g. an int or a double).
But if you said:
arrayOfNumbers[kDistanceToSaturn];
...now it has much more meaning to someone reading the code.
In general one iterates over an array, performing some operation on each element, because one doesn't know how long the array is and you can't just access it in a hardcoded manner.
However, sometimes array elements have specific meanings, for example, in graphics programming. Sometimes an array is always the same size because the data demands it (e.g. certain transform matrices). In these cases it may or may not be okay to access the specific element by number: domain experts will know what you're doing, but generalists probably won't. Giving the magic index number a name makes it more obvious to those who have to maintain your code, and helps you to prevent typing the wrong one accidentally.
In my example above I assumed your array holds distances from the sun to a planet. The sun would be the zeroth element, thus arrayOfNumbers[kDistanceToSun] = 0. Then as you increment, each element contains the distance to the next farthest planet: mercury, venus, etc. This is much more readable than just typing the number of the planet you want. In this case the array is of a fixed size because there are a fixed number of planets (well, except the whole Pluto debacle).
The other problem is that "arrayOfNumbers" tells us nothing about the contents of the array. We already know its an array of numbers because we saw the declaration somewhere where you said int arrayOfNumers[12345]; or however you declared it. Instead, something like:
int distanceToPlanetsFromSol[kNumberOfPlanets];
...gives us a much better idea of what the data actually is and what its semantics are. One of your goals as a programmer should be to write code that is self-documenting in this manner.
And then we can argue elsewhere if kNumberOfPlanets should be 8 or 9. :)
You should ask yourself why are you accessing that particular position. In this case, I assume that if you are doing arrayOfNumbers[6] the sixth position has some special meaning. If you think what's that meaning, you probably realize that it's a magic number hiding that.
another way to look at it:
What if after some chance the program needs to access 7th element instead of 6th? HOw would you or a maintainer know that? If for example if the 6th entry is the count of trees in CA it would be a good thing to put
#define CA_STATE_ENTRY 6
Then if now the table is reordered somebody can see that they need to change this to 9 (say). BTW I am not saying this is the best way to maintain an array for tree counts by state - it probably isnt.
Likewise, if later people want to change the program to deal with trees in oregon, then they know to replace
trees[CA_STATE_ENTRY]
with
trees[OR_STATE_ENTRY]
The point is
trees[6]
is not self-documenting
Of course for c++ it should be an enum not a #define
You'd have to provide more context for a meaningful answer. Not all literal numbers are magic, but many are. In a case like that there is no way at all to tell for sure, though most cases I can think of off-hand with an explicit array index >>1 probably qualify as magic.
Not all literals in a program really qualify as "magic numbers" -- but this one certainly seems to. The 6 gives us no clue of why you're accessing that particular element of the array.
To not be a magic number, you need its meaning to be quite clear even on first examination (or at least minimal examination) why that value is being used. Just for example, a lot of code will do things like: &x[0]. In this case, it's typically pretty clear that the '0' really just means "the beginning of the array."
If you need to access a particular element of the array, chances are you're doing it wrong.
You should almost always be iterating over the entire array.
It's only not a magic number if your program is doing something very special involving the number six specifically. Could you provide some context?
That's the problem with professors, they're often too academic. In theory he's right, as usual, but usually magic numbers are used in a stricter context, when the number is embedded in a data stream, allowing you to detect certain properties of the stream (like the signature header of a file type for instance).
See also this Wikipedia entry.
Usually not all constant values in software are called magic numbers.
A java class files always starts with the hex value 0xcafebabe a windows .exe
file with MZ 0x4d, 0x5a , this allows you quickly (but not for sure) to identify
the content of a binary file.
In a MISRA compliant system, all values except 0 and 1 are considered magic numbers. My opinion has always been if the constant value is obvious or likely won't change then leave it as a number. If in doubt create a unique constant since long term maintenance will be easier.

How does this C++ function use memoization?

#include <vector>
std::vector<long int> as;
long int a(size_t n){
if(n==1) return 1;
if(n==2) return -2;
if(as.size()<n+1)
as.resize(n+1);
if(as[n]<=0)
{
as[n]=-4*a(n-1)-4*a(n-2);
}
return mod(as[n], 65535);
}
The above code sample using memoization to calculate a recursive formula based on some input n. I know that this uses memoization, because I have written a purely recursive function that uses the same formula, but this one much, much faster for much larger values of n. I've never used vectors before, but I've done some research and I understand the concept of them. I understand that memoization is supposed to store each calculated value, so that instead of performing the same calculations over again, it can simply retrieve ones that have already been calculated.
My question is: how is this memoization, and how does it work? I can't seem to see in the code at which point it checks to see if a value for n already exists. Also, I don't understand the purpose of the if(as[n]<=0). This formula can yield positive and negative values, so I'm not sure what this check is looking for.
Thank you, I think I'm close to understanding how this works, it's actually a bit more simple than I was thinking it was.
I do not think the values in the sequence can ever be 0, so this should work for me, as I think n has to start at 1.
However, if zero was a viable number in my sequence, what is another way I could solve it? For example, what if five could never appear? Would I just need to fill my vector with fives?
Edit: Wow, I got a lot of other responses while checking code and typing this one. Thanks for the help everyone, I think I understand it now.
if (as[n] <= 0) is the check. If valid values can be negative like you say, then you need a different sentinel to check against. Can valid values ever be zero? If not, then just make the test if (as[n] == 0). This makes your code easier to write, because by default vectors of ints are filled with zeroes.
The code appears to be incorrectly checking is (as[n] <= 0), and recalculates the negative values of the function(which appear to be approximately every other value). This makes the work scale linearly with n instead of 2^n with the recursive solution, so it runs a lot faster.
Still, a better check would be to test if (as[n] == 0), which appears to run 3x faster on my system. Even if the function can return 0, a 0 value just means it will take slightly longer to compute (although if 0 is a frequent return value, you might want to consider a separate vector that flags whether the value has been computed or not instead of using a single vector to store the function's value and whether it has been computed)
If the formula can yield both positive and negative values then this function has a serious bug. The check if(as[n]<=0) is supposed to be checking if it had already cached this value of computation. But if the formula can be negative this function recalculates this cached value alot...
What it really probably wanted was a vector<pair<bool, unsigned> >, where the bool says if the value has been calculated or not.
The code, as posted, only memoizes about 40% of the time (precisely when the remembered value is positive). As Chris Jester-Young pointed out, a correct implementation would instead check if(as[n]==0). Alternatively, one can change the memoization code itself to read as[n]=mod(-4*a(n-1)-4*a(n-2),65535);
(Even the ==0 check would spend effort when the memoized value was 0. Luckily, in your case, this never happens!)
There's a bug in this code. It will continue to recalculate the values of as[n] for as[n] <= 0. It will memoize the values of a that turn out to be positive. It works a lot faster than code without the memoization because there are enough positive values of as[] so that the recursion is terminated quickly. You could improve this by using a value of greater than 65535 as a sentinal. The new values of the vector are initialized to zero when the vector expands.