Why is 3<–1 in code? - c++

Take a look at the following code:
int start = 3;
vector<int> data;
data.push_back(0);
data.push_back(0);
for (int i=start; i<data.size()-start; i++)
printf("In...\n");
When running the above code, it will run printf("In...\n"); infinitely. Although based on the condition (3<-1) of the for loop, it should never do this. Weird, huh?
To avoid this, you have to compute the long condition equation first, like:
… …
int end = data.size()-start;
for (int i=start; i<end; i++)
printf("In...\n");
Why this happens?

size() returns an unsigned value (of type size_t) which causes the expression on the right of the comparison to be promoted to unsigned which then makes the comparison unsigned.
So there are no negative numbers where you think there are, just very large positive ones.
As other people have said, most compilers will warn you about this if you turn up the warning level, and c++ is not a language that can safely be used at a low warning level.

Related

Using `size_t` for lengths impacts on compiler optimizations?

While reading this question, I've seen the first comment saying that:
size_t for length is not a great idea, the proper types are signed ones for optimization/UB reasons.
followed by another comment supporting the reasoning. Is it true?
The question is important, because if I were to write e.g. a matrix library, the image dimensions could be size_t, just to avoid checking if they are negative. But then all loops would naturally use size_t. Could this impact on optimization?
size_t being unsigned is mostly an historical accident - if your world is 16 bit, going from 32767 to 65535 maximum object size is a big win; in current-day mainstream computing (where 64 and 32 bit are the norm) the fact that size_t is unsigned is mostly a nuisance.
Although unsigned types have less undefined behavior (as wraparound is guaranteed), the fact that they have mostly "bitfield" semantics is often cause of bugs and other bad surprises; in particular:
difference between unsigned values is unsigned as well, with the usual wraparound semantics, so if you may expect a negative value you have to cast beforehand;
unsigned a = 10, b = 20;
// prints UINT_MAX-10, i.e. 4294967286 if unsigned is 32 bit
std::cout << a-b << "\n";
more in general, in signed/unsigned comparisons and mathematical operations unsigned wins (so the signed value is casted to unsigned implicitly) which, again, leads to surprises;
unsigned a = 10;
int b = -2;
if(a < b) std::cout<<"a < b\n"; // prints "a < b"
in common situations (e.g. iterating backwards) the unsigned semantics are often problematic, as you'd like the index to go negative for the boundary condition
// This works fine if T is signed, loops forever if T is unsigned
for(T idx = c.size() - 1; idx >= 0; idx--) {
// ...
}
Also, the fact that an unsigned value cannot assume a negative value is mostly a strawman; you may avoid checking for negative values, but due to implicit signed-unsigned conversions it won't stop any error - you are just shifting the blame. If the user passes a negative value to your library function taking a size_t, it will just become a very big number, which will be just as wrong if not worse.
int sum_arr(int *arr, unsigned len) {
int ret = 0;
for(unsigned i = 0; i < len; ++i) {
ret += arr[i];
}
return ret;
}
// compiles successfully and overflows the array; it len was signed,
// it would just return 0
sum_arr(some_array, -10);
For the optimization part: the advantages of signed types in this regard are overrated; yes, the compiler can assume that overflow will never happen, so it can be extra smart in some situations, but generally this won't be game-changing (as in general wraparound semantics comes "for free" on current day architectures); most importantly, as usual if your profiler finds that a particular zone is a bottleneck you can modify just it to make it go faster (including switching types locally to make the compiler generate better code, if you find it advantageous).
Long story short: I'd go for signed, not for performance reasons, but because the semantics is generally way less surprising/hostile in most common scenarios.
That comment is simply wrong. When working with native pointer-sized operands on any reasonable architectute, there is no difference at the machine level between signed and unsigned offsets, and thus no room for them to have different performance properties.
As you've noted, use of size_t has some nice properties like not having to account for the possibility that a value might be negative (although accounting for it might be as simple as forbidding that in your interface contract). It also ensures that you can handle any size that a caller is requesting using the standard type for sizes/counts, without truncation or bounds checks. On the other hand, it precludes using the same type for index-offsets when the offset might need to be negative, and in some ways makes it difficult to perform certain types of comparisons (you have to write them arranged algebraically so that neither side is negative), but the same issue comes up when using signed types, in that you have to do algebraic rearrangements to ensure that no subexpression can overflow.
Ultimately you should initially always use the type that makes sense semantically to you, rather than trying to choose a type for performance properties. Only if there's a serious measured performance problem that looks like it might be improved by tradeoffs involving choice of types should you consider changing them.
I stand by my comment.
There is a simple way to check this: checking what the compiler generates.
void test1(double* data, size_t size)
{
for(size_t i = 0; i < size; i += 4)
{
data[i] = 0;
data[i+1] = 1;
data[i+2] = 2;
data[i+3] = 3;
}
}
void test2(double* data, int size)
{
for(int i = 0; i < size; i += 4)
{
data[i] = 0;
data[i+1] = 1;
data[i+2] = 2;
data[i+3] = 3;
}
}
So what does the compiler generate? I would expect loop unrolling, SIMD... for something that simple:
Let's check godbolt.
Well, the signed version has unrolling, SIMD, not the unsigned one.
I'm not going to show any benchmark, because in this example, the bottleneck is going to be on memory access, not on CPU computation. But you get the idea.
Second example, just keep the first assignment:
void test1(double* data, size_t size)
{
for(size_t i = 0; i < size; i += 4)
{
data[i] = 0;
}
}
void test2(double* data, int size)
{
for(int i = 0; i < size; i += 4)
{
data[i] = 0;
}
}
As you want gcc
OK, not as impressive as for clang, but it still generates different code.

Solving Equality Equation in c++

I want to calculate maximum value(int) of i for which (i*(i+1)(2i+1))/3 < 4,294,967,295 (int limit).
int main()
{
unsigned int i=1;
unsigned int l=std::numeric_limits<unsigned int>::max();
while(l>((i*(i+1)*(2*i+1))/3))
{
i++;
}
cout<<(i-1);getchar();return 0;
}
Your problem is caused comparing unsigned int l to an expression casted to int, this gives undefined results. In the second case the inner expression is all evaluated to an unsigned int and casted to int after evaluation (with a loss of precision that might cut the positive value). In your first case the nominator of the division function is casted to int before the division applies.
You should better write your condition like this, or even better omit the cast at all (there's no single float or double math operation done in your expression, you're dealing solely with unsigned int):
while(l>(unsigned int)(i*(i+1)*(2*i+1))/3) { // ...
// ^^^^^^^^
If you do so, you'll always experience your loop running endlessly or very long. IMHO it makes no sense, to check if the result of the condition expression might be bigger than std::numeric_limits<unsigned int>::max(), it cannot be bigger.
This code will not give you the correct answer. The calculation can be rewritten as (i*(i+1)*(2*i+1)) < 3 * 4,294,967,295, now consider what that means about the calculation of the left hand side.
The inequality appearing in while loop is of order 3. this type of curve has very high slope,meaning small change in co-ordinate produces huge amount in y. while loop soon encounter in comparison of unsigned int and overflow of i, thus gives never ending loop(Yes never ending, i tried).
The solution is simple. break the in-equality in logarithm. Now the 3rd order polynomial is linear of log. Eventually it worked.

C/C++ use of int or unsigned int

In a lot of code examples, source code, libraries etc. I see the use of int when as far as I can see, an unsigned int would make much more sense.
One place I see this a lot is in for loops. See below example:
for(int i = 0; i < length; i++)
{
// Do Stuff
}
Why on earth would you use an int rather than an unsigned int? Is it just laziness - people can't be bothered with typing unsigned?
Using unsigned can introduce programming errors that are hard to spot, and it's usually better to use signed int just to avoid them. One example would be when you decide to iterate backwards rather than forwards and write this:
for (unsigned i = 5; i >= 0; i--) {
printf("%d\n", i);
}
Another would be if you do some math inside the loop:
for (unsigned i = 0; i < 10; i++) {
for (unsigned j = 0; j < 10; j++) {
if (i - j >= 4) printf("%d %d\n", i, j);
}
}
Using unsigned introduces the potential for these sorts of bugs, and there's not really any upside.
It's generally laziness or lack of understanding.
I aways use unsigned int when the value should not be negative. That also serves the documentation purpose of specifying what the correct values should be.
IMHO, the assertion that it is safer to use "int" than "unsigned int" is simply wrong and a bad programming practice.
If you have used Ada or Pascal you'd be accustomed to using the even safer practice of specifying specific ranges for values (e.g., an integer that can only be 1, 2, 3, 4, 5).
If length is also int, then you should use the same integer type, otherwise weird things happen when you mix signed and unsigned types in a comparison statement. Most compilers will give you a warning.
You could go on to ask, why should length be signed? Well, that's probably historical.
Also, if you decide to reverse the loop, ie
for(int i=length-1;i>=0 ;i--)
{
// do stuff
}
the logic breaks if you use unsigned ints.
I chose to be as explicit as possible while programming. That is, if I intend to use a variable whose value is always positive, then unsigned is used. Many here mention "hard to spot bugs" but few give examples. Consider the following advocate example for using unsigned, unlike most posts here:
enum num_things {
THINGA = 0,
THINGB,
THINGC,
NUM_THINGS
};
int unsafe_function(int thing_ID){
if(thing_ID >= NUM_THINGS)
return -1;
...
}
int safe_function(unsigned int thing_ID){
if(thing_ID >= NUM_THINGS)
return -1;
...
}
int other_safe_function(int thing_ID){
if((thing_ID >=0 ) && (thing_ID >= NUM_THINGS))
return -1;
...
}
/* Error not caught */
unsafe_function(-1);
/* Error is caught */
safe_function((unsigned int)-1);
In the above example, what happens if a negative value is passed in as thing_ID? In the first case, you'll find that the negative value is not greater than or equal to NUM_THINGS, and so the function will continue executing.
In the second case, you'll actually catch this at run-time because the signedness of thing_ID forces the conditional to execute an unsigned comparison.
Of course, you could do something like other_safe_function, but this seems more of a kludge to use signed integers rather than being more explicit and using unsigned to begin with.
I think the most important reason is if you choose unsigned int, you can get some logical errors. In fact, you often do not need the range of unsigned int, using int is safer.
this tiny code is usecase related, if you call some vector element then the prototype is int but there're much modern ways to do it in c++ eg. for(const auto &v : vec) {} or iterators, in some calculcation if there's no substracting/reaching a negative number you can and should use unsigned (explains better the range of values expected), sometimes as many posted examples here shows you actually need int but the truth is it's all about usecase and situation, no one strict rule apply to all usecases and it would be kinda dumb to force one over...

C++ for loop structure

Hope its not a lame question but I have to ask this :)
When I program in C++ and use for loops the parameters I give are i.e.
for(int i = 0; i< something; i++)
Which is correct way forward but..this gives me compile warnings such as this:
1>c:\main.cpp(185): warning C4018: '<' : signed/unsigned mismatch
Now going through books and reading online most for loops examples are of this structure.
I was always ignoring warnings as my programs always worked and did what they suppose to do, until I got interested with this warnings and did a small research....by copying this Waring and Google it to find that it is better if I use this structure to avoid the warning:
for(vector<int>::size_type i= 0; i < something; i++ )
Now my question here is why......if the initial structure works and is described and documented in many books and online resources.
Also what is the benefit or is there any significant difference in the techniques.....?
Why would I use this
for(vector<int>::size_type i= 0; i < something; i++ )
apart from getting rid of the warnings.....?
Don't ignore the warnings. They're trying to tell you something.
I suspect something is unsigned.
If you have
unsigned int something = 0;
something--; // Now something is a really large positive integer, not -1
If you ignore the warnings, and you don't have your compiler set to treat warnings as errors, then this will compile fine, but you won't get what you expect.
You're probably seeing that vector<int>::size_type is an unsigned int if the warning goes away.
You simply have a signed / unsigned mismatch between the type of i and the type of something in your statement:
for(int i = 0; i < something; i++)
So this has nothing to do with the for structure, but rather with the comparison.
bool b = i < something;
would give you the same warnings.
This can be the case if you use int i and compare it to a size_t variable somehow (which is what std::vector::size() gives you).
So, to fix it, simply change your for loop to using the same type for i and for something, such as:
for(size_t i = 0; i < something; i++)
if something is of type size_t.
Why would I use this
Because signed int and unsigned values like size_t have differing ranges, and you may not get your expected result if one contains a value that can not be represented by the other.
That said, if you think that code is too verbose, you don't have to use it.
Code like this:
for(vector<int>::size_type i= 0; i < myvector.size(); i++ )
{
int val = myvector[i];
Can also be written like this.
for ( int val : myvector )
Broadly, there are two kinds of integral types in C++: signed and unsigned. For each size of integer, there is a signed and an unsigned version. The difference is in their range: signed integers of n bits have a range from −2n − 1 to +2n − 1 − 1; unsigned integers, from 0 to 2n − 1.
When comparing signed integer types to unsigned, the signed value is converted to unsigned; negative values will wrap and be treated as large positive values. The upshot of this is that comparisons with < might not do what you expect, so many compilers will warn about such comparisons.
For example, 1u < -1 is true. u is a suffix that tells the compiler to treat the 1 as an unsigned int value.
With that, the meaning becomes clear: int is a signed type and vector<T>::size_type is an unsigned type. Since the result of vector<T>::size() is vector<T>::size_type, you want to use that or another unsigned type such as size_t to ensure that your comparisons have the behaviour you want.
Instead of using indices, you can also use iterators, which don’t have such conversion problems:
for (vector<int>::iterator i = v.begin(); i != v.end(); ++i)
cout << *i << '\n';
Which can be made more succinct with auto in C++11:
for (auto i = v.begin(); i != v.end(); ++i)
cout << *i << '\n';
If you’re just iterating over the whole container, use C++11 range-based for:
for (int i : v)
cout << i << '\n';
And if you want to modify the values, use a reference:
for (int& i : v)
++i;
something must be int, otherwise you get the warning. Or i must be unsigned int, depending on your needs.
Assuming a 32 bits integers, if signed any value above 0x7FFFFFFF (2,147,483,647 decimal) will be interpreted as negative, whereas it will be positive for an unsigned int.
So the compiler is issuing a warning telling you that comparison mail result in unexpected outcome.
32 bits integers range from −2,147,483,648 to 2,147,483,647.
32 bits unsigned integers range from 0 to 4,294,967,295

Finding the Sum of 2D vector

Having some trouble finding the sum of a 2D vector. Does this look ok?
int sumOfElements(vector<iniMatrix> &theBlocks)
{
int theSum = 0;
for(unsigned i=0; (i < theBlocks.size()); i++)
{
for(unsigned j=0; (j < theBlocks[i].size()); j++)
{
theSum +=theBlocks[i][j];
}
}
return theSum;
}
It returns a negative number, however, it should return a positive number..
Hope someone can help :)
The code looks proper in an abstract sense, but you may be overflowing theSum. You can try making theSum type double to see what value you get to help sort out the proper integral type to use for it.
double sumOfElements(vector<iniMatrix> &theBlocks)
{
double theSum = 0;
/* ... */
return theSum;
}
When you observe the returned value, you can see if it would fit in an int or if you need to use a wider long or long long type.
If all the values in the matrix are positive, you should consider using one of the unsigned integral types. which would double your range of allowed values.
The problem is obviously the int exceeds its boundary (like others said)
For signed data types it becomes negative when overflow, and for unsigned datatypes it starts from zero again after overflow.
If you want to detect overflow pragmatically, you can paste these lines instead of the additional line.
if( theSum > int(theSum + theBlocks[i][j]) )
//print error message, throw exception, break, ...
break;
else
theSum += theBlocks[i][j];
For more generic solution to work with more data types and more operations than addition, check this: How to detect integer overflow?
A solution would be using unsigned long long and if it exceeds its boundary too, you need to use third party libraries for big integers.
Like Mokhtar Ashour says, it's may be that the variable theSum overflows. Try making it either unsigned if no numbers are negative, or change the type from int (which is 32 bits) to long long (which is 64 bits).
I think it may be int overflow problem. to make sure, you may insert a condition after the inner loop finishes to see if your result exceeds the int range.
if(result>sizeof(int))
cout<<"hitting boundaries";
a better way to test if you exceed the int boundaries is to print the result after the inner loop ends and notice the result.
.if so, just use a bigger data type.