C++ for-loop condition - c++

I want to know why this loops runs even when result.bad_matches.size()=0
for (int i = 1; i <= result.badmatches.size() - 1; i++)
{
...
}
Also, is there any other way I could stop it from running when badmatches size is 0 without using an if condition?

This depends on the type size() returns. It is probably a standard container and thus will be an unsigned type and those types wrap around on overflow. That means it the result of subtracting one will be the maximum value of that type.
Either use a comparison that doesn't require you to subtract from the size (<, !=) or just use iterators or a for-auto loop. Under any circumstance you should at least use the same type for iterating as the nested size_type of the container and not int.
for(auto& x : result.badmatches) {
// ...
}

use while(result.badmatches.size()) to NOT execute it.
result.badmatches.size()-1 this will be converted to -1. If its an unsigned integer, then -1 is interpreted as 0xFFFFFFFF(on a 32 bit machine). This will make the loop run for 2^32 or 2^64 times. To avoid this, use while() as before IF you're certain that result.badmatches.size() will return 0.

size must be returning an unsigned so 0-1 is getting upgraded to unsigned and so is the left value.
So for int size of 4 bytes, -1 will be represented as 2^32 -1 in unsigned int.
If you don't want this behavior then just cast it like this : static_cast <signed int > (result.badmatches.size());
PS: I've not touched C++ for past 4 years pl. excuse little mistakes.
The right way is:
for (int i=0;i< result.badmatches.size() ;++i)
{
}

If you specifically don't want this loop to enter when the sise of the collection is zero then you could check for ! badmatches.empty() assuming that badmatches is an STL container. However, if you structure your code slightly differently, you'll probably overcome this issue without having to do that:
for (size_t i=0; i < result.badmatches.size(); i++)
{
}
I've changed the int to size_t which is the same type that size() returns (an unsigned integer), changed the initial value to 0 and the comparison so that it will exit if i >= result.badmatches.size() Generally, I'd say that this is the clearest way of presenting an indexed approach as it matches the natural indexing of collections and if you need 1, 2, 3 ... rather than 0, 1, 2 in your loop, then you can address that within it.
If you're still having problems, two questions:
Is there anything in your loop that might alter the value of result.badmatches.size()?
Is your code multithreaded with a possibility that result.badmatches.size() could change by actions on another thread?

After understanding the problem explained by #Prototype Stark #Aga , i came to a more simpler solution , using which i can keep my initial index to 1 .
for(int i=1;i+1<=result.badmatches.size();i++)
Thanks for all the help , it's much clearer now .

Related

For Loop Exit Condition (size_t vs. int) [duplicate]

This question already has answers here:
What's the best way to do a reverse 'for' loop with an unsigned index?
(20 answers)
Closed 7 years ago.
When I put the following in my program:
for (size_t i = VectorOfStructs.size()-1; i > 0; i--)
It works correctly but does "i" will never equal 0.
So, I cannot access the first element (VectorOfStructs[0]).
If I change it to:
for (size_t i = VectorOfStructs.size()-1; i > -1; i--)
The program doesn't even enter the for loop! But, if I change it to the following:
for (int i = VectorOfStructs.size()-1; i > -1; i--)
It works exactly as I want it to (Iterates through all the elements).
So, my questions are:
(A) Why does the 2nd code snippet fail to execute?
(B) Why does the 3rd code snippet execute accordingly while the 2nd doesn't?
Any insight would be greatly appreciated!
All loops go forward, even the ones that go backwards.
What you want is either this:
for (std::size_t i = 0, e = VectorOfStructs.size(); i != e; ++i)
{
std::size_t const ri = e - i - 1;
// use "VectorOfStructs[ri]"
}
Or better:
for (auto rit = VectorOfStructs.rbegin(); rit != VectorOfStructs.rend(); ++rit)
{
// use "*rit"
}
(Your second snippet fails because i is unsigned, so -1 is converted to the same type as i and becomes the maximal representable value, so the comparison is always true. By contrast, i is signed in the third snippet.)
The second example uses size_t as type for i, which is an unsigned type, thus it can never have negative values; this also means that it cannot be properly compared with -1
But (int)-1 is bit-represented as 0xFFFFFFFF, which represents a rather large number (2^32-1) for size_t. i>0xFFFFFFFF can never be true, since 0xFFFFFFF is the largest value a size_t can ever hold.
The 3rd example uses signed int (which allows for negative numbers and therefore the test succeeds).
This one should work:
for (size_t i = VectorOfStructs.size(); i-- > 0;) {
use(VectorOfStructs[i]);
}
In second one you comparing variable 'i' with -1 , and here it is of type size_t and size can not be in negative so it fails.
In third one , 'i' is integer type and integer has range from -32568 to +32567 (for int=2 byte in a system)
Overall size_t variable can not have negative values because a physical memory will have its existence in the system
Why does the 2nd code snippet fail to execute?
size_t is unsigned, so it is by definition never negative. So your loop condition is always true. The variable "wraps around" to the maximum value.
size_t is an unsigned type so -1 is the maximum value size_t can take. In the second snippet size_t can't be greater than this maximum value so the loop isn't entered.
On the other hand, int is a signed type so the comparison to -1 is as you expect.
Int and size_t are both integer types but int can hold negatives as well as positives.
int ranges from -2^31 -1 to 2^31 -1 while size_t ranges from 0 to 2^32 -1
Now, when you write something like int a = -1 it is indeed -1 but when you do so with size_t you get the max int 2^32 -1
So in the 2nd snippet no size_t value will ever exceed -1 as it really 2^32 -1
In the 3rd snippet the type compared is int and when int is compared to -1 it sees it as -1 so it executes the way you planned
When the compiler sees i > -1 and notices that the subexpressions i and -1 have different types, it converts them both to a common type. If the two types (std::size_t and int) have the same number of bits, which appears to be the case for your compiler, the common type is the unsigned one (std::size_t). So the expression turns out to be equivalent to i > (std::size_t)-1. But of course (std::size_t)-1 is the maximum possible value of a size_t, so the comparison is always false.
Most compilers have a warning about a comparison that is always true or always false for reasons like this.
Whenever you compare 'signed' and 'unsigned' the 'signed' values are converted to 'unsigned', first. That covers (#1) and (#2), having a problems with 'unsigned(0-1)' and 'some unsigned' > 'unsigned max'.
However, making it work by forcing a 'signed'/'signed' compare (#3), you loose 1/2 of the 'unsigned' range.
You may do:
for(size_t n = vector.size(); n; /* no -- here */ ) {
--n;
// vector[n];
}
Note: unsigned(-1) is on many systems the biggest unsigned integer value.

C++ string.length() Strange Behavior

I just came across an extremely strange problem. The function I have is simply:
int strStr(string haystack, string needle) {
for(int i=0; i<=(haystack.length()-needle.length()); i++){
cout<<"i "<<i<<endl;
}
return 0;
}
Then if I call strStr("", "a"), although haystack.length()-needle.length()=-1, this will not return 0, you can try it yourself...
This is because .length() (and .size()) return size_t, which is an unsigned int. You think you get a negative number, when in fact it underflows back to the maximum value for size_t (On my machine, this is 18446744073709551615). This means your for loop will loop through all the possible values of size_t, instead of just exiting immediately like you expect.
To get the result you want, you can explicitly convert the sizes to ints, rather than unsigned ints (See aslgs answer), although this may fail for strings with sufficient length (Enough to over/under flow a standard int)
Edit:
Two solutions from the comments below:
(Nir Friedman) Instead of using int as in aslg's answer, include the header and use an int64_t, which will avoid the problem mentioned above.
(rici) Turn your for loop into for(int i = 0;needle.length() + i <= haystack.length();i ++){, which avoid the problem all together by rearranging the equation to avoid the subtraction all together.
(haystack.length()-needle.length())
length returns a size_t, in other words an unsigned int. Given the size of your strings, 0 and 1 respectively, when you calculate the difference it underflows and becomes the maximum possible value for an unsigned int. (Which is approximately 4.2 billions for a storage of 4 bytes, but could be a different value)
i<=(haystack.length()-needle.length())
The indexer i is converted by the compiler into an unsigned int to match the type. So you're gonna have to wait until i is greater than the max possible value for an unsigned int. It's not going to stop.
Solution:
You have to convert the result of each method to int, like so,
i <= ( (int)haystack.length() - (int)needle.length() )

What's the fastest way to extract non-zero indices from a byte array in C++

I have a byte array
unsigned char* array=new unsigned char[4000000];
...
And I would like to get indices of all non-zero elements of the array.
Of course, I can do following
for(int i=0;i<size;i++)
{
if(array[i]!=0) somevector.push_back(i);
}
Is there any faster algorithm than this?
Update 1 I can see majority answer is no. I hoped that there is some magical bit operations I am not aware of. Some guys suggested sorting but no it's not feasible in this case. But thanks a lot for all your answers.
Update 2 After 4 years and 4 months since this question posted, #wim suggested this answer that looks promising.
Unless your vector is ordered, this is the most efficient algorithm to perform what you want to do if you are using a mono-thread program. You can try to optimize the data structure where you want to store your result, but in time this is the best you can do.
With a byte array that is mostly zero, being a sparse array, you can take advantage of a 32 bit CPU by doing comparisons 4 bytes at a time. The actual comparisons are done 4 bytes at a time however if any of the bytes are non-zero then you have to determine which of the bytes in the unsigned long are non-zero so that will take more effort. If the array is really sparse then the time saved with the comparisons may compensate for the additional work determining which of the bytes are non-zero.
The easiest would be to make the unsigned char array sized to some multiple of 4 bytes so that you do not need to worry about doing the last few bytes after the loop completes.
I would suggest doing a timing study on this as it is purely conjectural and there would be a point where an array becomes un-sparse enough that this would take more time than a simple loop.
One question that I would have is what are you doing with the vector of offsets of non-zero elements of the array and whether you can do away with the vector. Another question is if you need the vector whether you can build the vector as you place elements into the array.
unsigned char* array=new unsigned char[4000000];
......
unsigned long *pUlaw = (unsigned long *)array;
for ( ; pUlaw < array + 4000000; pUlaw++) {
if (*pUlaw) {
// at least one byte is non-zero
unsigned char *pUlawByte = (unsigned char *)pUlaw;
if (*pUlawByte)
somevector.push_back(pUlawByte - array);
if (*(pUlawByte+1))
somevector.push_back(pUlawByte - array + 1);
if (*(pUlawByte+2))
somevector.push_back(pUlawByte - array + 2);
if (*(pUlawByte+3))
somevector.push_back(pUlawByte - array + 3);
}
}
If the non-zero values are relatively rare, one trick you can use is a sentinel value:
unsigned char old_value = array[size-1];
array[size-1] = 1; // make sure we find a non-zero eventually
int i=0;
for (;;) {
while (array[i]==0) ++i; // tighter loop
if (i==size-1) break;
somevector.push_back(i);
++i;
}
array[size-1] = old_value;
if (old_value!=0) {
somevector.push_back(size-1);
}
This avoids having to check both the index and the value on each iteration.
The only thing you can do to improve the speed is to use concurrency.
This is not really an answer to your question, but I was trying to imagine what problem you are trying to solve.
Sometimes when performing operations on matrices (in mathematical sense), the operations can be improved when you know that the great majority of matrix elements will be zeros (a sparse matrix). You do such an optimization by not using a big array at all, but simply storing pairs {index, value} that indicate a non-zero element.

Why is the parameter of discard is of type unsigned long long?

I'm implementing a random number engine myself(No, I'm not inventing one) and want to know what should be done if the parameter is negative. So i check the code of mersenne_twister_engine and found this:
void discard(unsigned long long _Nskip)
{ // discard _Nskip elements
for (; 0 < _Nskip; --_Nskip)
(*this)();
}
Isn't unsigned type dangerous in these place?
It's only dangerous if the condition is x <= 0. x > 0 (or 0 < x) is safe.
Basically what you must avoid is subtracting from 0 (more specifically, you need it such that x - y >= 0). The for-statement will be executed one last time, and if the loop were checking for equality to 0, that would be a problem (0 - 1 typically = max). As long as 1 is the ending condition, subtracting one from it is fine (1 - 1 >= 0).
Edit: Upon reading your question again, I'm not sure if I addressed the actual question (I think you may have edited within the 5 minute window? Or maybe I just failed at reading it.))
Anyway, the reason it's unsigned was alluded to by David Rodriguez: discarding a negative number of elements doesn't make sense. (Also, if you did actually manage to pass a negative value to that [in the form it's in], it would at that point be the bit pattern for a huge positive number, and bad, bad things would happen.)
Some people like to use unsigned for variables that only store positive quantities. Some other people don't want to use unsigned to denote that meaning.
It's an often and much discussed topic. I'm in the latter camp: I won't use unsigned in such parameters. When I write a size() function for a list-like class for instance, I use int, even though a size will never become smaller than 0.
Putting an assert or test-and-throw to reject negative int values seems appropriate if you want. People from the unsigned camp will say that the compiler should warn on the call-side when you pass a negative value. You can go on with arguments and I'm sure you will find lots of them on the interwebs.

Cast from size_t to int, or iterate with size_t?

Is it better to cast the iterator condition right operand from size_t to int, or iterate potentially past the maximum value of int? Is the answer implementation specific?
int a;
for (size_t i = 0; i < vect.size(); i++)
{
if (some_func((int)i))
{
a = (int)i;
}
}
int a;
for (int i = 0; i < (int)vect.size(); i++)
{
if (some_func(i))
{
a = i;
}
}
I almost always use the first variation, because I find that about 80% of the time, I discover that some_func should probably also take a size_t.
If in fact some_func takes a signed int, you need to be aware of what happens when vect gets bigger than INT_MAX. If the solution isn't obvious in your situation (it usually isn't), you can at least replace some_func((int)i) with some_func(numeric_cast<int>(i)) (see Boost.org for one implementation of numeric_cast). This has the virtue of throwing an exception when vect grows bigger than you've planned on, rather than silently wrapping around to negative values.
I'd just leave it as a size_t, since there's not a good reason not to do so. What do you mean by "or iterate potentially up to the maximum value of type_t"? You're only iterating up to the value of vect.size().
For most compilers, it won't make any difference. On 32 bit systems, it's obvious, but even on 64 bit systems, both variables will probably be stored in a 64-bit register and pushed on the stack as a 64-bit value.
If the compiler stores int values as 32 bit values on the stack, the first function should be more efficient in terms of CPU-cycles.
But the difference is negligible (although the second function "looks" cleaner)