Idiom for handling size_t underflow in loop condition - c++

In C and C++, size_t is an unsigned type that is used for expressing size. It expresses intent and somewhat simplifies range assertions (len < upper_bound vs len >= 0 && len < upper_bound for signed integers).
(In all the examples below len means the length of the array a).
The idiom for a for loop is: for (i = 0; i < len; i++). The idiom for a backward for loop is for (i = len-1; i >= 0; i--). But having unsigned loop indices introduces subtle bugs, and every so often I mess up the edge cases.
First, the backwards for loop. This code underflows for len=0.
for (size_t i = len-1; i >= 0; i--) { // Bad: Underflows for len=0
use(a[i]);
}
There's the --> "operator" trick, which looks strange if you're not used to it.
for (size_t i = len; i--> 0;) {
use(a[i]);
}
You can use a signed type for the loop index variable, but that overflows if len > INT_MAX. Many people and organizations considers that risk so minimal that they just stick to int.
for (int i = len-1; i >= 0; i--) { // BAD: overflows for len < INT_MAX
use(a[i]);
}
So I've settled for this construct, since it's closest to the canonical for-loop form and has the simplest expressions.
for (size_t i = len; i > 0; i--) {
size_t pos = i-1;
use(a[pos]);
}
My problem iterating from 0 to len-1
That is, looping over the range [0, len-1). This loop underflows when len=0.
for (size_t i = 0; i < len-1; i++) { // BAD: Underflows for len=0.
use(a[i]);
}
As for the iterating backwards case, you can use signed integers but that may cause overflows.
for (int i = 0; i < len-1; i++) { // BAD: Will overflow if len > INT_MAX
use(a[i]);
}
I tend to add another expression to the loop condition, checking for len > 0, but that feels clumsy.
for (size_t i = 0; len > 0 && i < len-1; i++) {
use(a[i]);
}
I can add an if statement before the loop, but that also feels clumsy.
Is there a less bulky way of writing a for loop with unsigned index variables looping from 0 to len-1?

There are two cases here.
Iterating forward from 0 to len - 2 inclusive
for (size_t i = 0; i + 1 < len; ++i) {
size_t index = i;
// use index here
}
Iterating backward from len - 2 to 0 inclusive
for (size_t i = len; i > 1; --i) {
size_t index = i - 2;
// use index here
}

How about
for (size_t i = 0; i+1 < len; i++) {
use(a[i]);
}

In all these cases, you have one common theme: You have an index that has a start value. It gets incremented or decremented until it reaches the end value, which terminates the loop.
One thing that helps here is to make these two values explicit. In the simplest case, forward-iterating the whole range, the end value is simply len:
for (size_t i=0, end=len; i!=end; ++i) {
///...
}
This follows the general advise given to people using iterators. In particular, pay attention to the comparison, which works here and which is actually required for some iterators.
Now, backward iterating:
for (size_t i=len-1, end=-1; i!=end; --i) {
///...
}
Lastly, iterating a subset excluding the last n elements of the range backwards:
if (len > n) {
for (size_t i=len-n-1, end=-1; i!=end; --i) {
///...
}
}
Actually, what you were fighting with was your attempt to put too much stuff into the loop logic. Just be explicit that this requires more than n elements in order to do anything at all. Yes, you could put len > n into the loop condition, but that wouldn't give you clear and simple code who's intention anyone understands.

I tend to add another expression to the loop condition, checking for
len > 0, but that feels clumsy.
Your code should follow the logic. It is not clumsy at all. And it is much more readable for humans.
As the loop makes no sense if the len == 0 and I usually use the if statement. It makes the code easy to understand and maintain.
if(len)
{
for (size_t i = 0; i < len-1; i++) { /*...*/ }
}
or you can also add the check in the loop. BTW you only check if it is not zero.
for (size_t i = 0; len && i < len-1; i++) { /*...*/ }

There are many possible answers for this question. Which is best is opinion based. I'll offer some options.
For iterating over all elements of an array in reverse order, using an unsigned index, one option is
for (size_t index = 0; index < len; ++index)
{
size_t i = len - 1 - index;
use(a[i]);
}
or (more simply)
for (size_t i = 0; i < len; ++i)
{
use(a[len - 1 - i]);
}
In both cases, if len is zero, the loop body is not executed. Although the loops increment rather than decrement, both access elements in reverse order. If you are frequently writing such loops, it is also not difficult to write a little inline function of the form
size_t index_in_reverse(size_t index, size_t len)
{
return len - 1 - index;
}
and do
for (size_t i = 0; i < len; ++i)
{
use(index_in_reverse(i, len));
}
To iterate forward over all elements of the loop in forward order, except the last, I'd make that explicit rather than trying to do it in the loop condition.
if (len > 0)
{
size_t shortened_len = len - 1;
for (size_t i = 0; i < shortened_len; ++i)
use(a[i]);
}
The reason I introduce the variable shortened_len is to make the code self-documenting about the fact it is not iterating over the entire array. I've seen too many cases where a condition of the form i < len - 1 is "corrected" by a subsequent developer to remove the - 1 because they believe it is a typo.
That may feel "clumsy" to the OP, but I suggest that
for (size_t i = 0; len > 0 && i < len-1; i++) {
use(a[i]);
}
is harder for a human to understand (putting multiple tests in a loop condition, forces a person maintaining the code to actually work out what both conditions do and, how they interact). Given a choice between code that is "concise" or code that is "less concise but consumes less brainpower of an unacquainted human to understand" I will ALWAYS choose the latter. Bear in mind that the user maintaining the code six months later may be yourself, and there are few thought processes more humbling than "What idiot wrote this?? Oh, it was me!".

Related

How can I iterate through the last element of the vector without going out of bounds?

The expected output is 1a1b1c but I only get 1a1b If I try putting '-1' next to input.size() in the for loop but that will just ignore the bug. What I'm looking for is that I want to be able to iterate through the last member of the string without going out of bounds.
std::string input = "abc";
for (unsigned int i = 0; i < input.size(); i++){
int counter = 1;
while(input.at(i) == input.at(i+1) && i < input.size()-1){
counter++;
i++;
}
number.push_back(counter);
character.push_back(input.at(i));
}
Few points for you to consdier:
1: for (unsigned int i = 0; i < input.size(); i++) specifically i++. This is a postfix operation meaning it returns i then increments the value of i. Not as big a deal here with integers but with iterators this can get very expensive as you create a copy of the iterator each time. Prefer to say what you mean / what you actually want, which is to increment i, not get a copy of i and increment i afterwards. So prefer ++i which only increments i and does not make a copy.
2: unsigned int i = 0 Firstly its better than using an int which has a signed -> unsigned conversaion every comparison with input.size() which returns a size_t. Secondly unsigned int is not guaranteed to be big enough to hold the size of the string and requires a promotion from (probably) 32 bit -> 64 bit unsigned to compare with size_t
3: cognitive complexity, nested loops which both mutate the same invariant (in this case i) makes the code more difficult to reason about and will ultimately lead to more bugs as code evolves over time. where possible only have one place where a loop invariant is mutated.
4: As pointed out by others the while loop while(input.at(i) == input.at(i+1) && i < input.size()-1) can exceed the size of the string and using the .at member function of string will throw for an out of bounds access. This can be simply resolved with point 3 by refactoring ther nested loop into a single loop.
5: Avoid so many calls to .at, we are in complete control of the index we use to index the string so you can use operator[] safely as long as we can guarantee i will always be a valid index which in this case i think you can.
6: i < input.size() using < when its not the check you want and its much more expensive than the check you actually want which is i != input.size(). Check out this trivial comparison in compiler explorer
Thankfully the fix from shadowranger Fixes your problem completely ie: while(i < s.size()-1 && s.at(i) == s.at(i+1)) However i would like to offer an alternitive with no nested loops to show you how to avoid my points 3,4, 5 and 6 :
void do_the_thing(std::string const& s) {
std::cout << "Considering: \"" + s + "\"\n";
if(s.empty()) {
return;
}
size_t const length = s.length(); // avoiding repeated calls to length which never changes in this case
if(length == 1) {
std::cout << "1" << s[0] << "\n";
return;
}
std::vector<unsigned> number;
std::vector<char> character;
// do the stuff your example did
char last = s[0];
unsigned same_count = 1;
for(size_t ii = 1; ii != length; ++ii) {
char const cur = s[ii];
if(cur == last) {
++same_count;
} else {
number.push_back(same_count);
character.push_back(last);
last = cur;
same_count = 1;
}
}
if(*s.rbegin() == last) {
number.push_back(same_count);
character.push_back(last);
}
// print the things or use them in some way
assert(number.size() == character.size());
size_t const out_len = character.size();
for(size_t ii = 0; ii != out_len; ++ii) {
std::cout << number[ii] << character[ii];
}
std::cout << "\n";
}

Why program throws runtime error while iterating over an emtpy vector in c++

vector <int> o; //Empty vector
for(int i=0;i<=o.size()-1;i++) cout<<o[i];
got runtime error in the above
vector <int> o;
for(auto j : o){
cout<<j<<" ";
}
However this code runs fine if iterator is used instead
o.size() is required by the C++ standard to return an unsigned type. When that's zero, subtracting 1 yields std::numeric_limits<decltype(o.size())>::max() which means your loop runs past the bounds of the empty vector.
for(std::size_t i = 0; i < o.size(); ++i) is the obvious fix. The use of <= and -1 seems almost disingenuously contrived to me.
o.size() will return an unsigned value of 0. Subtracting one from it returns a very large positive number, essentially making an infinite loop. Eventually your out-of-bounds array accesses to o[i] will result in a crash.
You could use
for(int i = 0; i <= int(o.size() - 1); i++)
Or just use the more typical
for(int i = 0;i < o.size(); i++)
where you check for "less than", not "less or equal" to a number one less.
Since sizeof(size_t) is greater or equal than sizeof(int) (although this might be implementation dependent) and size_t is unsigned, the int (1) is converted to size_t.
Therefore, in the expression o.size() - 1, the 1 is implicitly converted to size_t, and o.size() - 1 (which is equivalent to size_t(0 - 1)) becomes equal to std::numeric_limits<size_t>::max(). Therefore, the for loop is entered and accessing your empty o at index 0 results in undefined behavior.
You should:
for (size_t idx = 0; idx < o.size(); ++idx) { /* ... */ }
If for some reason you need the index to be of type int, you can:
for (int idx = 0; idx < static_cast<int>(o.size()); ++idx) { /* ... */ }
or in your example (which is less common):
for (int idx = 0; idx <= static_cast<int>(o.size()) - 1; ++idx) { /* ... */ }

Almost same code running much slower

I am trying to solve this problem:
Given a string array words, find the maximum value of length(word[i]) * length(word[j]) where the two words do not share common letters. You may assume that each word will contain only lower case letters. If no such two words exist, return 0.
https://leetcode.com/problems/maximum-product-of-word-lengths/
You can create a bitmap of char for each word to check if they share chars in common and then calc the max product.
I have two method almost equal but the first pass checks, while the second is too slow, can you understand why?
class Solution {
public:
int maxProduct2(vector<string>& words) {
int len = words.size();
int *num = new int[len];
// compute the bit O(n)
for (int i = 0; i < len; i ++) {
int k = 0;
for (int j = 0; j < words[i].length(); j ++) {
k = k | (1 <<(char)(words[i].at(j)));
}
num[i] = k;
}
int c = 0;
// O(n^2)
for (int i = 0; i < len - 1; i ++) {
for (int j = i + 1; j < len; j ++) {
if ((num[i] & num[j]) == 0) { // if no common letters
int x = words[i].length() * words[j].length();
if (x > c) {
c = x;
}
}
}
}
delete []num;
return c;
}
int maxProduct(vector<string>& words) {
vector<int> bitmap(words.size());
for(int i=0;i<words.size();++i) {
int k = 0;
for(int j=0;j<words[i].length();++j) {
k |= 1 << (char)(words[i][j]);
}
bitmap[i] = k;
}
int maxProd = 0;
for(int i=0;i<words.size()-1;++i) {
for(int j=i+1;j<words.size();++j) {
if ( !(bitmap[i] & bitmap[j])) {
int x = words[i].length() * words[j].length();
if ( x > maxProd )
maxProd = x;
}
}
}
return maxProd;
}
};
Why the second function (maxProduct) is too slow for leetcode?
Solution
The second method does repetitive call to words.size(). If you save that in a var than it working fine
Since my comment turned out to be correct I'll turn my comment into an answer and try to explain what I think is happening.
I wrote some simple code to benchmark on my own machine with two solutions of two loops each. The only difference is the call to words.size() is inside the loop versus outside the loop. The first solution is approximately 13.87 seconds versus 16.65 seconds for the second solution. This isn't huge, but it's about 20% slower.
Even though vector.size() is a constant time operation that doesn't mean it's as fast as just checking against a variable that's already in a register. Constant time can still have large variances. When inside nested loops that adds up.
The other thing that could be happening (someone much smarter than me will probably chime in and let us know) is that you're hurting your CPU optimizations like branching and pipelining. Every time it gets to the end of the the loop it has to stop, wait for the call to size() to return, and then check the loop variable against that return value. If the cpu can look ahead and guess that j is still going to be less than len because it hasn't seen len change (len isn't even inside the loop!) it can make a good branch prediction each time and not have to wait.

Enumerating array in reverse order using size_t index

Let's say we need to print int array with size N in reverse order:
// Wrong, i is unsigned and always >= 0:
for(size_t i = N-1; i >= 0; --i){cout << data[i];}
// Correct, but uses int instead of size_t:
for(int i = N-1; i >= 0; --i){cout << data[i];}
// Correct, but requires additional operation in the loop:
for(size_t i = N; i > 0; --i){cout << data[i-1];}
// Probably the best version, but less readable.
// Is this well-defined behavior?
for(size_t i = N-1; i != (size_t)(-1); --i){cout << data[i];}
Is there better way to do such enumeration using size_t index and without additional operations in the loop?
Is it valid to assume that (size_t)0 - 1 gives (size_t)(-1) or this is undefined?
You can move the decrement to "after" the condition.
for(size_t i = N; i > 0;) {
--i;
cout << data[i];
}
It's not as elegant as a forwards loop but it works. We break at 0 so i never wraps.
since C++14:
for (auto it = std::rbegin(data); it != std::rend(data); ++it) { std::cout << *it; }
Or if you can use boost, you may use boost::adaptors::reversed and for range.
You can simply test for i < N.
size_t is defined as an unsigned integer, which in turn is defined to have modulo semantics. So your index will go from N-1 down to 0 and then wrap around to numeric_limits<size_t>::max() for which i<N doesn't hold true any longer.
To give a full example:
for(size_t i = N-1; i < N; --i){cout << data[i];}

c++ counting sort

I tried to write a countingsort, but there's some problem with it.
here's the code:
int *countSort(int* start, int* end, int maxvalue)
{
int *B = new int[(int)(end-start)];
int *C = new int[maxvalue];
for (int i = 0; i < maxvalue; i++)
{
*(C+i) = 0;
}
for (int *i = start; i < end; i++)
{
*(C+*i) += 1;
}
for (int i = 1; i < maxvalue-1 ; i++)
{
*(C+i) += *(C+i-1);
}
for (int *i = end-1; i > start-1; i--)
{
*(B+*(C+(*i))) = *i;
*(C+(*i)) -= 1;
}
return B;
}
In the last loop it throws an exception "Acces violation writing at location: -some ram address-"
Where did I go wrong?
for (int i = 1; i < maxvalue-1 ; i++)
That's the incorrect upper bound. You want to go from 1 to maxvalue.
for (int *i = end-1; i > start-1; i--)
{
*(B+*(C+(*i))) = *i;
*(C+(*i)) -= 1;
}
This loop is also completely incorrect. I don't know what it does, but a brief mental test shows that the first iteration sets the element of B at the index of the value of the last element in the array to the number of times it shows. I guarantee that that is not correct. The last loop should be something like:
int* out = B;
int j=0;
for (int i = 0; i < maxvalue; i++) { //for each value
for(j<C[i]; j++) { //for the number of times its in the source
*out = i; //add it to the output
++out; //in the next open slot
}
}
As a final note, why are you playing with pointers like that?
*(B + i) //is the same as
B[i] //and people will hate you less
*(B+*(C+(*i))) //is the same as
B[C[*i]]
Since you're using C++ anyway, why not simplify the code (dramatically) by using std::vector instead of dynamically allocated arrays (and leaking one in the process)?
std::vector<int>countSort(int* start, int* end, int maxvalue)
{
std::vector<int> B(end-start);
std::vector<int> C(maxvalue);
for (int *i = start; i < end; i++)
++C[*i];
// etc.
Other than that, the logic you're using doesn't make sense to me. I think to get a working result, you're probably best off sitting down with a sheet of paper and working out the steps you need to use. I've left the counting part in place above, because I believe that much is correct. I don't think the rest really is. I'll even give a rather simple hint: once you've done the counting, you can generate B (your result) based only on what you have in C -- you do not need to refer back to the original array at all. The easiest way to do it will normally use a nested loop. Also note that it's probably easier to reserve the space in B and use push_back to put the data in it, rather than setting its initial size.