Related
I was asked to make a copy of std::string in an assignment and I am having problem implementing the substr function. In a set of tests the teacher gave us there was one with a length equal to -1. Mine declaration of substr is:
Cadena substr(size_t start, size_t length) const;
Which I (thought) that the size_t would prevent negative values to be passed. The problem is that in the definition I check for size() < start + length (assume tam_ is the same as size()):
if (tam_ < start + length)
throw std::out_of_range("Error");
In my system -1 in unsigned is 18446744073709551615, so that, for example assume start is 9 and tam_ is 10.
I expect:
10 < 9 + 18446744073709551615
So that the exception is thrown, but in reality I get
10 < 9 + (-1)
Which is false and exception isn't thrown. As the function continues it allocates a char array for size length + 1, which the system refuses due to that new[] treats the size_t as it should be, 18446744073709551615, which is so large it makes the program crash.
I want to know why my expected result isn't the correct.
Overflow and underflow problems are hard to check for. In general, you need to check before you do the arithmetic, not afterward.
First, let's be clear what's happening.
// The call
foo.substr(9, -1);
The -1 is an int, but the function takes a std::size_t so -1 is converted, and you get the Really Big Number (RBN).
Then, you test:
size() < start + length
These are unsigned types, so if the arithmetic (i.e., the addition) exceeds the upper or lower bounds of the range that can be represented, the value will wrap around. (With signed types, the behavior would be undefined.)
Here, length is the RBN. When you add start and the RBN, the addition wraps around to a small number. You missed detection of the overflow.
To fix it, you need to check two things: (1) that start is within bounds and (2) that the remainder of the string after start is longer than length.
if (start > size() || size() - start < length)
throw std::::out_of_range("Error");
Step (1) is key because it guarantees that the subtraction in Step (2) won't go below 0.
This question already has answers here:
What's the best way to do a reverse 'for' loop with an unsigned index?
(20 answers)
Closed 7 years ago.
When I put the following in my program:
for (size_t i = VectorOfStructs.size()-1; i > 0; i--)
It works correctly but does "i" will never equal 0.
So, I cannot access the first element (VectorOfStructs[0]).
If I change it to:
for (size_t i = VectorOfStructs.size()-1; i > -1; i--)
The program doesn't even enter the for loop! But, if I change it to the following:
for (int i = VectorOfStructs.size()-1; i > -1; i--)
It works exactly as I want it to (Iterates through all the elements).
So, my questions are:
(A) Why does the 2nd code snippet fail to execute?
(B) Why does the 3rd code snippet execute accordingly while the 2nd doesn't?
Any insight would be greatly appreciated!
All loops go forward, even the ones that go backwards.
What you want is either this:
for (std::size_t i = 0, e = VectorOfStructs.size(); i != e; ++i)
{
std::size_t const ri = e - i - 1;
// use "VectorOfStructs[ri]"
}
Or better:
for (auto rit = VectorOfStructs.rbegin(); rit != VectorOfStructs.rend(); ++rit)
{
// use "*rit"
}
(Your second snippet fails because i is unsigned, so -1 is converted to the same type as i and becomes the maximal representable value, so the comparison is always true. By contrast, i is signed in the third snippet.)
The second example uses size_t as type for i, which is an unsigned type, thus it can never have negative values; this also means that it cannot be properly compared with -1
But (int)-1 is bit-represented as 0xFFFFFFFF, which represents a rather large number (2^32-1) for size_t. i>0xFFFFFFFF can never be true, since 0xFFFFFFF is the largest value a size_t can ever hold.
The 3rd example uses signed int (which allows for negative numbers and therefore the test succeeds).
This one should work:
for (size_t i = VectorOfStructs.size(); i-- > 0;) {
use(VectorOfStructs[i]);
}
In second one you comparing variable 'i' with -1 , and here it is of type size_t and size can not be in negative so it fails.
In third one , 'i' is integer type and integer has range from -32568 to +32567 (for int=2 byte in a system)
Overall size_t variable can not have negative values because a physical memory will have its existence in the system
Why does the 2nd code snippet fail to execute?
size_t is unsigned, so it is by definition never negative. So your loop condition is always true. The variable "wraps around" to the maximum value.
size_t is an unsigned type so -1 is the maximum value size_t can take. In the second snippet size_t can't be greater than this maximum value so the loop isn't entered.
On the other hand, int is a signed type so the comparison to -1 is as you expect.
Int and size_t are both integer types but int can hold negatives as well as positives.
int ranges from -2^31 -1 to 2^31 -1 while size_t ranges from 0 to 2^32 -1
Now, when you write something like int a = -1 it is indeed -1 but when you do so with size_t you get the max int 2^32 -1
So in the 2nd snippet no size_t value will ever exceed -1 as it really 2^32 -1
In the 3rd snippet the type compared is int and when int is compared to -1 it sees it as -1 so it executes the way you planned
When the compiler sees i > -1 and notices that the subexpressions i and -1 have different types, it converts them both to a common type. If the two types (std::size_t and int) have the same number of bits, which appears to be the case for your compiler, the common type is the unsigned one (std::size_t). So the expression turns out to be equivalent to i > (std::size_t)-1. But of course (std::size_t)-1 is the maximum possible value of a size_t, so the comparison is always false.
Most compilers have a warning about a comparison that is always true or always false for reasons like this.
Whenever you compare 'signed' and 'unsigned' the 'signed' values are converted to 'unsigned', first. That covers (#1) and (#2), having a problems with 'unsigned(0-1)' and 'some unsigned' > 'unsigned max'.
However, making it work by forcing a 'signed'/'signed' compare (#3), you loose 1/2 of the 'unsigned' range.
You may do:
for(size_t n = vector.size(); n; /* no -- here */ ) {
--n;
// vector[n];
}
Note: unsigned(-1) is on many systems the biggest unsigned integer value.
I just came across an extremely strange problem. The function I have is simply:
int strStr(string haystack, string needle) {
for(int i=0; i<=(haystack.length()-needle.length()); i++){
cout<<"i "<<i<<endl;
}
return 0;
}
Then if I call strStr("", "a"), although haystack.length()-needle.length()=-1, this will not return 0, you can try it yourself...
This is because .length() (and .size()) return size_t, which is an unsigned int. You think you get a negative number, when in fact it underflows back to the maximum value for size_t (On my machine, this is 18446744073709551615). This means your for loop will loop through all the possible values of size_t, instead of just exiting immediately like you expect.
To get the result you want, you can explicitly convert the sizes to ints, rather than unsigned ints (See aslgs answer), although this may fail for strings with sufficient length (Enough to over/under flow a standard int)
Edit:
Two solutions from the comments below:
(Nir Friedman) Instead of using int as in aslg's answer, include the header and use an int64_t, which will avoid the problem mentioned above.
(rici) Turn your for loop into for(int i = 0;needle.length() + i <= haystack.length();i ++){, which avoid the problem all together by rearranging the equation to avoid the subtraction all together.
(haystack.length()-needle.length())
length returns a size_t, in other words an unsigned int. Given the size of your strings, 0 and 1 respectively, when you calculate the difference it underflows and becomes the maximum possible value for an unsigned int. (Which is approximately 4.2 billions for a storage of 4 bytes, but could be a different value)
i<=(haystack.length()-needle.length())
The indexer i is converted by the compiler into an unsigned int to match the type. So you're gonna have to wait until i is greater than the max possible value for an unsigned int. It's not going to stop.
Solution:
You have to convert the result of each method to int, like so,
i <= ( (int)haystack.length() - (int)needle.length() )
The following code crashes C++ with a runtime error:
#include <string>
using namespace std;
int main() {
string s = "aa";
for (int i = 0; i < s.length() - 3; i++) {
}
}
While this code does not crash:
#include <string>
using namespace std;
int main() {
string s = "aa";
int len = s.length() - 3;
for (int i = 0; i < len; i++) {
}
}
I just don't have any idea how to explain it. What could be the reason for this behavior?
s.length() is unsigned integer type. When you subtract 3, you make it negative. For an unsigned, it means very big.
A workaround (valid as long the string is long up to INT_MAX) would be to do like this:
#include <string>
using namespace std;
int main() {
string s = "aa";
for (int i = 0; i < static_cast<int> (s.length() ) - 3; i++) {
}
}
Which would never enter the loop.
A very important detail is that you have probably received a warning "comparing signed and unsigned value". The problem is that if you ignore those warnings, you enter the very dangerous field of implicit "integer conversion"(*), which has a defined behaviour, but it is difficult to follow: the best is to never ignore those compiler warnings.
(*) You might also be interested to know about "integer promotion".
First of all: why does it crash? Let's step through your program like a debugger would.
Note: I'll assume that your loop body isn't empty, but accesses the string. If this isn't the case, the cause of the crash is undefined behaviour through integer overflow. See Richard Hansens answer for that.
std::string s = "aa";//assign the two-character string "aa" to variable s of type std::string
for ( int i = 0; // create a variable i of type int with initial value 0
i < s.length() - 3 // call s.length(), subtract 3, compare the result with i. OK!
{...} // execute loop body
i++ // do the incrementing part of the loop, i now holds value 1!
i < s.length() - 3 // call s.length(), subtract 3, compare the result with i. OK!
{...} // execute loop body
i++ // do the incrementing part of the loop, i now holds value 2!
i < s.length() - 3 // call s.length(), subtract 3, compare the result with i. OK!
{...} // execute loop body
i++ // do the incrementing part of the loop, i now holds value 3!
.
.
We would expect the check i < s.length() - 3 to fail right away, since the length of s is two (we only every given it a length at the beginning and never changed it) and 2 - 3 is -1, 0 < -1 is false. However we do get an "OK" here.
This is because s.length() isn't 2. It's 2u. std::string::length() has return type size_t which is an unsigned integer. So going back to the loop condition, we first get the value of s.length(), so 2u, now subtract 3. 3 is an integer literal and interpreted by the compiler as type int. So the compiler has to calculate 2u - 3, two values of different types. Operations on primitive types only work for same types, so one has to be converted into the other. There are some strict rules, in this case, unsigned "wins", so 3 get's converted to 3u. In unsigned integers, 2u - 3u can't be -1u as such a number does not exists (well, because it has a sign of course!). Instead it calculates every operation modulo 2^(n_bits), where n_bits is the number of bits in this type (usually 8, 16, 32 or 64). So instead of -1 we get 4294967295u (assuming 32bit).
So now the compiler is done with s.length() - 3 (of course it's much much faster than me ;-) ), now let's go for the comparison: i < s.length() - 3. Putting in the values: 0 < 4294967295u. Again, different types, 0 becomes 0u, the comparison 0u < 4294967295u is obviously true, the loop condition is positively checked, we can now execute the loop body.
After incrementing, the only thing that changes in the above is the value of i. The value of i will again be converted into an unsigned int, as the comparison needs it.
So we have
(0u < 4294967295u) == true, let's do the loop body!
(1u < 4294967295u) == true, let's do the loop body!
(2u < 4294967295u) == true, let's do the loop body!
Here's the problem: What do you do in the loop body? Presumably you access the i^th character of your string, don't you? Even though it wasn't your intention, you didn't only accessed the zeroth and first, but also the second! The second doesn't exists (as your string only has two characters, the zeroth and first), you access memory you shouldn't, the program does whatever it wants (undefined behaviour). Note that the program isn't required to crash immediately. It can seem to work fine for another half an hour, so these mistakes are hard to catch. But it's always dangerous to access memory beyond the bounds, this is where most crashes come from.
So in summary, you get a different value from s.length() - 3 from that what you'd expect, this results in a positive loop condition check, that leads to repetitive execution of the loop body, which in itself accesses memory it shouldn't.
Now let's see how to avoid that, i.e. how to tell the compiler what you actually meant in your loop condition.
Lengths of strings and sizes of containers are inherently unsigned so you should use an unsigned integer in for loops.
Since unsigned int is fairly long and therefore undesirable to write over and over again in loops, just use size_t. This is the type every container in the STL uses for storing length or size. You may need to include cstddef to assert platform independence.
#include <cstddef>
#include <string>
using namespace std;
int main() {
string s = "aa";
for ( size_t i = 0; i + 3 < s.length(); i++) {
// ^^^^^^ ^^^^
}
}
Since a < b - 3 is mathematically equivalent to a + 3 < b, we can interchange them. However, a + 3 < b prevents b - 3 to be a huge value. Recall that s.length() returns an unsigned integer and unsigned integers perform operations module 2^(bits) where bits is the number of bits in the type (usually 8, 16, 32 or 64). Therefore with s.length() == 2, s.length() - 3 == -1 == 2^(bits) - 1.
Alternatively, if you want to use i < s.length() - 3 for personal preference, you have to add a condition:
for ( size_t i = 0; (s.length() > 3) && (i < s.length() - 3); ++i )
// ^ ^ ^- your actual condition
// ^ ^- check if the string is long enough
// ^- still prefer unsigned types!
Actually, in the first version you loop for a very long time, as you compare i to an unsigned integer containing a very large number. The size of a string is (in effect) the same as size_t which is an unsigned integer. When you subtract the 3 from that value it underflows and goes on to be a big value.
In the second version of the code, you assign this unsigned value to a signed variable, and so you get the correct value.
And it's not actually the condition or the value that causes the crash, it's most likely that you index the string out of bounds, a case of undefined behavior.
Assuming you left out important code in the for loop
Most people here seem unable to reproduce the crash—myself included—and it looks like the other answers here are based on the assumption that you left out some important code in the body of the for loop, and that the missing code is what is causing your crash.
If you are using i to access memory (presumably characters in the string) in the body of the for loop, and you left that code out of your question in an attempt to provide a minimal example, then the crash is easily explained by the fact that s.length() - 3 has the value SIZE_MAX due to modular arithmetic on unsigned integer types. SIZE_MAX is a very big number, so i will keep getting bigger until it is used to access an address that triggers a segfault.
However, your code could theoretically crash as-is, even if the body of the for loop is empty. I am unaware of any implementations that would crash, but maybe your compiler and CPU are exotic.
The following explanation does not assume that you left out code in your question. It takes on faith that the code you posted in your question crashes as-is; that it isn't an abbreviated stand-in for some other code that crashes.
Why your first program crashes
Your first program crashes because that is its reaction to undefined behavior in your code. (When I try running your code, it terminates without crashing because that is my implementation's reaction to the undefined behavior.)
The undefined behavior comes from overflowing an int. The C++11 standard says (in [expr] clause 5 paragraph 4):
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
In your example program, s.length() returns a size_t with value 2. Subtracting 3 from that would yield negative 1, except size_t is an unsigned integer type. The C++11 standard says (in [basic.fundamental] clause 3.9.1 paragraph 4):
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.46
46) This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
This means that the result of s.length() - 3 is a size_t with value SIZE_MAX. This is a very big number, bigger than INT_MAX (the largest value representable by int).
Because s.length() - 3 is so big, execution spins in the loop until i gets to INT_MAX. On the very next iteration, when it tries to increment i, the result would be INT_MAX + 1 but that is not in the range of representable values for int. Thus, the behavior is undefined. In your case, the behavior is to crash.
On my system, my implementation's behavior when i is incremented past INT_MAX is to wrap (set i to INT_MIN) and keep going. Once i reaches -1, the usual arithmetic conversions (C++ [expr] clause 5 paragraph 9) cause i to equal SIZE_MAX so the loop terminates.
Either reaction is appropriate. That is the problem with undefined behavior—it might work as you intend, it might crash, it might format your hard drive, or it might cancel Firefly. You never know.
How your second program avoids the crash
As with the first program, s.length() - 3 is a size_t type with value SIZE_MAX. However, this time the value is being assigned to an int. The C++11 standard says (in [conv.integral] clause 4.7 paragraph 3):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
The value SIZE_MAX is too big to be representable by an int, so len gets an implementation-defined value (probably -1, but maybe not). The condition i < len will eventually be true regardless of the value assigned to len, so your program will terminate without encountering any undefined behavior.
The type of s.length() is size_t with a value of 2, therefore s.length() - 3 is also an unsigned type size_t and it has a value of SIZE_MAX which is implementation defined (which is 18446744073709551615 if its size is 64 bit). It is at least 32 bit type (can be 64 bit in 64 bit platforms) and this high number means an indefinite loop. In order to prevent this problem you can simply cast s.length() to int:
for (int i = 0; i < (int)s.length() - 3; i++)
{
//..some code causing crash
}
In the second case len is -1 because it is a signed integer and it does not enter the loop.
When it comes to crashing, this "infinite" loop is not the direct cause of the crash. If you share the code within the loop you can get further explanation.
Since s.length() is unsigned type quantity, when you do s.length()-3, it becomes negative and negative values are stored as large positive values (due to unsigned conversion specifications) and the loop goes infinite and hence it crashes.
To make it work, you must typecast the s.length() as :
static_cast < int > (s.length())
The problem you are having arises from the following statement:
i < s.length() - 3
The result of s.length() is of the unsigned size_t type.
If you imagine the binary representation of two:
0...010
And you then substitute three from this, you are effectively taking off 1 three times, that is:
0...001
0...000
But then you have a problem, removing the third digit it underflows, as it attempts to get another digit from the left:
1...111
This is what happens no matter if you have an unsigned or signed type, however the difference is the signed type uses the Most Significant Bit (or MSB) to represent if the number is negative or not. When the undeflow occurs it simply represents a negative for the signed type.
On the other hand, size_t is unsigned. When it underflows it will now represent the highest number size_t can possibly represent. Thus the loop is practically infinite (Depending on your computer, as this effects the maximum of size_t).
In order to fix this problem, you can manipulate the code you have in a few different ways:
int main() {
string s = "aa";
for (size_t i = 3; i < s.length(); i++) {
}
}
or
int main() {
string s = "aa";
for (size_t i = 0; i + 3 < s.length(); i++) {
}
}
or even:
int main() {
string s = "aa";
for(size_t i = s.length(); i > 3; --i) {
}
}
The important things to note is that the substitution has been omitted and instead addition has been used elsewhere with the same logical evaluations.
Both the first and last ones change the value of i that is available inside the for loop whereas the second will keep it the same.
I was tempted to provide this as an example of code:
int main() {
string s = "aa";
for(size_t i = s.length(); --i > 2;) {
}
}
After some thought I realised this was a bad idea. Readers' exercise is to work out why!
The reason is the same as
int a = 1000000000;
long long b = a * 100000000; would give error. When compilers multiplies these numbers it evaluates it as ints, since a and literal 1000000000 are ints, and since 10^18 is much more large than the upper bound of int, it will give error.
In your case we have s.length() - 3, as s.length() is unsigned int, it cant be negative, and since s.length() - 3 is evaluated as unsigned int, and its value is -1, it gives error here too.
In a system I am maintaining, users request elements from a collection from a 1-based indexing scheme. Values are stored in 0-based arrays in C++ / C.
Is the following hypothetical code portable if 0 is erroneously entered as the input to this function? Is there a better way to validate the user's input when converting 1-based numbering schemes to 0-based?
const unsigned int arraySize;
SomeType array[arraySize];
SomeType GetFromArray( unsigned int oneBasedIndex )
{
unsigned int zeroBasedIndex = oneBasedIndex - 1;
//Intent is to check for a valid index.
if( zeroBasedIndex < arraySize )
{
return array[zeroBasedIndex];
}
//else... handle the error
}
My assumption is that (unsigned int)( 0 - 1 ) is always greater than arraySize; is this true?
The alternative, as some have suggested in their answers below, is to check oneBasedIndex and ensure that it is greater than 0:
const unsigned int arraySize;
SomeType array[arraySize];
SomeType GetFromArray( unsigned int oneBasedIndex )
{
if( oneBasedIndex > 0 && oneBasedIndex <= arraySize )
{
return array[oneBasedIndex - 1];
}
//else... handle the error
}
For unsigned types, 0-1 is the maximum value for that type, so it's always >= arraySize. In other words, yes that is absolutely safe.
Unsigned integers never overflow in C++ and in C.
For C++ language:
(C++11, 3.9.1p4) "Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer. 46)"
and footnote 46):
"46) This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type."
and for C language:
(C11, 6.2.5p9) "A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type."
Chances are good that it's safe, but for many purposes, there's a much simpler way: allocate one extra spot in your array, and just ignore element 0.
No, for example 0-1 is 0xffffffff in 4-byte unsigned int, what if your array is really that big?
32 bit is ok because 0xffffffff exceeds limit, the code breaks when compile in 64 bit if the array is that big.
Just check for oneBasedIndex > 0
Make sure oneBasedIndex is greater than zero...