How do usual arithmetic conversions work? - c++

I was running this code in VS2019:
#include<iostream>
#include<string>
#include<typeinfo>
using namespace std;
int main() {
string mystring = "hello world";
for (int j = 0; j < 10; j++) {
if (mystring[j + 1] == 'w') {
cout<<"string contains w letter\n";
}
else {
continue;
}
return 0;
}
}
And I realized that when I run it on Debug mode on an x86 platform, everything is ok, but if I change the platform to x64, the following warning appears:
C26451 Arithmetic overflow: Using operator '+' on a 4-byte value and then casting the result to an 8-byte value. Cast the value to the wider type before calling operator '+' to avoid overflow (io.2).
It seems to be related to Usual arithmetic conversions, such that, if the operands are of different types, a conversion is applied to one of them before calculation. But if they are equal, that still happens?
If I print typeid(j).name() and typeid(1).name(), it prints int for both, so what is the reason for this warning? The warning is fixed if I change the if condition to (mystring[j + static_cast<__int64>(1)] == 'w'). The explanation, I think, should be that the number '1' is not considered of type int on x64, or it is but occupies different bits of memory than the int type on x64.
I would really like to clarify the issue, thanks.

The "C26451" warning is not a standard compiler warning. It's part of the C++ Code Guidelines Checker which is giving you 'recommendations'. For more on this feature, see Microsoft Docs.
In C++ Core Guidelines the specific recommendation the checker is using here is: ES.103: Don't overflow.
The reason this only happens in x64 is because size_t is 64-bits while int is 32-bits. In x86, both int and size_t are 32-bits.
The std::string operator[] takes a size_t. The cleanest simplest fix here is:
for (size_t j= 0; j <10; j++)
You could also address this by explicitly promoting the int to size_t before the addition takes place:
if (mystring[size_t(j) + 1] == 'w') {
You could also ignore the warning by adding:
#pragma warning(disable : 26451)
Or you could disable the C++ Core Guidelines Checker.

This is how the std::string [] operator defined. It takes std::size_t.
char& operator[] (size_t pos);
const char& operator[] (size_t pos) const;
That's where the casting is taking place: int -> size_t.
https://www.learncpp.com/cpp-tutorial/fixed-width-integers-and-size-t/
edit: See Chuck Walbourn answer

It seems to be related to Usual arithmetic conversions, such that, if the operands are of different types, a conversion is applied to one of them before calculation. But if they are equal, that still happens?
Your conclusion is incorrect.
VS thinks that j + 1 has the potential to overflow. It recommends that you perform the arithmetic operation, +, on a wider integral type to reduce the chances of overflow. Note that static_cast<std::size_t>(j) + 1 could, in theory, still overflow but VS does not care about that.
You don't get the warning in x86 mode since the size of std::size_t and int are same on that platform.

Related

Using `size_t` for lengths impacts on compiler optimizations?

While reading this question, I've seen the first comment saying that:
size_t for length is not a great idea, the proper types are signed ones for optimization/UB reasons.
followed by another comment supporting the reasoning. Is it true?
The question is important, because if I were to write e.g. a matrix library, the image dimensions could be size_t, just to avoid checking if they are negative. But then all loops would naturally use size_t. Could this impact on optimization?
size_t being unsigned is mostly an historical accident - if your world is 16 bit, going from 32767 to 65535 maximum object size is a big win; in current-day mainstream computing (where 64 and 32 bit are the norm) the fact that size_t is unsigned is mostly a nuisance.
Although unsigned types have less undefined behavior (as wraparound is guaranteed), the fact that they have mostly "bitfield" semantics is often cause of bugs and other bad surprises; in particular:
difference between unsigned values is unsigned as well, with the usual wraparound semantics, so if you may expect a negative value you have to cast beforehand;
unsigned a = 10, b = 20;
// prints UINT_MAX-10, i.e. 4294967286 if unsigned is 32 bit
std::cout << a-b << "\n";
more in general, in signed/unsigned comparisons and mathematical operations unsigned wins (so the signed value is casted to unsigned implicitly) which, again, leads to surprises;
unsigned a = 10;
int b = -2;
if(a < b) std::cout<<"a < b\n"; // prints "a < b"
in common situations (e.g. iterating backwards) the unsigned semantics are often problematic, as you'd like the index to go negative for the boundary condition
// This works fine if T is signed, loops forever if T is unsigned
for(T idx = c.size() - 1; idx >= 0; idx--) {
// ...
}
Also, the fact that an unsigned value cannot assume a negative value is mostly a strawman; you may avoid checking for negative values, but due to implicit signed-unsigned conversions it won't stop any error - you are just shifting the blame. If the user passes a negative value to your library function taking a size_t, it will just become a very big number, which will be just as wrong if not worse.
int sum_arr(int *arr, unsigned len) {
int ret = 0;
for(unsigned i = 0; i < len; ++i) {
ret += arr[i];
}
return ret;
}
// compiles successfully and overflows the array; it len was signed,
// it would just return 0
sum_arr(some_array, -10);
For the optimization part: the advantages of signed types in this regard are overrated; yes, the compiler can assume that overflow will never happen, so it can be extra smart in some situations, but generally this won't be game-changing (as in general wraparound semantics comes "for free" on current day architectures); most importantly, as usual if your profiler finds that a particular zone is a bottleneck you can modify just it to make it go faster (including switching types locally to make the compiler generate better code, if you find it advantageous).
Long story short: I'd go for signed, not for performance reasons, but because the semantics is generally way less surprising/hostile in most common scenarios.
That comment is simply wrong. When working with native pointer-sized operands on any reasonable architectute, there is no difference at the machine level between signed and unsigned offsets, and thus no room for them to have different performance properties.
As you've noted, use of size_t has some nice properties like not having to account for the possibility that a value might be negative (although accounting for it might be as simple as forbidding that in your interface contract). It also ensures that you can handle any size that a caller is requesting using the standard type for sizes/counts, without truncation or bounds checks. On the other hand, it precludes using the same type for index-offsets when the offset might need to be negative, and in some ways makes it difficult to perform certain types of comparisons (you have to write them arranged algebraically so that neither side is negative), but the same issue comes up when using signed types, in that you have to do algebraic rearrangements to ensure that no subexpression can overflow.
Ultimately you should initially always use the type that makes sense semantically to you, rather than trying to choose a type for performance properties. Only if there's a serious measured performance problem that looks like it might be improved by tradeoffs involving choice of types should you consider changing them.
I stand by my comment.
There is a simple way to check this: checking what the compiler generates.
void test1(double* data, size_t size)
{
for(size_t i = 0; i < size; i += 4)
{
data[i] = 0;
data[i+1] = 1;
data[i+2] = 2;
data[i+3] = 3;
}
}
void test2(double* data, int size)
{
for(int i = 0; i < size; i += 4)
{
data[i] = 0;
data[i+1] = 1;
data[i+2] = 2;
data[i+3] = 3;
}
}
So what does the compiler generate? I would expect loop unrolling, SIMD... for something that simple:
Let's check godbolt.
Well, the signed version has unrolling, SIMD, not the unsigned one.
I'm not going to show any benchmark, because in this example, the bottleneck is going to be on memory access, not on CPU computation. But you get the idea.
Second example, just keep the first assignment:
void test1(double* data, size_t size)
{
for(size_t i = 0; i < size; i += 4)
{
data[i] = 0;
}
}
void test2(double* data, int size)
{
for(int i = 0; i < size; i += 4)
{
data[i] = 0;
}
}
As you want gcc
OK, not as impressive as for clang, but it still generates different code.

Following code crashes in x64 bit but not x86 build

I've always been using a xor encryption class for my 32 bit applications but recently I have started working on a 64 bit one and encountered the following crash: https://i.stack.imgur.com/jCBlJ.png
Here's the xor class I'm using:
// xor.h
#pragma once
template <int XORSTART, int BUFLEN, int XREFKILLER>
class XorStr
{
private:
XorStr();
public:
char s[BUFLEN];
XorStr(const char* xs);
~XorStr()
{
for (int i = 0; i < BUFLEN; i++) s[i] = 0;
}
};
template <int XORSTART, int BUFLEN, int XREFKILLER>
XorStr<XORSTART, BUFLEN, XREFKILLER>::XorStr(const char* xs)
{
int xvalue = XORSTART;
int i = 0;
for (; i < (BUFLEN - 1); i++)
{
s[i] = xs[i - XREFKILLER] ^ xvalue;
xvalue += 1;
xvalue %= 256;
}
s[BUFLEN - 1] = (2 * 2 - 3) - 1;
}
The crash occurs when I try to use the obfuscated string but doesnt necessarily happen 100% of the times (never happens on 32 bit, however). Here's a small example of a 64 bit app that will crash on the second obfuscated string:
#include <iostream>
#include "xor.h"
int main()
{
// no crash
printf(/*123456789*/XorStr<0xDE, 10, 0x017A5298>("\xEF\xED\xD3\xD5\xD7\xD5\xD3\xDD\xDF" + 0x017A5298).s);
// crash
printf(/*123456*/XorStr<0xE3, 7, 0x87E64A05>("\xD2\xD6\xD6\xD2\xD2\xDE" + 0x87E64A05).s);
return 0;
}
The same app will run perfectly fine if built in 32 bit.
Here's the HTML script to generate the obfuscated strings: https://pastebin.com/QsZxRYSH
I need to tweak this class to work on 64 bit because I have a lot of strings that I already have encrypted that I need to import from a 32 bit project into the one I'm working on at the moment, which is 64 bit. Any help is appreciated!
The access violation is because 0x87E64A05 is larger than the largest value a signed 32bit integer can hold (which is 0x7FFFFFFF).
Because int is likely 32bit, then XREFKILLER cannot hold 0x87E64A05 and so its value will be implementation-defined.
This value is then used later to subtract again from xs after the pointer passed was artificially advanced by the literal 0x87E64A05 which would be interpreted as long or long long to make the value fit, depending on whether long is 32bit or larger and therefore wouldn't narrowing into the implementation defined value.
Therefore you are effectively left with some random pointer in xs[i - XREFKILLER] and this is likely to give undefined behavior, e.g. an access violation.
If compiled for 32bit x86 it probably so happens that int and pointers have the same bit-size and that the implementation-defined over-/underflow and narrowing behaviors happen to be such that the addition and subtraction cancel correctly as expected. If however the pointer type is larger than 32bit this cannot function.
There is no point to XREFKILLER at all. It just does one calculation that is immediately reverted (if there is no over-/underflow).
Note that the fact that the compiler accepts the narrowing in the template argument at all is a bug. Your program is ill-formed and the compiler should give you an error message.
In GCC for example this bug persists until version 8.2, but has been fixed on current trunk (i.e. version 9).
You will have similar problems with XORSTART if char happens to be signed on your platform, because then your provided values wont fit into it. But in that case you will have to enable warnings, because that won't be a conversion making the program ill-formed. Also the behavior of ^ may not be as you expect if char is signed on your system.
It is not clear what the point of
s[BUFLEN - 1] = (2 * 2 - 3) - 1;
is. It should be:
s[BUFLEN - 1] = '\0';
Passing the resulting string to printf as first argument will lead to spurious undefined behavior if the result string happens to contain a % which would be interpreted as introduction to a format specifier. Use std::cout.
If you want to use printf you need to write std::printf and #include<cstdio> to guarantee that it will be available. However, since this is C++, you should be using std::cout instead anyway.
More fundamentally your output string may happen to contain a 0 other than the terminating one after your transformation. This would be interpreted as end of the C-style string. This seems like a major design flaw and you probably want to use std::string instead for that reason (and because it is better style).

Visual C++ 2010: Why "signed/unsigned mismatch" disappears if i add "const" to one comparand?

I have the following simple C++ code:
#include "stdafx.h"
int main()
{
int a = -10;
unsigned int b = 10;
// Trivial error is placed here on purpose to trigger a warning.
if( a < b ) {
printf( "Error in line above: this will not be printed\n" );
}
return 0;
}
Compiled using Visual Studio 2010 (default C++ console application) it gives warning C4018: '<' : signed/unsigned mismatch" on line 7 as expected (code has a logical error).
But if I change unsigned int b = 10; into const unsigned int b = 10; warning disappears! Are where any known reasons for such behavior? gcc shows warning regardless of const.
Update
I can see from comments that lot of people suggest "it's just got optimized somehow so where is no warning needed". Unfortunately, warning is needed, since my code sample has actual logical error carefully placed to trigger a warning: the print statement will not be called regardless that -10 is actually less than 10. This error is well known and "signed/unsigned warning" is raised exactly to find such errors.
Update
I can also see from comments that lot of people have "found" a signed/unsigned logical error in my code and are explaining it. Where is no need to do so - this error is placed purely to trigger a warning, is trivial (-10 is conveted to (unsigned int)-10 that is 0xFFFFFFFF-10) and question is not about it :).
It is a Visual Studio bug, but let's start by the aspects that are not bugs.
Section 5, Note 9 of the then applicable C++ standard first discusses what to do if the operands are of different bit widths, before proceeding what to do if they are the same but differ in the sign:
...
Otherwise, if the operand that has unsigned integer type has rank
greater than or equal to the rank of the type of the other operand,
the operand with signed integer type shall be converted to the type of
the operand with unsigned integer type.
This is where we learn that the comparison has to operate in unsigned arithmetic. We now need to learn what this means for the value -10.
Section 4.6 tells us:
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2 n where n
is the number of bits used to represent the unsigned type). [Note: In
a two’s complement representation, this conversion is conceptual and
there is no change in the bit pattern (if there is no truncation). —
end note ] 3 If the destination type is signed, the value is unchanged
if it can be represented in the destination type (and bit-field width);
otherwise, the value is implementation-defined.
As you can see, a specific pretty high value (4294967286, or 0xFFFFFFF6, assuming unsigned int is a 32-bit number) is being compared with 10, and so the standard guarantees that printf is really never called.
Now you can believe me that there is no rule in the standard requiring a diagnostic in this case, so the compiler is free not to issue any. (Indeed, some people write -1 with the intention of producing an all-ones bit pattern. Others use int for iterating arrays, which results in signed/unsigned comparisons between size_t and int. Ugly, but guaranteed to compile.)
Now Visual Studio issues some warnings "voluntarily".
This results in a warning already under default settings (level 3):
int a = -10;
unsigned int b = 10;
if( a < b ) // C4018
{
printf( "Error in line above: this will not be printed\n" );
}
The following requires /W4 to get a warning. Notice that the warning was reclassified. It changed from a warning C4018 to a warning C4245. This is apparently by design. A logic error that breaks a comparison nearly always is less dangerous than one that appears to work with positive-positive comparisons but breaks down with positive-negative ones.
const int a = -10;
unsigned int b = 10;
if( a < b ) // C4245
{
printf( "Error in line above: this will not be printed\n" );
}
But your case was yet different:
int a = -10;
const unsigned int b = 10;
if( a < b ) // no warning
{
printf( "Error in line above: this will not be printed\n" );
}
And there is no warning whatsoever. (Well, you should retry with -Wall if you want to be sure.) This is a bug. Microsoft says about it:
Thank you for submitting this feedback. This is a scenario where we
should emit a C4018 warning. Unfortunately, this particular issue is
not a high enough priority to fix in the next release given the
resources that we have available.
Out of curiosity, I checked using Visual Studio 2012 SP1 and the defect is still there - no warning with -Wall.

size_t vs int warning

I am getting following warning always for following type of code.
std::vector v;
for ( int i = 0; i < v.size(); i++) {
}
warning C4267: 'initializing' : conversion from 'size_t' to 'int', possible loss of data
I understand that size() returns size_t, just wanted to know is this safe to ignore this warning or should I make all my loop variable of type size_t
If you might need to hold more than INT_MAX items in your vector, use size_t. In most cases, it doesn't really matter, but I use size_t just to make the warning go away.
Better yet, use iterators:
for( auto it = v.begin(); it != v.end(); ++it )
(If your compiler doesn't support C++11, use std::vector<whatever>::iterator in place of auto)
C++11 also makes choosing the best index type easier (in case you use the index in some computation, not just for subscripting v):
for( decltype(v.size()) i = 0; i < v.size(); ++i )
What is size_t?
size_t corresponds to the integral data type returned by the language operator sizeof and is defined in the header file (among others) as an unsigned integral type.
Is it okay to cast size_t to int?
You could use a cast if you are sure that size is never going to be > than INT_MAX.
If you are trying to write a portable code, it is not safe because,
size_t in 64 bit Unix is 64 bits
size_tin 64 bit Windows is 32 bits
So if you port your code from Unix to WIndows and if above are the enviornments you will lose data.
Suggested Answer
Given the caveat, the suggestion is to make i of unsigned integral type or even better use it as type size_t.
is this safe to ignore this warning or should I make all my loop variable of type size_t
No. You are opening yourself up to a class of integer overflow attacks. If the vector size is greater than MAX_INT (and an attacker has a way of making that happen), your loop will run forever, causing a denial of service possibility.
Technically, std::vector::size returns std::vector::size_type, though.
You should use the right signedness for your loop counter variables. (Really, for most uses, you want unsigned integers rather than signed integers for loops anyway)
The problem is that you're mixing two different data types. On some architectures, size_t is a 32-bit integer, on others it's 64-bit. Your code should properly handle both.
since size() returns a size_t (not int), then that should be the datatype you compare it against.
std::vector v;
for ( size_t i = 0; i < v.size(); i++) {
}
Here's an alternate view from Bjarne Stroustrup:
http://www.stroustrup.com/bs_faq2.html#simple-program
for (int i = 0; i<v.size(); ++i) cout << v[i] << '\n';
Yes, I know that I could declare i to be a vector::size_type
rather than plain int to quiet warnings from some hyper-suspicious
compilers, but in this case,I consider that too pedantic and
distracting.
It's a trade-off. If you're worried that v.size() could go above 2,147,483,647, use size_t. If you're using i inside your loop for more than just looking inside the vector, and you're concerned about subtle signed/unsigned related bugs, use int. In my experience, the latter issue is more prevalent than the former. Your experience may differ.
Also see Why is size_t unsigned?.

MSVC++: Strangeness with unsigned ints and overflow

I've got the following code:
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
string a = "a";
for(unsigned int i=a.length()-1; i+1 >= 1; --i)
{
if(i >= a.length())
{
cerr << (signed int)i << "?" << endl;
return 0;
}
}
}
If I compile in MSVC with full optimizations, the output I get is "-1?". If I compile in Debug mode (no optimizations), I get no output (expected.)
I thought the standard guaranteed that unsigned integers overflowed in a predictable way, so that when i = (unsigned int)(-1), i+1 = 0, and the loop condition i + 1 >= 1 fails. Instead, the test is somehow passing. Is this a compiler bug, or am I doing something undefined somewhere?
I remember having this problem in 2001. I'm amazed it's still there. Yes, this is a compiler bug.
The optimiser is seeing
i + 1 >= 1;
Theoretically, we can optimise this by putting all of the constants on the same side:
i >= (1-1);
Because i is unsigned, it will always be greater than or equal to zero.
See this newsgroup discussion here.
ISO14882:2003, section 5, paragraph 5:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression (5.19), in which case the program is ill-formed.
(Emphasis mine.) So, yes, the behavior is undefined. The standard makes no guarantees of behavior in the case of integer over/underflow.
Edit: The standard seems slightly conflicted on the matter elsewhere.
Section 3.9.1.4 says:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2 n where n is the number of bits in the value representation of that particular size of integer.
But section 4.7.2 and .3 says:
2) If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2 n where n is the number of bits used to represent the unsigned type). [Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). ]
3) If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
(Emphasis mine.)
I'm not certain, but I think you are probably running foul of a bug.
I suspect the trouble is in how the compiler is treating the for control. I could imagine the optimizer doing:
for(unsigned int i=a.length()-1; i+1 >= 1; --i) // As written
for (unsigned int i = a.length()-1; i >= 0; --i) // Noting 1 appears twice
for (unsigned int i = a.length()-1; ; --i) // Because i >= 0 at all times
Whether that is what is happening is another matter, but it might be enough to confuse the optimizer.
You would probably be better off using a more standard loop formulation:
for (unsigned i = a.length()-1; i-- > 0; )
Yup, I just tested this on Visual Studio 2005, it definitely behaves differently in Debug and Release. I wonder if 2008 fixes it.
Interestingly it complained about your implicit cast from size_t (.length's result) to unsigned int, but has no problem generating bad code.