C++ long long issues - c++

I've been using signed long longs and have had weird issues with it - i.e. inconsistent behavior. I.e.
long long i;
printf("%d", i);
This tends to print values which have no relevance to the actual value of i (this also occured with cout).
It also has random behavior with %, i.e.
if(i % x == 0)
//some code
This would sometimes run i.e. if i = 15 and x = 5 it just wouldn't return true and therefore the if statement would not run the code.
It would tend to return true on x = 7 for some reason.
I believe that it may be a fault with the compiler which I believe was just the g++ compiler (it was at a competition).
Any ways to mitigate this or why it was doing this would be greatly appreciated.

To print the various integer types using printf-style syntax requires horrible syntax - I suggest using the C++ type-safe iostreams instead.

Related

Can (a==1 && a==2 && a==3) ever evaluate to true in C or C++?

We know it can in Java and JavaScript.
But the question is, can the condition below ever evaluate to true in C or C++?
if(a==1 && a==2 && a==3)
printf("SUCCESS");
EDIT
If a was an integer.
Depends on your definition of "a is an integer":
int a__(){ static int r; return ++r; }
#define a a__() //a is now an expression of type `int`
int main()
{
return a==1 && a==2 && a==3; //returns 1
}
Of course:
int f(int b) { return b==1&&b==2&&b==3; }
will always return 0; and optimizers will generally replace the check with exactly that.
If we put macro magic aside, I can see one way that could positively answer this question. But it will require a bit more than just standard C. Let's assume we have an extension allowing to use the __attribute__((at(ADDRESS))); attribute, which is placing a variable at some specific memory location (available in some ARM compilers for example, like ARM GCC). Lets assume we have a hardware counter register at the address ADDRESS, which is incrementing each read. Then we could do something like this:
volatile int a __attribute__((at(ADDRESS)));
The volatile is forcing the compiler to generate the register read each time the comparison is performed, so the counter will increment 3 times. If the initial value of the counter is 1, the statement will return true.
P.S. If you don't like the at attribute, same effect can be achieved using linker script by placing a into specific memory section.
If a is of a primitive type (i.e all == and && operators are built in) and you are in defined behavior, and there's no way for another thread to modify a in the middle of execution (this is technically a case of undefined behavior - see comments - but I left it here anyway because it's the example given in the Java question), and there is no preprocessor magic involved (see chosen answer), then I don't believe there is anything way for this to evaluate to true. However, as you can see by that list of conditions, there are many scenarios in which that expression could evaluate to true, depending on the types used and the context of the code.
In C, yes it can. If a is uninitialised then (even if there is no UB, as discussed here), its value is indeterminate, reading it gives indeterminate results, and comparing it to other numbers therefore also gives indeterminate results.
As a direct consequence, a could compare true with 1 in one moment, then compare true with 2 instead the next moment. It can't hold both those values simultaneously, but it doesn't matter, because its value is indeterminate.
In practice I'd be surprised to see the behaviour you describe, though, because there's no real reason for the actual storage to change in memory in the time between the two comparisons.
In C++, sort of. The above is still true there, but reading an indeterminate value is always an undefined operation in C++ so really all bets are off.
Optimisations are allowed to aggressively bastardise your code, and when you do undefined things this can quite easily result in all manner of chaos.
So I'd be less surprised to see this result in C++ than in C but, if I did, it would be an observation without purpose or meaning because a program with undefined behaviour should be entirely ignored anyway.
And, naturally, in both languages there are "tricks" you can do, like #define a (x++), though these do not seem to be in the spirit of your question.
The following program randomly prints seen: yes or seen: no, depending on whether at some point in the execution of the main thread (a == 0 && a == 1 && a == 2) evaluated to true.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
_Atomic int a = 0;
_Atomic int relse = 0;
void *writer(void *arg)
{
++relse;
while (relse != 2);
for (int i = 100; i > 0; --i)
{
a = 0;
a = 1;
a = 2;
}
return NULL;
}
int main(void)
{
int seen = 0;
pthread_t pt;
if (pthread_create(&pt, NULL, writer, NULL)) exit(EXIT_FAILURE);
++relse;
while (relse != 2);
for (int i = 100; i > 0; --i)
seen |= (a == 0 && a == 1 && a == 2);
printf("seen: %s\n", seen ? "yes":"no");
pthread_join(pt, NULL);
return 0;
}
As far as I am aware, this does not contain undefined behavior at any point, and a is of an integer type, as required by the question.
Obviously this is a race condition, and so whether seen: yes or seen: no is printed depends on the platform the program is run on. On Linux, x86_64, gcc 8.2.1 both answers appear regularly. If it doesn't work, try increasing the loop counters.

Following code crashes in x64 bit but not x86 build

I've always been using a xor encryption class for my 32 bit applications but recently I have started working on a 64 bit one and encountered the following crash: https://i.stack.imgur.com/jCBlJ.png
Here's the xor class I'm using:
// xor.h
#pragma once
template <int XORSTART, int BUFLEN, int XREFKILLER>
class XorStr
{
private:
XorStr();
public:
char s[BUFLEN];
XorStr(const char* xs);
~XorStr()
{
for (int i = 0; i < BUFLEN; i++) s[i] = 0;
}
};
template <int XORSTART, int BUFLEN, int XREFKILLER>
XorStr<XORSTART, BUFLEN, XREFKILLER>::XorStr(const char* xs)
{
int xvalue = XORSTART;
int i = 0;
for (; i < (BUFLEN - 1); i++)
{
s[i] = xs[i - XREFKILLER] ^ xvalue;
xvalue += 1;
xvalue %= 256;
}
s[BUFLEN - 1] = (2 * 2 - 3) - 1;
}
The crash occurs when I try to use the obfuscated string but doesnt necessarily happen 100% of the times (never happens on 32 bit, however). Here's a small example of a 64 bit app that will crash on the second obfuscated string:
#include <iostream>
#include "xor.h"
int main()
{
// no crash
printf(/*123456789*/XorStr<0xDE, 10, 0x017A5298>("\xEF\xED\xD3\xD5\xD7\xD5\xD3\xDD\xDF" + 0x017A5298).s);
// crash
printf(/*123456*/XorStr<0xE3, 7, 0x87E64A05>("\xD2\xD6\xD6\xD2\xD2\xDE" + 0x87E64A05).s);
return 0;
}
The same app will run perfectly fine if built in 32 bit.
Here's the HTML script to generate the obfuscated strings: https://pastebin.com/QsZxRYSH
I need to tweak this class to work on 64 bit because I have a lot of strings that I already have encrypted that I need to import from a 32 bit project into the one I'm working on at the moment, which is 64 bit. Any help is appreciated!
The access violation is because 0x87E64A05 is larger than the largest value a signed 32bit integer can hold (which is 0x7FFFFFFF).
Because int is likely 32bit, then XREFKILLER cannot hold 0x87E64A05 and so its value will be implementation-defined.
This value is then used later to subtract again from xs after the pointer passed was artificially advanced by the literal 0x87E64A05 which would be interpreted as long or long long to make the value fit, depending on whether long is 32bit or larger and therefore wouldn't narrowing into the implementation defined value.
Therefore you are effectively left with some random pointer in xs[i - XREFKILLER] and this is likely to give undefined behavior, e.g. an access violation.
If compiled for 32bit x86 it probably so happens that int and pointers have the same bit-size and that the implementation-defined over-/underflow and narrowing behaviors happen to be such that the addition and subtraction cancel correctly as expected. If however the pointer type is larger than 32bit this cannot function.
There is no point to XREFKILLER at all. It just does one calculation that is immediately reverted (if there is no over-/underflow).
Note that the fact that the compiler accepts the narrowing in the template argument at all is a bug. Your program is ill-formed and the compiler should give you an error message.
In GCC for example this bug persists until version 8.2, but has been fixed on current trunk (i.e. version 9).
You will have similar problems with XORSTART if char happens to be signed on your platform, because then your provided values wont fit into it. But in that case you will have to enable warnings, because that won't be a conversion making the program ill-formed. Also the behavior of ^ may not be as you expect if char is signed on your system.
It is not clear what the point of
s[BUFLEN - 1] = (2 * 2 - 3) - 1;
is. It should be:
s[BUFLEN - 1] = '\0';
Passing the resulting string to printf as first argument will lead to spurious undefined behavior if the result string happens to contain a % which would be interpreted as introduction to a format specifier. Use std::cout.
If you want to use printf you need to write std::printf and #include<cstdio> to guarantee that it will be available. However, since this is C++, you should be using std::cout instead anyway.
More fundamentally your output string may happen to contain a 0 other than the terminating one after your transformation. This would be interpreted as end of the C-style string. This seems like a major design flaw and you probably want to use std::string instead for that reason (and because it is better style).

comparison between signed and unsigned integer expressions and 0x80000000

I have the following code:
#include <iostream>
using namespace std;
int main()
{
int a = 0x80000000;
if(a == 0x80000000)
a = 42;
cout << "Hello World! :: " << a << endl;
return 0;
}
The output is
Hello World! :: 42
so the comparison works. But the compiler tells me
g++ -c -pipe -g -Wall -W -fPIE -I../untitled -I. -I../bin/Qt/5.4/gcc_64/mkspecs/linux-g++ -o main.o ../untitled/main.cpp
../untitled/main.cpp: In function 'int main()':
../untitled/main.cpp:8:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if(a == 0x80000000)
^
So the question is: Why is 0x80000000 an unsigned int? Can I make it signed somehow to get rid of the warning?
As far as I understand, 0x80000000 would be INT_MIN as it's out of range for positive a integer. but why is the compiler assuming, that I want a positive number?
I'm compiling with gcc version 4.8.1 20130909 on linux.
0x80000000 is an unsigned int because the value is too big to fit in an int and you did not add any L to specify it was a long.
The warning is issued because unsigned in C/C++ has a quite weird semantic and therefore it's very easy to make mistakes in code by mixing up signed and unsigned integers. This mixing is often a source of bugs especially because the standard library, by historical accident, chose to use an unsigned value for the size of containers (size_t).
An example I often use to show how subtle is the problem consider
// Draw connecting lines between the dots
for (int i=0; i<pts.size()-1; i++) {
draw_line(pts[i], pts[i+1]);
}
This code seems fine but has a bug. In case the pts vector is empty pts.size() is 0 but, and here comes the surprising part, pts.size()-1 is a huge nonsense number (today often 4294967295, but depends on the platform) and the loop will use invalid indexes (with undefined behavior).
Here changing the variable to size_t i will remove the warning but is not going to help as the very same bug remains...
The core of the problem is that with unsigned values a < b-1 and a+1 < b are not the same thing even for very commonly used values like zero; this is why using unsigned types for non-negative values like container size is a bad idea and a source of bugs.
Also note that your code is not correct portable C++ on platforms where that value doesn't fit in an integer as the behavior around overflow is defined for unsigned types but not for regular integers. C++ code that relies on what happens when an integer gets past the limits has undefined behavior.
Even if you know what happens on a specific hardware platform note that the compiler/optimizer is allowed to assume that signed integer overflow never happens: for example a test like a < a+1 where a is a regular int can be considered always true by a C++ compiler.
It seems you are confusing 2 different issues: The encoding of something and the meaning of something. Here is an example: You see a number 97. This is a decimal encoding. But the meaning of this number is something completely different. It can denote the ASCII 'a' character, a very hot temperature, a geometrical angle in a triangle, etc. You cannot deduce meaning from encoding. Someone must supply a context to you (like the ASCII map, temperature etc).
Back to your question: 0x80000000 is encoding. While INT_MIN is meaning. There are not interchangeable and not comparable. On a specific hardware in some contexts they might be equal just like 97 and 'a' are equal in the ASCII context.
Compiler warns you about ambiguity in the meaning, not in the encoding. One way to give meaning to a specific encoding is the casting operator. Like (unsigned short)-17 or (student*)ptr;
On a 32 bits system or 64bits with back compatibility int and unsigned int have encoding of 32bits like in 0x80000000 but on 64 bits MIN_INT would not be equal to this number.
Anyway - the answer to your question: in order to remove the warning you must give identical context to both left and right expressions of the comparison.
You can do it in many ways. For example:
(unsigned int)a == (unsigned int)0x80000000 or (__int64)a == (__int64)0x80000000 or even a crazy (char *)a == (char *)0x80000000 or any other way as long as you maintain the following rules:
You don't demote the encoding (do not reduce the amount of bits it requires). Like (char)a == (char)0x80000000 is incorrect because you demote 32 bits into 8 bits
You must give both the left side and the right side of the == operator the same context. Like (char *)a == (unsigned short)0x80000000 is incorrect an will yield an error/warning.
I want to give you another example of how crucial is the difference between encoding and meaning. Look at the code
char a = -7;
bool b = (a==-7) ? true : false;
What is the result of 'b'? The answer will shock you: it is undefined.
Some compilers (typically Microsoft visual studio) will compile a program that b will get true while on Android NDK compilers b will get false.
The reason is that Android NDK treats 'char' type as 'unsigned char', while Visual studio treats 'char' as 'signed char'. So on Android phones the encoding of -7 actually has a meaning of 249 and is not equal to the meaning of (int)-7.
The correct way to fix this problem is to specifically define 'a' as signed char:
signed char a = -7;
bool b = (a==-7) ? true : false;
0x80000000 is considered unsigned per default.
You can avoid the warning like this:
if (a == (int)0x80000000)
a=42;
Edit after a comment:
Another (perhaps better) way would be
if ((unsigned)a == 0x80000000)
a=42;

How to catch undefined behaviour without executing it?

In my software I am using the input values from the user at run time and performing some mathematical operations. Consider for simplicity below example:
int multiply(const int a, const int b)
{
if(a >= INT_MAX || B >= INT_MAX)
return 0;
else
return a*b;
}
I can check if the input values are greater than the limits, but how do I check if the result will be out of limits? It is quite possible that a = INT_MAX - 1 and b = 2. Since the inputs are perfectly valid, it will execute the undefined code which makes my program meaningless. This means any code executed after this will be random and eventually may result in crash. So how do I protect my program in such cases?
This really comes down to what you actually want to do in this case.
For a machine where long or long long (or int64_t) is a 64-bit value, and int is a 32-bit value, you could do (I'm assuming long is 64 bit here):
long x = static_cast<long>(a) * b;
if (x > MAX_INT || x < MIN_INT)
return 0;
else
return static_cast<int>(x);
By casting one value to long, the other will have to be converted as well. You can cast both if that makes you happier. The overhead here, above a normal 32-bit multiply is a couple of clock-cycles on modern CPU's, and it's unlikely that you can find a safer solution, that is also faster. [You can, in some compilers, add attributes to the if saying that it's unlikely to encourage branch prediction "to get it right" for the common case of returning x]
Obviously, this won't work for values where the type is as big as the biggest integer you can deal with (although you could possibly use floating point, but it may still be a bit dodgy, since the precision of float is not sufficient - could be done using some "safety margin" tho' [e.g. compare to less than LONG_INT_MAX / 2], if you don't need the entire range of integers.). Penalty here is a bit worse tho', especially transitions between float and integer isn't "pleasant".
Another alternative is to actually test the relevant code, with "known invalid values", and as long as the rest of the code is "ok" with it. Make sure you test this with the relevant compiler settings, as changing the compiler options will change the behaviour. Note that your code then has to deal with "what do we do when 65536 * 100000 is a negative number", and your code didn't expect so. Perhaps add something like:
int x = a * b;
if (x < 0) return 0;
[But this only works if you don't expect negative results, of course]
You could also inspect the assembly code generated and understand the architecture of the actual processor [the key here is to understand if "overflow will trap" - which it won't by default in x86, ARM, 68K, 29K. I think MIPS has an option of "trap on overflow"], and determine whether it's likely to cause a problem [1], and add something like
#if (defined(__X86__) || defined(__ARM__))
#error This code needs inspecting for correct behaviour
#endif
return a * b;
One problem with this approach, however, is that even the slightest changes in code, or compiler version may alter the outcome, so it's important to couple this with the testing approach above (and make sure you test the ACTUAL production code, not some hacked up mini-example).
[1] The "undefined behaviour" is undefined to allow C to "work" on processors that have trapping overflows of integer math, as well as the fact that that a * b when it overflows in a signed value is of course hard to determine unless you have a defined math system (two's complement, one's complement, distinct sign bit) - so to avoid "defining" the exact behaviour in these cases, the C standard says "It's undefined". It doesn't mean that it will definitely go bad.
Specifically for the multiplication of a by b the mathematically correct way to detect if it will overflow is to calculate log₂ of both values. If their sum is higher than the log₂ of the highest representable value of the result, then there is overflow.
log₂(a) + log₂(b) < log₂(UINT_MAX)
The difficulty is to calculate quickly the log₂ of an integer. For that, there are several bit twiddling hacks that can be used, like counting bit, counting leading zeros (some processors even have instructions for that). This site has several implementations
https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
The simplest implementation could be:
unsigned int log2(unsigned int v)
{
unsigned int r = 0;
while (v >>= 1)
r++;
return r;
}
In your program you only need to check then
if(log2(a) + log2(b) < MYLOG2UINTMAX)
return a*b;
else
printf("Overflow");
The signed case is similar but has to take care of the negative case specifically.
EDIT: My solution is not complete and has an error which makes the test more severe than necessary. The equation works in reality if the log₂ function returns a floating point value. In the implementation I limited thevalue to unsigned integers. This means that completely valid multiplication get refused. Why? Because log2(UINT_MAX) is truncated
log₂(UINT_MAX)=log₂(4294967295)≈31.9999999997 truncated to 31.
We have there for to change the implementation to replace the constant to compare to
#define MYLOG2UINTMAX (CHAR_BIT*sizeof (unsigned int))
You may try this:
if ( b > ULONG_MAX / a ) // Need to check a != 0 before this division
return 0; //a*b invoke UB
else
return a*b;

Is dividing by zero accompanied with a runtime error ever useful in C++?

According to C++ Standard (5/5) dividing by zero is undefined behavior. Now consider this code (lots of useless statements are there to prevent the compiler from optimizing code out):
int main()
{
char buffer[1] = {};
int len = strlen( buffer );
if( len / 0 ) {
rand();
}
}
Visual C++ compiles the if-statement like this:
sub eax,edx
cdq
xor ecx,ecx
idiv eax,ecx
test eax,eax
je wmain+2Ah (40102Ah)
call rand
Clearly the compiler sees that the code is to divide by zero - it uses xor x,x pattern to zero out ecx which then serves the second operand in integer division. This code will definitely trigger an "integer division by zero" error at runtime.
IMO such cases (when the compiler knows that the code will divide by zero at all times) are worth a compile-time error - the Standard doesn't prohibit that. That would help diagnose such cases at compile time instead of at runtime.
However I talked to several other developers and they seem to disagree - their objection is "what if the author wanted to divide by zero to... emm... test error handling?"
Intentionally dividing by zero without compiler awareness is not that hard - using __declspec(noinline) Visual C++ specific function decorator:
__declspec(noinline)
void divide( int what, int byWhat )
{
if( what/byWhat ) {
rand();
}
}
void divideByZero()
{
divide( 0, 0 );
}
which is much more readable and maintainable. One can use that function when he "needs to test error handling" and have a nice compile-time error in all other cases.
Am I missing something? Is it necessary to allow emission of code that the compiler knows divides by zero?
There is probably code out there which has accidental division by zero in functions which are never called (e.g. because of some platform-specific macro expansion), and these would no longer compile with your compiler, making your compiler less useful.
Also, most division by zero errors that I've seen in real code are input-dependent, or at least are not really amenable to static analysis. Maybe it's not worth the effort of performing the check.
Dividing by 0 is undefined behavior because it might trigger, on certain platforms, a hardware exception. We could all wish for a better behaved hardware, but since nobody ever saw fit to have integers with -INF/+INF and NaN values, it's quite pointeless.
Now, because it's undefined behavior, interesting things may happen. I encourage you to read Chris Lattner's articles on undefined behavior and optimizations, I'll just give a quick example here:
int foo(char* buf, int i) {
if (5 / i == 3) {
return 1;
}
if (buf != buf + i) {
return 2;
}
return 0;
}
Because i is used as a divisor, then it is not 0. Therefore, the second if is trivially true and can be optimized away.
In the face of such transformations, anyone hoping for a sane behavior of a division by 0... will be harshly disappointed.
In the case of integral types (int, short, long, etc.) I can't think of any uses for intentional divide by zero offhand.
However, for floating point types on IEEE-compliant hardware, explicit divide by zero is tremendously useful. You can use it to produce positive & negative infinity (+/- 1/0), and not a number (NaN, 0/0) values, which can be quite helpful.
In the case of sorting algorithms, you can use the infinities as initial values representing greater or less than all possible values.
For data analysis purposes, you can use NaNs to indicate missing or invalid data, which can then be handled gracefully. Matlab, for example, uses explicit NaN values to suppress missing data in plots, etc.
Although you can access these values through macros and std::numeric_limits (in C++), it is useful to be able to create them on your own (and allows you to avoid lots of "special case" code). It also allows implementors of the standard library to avoid resorting to hackery (such as manual assembly of the correct FP bit sequence) to provide these values.
If the compiler detects a division-by-0, there is absolutely nothing wrong with a compiler error. The developers you talked to are wrong - you could apply that logic to every single compile error. There is no point in ever dividing by 0.
Detecting divisions by zero at compile-time is the sort of thing that you'd want to have be a compiler warning. That's definitely a nice idea.
I don't keep no company with Microsoft Visual C++, but G++ 4.2.1 does do such checking. Try compiling:
#include <iostream>
int main() {
int x = 1;
int y = x / 0;
std::cout << y;
return 0;
}
And it will tell you:
test.cpp: In function ‘int main()’:
test.cpp:5: warning: division by zero in ‘x / 0’
But considering it an error is a slippery slope that the savvy know not to spend too much of their spare time climbing. Consider why G++ doesn't have anything to say when I write:
int main() {
while (true) {
}
return 0;
}
Do you think it should compile that, or give an error? Should it always give a warning? If you think it must intervene on all such cases, I eagerly await your copy of the compiler you've written that only compiles programs that guarantee successful termination! :-)