GCC turning on O2 causes bug when 128-bit integer is subscripted

GCC turning on O2 causes bug when 128-bit integer is subscripted - c++

#include <cstdio>
__int128 idx;
int main() {
int a[2] = {1, 2};
idx++;
a[idx] = 0;
printf("%d %d", a[0], a[1]);
}
After turning on O2 a[idx] = 0 not executed.
I guess it shouldn't be undefined behavior.
Is this a bug in the compiler?
https://godbolt.org/z/qqccd9oEj

Looking at the compiler output for gcc-12.1 -std=c++20 -O2 -W -Wall
.LC0:
.string "%d %d"
main:
sub rsp, 8
mov edx, 2
add QWORD PTR idx[rip], 1
mov esi, 1
adc QWORD PTR idx[rip+8], 0
mov edi, OFFSET FLAT:.LC0
xor eax, eax
call printf
xor eax, eax
add rsp, 8
ret
idx:
.zero 16
The problem is mov edx, 2. That is just wrong, it should read a[1] and optimize that to 0 not 2.
clang gets it right but still generates horrible code. idx should get optimized out.
You should file that as compiler bug.

Related

Found error in GNU Compiler. Later version?

I have found that this code causes a startling error in the gnu C++ compiler when it is optimizing.
#include <stdio.h>
int main()
{
int a = 333666999, b = 0;
for (short i = 0; i<7; ++i)
{
b += a;
printf("%d ", b);
}
return 9;
}
To compile using g++ -Os fail.cpp the executable does not print seven numbers, it goes on forever, printing and printing. I am using -
-rwxr-xr-x 4 root root 700388 Jun 3 2013 /usr/bin/g++
Is there a later corrected version?

The compiler is very, very rarely wrong. In this case, b is overflowing, which is undefined behaviour for signed integers:
$ g++ --version
g++ (GCC) 10.2.0
...
$ g++ -Os -otest test.cpp
test.cpp: In function ‘int main()’:
test.cpp:8:11: warning: iteration 6 invokes undefined behavior [-Waggressive-loop-optimizations]
8 | b += a;
| ~~^~~~
test.cpp:6:24: note: within this loop
6 | for (short i = 0; i<7; ++i)
| ~^~
And if you invoke undefined behaviour, the compiler is free to do whatever it likes, including making your program never terminate.
Edit: Some people seem to think that the UB should only affect the value of b, but not the loop iteration. This is not according to the Standard (UB can cause literally anything to happen) but it's a reasonable thought, so let's look at the generated assembly to see why the loop doesn't terminate.
First without -Os:
.LC0:
.string "%d "
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-12], 333666999
mov DWORD PTR [rbp-4], 0
mov WORD PTR [rbp-6], 0
.L3:
cmp WORD PTR [rbp-6], 6 # Compare i to 6
jg .L2 # If greater, jump to end
mov eax, DWORD PTR [rbp-12]
add DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
movzx eax, WORD PTR [rbp-6]
add eax, 1
mov WORD PTR [rbp-6], ax
jmp .L3
.L2:
mov eax, 9
leave
ret
Then with -Os:
.LC0:
.string "%d "
main:
push rbx
xor ebx, ebx
.L2:
add ebx, 333666999
mov edi, OFFSET FLAT:.LC0
xor eax, eax
mov esi, ebx
call printf
jmp .L2
The comparison and jump instructions are completely gone. Ironically, the compiler did exactly what you asked it to do: optimize for size, so remove as many instructions as it can while obeying the C++ standard. -O3 and -O2 generate the exact same code as -Os here.
-O1 generates a very interesting output:
.LC0:
.string "%d "
main:
push rbx
mov ebx, 0
.L2:
add ebx, 333666999
mov esi, ebx
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
cmp ebx, -1959298303
jne .L2
mov eax, 9
pop rbx
ret
Here, the compiler optimized away the loop counter i and just compares the value of b to its final value after 7 iterations, using the fact that signed overflow happens according to two's complement on this platform! Cheeky, isn't it? :)

I am using g++ version 4.8.1. Thomas has version 10.2.0 which evidently puts out a warning about "undefined behavior" when adding two signed integers. However, being only a warning still goes ahead and compiles the program. In all circumstances though, the "undefined behavior" should only be concerning the integers being added. In practice those integers in fact do abide by the 2's complement expected result. The "undefined behavior" should not overwrite other variables in the program. Otherwise the executable cannot be trusted at all. And if it cannot be trusted it shouldn't be compiled. Perhaps there is an even later version of the gnu compiler that works correctly when optimizing?

Why does my program not check the value of a bitfield member even though there is an "if" statement?

I wrote this program as a test case for the behavior of bit field member comparisons in C++ (I suppose the same behavior would be exhibited in C as well):
#include <cstdint>
#include <cstdio>
union Foo
{
int8_t bar;
struct
{
#if __BYTE_ORDER == __LITTLE_ENDIAN
int8_t baz : 1;
int8_t quux : 7;
#elif __BYTE_ORDER == __BIG_ENDIAN
int8_t quux : 7;
int8_t baz : 1;
#endif
};
};
int main()
{
Foo foo;
scanf("%d", &foo.bar);
if (foo.baz == 1)
printf("foo.baz == 1\n");
else
printf("foo.baz != 1\n");
}
After I compile and run it with 1 as its input, I get the following output:
foo.baz != 1
*** stack smashing detected ***: terminated
fish: “./a.out” terminated by signal SIGABRT (Abort)
One would expect that the foo.baz == 1 check would be evaluated as true since baz is always the least significant bit in the anonymous bit field. However, the opposite seems to happen, as can be seen from the program output (which is, somewhat comfortingly, consistently the same across each program invocation).
Even more weird to me is the fact that the generated AMD64 assembly code for the program (using the GCC 10.2 compiler) does not contain even a single comparison or jump instruction!
.LC0:
.string "%d"
.LC1:
.string "foo.baz != 1"
main:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-1]
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call scanf
mov edi, OFFSET FLAT:.LC1
call puts
mov eax, 0
leave
ret
It seems that the C++ code for the if statement somehow gets optimized out (or something like that), even though I compiled the program with the default settings (i.e. I did not turn on any level of optimization or anything like that).
Interestingly enough, Clang 10.0.1 (when run without optimizations) seems to generate code with a cmp instruction (as well as a jne and a jmp one):
main: # #main
push rbp
mov rbp, rsp
sub rsp, 16
mov dword ptr [rbp - 4], 0
lea rax, [rbp - 8]
movabs rdi, offset .L.str
mov rsi, rax
mov al, 0
call scanf
mov cl, byte ptr [rbp - 8]
shl cl, 7
sar cl, 7
movsx edx, cl
cmp edx, 1
jne .LBB0_2
movabs rdi, offset .L.str.1
mov al, 0
call printf
jmp .LBB0_3
.LBB0_2:
movabs rdi, offset .L.str.2
mov al, 0
call printf
.LBB0_3:
mov eax, dword ptr [rbp - 4]
add rsp, 16
pop rbp
ret
.L.str:
.asciz "%d"
.L.str.1:
.asciz "foo.baz == 1\n"
.L.str.2:
.asciz "foo.baz != 1\n"
Both of the printf strings also seem to be present in the data segment (unlike in the GCC case when only the second one is present). I cannot tell for sure (because I'm not very proficient in assembly) but this seems to be properly generated code (unlike the one which GCC generates).
However, as soon as I try compile with any kind of optimizations (even -O1) using Clang, the comparisons/jumps are gone (as well as the foo.baz == 1 string), and the generated code seems to be very similar to the one which GCC generates:
(with -O1)
main: # #main
push rax
mov rsi, rsp
mov edi, offset .L.str
xor eax, eax
call scanf
mov edi, offset .Lstr
call puts
xor eax, eax
pop rcx
ret
.L.str:
.asciz "%d"
.Lstr:
.asciz "foo.baz != 1"
(You may want to check the generated assembly code by different compiler versions yourself using Compiler Explorer.)
I'm totally perplexed by this kind of unintuitive behavior. The only thing which comes to mind as an explanation is the interaction of some weird undefined behavior of bitfields containing signed integral types and unions. What makes me think so is that after I replace the signed integer types with their unsigned counterparts, the output of the program becomes exactly as one would expect (with 1 as input):
foo.baz == 1
*** stack smashing detected ***: terminated
fish: “./a.out” terminated by signal SIGABRT (Abort)
Naturally, the program crashing because of a stack smashing (just like before) is something which is not supposed to happen, which leads to my second question: why does this occur?
Here's the modified program:
#include <cstdint>
#include <cstdio>
union Foo
{
uint8_t bar;
struct
{
#if __BYTE_ORDER == __LITTLE_ENDIAN
uint8_t baz : 1;
uint8_t quux : 7;
#elif __BYTE_ORDER == __BIG_ENDIAN
uint8_t quux : 7;
uint8_t baz : 1;
#endif
};
};
int main()
{
Foo foo;
scanf("%d", &foo.bar);
if (foo.baz == 1)
printf("foo.baz == 1\n");
else
printf("foo.baz != 1\n");
}
... and the generated assembly code by GCC:
.LC0:
.string "%d"
.LC1:
.string "foo.baz == 1"
.LC2:
.string "foo.baz != 1"
main:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-1]
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call scanf
movzx eax, BYTE PTR [rbp-1]
and eax, 1
test al, al
je .L2
mov edi, OFFSET FLAT:.LC1
call puts
jmp .L3
.L2:
mov edi, OFFSET FLAT:.LC2
call puts
.L3:
mov eax, 0
leave
ret

The stack smashing has nothing to do with member access.
scanf("%d", &foo.bar);
The %d format conversion specifier is for an int. Which is, typically, 4 bytes. But your bar is:
int8_t bar;
just one byte.
So, scanf ends up writing a 4 bytes worth of an int value into a one byte bar, and clobbering three additional bytes in the immediate vicinity.
There's your stack smash.

The answer is trivial.
your baz struct member is 1 bit long and it is signed. So it will never be 1. The only possibe values are 0 and -1.
Compiler knows that so the condition foo.baz == 1 will never be the truth. No conditional code has to be generated.
So I afraid it is not the compiler bug, only the programmer bug :)
So if we change the code to:
int main()
{
union Foo foo;
int x;
scanf("%d", &x);
foo.bar = x;
if (foo.baz == -1)
printf("foo.baz == -1\n");
else
printf("foo.baz != -1\n");
}
Compiler starts to generate the conditional instructions.
https://godbolt.org/z/fzKMo5
BTW your endianess check does not make any sense here as endianess defines the byte order not the bit order
Not related to the code generation problem is use of the wrong scanf conversion specifier.

allocating memory on stack is bigger when disassembling code

I have the following code. I expected the size of the stack in main function to be 8 bytes on 64 bit system, but when disassembling I see strange thing: it is 16. I am using https://godbolt.org/ x86-64 GCC 9.3. So my question is why?
#include <memory>
struct my_struct {
char a[10];
int b;
char c;
short d;
};
int main() {
struct my_struct* s = (struct my_struct*)malloc(sizeof(struct my_struct));
printf("%lu\n", sizeof(s));
return 0;
}
.LC0:
.string "%lu\n"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov edi, 20
call malloc
mov QWORD PTR [rbp-8], rax
mov esi, 8
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret

Why does clang not always produce constant value for same static constexpr

static constexpr int
count_x(const char * str)
{
int count{};
for (;*str != 0; ++str) {
count += *str == 'x';
}
return count;
}
#define STRx1 "123456789x"
#define STRx4 STRx1 STRx1 STRx1 STRx1
#define STRx8 STRx4 STRx4
#define STRx16 STRx8 STRx8
int test1() { return count_x(STRx4); }
int test2() { return count_x(STRx8); }
int test3() { return count_x(STRx16); }
int test4() { constexpr auto k = count_x(STRx16); return k; }
Given the code above, clang produces a constant value for test1, test2 and test4. Why doesn't it for test3?
test1(): # #test1()
mov eax, 4
ret
test2(): # #test2()
mov eax, 8
ret
test3(): # #test3()
xor eax, eax
mov dl, 49
mov ecx, offset .L.str.2+1
.LBB2_1: # =>This Inner Loop Header: Depth=1
xor esi, esi
cmp dl, 120
sete sil
add eax, esi
movzx edx, byte ptr [rcx]
add rcx, 1
test dl, dl
jne .LBB2_1
ret
test4(): # #test4()
mov eax, 16
ret
.L.str.2:
.asciz "123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x"
gcc does:
test1():
mov eax, 4
ret
test2():
mov eax, 8
ret
test3():
mov eax, 16
ret
test4():
mov eax, 16
ret
Compilation command lines used:
clang++ -Ofast -std=c++2a -S -o - -c src/test.cpp | grep -Ev $'^\t+\\.'
gcc9 -Ofast -std=c++2a -S -o - -c src/test.cpp | grep -Ev $'^\t+\\.'
Compiler Explorer:
https://godbolt.org/z/V-3MEp

As mentioned by Tharwen's comment there is *by default *a of limit of 100 characters (or iterations).
This limit can be changed by using a really hidden option:
--scalar-evolution-max-iterations
Maximum number of iterations SCEV will symbolically execute a constant derived loop (MaxBruteForceIterations: Source)
Changing the command to:
clang++ -Ofast -std=c++2a -S -o - -c src/test.cpp -mllvm --scalar-evolution-max-iterations=1000
produces:
test1(): # #test1()
mov eax, 4
ret
test2(): # #test2()
mov eax, 8
ret
test3(): # #test3()
mov eax, 16
ret
test4(): # #test4()
mov eax, 16
ret
View on Compiler Explorer
This limit and the option being marked really hidden, maybe there for a good reason. So I wouldn't go changing it and using it in production without suitable knowledge about it's effects (If I find some I'll update this answer).
For reference, here in the LLVM source code (ScalarEvolution::computeLoadConstantCompareExitLimit), is where the limit is used for test3.

Constexpr functions have to be called at compile time only in expression expecting compile time value.
You have to use:
int test4() {
constexpr auto res = count_x(STRx16);
return res;
}
to force evaluation at compile time.
Else you mostly rely on regular optimization of the compiler, which give expected optimization for test1 and test2 but not for test3.

Escape Analysis with std::vector in C++

I am wondering there is any optimization option in Clang or GCC for escape analysis on std::vector in C++.
Since std::vector<int> in the example below does not require the actual data of v to be allocated in the heap or stack. Compiler can actually allocate v.data() on stack for better performance.
Assume that Clang/GCC does not do escape analysis, is there any particular motivation to not to use escape analysis?
Assume that Clang/GCC does escape analysis, why value of v.data() and &x so different?
#include<cstdio>
#include<vector>
int main() {
int x = 0;
std::vector<int> v(3, 0);
std::printf("&x: %p\n", &x);
//std::printf("&v: %p\n", &v); // we intentionally don't print the pointer to v here.
std::printf("v.data(): %p\n", v.data());
return x + v[0]; // we want compiler not to optimize everything out
}
Expected result
&x: <some address>
v.data(): <some address> + 4
Actual result from Clang and GCC
[*****#localhost test]$ g++ test.cc -O3
[khanh#localhost test]$ ./a.out
&x: 0x7ffe2af5a59c
v.data(): 0xadde70
[*****#localhost test]$ clang++ test.cc -O3
[*****#localhost test]$ ./a.out
&x: 0x7fff66ce1ab4
v.data(): 0xfeee70
Thanks!

There exists escape analysis on Clang compiler.
Sample code: from #geza https://godbolt.org/z/N1GLUI
int fn(int a, int b, int c) {
int *t = new int[3];
t[0] = a;
t[1] = b;
t[2] = c;
int r = t[0]+t[1]+t[2];
delete[] t;
return r;
}
GCC
fn(int, int, int):
push r12
mov r12d, edx
push rbp
mov ebp, esi
push rbx
mov ebx, edi
mov edi, 12
call operator new[](unsigned long)
mov DWORD PTR [rax], ebx
add ebx, ebp
mov rdi, rax
mov DWORD PTR [rax+4], ebp
mov DWORD PTR [rax+8], r12d
add r12d, ebx
call operator delete[](void*)
mov eax, r12d
pop rbx
pop rbp
pop r12
ret
Clang
fn(int, int, int): # #fn(int, int, int)
lea eax, [rdi + rsi]
add eax, edx
ret

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

GCC turning on O2 causes bug when 128-bit integer is subscripted - c++

#include <cstdio> __int128 idx; int main() { int a[2] = {1, 2}; idx++; a[idx] = 0; printf("%d %d", a[0], a[1]); } After turning on O2 a[idx] = 0 not executed. I guess it shouldn't be undefined behavior. Is this a bug in the compiler? https://godbolt.org/z/qqccd9oEj

Related

Found error in GNU Compiler. Later version?

Why does my program not check the value of a bitfield member even though there is an "if" statement?

allocating memory on stack is bigger when disassembling code

Why does clang not always produce constant value for same static constexpr

Escape Analysis with std::vector in C++

Categories

Resources