Substracting pointers: Where does this missing level of indirection come from? - c++

I'm having trouble understanding the behavior of the MS VC compiler on this one. This line compiles fine, but the result I get is not what I'd expect at all:
this->Test((char *)&CS2 - (char *)&CS1 == sizeof(void *));
The CS1 and CS2 arguments are declared as follows:
myFunction(tCS1* CS1, tCS2* CS2) {...
tCS1 and tCS2 are structures containing one int and one __int64, resp.
This is meant to check the distance on the stack between my arguments CS1 and CS2, which are both pointers. When I break execution on this line and use the debugger to get the addresses of my two variables, I find that they indeed are 8 bytes away from each other (x64 platform).
However, the result of the comparison is false.
Here is the assembly code generated by the compiler:
mov rax,qword ptr [CS1]
mov rdi,qword ptr [CS2]
sub rdi,rax
(then it does the comparison using the result stored in rdi, and makes the call)
Yes, the compiler is comparing the values of my pointer arguments, rather than their addresses. I'm missing a level of indirection here, where did it go?
Of course I can't reproduce this in a test environment, and I have no clue where to look anymore.
I'm cross-compiling this bit of code on a 32-bits machine to an x64 platform (I have to), that's the only 'odd' thing about it. Any idea, any hint?

The assembly
mov rax,qword ptr [CS1]
mov rdi,qword ptr [CS2]
sub rdi,rax
indicates CS1 and CS2 are not really stack arguments, but rather some global symbols - if I wanted to produce similar results, I'd do something like this:
int* CS1 = NULL, *CS2 = NULL; /* or any other value...*/
#define CS1 *CS1
#define CS2 *CS2
Of course this is ugly code - but have you checked you haven't such things in your code? Also, dynamic linker might play a role in it.
And last but not least: If you attempt to write code like:
void foo()
{
int a;
int b;
printf("%d", &a-&b);
}
You should be aware that this is actually undefined behaviour, as C (and C++) only permits to subtract pointers pointing inside a single object (eg. array).

As #jpalacek and commenters observed this is undefined and the compiler may be taking advantage of it to do whatever it likes. It is pretty strange.
This code "works" on gcc:
#include
int func(int *a, int *b)
{
return (char *)&a - (char *) &b;
}
int main(void)
{
int a, b;
printf("%d", func(&a, &b));
return 0;
}
(gdb) disassemble func
Dump of assembler code for function func:
0x0 80483e4 : push %ebp
0x080483e5 : mov %esp,%ebp
=> 0x080483e7 : lea 0x8(%ebp),%edx
0x080483ea : lea 0xc(%ebp),%eax
0x080483ed : mov %edx,%ecx
0x080483ef : sub %eax,%ecx
0x080483f1 : mov %ecx,%eax
0x080483f3 : pop %ebp
0x080483f4 : ret
End of assembler dump.
and with optimization it just knows their relative addresses:
(edit: answer was truncated here for some reason)
(gdb) disassemble func
Dump of assembler code for function func:
0x08048410 : push %ebp
0x08048411 : mov $0xfffffffc,%eax
0x08048416 : mov %esp,%ebp
0x08048418 : pop %ebp
0x08048419 : ret
End of assembler dump.
The interesting thing is that with -O4 optimization it returns +4 and without it, it returns -4.
Why are you trying to do this anyhow? There's no guarantee in general that the arguments have any memory address: they may be passed in registers.

Related

Where is the const& args stored?

Here is the function definition
const int& test_const_ref(const int& a) {
return a;
}
and calling it from main
int main() {
auto& x = test_const_ref(1);
printf("%d, %p\n", x, &x);
}
output as following
./debug/main
>>> 1, 0x7ffee237285c
and here is the disassembly code of test_const_ref
test_const_ref(int const&):
pushq %rbp
movq %rsp, %rbp
movq %rdi, -0x8(%rbp)
movq -0x8(%rbp), %rax
popq %rbp
retq
The question is: where does the variable x alias or where is the number 1 I passed to function test_const_ref stored ?
The code exhibits undefined behavior - the function test_const_ref returns a reference to a temporary, which lives until the end of the full-expression (the ;), and any dereference of it afterwards accesses a dangling reference.
Appearing to work is a common manifestation of UB. The program is still wrong. With optimization on, for example, Clang 12 -O2 prints: 0.
Note - there's no error in the function test_const_ref itself (apart from a design error). The UB is in main, where the dereference of the dangling int& happens during a call to printf.
Where the temporary int is stored exactly is implementation detail - but in many cases (in a Debug build, when a function isn't inlined), it would be stored on the stack:
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov dword ptr [rbp - 12], 1 # Here the 1 is stored in the stack frame
lea rdi, [rbp - 12]
call test_const_ref(int const&)
mov qword ptr [rbp - 8], rax
mov rax, qword ptr [rbp - 8]
mov esi, dword ptr [rax]
mov rdx, qword ptr [rbp - 8]
movabs rdi, offset .L.str
mov al, 0
call printf
So any subsequent use of the returned reference will access memory at [rbp - 12], that may already have been re-used for other purposes.
Note also that the compiler doesn't actually generate assembly from C++ code; it merely uses the C++ code to understand the intent, and generates another program that produces the intended output. This is known as the as-if rule. In the presence of undefined behavior, the compiler becomes free from this restriction, and may generate any output, rendering the program meaningless.
Good answers are already been given, but wrapperm explained this topic very well in here. It's going to be stored on the stack in most implementations i'm aware of.
1. The function
The language doesn't define where arguments to functions are stored. Different ABIs, for different platforms, define this.
Typically, a function argument, before any optimization, is stored on the stack. A reference is no different in this respect. What's actually stored would be a pointer to the refered-to object. Think of it this way:
const int* test_const_ref(const int* a) {
return a;
}
2. The temporary
If you were to declare a variable int foo; and call test_const_ref(foo), you know that the result would refer to foo. Since you're calling it with a temporary, all bets are off: As #fabian notes in a comment, the language only guarantees the value exist until the end of the assignment statement. Afterwards
In practice, and in your case: A compiler which allocates stack space for the temporary integer 1 will have x refer to that place, and will not use it for something else before x is defined. But if your compiler optimizes that stack allocation away - e.g. passes 1 via a register - then x has nothing to refer to and may hold junk. It might even be undefined behavior (not quite sure about that). If you're lucky, you'll get a compiler warning about it (GodBolt.org).

Coroutine frame seems to be prematurely marked as destroyed when using address sanitizer

I am trying to write a small and simple coroutine library just to get a more solid understanding of C++20 coroutines. It seems to work fine, but when I compile with clang's adress sanitizer, it throws up on me.
I have narrowed down the issue to the following code example (available with compiler and sanitizer output at https://godbolt.org/z/WqY6Gd), but I still can't make any sense of it.
// namespace coro = std::/std::experimental;
// inlining this suppresses the error
__attribute__((noinline)) void foo(int& i) { i = 0; }
struct task {
struct promise_type {
promise_type() = default;
coro::suspend_always initial_suspend() noexcept { return {}; }
coro::suspend_always final_suspend() noexcept { return {}; }
void unhandled_exception() noexcept { std::terminate(); }
void return_value(int v) noexcept { value = v; }
task get_return_object() {
return task{coro::coroutine_handle<promise_type>::from_promise(*this)};
}
int value{};
};
void Start() { return handle_.resume(); }
int Get() {
auto& promise = handle_.promise();
return promise.value;
}
coro::coroutine_handle<promise_type> handle_;
};
task func() { co_return 3; }
int main() {
auto t = func();
t.Start();
const auto result = t.Get();
foo(t.handle_.promise().value);
// moving this one line down or separating this into a noinline
// function suppresses the error
// removing this removes the stack-use-after-scope, but (rightfully) reports a leak
t.handle_.destroy();
if (result != 3) return 1;
}
Address sanitizer reports use-after-scope (full output available at godbolt, link above).
With some help from lldb, I found out that the error is thrown in main, more precisely: the jump at line 112 in the assembly listing, jne .LBB2_15, jumps to asan's report and never returns. It seems to be inside main's prologue.
As the comments indicate, moving destroy() a line down or calling it in a separate noinline function1 changes the behavior of address sanitizer. The only two explanations to this seem to be undefined behavior and asan throwing a false positive (or -fsanitize=address itself is creating lifetime issues, which is sort of the same in a sense).
At this point I'm fairly certain that there's no UB in the code above: both task and result live on main's stack frame, the promise object lives in the coroutine frame. The frame itself is allocated (on main's stack because no suspend-points) at line 1 of main, and destroyed right before returning, past the last access to it in foo(). The coroutine frame is not destroyed automatically because control never flows off co_await final_suspend(), as per the standard. I've been staring at this code for a while, though, so please forgive me if I missed something obvious.
The assembly generated without sanitation seems to makes sense to me and all the memory access happens within [rsp, rsp+24], as allocated. Futhermore, compiling with -fsanitize=address,undefined, or just -fsanitize=undefined, or simply compiling with gcc with -fsanitize=address reports no errors, which leads me to believe the issue is hidden somewhere in the code generated by asan.
Unfortunately, I can't quite make sense of what exactly happens in the code instrumented by asan, and that's why I'm posting this. I have a general understanding of Address sanitizer's algorithm, but I can't map the assembly memory access/allocations to what's happenning in the C
++ code.
I'm hoping that an answer will help me
Understand where the lifetime issues are hidden, if there are any
Understand what exactly happens in main when compiled with asan, so that a person reading this can have a more clear way of finding what memory access in the C++ code triggered the error, and where (if anywhere) was that memory allocated and freed.
Consistently suppress this particular false positive, and elaborate a bit on what causes it, if the issue really is in asan.
Thanks in advance.
1 This initially lead me to believe that clang's optimizer is reading result from the (destroyed) coroutine's frame directly, but moving destroy() into task's destructor brings the issue back and proves that theory wrong, as far as I can tell. destroy() is not in the destructor in the listing above because it requires implementing move construction/assignment in order to avoid double free, and I wanted to keep the example as small and clear as possible.
I think I figured it out - but mostly because it's already fixed in clang12.0.
Running the smaller/cleaner example with clang-12 shows no error from asan. The difference is in the following lines:
movabs rcx, -866669180174077455
mov qword ptr [r13 + 2147450880], rcx
mov dword ptr [r13 + 2147450888], -202116109
lea rdi, [r12 + 40]
mov rcx, rdi
shr rcx, 3
cmp byte ptr [rcx + 2147450880], 0
jne .LBB2_14
lea r14, [r12 + 32]
mov qword ptr [r14 + 8], offset f() [clone .cleanup]
lea rdi, [r14 + 16]
mov byte ptr [r13 + 2147450884], 0
mov rcx, rdi
shr rcx, 3
mov dl, byte ptr [rcx + 2147450880]
test dl, dl
jne .LBB2_7
.LBB2_8:
mov qword ptr [rbx + 16], rax # 8-byte Spill
mov dword ptr [r14 + 16], 0
Which clang-11 has, and clang-12 doesn't. From the looks of it, the address sanitizer tries to check that r12+40 (which should be the promise's cleanup method) is initialized before initializing it.
Clang-12 just performs no checks for the promise, leaving the entirity of the code above out.
TL;DR: (probably) a bug in clang-11 coroutine sanitation, fixed in 12.0, perhaps in later versions of clang-11 as well.

c++ gcc inline assembly does not seem to work

I am trying to figure out gcc inline assembly on c++. The following code works on visual c++ without % and other operands but i could not make it work with gcc
void function(const char* text) {
DWORD addr = (DWORD)text;
DWORD fncAddr = 0x004169E0;
asm(
"push %0" "\n"
"call %1" "\n"
"add esp, 04" "\n"
: "=r" (addr) : "d" (fncAddr)
);
}
I am injecting a dll to a process on runtime and fncAddr is an address of a function. It never changes. As I said it works with Visual C++
VC++ equivalent of that function:
void function(const char* text) {
DWORD addr = (DWORD)text;
DWORD fncAddr = 0x004169E0;
__asm {
push addr
call fncAddr
add esp, 04
}
}
Edit:
I changed my function to this: now it crashes
void sendPacket(const char* msg) {
DWORD addr = (DWORD)msg;
DWORD fncAddr = 0x004169E0;
asm(
".intel_syntax noprefix" "\n"
"pusha" "\n"
"push %0" "\n"
"call %1" "\n"
"add esp, 04" "\n"
"popa" "\n"
:
: "r" (addr) , "d"(fncAddr) : "memory"
);
}
Edit:
004169E0 /$ 8B0D B4D38100 MOV ECX,DWORD PTR DS:[81D3B4]
004169E6 |. 85C9 TEST ECX,ECX
004169E8 |. 74 0A JE SHORT client_6.004169F4
004169EA |. 8B4424 04 MOV EAX,DWORD PTR SS:[ESP+4]
004169EE |. 50 PUSH EAX
004169EF |. E8 7C3F0000 CALL client_6.0041A970
004169F4 \> C3 RETN
the function im calling is above. I changed it to function pointer cast
char_func_t func = (char_func_t)0x004169E0;
func(text);
like this and it crashed too but surprisingly somethimes it works. I attacted a debugger and it gave access violation at some address it does not exist
on callstack the last call is this:
004169EF |. E8 7C3F0000 CALL client_6.0041A970
LAST EDIT:
I gave up inline assembly, instead i wrote instructions i wanted byte by byte and it works like a charm
void function(const char* text) {
DWORD fncAddr = 0x004169E0;
char *buff = new char[50]; //extra bytes for no reason
memset((void*)buff, 0x90, 50);
*((BYTE*)buff) = 0x68; // push
*((DWORD*)(buff + 1)) = ((DWORD)text);
*((BYTE*)buff+5) = 0xE8; //call
*((DWORD*)(buff + 6)) = ((DWORD)fncAddr) - ((DWORD)&(buff[5]) + 5);
*((BYTE*)(buff + 10)) = 0x83; // add esp, 04
*((BYTE*)(buff + 11)) = 0xC4;
*((BYTE*)(buff + 12)) = 0x04;
*((BYTE*)(buff + 13)) = 0xC3; // ret
typedef void(*char_func_t)(void);
char_func_t func = (char_func_t)buff;
func();
delete[] buff;
}
Thank you all
Your current version with pusha / popa looks correct (slow but safe), unless your calling convention depends on maintaing 16-byte stack alignment.
If it's crashing, your real problem is somewhere else, so you should use a debugger and find out where it crashes.
Declaring clobbers on eax / ecx / edx, or asking for the pointers in two of those registers and clobbering the third, would let you avoid pusha / popa. (Or whatever the call-clobbered regs are for the calling convention you're using.)
You should remove the .intel_syntax noprefix. You already depend on compiling with -masm=intel, because you don't restore the previous mode in case it was AT&T. (I don't think there is a way to save/restore the old mode, unfortunately, but there is a dialect-alternatves mechanism for using different templates for different syntax modes.)
You don't need and shouldn't use inline asm for this
compilers know how to make function calls already, when you're using a standard calling convention (in this case: stack args in 32-bit mode which is normally the default).
It's valid C++ to cast an integer to a function pointer, and it's not even undefined behaviour if there really is a function there at that address.
void function(const char* text) {
typedef void (*char_func_t)(const char *);
char_func_t func = (char_func_t)0x004169E0;
func(text);
}
As a bonus, this compiles more efficiently with MSVC than your asm version, too.
You can use GCC function attributes on function pointers to specify the calling convention explicitly, in case you compile with a different default. For example __attribute__((cdecl)) to explicitly specify stack args and caller-pops for calls using that function pointer. The MSVC equivalent is just __cdecl.
#ifdef __GNUC__
#define CDECL __attribute__((cdecl))
#define STDCALL __attribute__((stdcall))
#elif defined(_MSC_VER)
#define CDECL __cdecl
#define STDCALL __stdcall
#else
#define CDECL /*empty*/
#define STDCALL /*empty*/
#endif
// With STDCALL instead of CDECL, this function has to translate from one calling convention to another
// so it can't compile to just a jmp tailcall
void function(const char* text) {
typedef void (CDECL *char_func_t)(const char *);
char_func_t func = (char_func_t)0x004169E0;
func(text);
}
To see the compiler's asm output, I put this on the Godbolt compiler explorer. I used the "intel-syntax" option, so gcc output comes from gcc -S -masm=intel
# gcc8.1 -O3 -m32 (the 32-bit Linux calling convention is close enough to Windows)
# except it requires maintaing 16-byte stack alignment.
function(char const*):
mov eax, 4286944
jmp eax # tail-call with the args still where we got them
This test caller makes the compiler set up args and not just a tail-call, but function can inline into it.
int caller() {
function("hello world");
return 0;
}
.LC0:
.string "hello world"
caller():
sub esp, 24 # reserve way more stack than it needs to reach 16-byte alignment, IDK why.
mov eax, 4286944 # your function pointer
push OFFSET FLAT:.LC0 # addr becomes an immediate
call eax
xor eax, eax # return 0
add esp, 28 # add esp, 4 folded into this
ret
MSVC's -Ox output for caller is essentially the same:
caller PROC
push OFFSET $SG2661
mov eax, 4286944 ; 004169e0H
call eax
add esp, 4
xor eax, eax
ret 0
But a version using your inline asm is much worse:
;; MSVC -Ox on a caller() that uses your asm implementation of function()
caller_asm PROC
push ebp
mov ebp, esp
sub esp, 8
; store inline asm inputs to the stack
mov DWORD PTR _addr$2[ebp], OFFSET $SG2671
mov DWORD PTR _fncAddr$1[ebp], 4286944 ; 004169e0H
push DWORD PTR _addr$2[ebp] ; then reload as memory operands
call DWORD PTR _fncAddr$1[ebp]
add esp, 4
xor eax, eax
mov esp, ebp ; makes the add esp,4 redundant in this case
pop ebp
ret 0
MSVC inline asm syntax basically sucks, because unlike GNU C asm syntax the inputs always have to be in memory, not registers or immediates. So you could do better with GNU C, but not as good as you can do by avoiding inline asm altogether. https://gcc.gnu.org/wiki/DontUseInlineAsm.
Making function calls from inline asm is generally to be avoided; it's much safer and more efficient when the compiler knows what's happening.
Here's an example of inline assembly with gcc.
Routine "vazio" hosts assembly code for routine "rotina" (vazio and rotina are simply labels). Note the use of Intel syntax by means of a directive; gcc defaults to AT&T .
I recovered this code from an old sub-directory; variables in assembly code were prefixed with "_" , as "_str" - that's standard C convention. I confess that, here and now, I have no idea as why the compiler is accepting "str" instead... Anyway:
compiled correctly with gcc/g++ versions 5 and 7! Hope this helps. Simply call "gcc main.c", or "gcc -S main.c" if you want to see the asm result, and "gcc -S masm=intel main.c" for Intel output.
#include <stdio.h>
char str[] = "abcdefg";
// C routine, acts as a container for "rotina"
void vazio (void) {
asm(".intel_syntax noprefix");
asm("rotina:");
asm("inc eax");
// EBX = address of str
asm("lea ebx, str");
// ++str[0]
asm("inc byte ptr [ebx]");
asm("ret");
asm(".att_syntax noprefix");
}
// global variables make things simpler
int a;
int main(void) {
a = -7;
puts ("antes");
puts (str);
printf("a = %d\n\n", a);
asm(".intel_syntax noprefix");
asm("mov eax, 0");
asm("call rotina");
// modify variable a
asm("mov a, eax");
asm(".att_syntax noprefix");
printf("depois: \n a = %d\n", a);
puts (str);
return 0;
}

Accessing Assembly language from C++

This is my programming assignment. I need to find out the largest among the array of integers using a method written in 8086 programming language. This is my attempt :
#include <iostream.h>
#include <conio.h>
int returnLargest(int a[])
{
int max;
asm mov si,offset a
for(int i=0;i<6;i++) //Assuming six numbers in the array...Can be set to a variable 'n' later
{
asm mov ax,[si]
asm mov max,ax
asm inc si
cout<<max<<"\n"; //Just to see what is there in the memory location
}
asm mov si,offset a
asm mov cx,0000h
asm mov dx, [si]
asm mov cx,06h
skip: asm mov si,offset a
asm mov bx,[si]
asm mov max,bx
asm inc si
abc: asm mov bx,max
asm cmp [si],bx
asm jl ok
asm mov bx,[si]
asm mov max,bx
ok: asm loop abc
asm mov ax,max
return max;
}
void main()
{
clrscr();
int n;
int a[]={1,2,3,4,5,6};
n=returnLargest(a);
cout<<n; //Prints the largest
getch();
}
The expected answer is
1
2
3
4
5
6
6. But what I get is this :
Here I sit down and think... Is'nt it the value at the index i of array actually stored in the memory? Because atleast we were taught that if a[i] is 12(say) then ith memory location has the number 12 written inside it.
Or if the value is'nt stored at the memory location, How do I write into the memory location so as to accomplish the desired task?
Also I request you all to link some material on net/paperback so as to brush-up on these concepts.
EDIT :
The same code in assembly works just fine...
data segment
a db 01h,02h,03h,04h,05h,06h,'$'
max db ?
data ends
code segment
start:
assume cs:code,ds:data
mov ax,data
mov ds,ax
mov si,offset a
mov cx,0000h
back: mov dl,byte ptr [si]
cmp dl,'$'
je skip
inc cx
inc si
jmp back
skip: mov si,offset a
mov bl,byte ptr[si]
mov max,bl
inc si
abc: mov bl,max
cmp [si],bl
jl ok
mov bl,[si]
mov max,bl
ok: loop abc
mov al,max
int 03h
code ends
end start
mov si,offset a is incorrect. When you have a function parameter declared as int a[], the function actually receives a pointer. Since you want the pointer value (a) rather than its address (&a in C, offset a in assembly), use mov si, a.
Additionally, inc si doesn't seem right - you need to increase si by sizeof(int) for each element.
Edit:
You are mixing C++ code (for loop, cout) with your assembly. The C++ code is likely to use the same registers, which would cause conflicts. You should avoid doing this.
You also need to find out which registers your function is allowed to change according to the calling convention used. If you use any registers which aren't allowed to change, you need to push them at the beginning and pop them at the end.
You will have to make sure your compiler doesnt use your registers. Best way would be to write the entire function in assembly and implement a desired calling convention (c-call or stdcall - whatever). Then call that function from C/C++.
However if you know you will use only one compiler and how it works you shouldnt have any problems by inlining assembler, but it's really a pitfall.

Register keyword in C++

What is difference between
int x=7;
and
register int x=7;
?
I am using C++.
register is a hint to the compiler, advising it to store that variable in a processor register instead of memory (for example, instead of the stack).
The compiler may or may not follow that hint.
According to Herb Sutter in "Keywords That Aren't (or, Comments by Another Name)":
A register specifier has the same
semantics as an auto specifier...
According to Herb Sutter, register is "exactly as meaningful as whitespace" and has no effect on the semantics of a C++ program.
In C++ as it existed in 2010, any program which is valid that uses the keywords "auto" or "register" will be semantically identical to one with those keywords removed (unless they appear in stringized macros or other similar contexts). In that sense the keywords are useless for properly-compiling programs. On the other hand, the keywords might be useful in certain macro contexts to ensure that improper usage of a macro will cause a compile-time error rather than producing bogus code.
In C++11 and later versions of the language, the auto keyword was re-purposed to act as a pseudo-type for objects which are initialized, which a compiler will automatically replace with the type of the initializing expression. Thus, in C++03, the declaration: auto int i=(unsigned char)5; was equivalent to int i=5; when used within a block context, and auto i=(unsigned char)5; was a constraint violation. In C++11, auto int i=(unsigned char)5; became a constraint violation while auto i=(unsigned char)5; became equivalent to auto unsigned char i=5;.
With today's compilers, probably nothing. Is was orginally a hint to place a variable in a register for faster access, but most compilers today ignore that hint and decide for themselves.
register is deprecated in C++11. It is unused and reserved in C++17.
Source: http://en.cppreference.com/w/cpp/keyword/register
Almost certainly nothing.
register is a hint to the compiler that you plan on using x a lot, and that you think it should be placed in a register.
However, compilers are now far better at determining what values should be placed in registers than the average (or even expert) programmer is, so compilers just ignore the keyword, and do what they wants.
The register keyword was useful for:
Inline assembly.
Expert C/C++ programming.
Cacheable variables declaration.
An example of a productive system, where the register keyword was required:
typedef unsigned long long Out;
volatile Out out,tmp;
Out register rax asm("rax");
asm volatile("rdtsc":"=A"(rax));
out=out*tmp+rax;
It has been deprecated since C++11 and is unused and reserved in C++17.
As of gcc 9.3, compiling using -std=c++2a, register produces a compiler warning, but it still has the desired effect and behaves identically to C's register when compiling without -O1–-Ofast optimisation flags in the respect of this answer. Using clang++-7 causes a compiler error however. So yes, register optimisations only make a difference on standard compilation with no optimisation -O flags, but they're basic optimisations that the compiler would figure out even with -O1.
The only difference is that in C++, you are allowed to take the address of the register variable which means that the optimisation only occurs if you don't take the address of the variable or its aliases (to create a pointer) or take a reference of it in the code (only on - O0, because a reference also has an address, because it's a const pointer on the stack, which, like a pointer can be optimised off the stack if compiling using -Ofast, except they will never appear on the stack using -Ofast, because unlike a pointer, they cannot be made volatile and their addresses cannot be taken), otherwise it will behave like you hadn't used register, and the value will be stored on the stack.
On -O0, another difference is that const register on gcc C and gcc C++ do not behave the same. On gcc C, const register behaves like register, because block-scope consts are not optimised on gcc. On clang C, register does nothing and only const block-scope optimisations apply. On gcc C, register optimisations apply but const at block-scope has no optimisation. On gcc C++, both register and const block-scope optimisations combine.
#include <stdio.h> //yes it's C code on C++
int main(void) {
const register int i = 3;
printf("%d", i);
return 0;
}
int i = 3;:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 3
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
register int i = 3;:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
push rbx
sub rsp, 8
mov ebx, 3
mov esi, ebx
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
mov rbx, QWORD PTR [rbp-8] //callee restoration
leave
ret
const int i = 3;
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 3 //still saves to stack
mov esi, 3 //immediate substitution
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
const register int i = 3;
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov esi, 3 //loads straight into esi saving rbx push/pop and extra indirection (because C++ block-scope const is always substituted immediately into the instruction)
mov edi, OFFSET FLAT:.LC0 // can't optimise away because printf only takes const char*
mov eax, 0 //zeroed: https://stackoverflow.com/a/6212755/7194773
call printf
mov eax, 0 //default return value of main is 0
pop rbp //nothing else pushed to stack -- more efficient than leave (rsp == rbp already)
ret
register tells the compiler to 1)store a local variable in a callee saved register, in this case rbx, and 2)optimise out stack writes if address of variable is never taken. const tells the compiler to substitute the value immediately (instead of assigning it a register or loading it from memory) and write the local variable to the stack as default behaviour. const register is the combination of these emboldened optimisations. This is as slimline as it gets.
Also, on gcc C and C++, register on its own seems to create a random 16 byte gap on the stack for the first local on the stack, which doesn't happen with const register.
Compiling using -Ofast however; register has 0 optimisation effect because if it can be put in a register or made immediate, it always will be and if it can't it won't be; const still optimises out the load on C and C++ but at file scope only; volatile still forces the values to be stored and loaded from the stack.
.LC0:
.string "%d"
main:
//optimises out push and change of rbp
sub rsp, 8 //https://stackoverflow.com/a/40344912/7194773
mov esi, 3
mov edi, OFFSET FLAT:.LC0
xor eax, eax //xor 2 bytes vs 5 for mov eax, 0
call printf
xor eax, eax
add rsp, 8
ret
Consider a case when compiler's optimizer has two variables and is forced to spill one onto stack. It so happened that both variables have the same weight to the compiler. Given there is no difference, the compiler will arbitrarily spill one of the variables. On the other hand, the register keyword gives compiler a hint which variable will be accessed more frequently. It is similar to x86 prefetch instruction, but for compiler optimizer.
Obviously register hints are similar to user-provided branch probability hints, and can be inferred from these probability hints. If compiler knows that some branch is taken often, it will keep branch related variables in registers. So I suggest caring more about branch hints, and forgetting about register. Ideally your profiler should communicate somehow with the compiler and spare you from even thinking about such nuances.