How does a C++ compiler compile variable names? [closed] - c++

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I understand I did not make myself clear. My doubt, I think, could be summed up in this:
In an executable file(machine code) how are "variables" represented? Are they static memory addresses? Does the compiler gives each one a specific "name" (or just keeps the one you gave them)?
Expressed in code:
int x=5;
//Bunch of code
cin>>y;
cout<<x+1;
How does the program in each and every machine knows which address is going to hold the value 5, to hold the inputed value, to add 1 to the value it now holds and finally print that same value.
--João

It's implementation-specific.
Typically, the location of variables will be based on all sorts of factors and optimizations. They may not live in RAM at all, as they may be optimised to live entirely within registers, or optimised away entirely.
Variable names don't exist at run-time; they're discarded during compilation. However, the compiler may emit debug information that's stored in the application binary, to allow the developers to debug the application. This is usually removed in release versions, though.
I have no idea about the specifics of Gameshark. But in many cases, the location of a particular variable can be figured out by taking a look at the machine code for the application.

Here is a simple program in C:
int main() {
int a = 5;
int b = 7;
int c = a + b;
return 0;
}
If you compile it with gcc -m32 -S -O0 -o main.s main.c under Linux, you'll get something like this
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
/* %ebp is a Base Pointer Register */
pushl %ebp
movl %esp, %ebp
/* Here we reserve space for our variables */
subl $16, %esp
/* a's address is %ebp - 4 */
movl $5, -4(%ebp)
/* b's address is %ebp - 8 */
movl $7, -8(%ebp)
/* a + b */
movl -8(%ebp), %eax
movl -4(%ebp), %edx
addl %edx, %eax
/* c's address is %ebp - 12 */
movl %eax, -12(%ebp)
/* return 0 */
movl $0, %eax
leave
ret
As you can see, in this case, variables' addresses are calculated as offsets of a base pointer of a function. If you enable optimisations, variables' values may be stored in registers.

So there are two parts to this, and I'll do my best.
When compiling, a compiler will convert the C++ code into an internal representation. This is then converted into using the CPU's registers as efficiently as possible, and pushing the rest of the data into RAM. As the program executes, data from ram will get copied back and forth into registers.
On your other question, one method I've seen that people use for this is for the gold that a user has. A program could take the entire memory space of the game and copy it. Then, the user does something (a minimal action) to gain or lose gold. The external application then searches through the entire memory space for what values have changed, and what previously was the original amount of gold, and what is now the current amount of gold. Once they find this location, they are able to edit the memory location and update it with whatever value they want.
Generally, the more complicated the game is, the harder that method is.

Related

Changing a number defined in a C++(C) program without compiling the source again

Suppose I have this simple program which prints a number:
#include <iostream>
int unique_id = 112233;
int main()
{
std::cout << unique_id;
return 0;
}
Then I compile it to something like a.exe. Now I want to create another application that opens a.exe and changes unique_id to something else. Is it possible?
I'm not going to pass a parameter to the program because of some restrictions.
I want to use the unique_id, as its name implies, to uniquely identify where my program is running. But I don't want to compile my program 1000 times for 1000 customers. I know I can use Hard Disk Serial number, but in virtual machines, this serial number may be omitted. I know I can use CPU serial number, But I read in S.O posts that this serial number is deprecated. I know I can use MAC address too :), but that address can be changed easily. So I decided to put the unique ID in exe file itself.
Considering the motivation you added to the question, you could simply make the exe read the id from a .txt file, and ship a different .txt file with the exe for every customer.
Or, equivalently, you could make a DLL (or the equivalent for your platform) that has a function returning the id, and only recompile the DLL for every customer.
In general, you cannot change anything without re-compiling.
In practice and in very limited cases, you might patch your binary. This is mostly processor specific (and executable format specific and ABI specific) and depends less on your particular operating system version (e.g. if it works for Windows 9, it could work for Windows 10).
(However, I don't know and never used Windows; I'm only using Linux; you should adapt my answer to your operating system)
So in some cases you might reverse-engineer your binary executable. If you do have the C source code, you could ask your compiler to emit the assembler code (e.g. by compiling with gcc -O -fverbose-asm -S with GCC). Then you might disassemble your executable, and change, with a binary or hexadecimal editor, the machine code containing that constant.
This won't always work, because the machine instruction (and its size) could depend on the magnitude (bit size) of your constant.
To take a simple example, in C, for GCC 7, on Linux/x86-64, consider the following C file:
/// A, B, C are preprocessor symbols defined as integers
int f(int x) {
if (x > 0)
return A*x + B;
return C;
}
If I compile that with gcc -fverbose-asm -S -O -DA=12751 -DB=32 -DC=11 e.c I'm getting:
.type f, #function
f:
.LFB0:
.cfi_startproc
# e.c:3: if (x > 0)
testl %edi, %edi # x
jle .L3 #,
# e.c:4: return A * x + B;
imull $12751, %edi, %edi #, x, tmp90
leal 32(%rdi), %eax #, <retval>
ret
.L3:
# e.c:5: return C;
movl $11, %eax #, <retval>
# e.c:6: }
ret
.cfi_endproc
.LFE0:
.size f, .-f
But if I do gcc -S -O -fverbose-asm -DA=12753 -DB=32 -DC=10 e.c I'm getting
.type f, #function
f:
.LFB0:
.cfi_startproc
# e.c:3: if (x > 0)
testl %edi, %edi # x
jle .L3 #,
# e.c:4: return A * x + B;
imull $12753, %edi, %edi #, x, tmp90
leal 32(%rdi), %eax #, <retval>
ret
.L3:
# e.c:5: return C;
movl $10, %eax #, <retval>
# e.c:6: }
ret
So indeed, in the above case I could patch the binary (I would need to find the 12751 and 11 constants in machine code; it is doable but tedious in that case).
Now, let's try with A being a small power of two, like 16, and C being 0, so
gcc -S -O -fverbose-asm -DA=16 -DB=32 -DC=0 e.c:
f:
.LFB0:
.cfi_startproc
# e.c:4: return A * x + B;
leal 2(%rdi), %eax #, tmp90
sall $4, %eax #, tmp93
testl %edi, %edi # x
movl $0, %edx #, tmp92
cmovle %edx, %eax # tmp93,, tmp92, <retval>
# e.c:6: }
ret
Because of compiler optimizations, the code changed significantly. It is not easy to patch.
Important notice
With enough effort, money and time (think of NSA-like abilities) a lot of things are possible.
if your goal is to obfuscate some data in your binary (e.g. some password), you might encrypt it to make hackers' life harder (but don't be naive, the NSA will be able to get it). Remember the motto: there is No Silver Bullet; it looks that is your goal, but don't be too naive (BTW, the legal protections around your software, e.g. the license, matters even more; so you need a lawyer to write a good EULA).
If your goal is on the contrary to adapt some performance-critical code, you could use metaprogramming and partial evaluation techniques. A practice I like doing is generate at runtime some temporary C (or C++) code (better suited for your particular situation and data), compile that temporary C or C++ code as some plugin, then dynamically load that temporary plugin (using dlopen and dlsym on Linux; on Windows you'll need LoadLibrary but I leave you to understand the details and consequences). Instead of generating C or C++ code at runtime you could use some JIT compiling library like libgccjit. If you are fond of such techniques, consider instead using better programming languages (like Common Lisp with SBCL) if your management allows them.
But I don't want to compile my program 1000 times for 1000 customers
That surprises me a lot. Compiling a simple (short) C file containing just constants is quick, and linking time is also quick. I would instead consider recompilation for each customer.
BTW, I feel you are incredibly naive. The most important protection is not technical in your binary, it is a legal protection (and you need a good contract, so find and pay a good lawyer).
Did you consider on the contrary to make your product free software? Many companies are doing that (and making money on something else that licenses, e.g. support).
NB. there are lots of existing license managers. Did you consider buying and using one? Notice also that corporations have large incentives to avoid cheating, and those willing to steal your software will be able to do that anyway. You'll sell more products by working on software quality, not by spending efforts on vain "protection" measures which are annoying your customers, increasing your logistics and distribution and maintenance costs, and harden the debugging of customer-found bugs.
No, the behaviour of changing a variable that is const is undefined. So you can't do this with standard C or C++.
Your best bet is to resort to an inline assembly solution; but note that UNIQUE_ID might be compiled out altogether (neither C nor C++ are reflective languages). In order to increase the probability of UNIQUE_ID being retained, remove the const qualifier and possibly introduce volatile.
Personally I'd pass UNIQUE_ID on the command line to your program.
Starting point: https://msdn.microsoft.com/en-us/library/fabdxz08.aspx

Endless loop in C/C++ [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
There are several possibilities to do an endless loop, here are a few I would choose:
for(;;) {}
while(1) {} / while(true) {}
do {} while(1) / do {} while(true)
Is there a certain form which one should choose? And do modern compilers make a difference between the middle and the last statement or does it realize that it is an endless loop and skips the checking part entirely?
Edit: as it has been mentioned I forgot goto, but this was done out of the reason that I don't like it as a command at all.
Edit2: I made some grep on the latest versions taken from kernel.org. I does seems as nothing much changed over time (within the Kernel at least)
The problem with asking this question is that you'll get so many subjective answers that simply state "I prefer this...". Instead of making such pointless statements, I'll try to answer this question with facts and references, rather than personal opinions.
Through experience, we can probably start by excluding the do-while alternatives (and the goto), as they are not commonly used. I can't recall ever seeing them in live production code, written by professionals.
The while(1), while(true) and for(;;) are the 3 different versions commonly existing in real code. They are of course completely equivalent and results in the same machine code.
for(;;)
This is the original, canonical example of an eternal loop. In the ancient C bible The C Programming Language by Kernighan and Ritchie, we can read that:
K&R 2nd ed 3.5:
for (;;) {
...
}
is an "infinite" loop, presumably to be broken by other means, such
as a break or return. Whether to use while or for is largely a matter
of personal preference.
For a long while (but not forever), this book was regarded as canon and the very definition of the C language. Since K&R decided to show an example of for(;;), this would have been regarded as the most correct form at least up until the C standardization in 1990.
However, K&R themselves already stated that it was a matter of preference.
And today, K&R is a very questionable source to use as a canonical C reference. Not only is it outdated several times over (not addressing C99 nor C11), it also preaches programming practices that are often regarded as bad or blatantly dangerous in modern C programming.
But despite K&R being a questionable source, this historical aspect seems to be the strongest argument in favour of the for(;;).
The argument against the for(;;) loop is that it is somewhat obscure and unreadable. To understand what the code does, you must know the following rule from the standard:
ISO 9899:2011 6.8.5.3:
for ( clause-1 ; expression-2 ; expression-3 ) statement
/--/
Both clause-1 and expression-3 can be omitted. An omitted expression-2
is replaced by a nonzero constant.
Based on this text from the standard, I think most will agree that it is not only obscure, it is subtle as well, since the 1st and 3rd part of the for loop are treated differently than the 2nd, when omitted.
while(1)
This is supposedly a more readable form than for(;;). However, it relies on another obscure, although well-known rule, namely that C treats all non-zero expressions as boolean logical true. Every C programmer is aware of that, so it is not likely a big issue.
There is one big, practical problem with this form, namely that compilers tend to give a warning for it: "condition is always true" or similar. That is a good warning, of a kind which you really don't want to disable, because it is useful for finding various bugs. For example a bug such as while(i = 1), when the programmer intended to write while(i == 1).
Also, external static code analysers are likely to whine about "condition is always true".
while(true)
To make while(1) even more readable, some use while(true) instead. The consensus among programmers seem to be that this is the most readable form.
However, this form has the same problem as while(1), as described above: "condition is always true" warnings.
When it comes to C, this form has another disadvantage, namely that it uses the macro true from stdbool.h. So in order to make this compile, we need to include a header file, which may or may not be inconvenient. In C++ this isn't an issue, since bool exists as a primitive data type and true is a language keyword.
Yet another disadvantage of this form is that it uses the C99 bool type, which is only available on modern compilers and not backwards compatible. Again, this is only an issue in C and not in C++.
So which form to use? Neither seems perfect. It is, as K&R already said back in the dark ages, a matter of personal preference.
Personally, I always use for(;;) just to avoid the compiler/analyser warnings frequently generated by the other forms. But perhaps more importantly because of this:
If even a C beginner knows that for(;;) means an eternal loop, then who are you trying to make the code more readable for?
I guess that's what it all really boils down to. If you find yourself trying to make your source code readable for non-programmers, who don't even know the fundamental parts of the programming language, then you are only wasting time. They should not be reading your code.
And since everyone who should be reading your code already knows what for(;;) means, there is no point in making it further readable - it is already as readable as it gets.
It is very subjective. I write this:
while(true) {} //in C++
Because its intent is very much clear and it is also readable: you look at it and you know infinite loop is intended.
One might say for(;;) is also clear. But I would argue that because of its convoluted syntax, this option requires extra knowledge to reach the conclusion that it is an infinite loop, hence it is relatively less clear. I would even say there are more number of programmers who don't know what for(;;) does (even if they know usual for loop), but almost all programmers who knows while loop would immediately figure out what while(true) does.
To me, writing for(;;) to mean infinite loop, is like writing while() to mean infinite loop — while the former works, the latter does NOT. In the former case, empty condition turns out to be true implicitly, but in the latter case, it is an error! I personally didn't like it.
Now while(1) is also there in the competition. I would ask: why while(1)? Why not while(2), while(3) or while(0.1)? Well, whatever you write, you actually mean while(true) — if so, then why not write it instead?
In C (if I ever write), I would probably write this:
while(1) {} //in C
While while(2), while(3) and while(0.1) would equally make sense. But just to be conformant with other C programmers, I would write while(1), because lots of C programmers write this and I find no reason to deviate from the norm.
In an ultimate act of boredom, I actually wrote a few versions of these loops and compiled it with GCC on my mac mini.
the
while(1){} and for(;;) {}
produced same assembly results
while the do{} while(1); produced similar but a different assembly code
heres the one for while/for loop
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
movl $0, -4(%rbp)
LBB0_1: ## =>This Inner Loop Header: Depth=1
jmp LBB0_1
.cfi_endproc
and the do while loop
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
movl $0, -4(%rbp)
LBB0_1: ## =>This Inner Loop Header: Depth=1
jmp LBB0_2
LBB0_2: ## in Loop: Header=BB0_1 Depth=1
movb $1, %al
testb $1, %al
jne LBB0_1
jmp LBB0_3
LBB0_3:
movl $0, %eax
popq %rbp
ret
.cfi_endproc
Everyone seems to like while (true):
https://stackoverflow.com/a/224142/1508519
https://stackoverflow.com/a/1401169/1508519
https://stackoverflow.com/a/1401165/1508519
https://stackoverflow.com/a/1401164/1508519
https://stackoverflow.com/a/1401176/1508519
According to SLaks, they compile identically.
Ben Zotto also says it doesn't matter:
It's not faster.
If you really care, compile with assembler output for your platform and look to see.
It doesn't matter. This never matters. Write your infinite loops however you like.
In response to user1216838, here's my attempt to reproduce his results.
Here's my machine:
cat /etc/*-release
CentOS release 6.4 (Final)
gcc version:
Target: x86_64-unknown-linux-gnu
Thread model: posix
gcc version 4.8.2 (GCC)
And test files:
// testing.cpp
#include <iostream>
int main() {
do { break; } while(1);
}
// testing2.cpp
#include <iostream>
int main() {
while(1) { break; }
}
// testing3.cpp
#include <iostream>
int main() {
while(true) { break; }
}
The commands:
gcc -S -o test1.asm testing.cpp
gcc -S -o test2.asm testing2.cpp
gcc -S -o test3.asm testing3.cpp
cmp test1.asm test2.asm
The only difference is the first line, aka the filename.
test1.asm test2.asm differ: byte 16, line 1
Output:
.file "testing2.cpp"
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
.text
.globl main
.type main, #function
main:
.LFB969:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
nop
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE969:
.size main, .-main
.type _Z41__static_initialization_and_destruction_0ii, #function
_Z41__static_initialization_and_destruction_0ii:
.LFB970:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
cmpl $1, -4(%rbp)
jne .L3
cmpl $65535, -8(%rbp)
jne .L3
movl $_ZStL8__ioinit, %edi
call _ZNSt8ios_base4InitC1Ev
movl $__dso_handle, %edx
movl $_ZStL8__ioinit, %esi
movl $_ZNSt8ios_base4InitD1Ev, %edi
call __cxa_atexit
.L3:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE970:
.size _Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii
.type _GLOBAL__sub_I_main, #function
_GLOBAL__sub_I_main:
.LFB971:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $65535, %esi
movl $1, %edi
call _Z41__static_initialization_and_destruction_0ii
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE971:
.size _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
.section .ctors,"aw",#progbits
.align 8
.quad _GLOBAL__sub_I_main
.hidden __dso_handle
.ident "GCC: (GNU) 4.8.2"
.section .note.GNU-stack,"",#progbits
With -O3, the output is considerably smaller of course, but still no difference.
The idiom designed into the C language (and inherited into C++) for infinite looping is for(;;): the omission of a test form. The do/while and while loops do not have this special feature; their test expressions are mandatory.
for(;;) does not express "loop while some condition is true that happens to always be true". It expresses "loop endlessly". No superfluous condition is present.
Therefore, the for(;;) construct is the canonical endless loop. This is a fact.
All that is left to opinion is whether or not to write the canonical endless loop, or to choose something baroque which involves extra identifiers and constants, to build a superfluous expression.
Even if the test expression of while were optional, which it isn't, while(); would be strange. while what? By contrast, the answer to the question for what? is: why, ever---for ever! As a joke some programmers of days past have defined blank macros, so they could write for(ev;e;r);.
while(true) is superior to while(1) because at least it doesn't involve the kludge that 1 represents truth. However, while(true) didn't enter into C until C99. for(;;) exists in every version of C going back to the language described in the 1978 book K&R1, and in every dialect of C++, and even related languages. If you're coding in a code base written in C90, you have to define your own true for while (true).
while(true) reads badly. While what is true? We don't really want to see the identifier true in code, except when we are initializing boolean variables or assigning to them. true need not ever appear in conditional tests. Good coding style avoids cruft like this:
if (condition == true) ...
in favor of:
if (condition) ...
For this reason while (0 == 0) is superior to while (true): it uses an actual condition that tests something, which turns into a sentence: "loop while zero is equal to zero." We need a predicate to go nicely with "while"; the word "true" isn't a predicate, but the relational operator == is.
I use for(;/*ever*/;).
It is easy to read and it takes a bit longer to type (due to the shifts for the asterisks), indicating I should be really careful when using this type of loop. The green text that shows up in the conditional is also a pretty odd sight—another indication this construct is frowned upon unless absolutely necessary.
They probably compile down to nearly the same machine code, so it is a matter of taste.
Personally, I would chose the one that is the clearest (i.e. very clear that it is supposed to be an infinite loop).
I would lean towards while(true){}.
I would recommend while (1) { } or while (true) { }. It's what most programmers would write, and for readability reasons you should follow the common idioms.
(Ok, so there is an obvious "citation needed" for the claim about most programmers. But from the code I've seen, in C since 1984, I believe it is true.)
Any reasonable compiler would compile all of them to the same code, with an unconditional jump, but I wouldn't be surprised if there are some unreasonable compilers out there, for embedded or other specialized systems.
Is there a certain form which one should choose?
You can choose either. Its matter of choice. All are equivalent. while(1) {}/while(true){} is frequently used for infinite loop by programmers.
Well, there is a lot of taste in this one.
I think people from a C background are more likely to prefer for(;;), which reads as "forever". If its for work, do what the locals do, if its for yourself, do the one that you can most easily read.
But in my experience, do { } while (1); is almost never used.
All are going to perform same function, and it is true to choose what you prefer..
i might think "while(1) or while(true)" is good practice to use.
They are the same. But I suggest "while(ture)" which has best representation.

Where is the one to one correlation between the assembly and cpp code?

I tried to examine how the this code will be in assembly:
int main(){
if (0){
int x = 2;
x++;
}
return 0;
}
I was wondering what does if (0) mean?
I used the shell command g++ -S helloWorld.cpp in Linux
and got this code:
.file "helloWorld.cpp"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1"
.section .note.GNU-stack,"",#progbits
I expected that the assembly will contain some JZ but where is it?
How can I compile the code without optimization?
There is no direct, guaranteed relationship between C++ source code and
the generated assembler. The C++ source code defines a certain
semantics, and the compiler outputs machine code which will implement
the observable behavior of those semantics. How the compiler does this,
and the actual code it outputs, can vary enormously, even over the same
underlying hardware; I would be very disappointed in a compiler which
generated code which compared 0 with 0, and then did a conditional
jump if the results were equal, regardless of what the C++ source code
was.
In your example, the only observable behavior in your code is to return
0 to the OS. Anything the compiler generates must do this (and have
no other observable behavior). The code you show isn't optimal for
this:
xorl %eax, %eax
ret
is really all that is needed. But of course, the compiler is free to
generate a lot more if it wants. (Your code, for example, sets up a
frame to support local variables, even though there aren't any. Many
compilers do this systematically, because most debuggers expect it, and
get confused if there is no frame.)
With regards to optimization, this depends on the compiler. With g++,
-O0 (that's the letter O followed by the number zero) turns off all
optimization. This is the default, however, so it is effectively what
you are seeing. In addition to having several different levels of
optimization, g++ supports turning individual optimizations off or on.
You might want to look at the complete list:
http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html#Optimize-Options.
The compiler eliminates that code as dead code, e.g. code that will never run. What you're left with is establishing the stack frame and setting the return value of the function. if(0) is never true, after all. If you want to get JZ, then you should probably do something like if(variable == 0). Keep in mind that the compiler is in no way required to actually emit the JZ instruction, it may use any other means to achieve the same thing. Compiling a high level language to assembly is very rarely a clear, one-to-one correlation.
The code has probably been optimized.
if (0){
int x = 2;
x++;
}
has been eliminated.
movl $0, %eax is where the return value been set. It seems the other instructions are just program init and exit.
There is a possibility that the compiler optimized it away, since it's never true.
The optimizer removed the if conditional and all of the code inside, so it doesn't show up at all.
the if (0) {} block has been optimized out by the compiler, as this will never be called.
so your function do only return 0 (movl $0, %eax)

Whether variable name in any programming language takes memory space

e.g.
int a=3;//-----------------------(1)
and
int a_long_variable_name_used_instead_of_small_one=3;//-------------(2)
out of (1) and (2) which will acquire more memory space or equal space would be aquire?
In C++ and most statically compiled languages, variable names may take up more space during the compilation process but by run time the names will have been discarded and thus take up no space at all.
In interpreted languages and compiled languages which provide run time introspection/reflection the name may take up more space.
Also, language implementation will affect how much space variable names take up. The implementer may have decided to use a fixed-length buffer for each variable name, in which case each name takes up the same space regardless of length. Or they may have dynamically allocated space based on the length.
Both occupy the same amount of memory. Variable names are just to help you, the programmer, remember what the variable is for, and to help the compiler associate different uses of the same variable. With the exception of debugging symbols, they make no appearance in the compiled code.
The name you give to a variable in C/C++ will not affect the size of the resulting executable code. When you declare a variable like that, the compiler reserves memory space (in the case of an int on x86/x64, four bytes) to store the value. To access or alter the value it will then use the address rather than the variable name (which is lost in the compilation process).
In most interpreted languages, the name would be stored in a table somewhere in memory, thus taking up different amounts of space.
If my understanding is correct, they'll take up the same amount of memory.
I believe (and am ready to get shot down in flames) that in C++ the names are symbolic to help the user and the compiler will just create a block of memory sufficient to hold the type you're declaring, in this case an int.
So, they should both occupy the same memory size, ie, the memory required to hold an address.
For C++,
$ cat name.cpp
int main() {
int a = 74678;
int bcdefghijklmnopqrstuvwxyz = 5664;
}
$ g++ -S name.cpp
$ cat name.s
.file "name.cpp"
.text
.align 2
.globl main
.type main, #function
main:
.LFB2:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
subl $8, %esp
.LCFI2:
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
subl %eax, %esp
movl $74678, -4(%ebp)
movl $5664, -8(%ebp)
movl $0, %eax
leave
ret
.LFE2:
.size main, .-main
.section .note.GNU-stack,"",#progbits
.ident "GCC: (GNU) 3.4.6 20060404 (Red Hat 3.4.6-11.0.1)"
$
As you can see, neither a nor bcdefghijklmnopqrstuvwxyz reflect in the assembler output. So, the length of the variable name does not matter at runtime in terms of memory.
But, variable names are huge contributors to the program design. Some programmers even rely on good naming conventions instead of comments to explain the design of their program.
A relevant quote from Hacker News,
Code should be written so as to completely describe the program's functionality to human readers, and only incidentally to be interpreted by computers. We have a hard time remembering short names for a long time, and we have a hard time looking at long names over and over again in a row. Additionally, short names carry a higher likelihood of collisions (since the search space is smaller), but are easier to "hold onto" for short periods of reading.
Thus, our conventions for naming things should take into consideration the limitations of the human brain. The length of a variable's name should be proportional to the distance between its definition and its use, and inversely proportional to its frequency of use.
In modern compilers the name of a variable does not impact the amount of space that is required to hold it in C++.
Field names (instance variable names) in Java use memory, but only once per field. This is for reflection to work. The same goes for other languages that are based on the JVM, and I guess for DotNet.
compilers are there for a reason...
They optimize code to use a little space as possible and run as fast as possible especially modern ones.
No. Both will occupy equal space.

about the cost of virtual function

If I call a virtual function 1000 times in a loop, will I suffer from the vtable lookup overhead 1000 times or only once?
The compiler may be able to optimise it - for example, the following is (at least conceptually) easliy optimised:
Foo * f = new Foo;
for ( int i = 0; i < 1000; i++ ) {
f->func();
}
However, other cases are more difficult:
vector <Foo *> v;
// populate v with 1000 Foo (not derived) objects
for ( int i = 0; i < v.size(); i++ ) {
v[i]->func();
}
the same conceptual optimisation is applicable, but much harder for the compiler to see.
Bottom line - if you really care about it, compile your code with all optimisations enabled and examine the compiler's assembler output.
The Visual C++ compiler (at least through VS 2008) does not cache vtable lookups. Even more interestingly, it doesn't direct-dispatch calls to virtual methods where the static type of the object is sealed. However, the actual overhead of the virtual dispatch lookup is almost always negligible. The place where you sometimes do see a hit is in the fact that virtual calls in C++ cannot be replaced by direct calls like they can in a managed VM. This also means no inlining for virtual calls.
The only true way to establish the impact for your application is using a profiler.
Regarding the specifics of your original question: if the virtual method you are calling is trivial enough that the virtual dispatch itself is incurring a measurable performance impact, then that method is sufficiently small that the vtable will remain in the processor's cache throughout the loop. Even though the assembly instructions to pull the function pointer from the vtable are executed 1000 times, the performance impact will be much less than (1000 * time to load vtable from system memory).
If the compiler can deduce that the object on which you're calling the virtual function doesn't change, then, in theory, it should be able to hoist the vtable lookup out of the loop.
Whether your particular compiler actually does this is something you can only find out by looking at the assembly code it produces.
I think that the problem is not vtable lookup since that's very fast operation especially in a loop where you have all required values on cache (if the loop is not too complex, but if it's complex then virtual function wouldn't impact performance a lot). The problem is the fact that compiler cannot inline that function in compile time.
This is especially a problem when virtual function is very small (e.g. returning only one value). The relative performance impact in this case can be huge because you need function call to just return a value. If this function can be inlined, it would improve performance very much.
If the virtual function is performance consuming, then I wouldn't really care about vtable.
For a study about the overhead of Virtual Function Calls i recommend the paper
"The Direct Cost of Virtual Function Calls in C++"
Let's give it a try with g++ targeting x86:
$ cat y.cpp
struct A
{
virtual void not_used(int);
virtual void f(int);
};
void foo(A &a)
{
for (unsigned i = 0; i < 1000; ++i)
a.f(13);
}
$
$ gcc -S -O3 y.cpp # assembler output, max optimization
$
$ cat y.s
.file "y.cpp"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl _Z3fooR1A
.type _Z3fooR1A, #function
_Z3fooR1A:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
pushq %rbx
.cfi_def_cfa_offset 24
.cfi_offset 3, -24
movq %rdi, %rbp
movl $1000, %ebx
subq $8, %rsp
.cfi_def_cfa_offset 32
.p2align 4,,10
.p2align 3
.L2:
movq 0(%rbp), %rax
movl $13, %esi
movq %rbp, %rdi
call *8(%rax)
subl $1, %ebx
jne .L2
addq $8, %rsp
.cfi_def_cfa_offset 24
popq %rbx
.cfi_def_cfa_offset 16
popq %rbp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE0:
.size _Z3fooR1A, .-_Z3fooR1A
.section .text.unlikely
.LCOLDE0:
.text
.LHOTE0:
.ident "GCC: (GNU) 5.3.1 20160406 (Red Hat 5.3.1-6)"
.section .note.GNU-stack,"",#progbits
$
The L2 label is the top of the loop. The line right after L2 seems to be loading the vpointer into rax. The call 4 lines after L2 seems to be indirect, fetching the pointer to the f() override from the vstruct.
I'm surprised by this. I would have expected the compiler to treat the address of the f() override function as a loop invariant. It seems like gcc is making two "paranoid" assumptions:
The f() override function may change the hidden vpointer in the object
somehow, or
The f() override function may change the contents of the
vstruct somehow.
Edit: In a separate compilation unit, I implemented A::f() and a main function with a call to foo(). I then built an executable with gcc using link-time optimization, and ran objdump on it. The virtual function call was inlined. So, perhaps this is why gcc optimization without LTO is not as ideal as one might expect.
I would say this depends on your compiler as well as on the look of the loop.
Optimizing compilers can do a lot for you and if the VF-call is predictable the compiler can help you.
Maybe you can find something about the optimizations your compiler does in your compiler documentation.