Does code alignment make a difference in compiler output? - c++

Let's say I got some code written in C++ and I compile it with gcc.
Now let's say I push everything in one line and compile it again.
Does the output bytes of the compiler change?
If so, what changes and why?

Does the output bytes of the compiler change?
Compilers may produce information that maps produced assembly instructions to lines of the original source code. This may be called "debug information". In case the compiler produces debug information, that information will be different if you change the line numbers or file names etc.
Then there is the macro __LINE__ and the newish std::source_location::line which will even change the meaning of the source code and thus may change the compiler output.

Related

Can I tell my compiler to "inline" a function also w.r.t. debug/source-line info?

In my code (either C or C++; let's say it's C++) I have a one-liner inline function foo() which gets called from many places in the code. I'm using a profiling tool which gathers statistics by line in the object code, which it translates into statistics by using the source-code-line information (which we get with -g in clang or GCC). Thus the profiler can't distinguish between calls to foo() from different places.
I would like the stats to be counted separately for the different places foo() get called. For this to happen, I need the compiler to "fully" inline foo() - including forgetting about it when it comes to the source location information.
Now, I know I can achieve this by using a macro - that way, there is no function, and the code is just pasted where I use it. But that wont work for operators, for example; and it may be a problem with templates. So, can I tell the compiler to do what I described?
Notes:
Compiler-specific answers are relevant; I'm mainly interested in GCC and clang.
I'm not compiling a debug build, i.e. optimizations are turned on.

__builtin_sinf What is it? Where is it? How do I obtain the dissassembly of it?

I was curious this morning to see if I could obtian and understand the machine instructions generated for calculating a mathematical function such as sin, which as far as I am aware there is no machine instruction (in x86-64) for computing.
I first went to Godbolt and tried to obtian the disassembly for this:
// Type your code here, or load an example.
#include <cmath>
int mysin(float num) {
float result = sin(num);
return result;
}
I found that the output sets up some registers for a function call and then does call sin.
I then went and searched my local machine for the cmath headers. I found one in /usr/include/c++/10/cmath.h
This header calls another function: __builtin_sin.
My guess would be that the gcc compiler sees this identifier and substitutes it for some set of machine instructions which is somehow "baked into" the gcc compiler itself. Not sure if I am correct about that however.
I did a search of my system for the string __builtin_sinf and it doesn't look like there is any text file on the system which contains source code (C, asm or otherwise) which might then be used by a compiler.
Can anyone offer any explaination as to what __builtin_sinf is, where it is located if it exists in a file on my system, and finally how to obtain the disassembly of this function?
If you do not have the source code for the compiler on your system, then you probably do not have the source code for __builtin_sin because what to do for __builtin_sin is built into the compiler.
__builtin_sin is not an instruction to the compiler to replace it with a call to a sin routine. It tells the compiler that the operation of the standard C library sin function is desired, with all the semantics of the sin function defined by the C standard. This means the compiler does not have to replace a use of __builtin_sin with a call to sin or a call to any other function. It may replace it with whatever code is appropriate. For example, given some __builtin_sin(x), if the compiler can determine the value of x during compilation, it can calculate sin(x) itself and replace __builtin_sin(x) with that value. Alternatively, the compiler can compile __builtin_sin(x) into assembly language that computes sin. Or it can compile __builtin_sin(x) into a call to the sin routine. It could also compile __builtin_sin(x) to a single machine instruction, such as fsin. (However, fsin is a primitive instruction with some limitations, particularly on argument domain, and is generally not suitable to serve as a sin function by itself.)
So there is no source file that contains the implementation of __builtin_sin other than the compiler’s source code. And, while that source code might contain information about an inline implementation of sin that the compiler uses, that information might not be in the form of assembly language for your target processor; it might be in the form of an internal compiler language specifying what operations to perform to calculate sin. (This is called an intermediate representation.)

How to disable inline assembly in GCC?

I'm developing an online judge system for programming contests. Since C/C++ inline assembly is not allowed in certain programming contests, I would like to add the same restriction to my system.
I would like to let GCC produce an error when compiling a C/C++ program containing inline assembly, so that any program containing inline assembly will be rejected. Is there a way to achieve that?
Note: disabling inline assembly is just for obeying the rules, not for security concerns.
Is there a way to disable inline assembler in GCC?
Yes there are a couple of methods; none useful for security, only guard-rails that could be worked around intentionally, but will stop people from accidentally using asm in places they didn't realize they shouldn't.
Turn off the asm keyword in the compiler (C only)
To do it in compilation phase, use the parameter -fno-asm. However, keep in mind that this will only affect asm for C, not C++. And not __asm__ or __asm for either language.
Documentation:
-fno-asm
Do not recognize "asm", "inline" or "typeof" as a keyword, so that code can use these words as identifiers. You can use the keywords "__asm__", "__inline__" and "__typeof__" instead. -ansi implies -fno-asm.
In C++ , this switch only affects the "typeof" keyword, since "asm" and "inline" are standard keywords. You may want to use the -fno-gnu-keywords flag instead, which has the same effect. In C99 mode (-std=c99 or -std=gnu99), this switch only affects the "asm" and "typeof" keywords, since "inline" is a standard keyword in ISO C99.
Define the keyword as a macro
You can use the parameters -Dasm=error -D__asm__=error -D__asm=error
Note that this construction is generic. What it does is to create macros. It works pretty much like a #define. The documentation says:
-D name=definition
The contents of definition are tokenized and processed as if they appeared during translation phase three in a #define directive. In particular, the definition will be truncated by embedded newline characters.
...
So what it does is simply to change occurrences of asm, __asm, or __asm__ to error. This is done in the preprocessor phase. You don't have to use error. Just pick anything that will not compile.
Use a macro that fires during compilation
A way to solve it in compilation phase by using a macro, as suggested in comments by zwol, you can use -D'asm(...)=_Static_assert(0,"inline assembly not allowed")'. This will also solve the problem if there exist an identifier called error.
Note: This method requires -std=c11 or higher.
Using grep before using gcc
Yet another way that may be the solution to your problem is to just do a grep in the root of the source tree before compiling:
grep -nr "asm"
This will also catch __asm__ but it may give false positives, for instance is you have a string literal, identifier or comment containing the substring "asm". But in your case you could solve this problem by also forbidding any occurrence of that string anywhere in the source code. Just change the rules.
Possible unexpected problems
Note that disabling assembly can cause other problems. For instance, I could not use stdio.h with this option. It is common that system headers contains inline assembly code.
A way to cheat above methods
Aside from the trivial #undef __asm__, it is possible to execute strings as machine code. See this answer for an example: https://stackoverflow.com/a/18477070/6699433
A piece of the code from the link above:
/* our machine code */
char code[] = {0x55,0x48,0x89,0xe5,0x89,0x7d,0xfc,0x48,
0x89,0x75,0xf0,0xb8,0x2a,0x00,0x00,0x00,0xc9,0xc3,0x00};
/* copy code to executable buffer */
void *buf = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
memcpy (buf, code, sizeof(code));
/* run code */
int i = ((int (*) (void))buf)();
The code above is only intended to give a quick idea of how to trick the rules OP has stated. It is not intended to be a good example of how to actually perform it in reality. Furthermore, the code is not mine. It is just a short code quote from the link I supplied. If you have ideas about how to improve it, then please comment on 4pie0:s original post instead.

Does VS2010 pre-calculate preprocessor defined by #define?

For Visual Studio 2010, if I define
#define PI 4.0f*atan(1.0f)
when PI is used somewhere later in the code, does the value needs to be calculate again or simply 3.1415926... being plugged in? Thanks.
EDIT:
Because I heard someone says the compiler might optimize to replace it with 3.1415926.., depending on the compiler.
the #define will do a direct text replacement. Because of that everywhere you have PI it will get replaced with 4.0f*atan(1.0f). I would suspect the compiler would optimize this away during code generation but the only real way to know is to compile it and check the assembly.
I found this little online tool that will take c++ code and generate the assembly output. If you turn on optimizations you will see that the code generated to display PI is gone and it is now just a constant that gets referenced.
#define is a "copy-paste" type of thing. If your code says std::cout << PI; then the compiler pretends you typed std::cout << 4.0f*atan(1.0f);.
The values of defines are not calculated until they're used, and they're theoretically recalculated every time they're used. However, most modern compilers will see std::cout << 4.0f*atan(1.0f); and do that calculation at compile time and will emit assembly for std::cout << 3.14159265f;, so the code is just as fast as if it were precalculated.
Unrelated, #include is also a copy-paste kind of thing, which is why we need include guards.
When the preprocessor runs it will replace every instance of PI with 4.0*atan(1.0f).

Strange compiler error due to use of xml in c++ comments

I'm working on a proprietary unix-like os (I don't know if that's relevant though) and compiling with g++.
I noticed recently that if I put xml-like tags in my C++ comments I get compiler errors. I don't particularly need to do this, but I thought it was strange and I'd like to know why it's an issue for the compiler. For example:
// <debugoutput>
std::cerr << "I'm debugging!" << std::endl;
// </debugoutput>
would cause massive compiler errors if it were in the middle of my code somewhere. Changing the last comment line </debugoutput> to <debugoutput> makes it compile fine though.
Does anyone know why the compiler would be confused by that line being in a comment? The compiler errors generated when this happens don't seem related at all - they're more like what you'd see if you missed the semi colon on the end of a class, undefined references to well defined classes, etc. I can't paste the output from my dev system, but trust me that it doesn't look related to the issue - its more like the compiler got confused.
This sounds suspiciously like a digraph related issue, but without the actual error message or a small code sample that exhibits the problem it's hard to tell for sure.
Try changing the whitespacing between the <, / and actual text, as well as try it within a C-style comment to see if that provides additional insight.
For information on C/C++ digraphs and trigraphs see http://en.wikipedia.org/wiki/C_trigraph#C and also Purpose of Trigraph sequences in C++? and Why are there digraphs in C and C++? from SO.
It seems possible that there is some sequence being picked up (for example </ as a digraph and it's throwing off the compiler).