How to disable inline assembly in GCC? - c++

I'm developing an online judge system for programming contests. Since C/C++ inline assembly is not allowed in certain programming contests, I would like to add the same restriction to my system.
I would like to let GCC produce an error when compiling a C/C++ program containing inline assembly, so that any program containing inline assembly will be rejected. Is there a way to achieve that?
Note: disabling inline assembly is just for obeying the rules, not for security concerns.

Is there a way to disable inline assembler in GCC?
Yes there are a couple of methods; none useful for security, only guard-rails that could be worked around intentionally, but will stop people from accidentally using asm in places they didn't realize they shouldn't.
Turn off the asm keyword in the compiler (C only)
To do it in compilation phase, use the parameter -fno-asm. However, keep in mind that this will only affect asm for C, not C++. And not __asm__ or __asm for either language.
Documentation:
-fno-asm
Do not recognize "asm", "inline" or "typeof" as a keyword, so that code can use these words as identifiers. You can use the keywords "__asm__", "__inline__" and "__typeof__" instead. -ansi implies -fno-asm.
In C++ , this switch only affects the "typeof" keyword, since "asm" and "inline" are standard keywords. You may want to use the -fno-gnu-keywords flag instead, which has the same effect. In C99 mode (-std=c99 or -std=gnu99), this switch only affects the "asm" and "typeof" keywords, since "inline" is a standard keyword in ISO C99.
Define the keyword as a macro
You can use the parameters -Dasm=error -D__asm__=error -D__asm=error
Note that this construction is generic. What it does is to create macros. It works pretty much like a #define. The documentation says:
-D name=definition
The contents of definition are tokenized and processed as if they appeared during translation phase three in a #define directive. In particular, the definition will be truncated by embedded newline characters.
...
So what it does is simply to change occurrences of asm, __asm, or __asm__ to error. This is done in the preprocessor phase. You don't have to use error. Just pick anything that will not compile.
Use a macro that fires during compilation
A way to solve it in compilation phase by using a macro, as suggested in comments by zwol, you can use -D'asm(...)=_Static_assert(0,"inline assembly not allowed")'. This will also solve the problem if there exist an identifier called error.
Note: This method requires -std=c11 or higher.
Using grep before using gcc
Yet another way that may be the solution to your problem is to just do a grep in the root of the source tree before compiling:
grep -nr "asm"
This will also catch __asm__ but it may give false positives, for instance is you have a string literal, identifier or comment containing the substring "asm". But in your case you could solve this problem by also forbidding any occurrence of that string anywhere in the source code. Just change the rules.
Possible unexpected problems
Note that disabling assembly can cause other problems. For instance, I could not use stdio.h with this option. It is common that system headers contains inline assembly code.
A way to cheat above methods
Aside from the trivial #undef __asm__, it is possible to execute strings as machine code. See this answer for an example: https://stackoverflow.com/a/18477070/6699433
A piece of the code from the link above:
/* our machine code */
char code[] = {0x55,0x48,0x89,0xe5,0x89,0x7d,0xfc,0x48,
0x89,0x75,0xf0,0xb8,0x2a,0x00,0x00,0x00,0xc9,0xc3,0x00};
/* copy code to executable buffer */
void *buf = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
memcpy (buf, code, sizeof(code));
/* run code */
int i = ((int (*) (void))buf)();
The code above is only intended to give a quick idea of how to trick the rules OP has stated. It is not intended to be a good example of how to actually perform it in reality. Furthermore, the code is not mine. It is just a short code quote from the link I supplied. If you have ideas about how to improve it, then please comment on 4pie0:s original post instead.

Related

(v) is actually (*&v) since when?

Could C++ standards gurus please enlighten me:
Since which C++ standard version has this statement failed because (v) seems to be equivalent to (*&v)?
I.e. for example the code:
#define DEC(V) ( ((V)>0)? ((V)-=1) : 0 )
...{...
register int v=1;
int r = DEC(v) ;
...}...
This now produces warnings under -std=c++17 like:
cannot take address of register variable
left hand side of operand must be lvalue
Many C macros enclose ALL macro parameters in parentheses, of which the above is meant only to be a representative example.
The actual macros that produce warnings are for instance
the RTA_* macros in /usr/include/linux/rtnetlink.h.
Short of not using/redefining these macros in C++, is there any workaround?
If you look at the revision summary of the latest C++1z draft, you'd see this in [diff.cpp14.dcl.dcl]
[dcl.stc]
Change: Removal of register storage-class-specifier.
Rationale: Enable repurposing of deprecated keyword in future
revisions of this International Standard.
Effect on original feature: A valid C++ 2014 declaration utilizing the register
storage-class-specifier is ill-formed in this International Standard.
The specifier can simply be removed to retain the original meaning.
The warning may be due to that.
register is no longer a storage class specifier, you should remove it. Compilers may not be issuing the right error or warnings but your code should not have register to begin with
The following is a quote from the standard informing people about what they should do with regards to register in their code (relevant part emphasized), you probably have an old version of that file
C.1.6 Clause 10: declarations [diff.dcl]
Change: In C++, register is not a storage class specifier.
Rationale: The storage class specifier had no effect in C++.
Effect on original feature: Deletion of semantically well-defined feature.
Difficulty of converting: Syntactic transformation.
How widely used: Common.
Your worry is unwarranted since the file in question does not actually contain the register keyword:
grep "register" /usr/include/linux/rtnetlink.h
outputs nothing. Either way, you shouldn't be receiving the warning since:
System headers don't emit warnings by default, at least in GCC
It isn't wise to try to compile a file that belongs to a systems project like the linux kernel in C++ mode, as there may be subtle and nasty breaking changes
Just include the file normally or link the C code to your C++ binary. Report a bug if you really are getting a warning that should normally be suppressed to your compiler vendor.

C++ macro inside macro [duplicate]

I know that I am trying to shoot myself in the leg ;) However, it will allow me to make the rest (big amount) of code smaller and more readable.
Is there any tricky way to create preprocessor macro inside of another preprocessor macro?
Here is the example, what I am looking for. My real scenario is more complex
// That's what I want to do and surely C++ doesn't like it.
#define MACROCREATER(B) #define MACRO##B B+B
void foo()
{
MACROCREATOR(5) // This should create new macro (#define MACRO5 5+5)
int a = MACRO5; // this will use new macro
}
The C++ Standard says (16.3.4.3):
The resulting completely
macro-replaced preprocessing token
sequence [... of the macro expansion...] is not processed as a
preprocessing directive even if it
resembles one...
So no, there is no 'official' way of achieving what you want with macros.
No. Even if a macro expands into something that looks like a preprocessing directive, the expansion is not evaluated as a preprocessing directive.
As a supplement to the answers above, if you really wanted to pre-process a source file twice—which is almost definitely not what you actually want to do—you could always invoke your compiler like this:
g++ -E input.cpp | g++ -c -x c++ - -o output.o
That is, run the file through the preprocessor, then run the preprocessed output via pipe through a full compilation routine, including a second preprocessing step. In order for this to have a reasonably good chance of working, I'd imagine you'd have to be rather careful in how you defined and used your macros, and all in all it would most likely not be worth the trouble and increased build time.
If you really want macros, use standard macro-based solutions. If you really want compile-time metaprogramming, use templates.
On a slightly related note, this reminds me of the fact that raytracing language POV-Ray made heavy use of a fairly complex preprocessing language, with flow-control directives such as #while that allowed conditional repetition, compile-time calculations, and other such goodies. Would that it were so in C++, but it simply isn't, so we just do it another way.
No. The pre-processor is single-pass. It doesn't re-evaluate the macro expansions.
As noted, one can #include a particular file more than once with different macro definitions active. This can make it practical to achieve some effects that could not be practically achieved via any other means.
As a simple example, on many embedded systems pointer indirection is very expensive compared to direct variable access. Code which uses a lot of pointer indirection may very well be twice as large and slow as code which simply uses variables. Consequently, if a particular routine is used with two sets of variables, in a scenario where one would usually pass in a pointer to a structure and then use the arrow operator, it may be far more efficient to simple put the routine in its own file (I normally use extension .i) which is #included once without macro _PASS2 defined, and a second time with. That file can then #ifdef _PASS2/#else to define macros for all the variables that should be different on the two passes. Even though the code gets generated twice, on some micros that will take less space than using the arrow operator with passed-in pointers.
Take a look at m4. It is similar to cpp, but recursive and much more powerful. I've used m4 to create a structured language for assemblers, e.g.
cmp r0, #0
if(eq)
mov r1, #0
else
add r1, #1
end
The "if", "else", and "end" are calls to m4 macros I wrote that generate jumps and labels, the rest is native assembly. In order to nest these if/else/end constructs, you need to do defines within a macro.

Extracting preprocessor symbols from source

I'm looking for a way to extract all preprocessor symbols used in my code.
As an example, if my code looks like this:
#ifdef FOO
#endif
#if ( BAR == 1 && \
defined (Z) )
#endif
I'd like to get the list [FOO,BAR,Z] as the output.
I found some posts suggesting gcc -E -dM, but this displays all symbols that the preprocessor would apply to the code.
What I want, in contrast, is a list of all symbols actually used in the code.
Any suggestions?
That's quite simple. You have just to parse the source code exactly the way a conformant pre-processor would, and with the correct C or C++ version support. Ok, I'm joking, if you support only the later version, your code is likely to produce correct results on older versions - but even this should be thoroughly controlled.
More seriously now. As you can ask the pre-processor to give you the list of all defined symbols, you can simply tokenize the source, and identify all tokens from that list that are not immediately following an initial #define or #undef. This part should be reasonably feasable with lex+yacc.
The only alternative I can imagine would be to use the code of a real compiler (Clang should be easier than gcc but unsure) discard all code generation and consistently store every macro usage.
TL/DR: however you take it, it will be a hard work: if you can do without, keep away from that...
You can get half way there by using a preprocessor library such as Boost.Wave. It can act as a lexer so you wouldn't have to write that part yourself. You would have to supply a grammar for the bit you cared about (define, ifdef, ifndef, if, elif) though.

How are macros handled by preprocessor?

I am reading Efficient c++ (older version) and have some doubts.
Here, for example, it says:
When you do something like this
#define ASPECT_RATIO 1.653
the symbolic name ASPECT_RATIO may never be seen by the compilers; it may be removed by the preprocessors before the source code ever gets compiled. As a results the ASPECT_RATIO may never get entered to SYMBOLIC_TABLE. It an be confusing if you get an error during compilation involving the constant, because the error message may refer to 1.653 and not ASPECT_RATIO
I don't understand this paragraph.How can anything be removed the preprocessor, just like that. what could be the reasons and how feasible are they in real world.
Thanks
I don't understand this paragraph under inverted quotes.How can
anything be removed the preprocessor, just like that. what could be
the reasons and how feasible are they in real world.
Basically what it describes is exactly how C and C++ pre-processor works. The reason is to replace macros/constants (that are made using the #define directive) with their actual values, instead of repeating the same values over and over again. In C++ it is considered a bad style using C-style macros, but they're supported for C compatibility.
The preprocessor, as the name suggests, runs prior to the actual compilation, and is basically changing the source code as directed through the pre-processor directives (those starting with #). This also includes replacement of the macros with their values, the inclusion of the header files as directed by the #include directive, etc etc.
This is used in order to avoid code repetitions, magic numbers, to share interfaces (header files) and many other useful things.
It's simply a global search and replace of "ASPECT_RATIO" with "1.653" in the file before passing it to the compiler
That's why macros are so dangerous. If you have #define max 123 and a variable int max = 100 the compiler will get int 123 = 100 and you will get a confusing error message
The pre-processor will replace all instances of the token ASPECT_RATIO that appear in the code with the actual token 1.653 ... thus the compiler will never see the token ASPECT_RATIO. By the time it compiles the code, it only sees the literal token 1.653 that was substituted in by the pre-processor.
Basically the "problem" you will encounter with this approach is that ASPECT_RATIO will not be seen as a symbol by the compiler, thus in a debugger, etc., you can't query the value ASPECT_RATIO as-if it were a variable. It's not a value that will have a memory address like a static const int may have (I say "may", because an optimizing compiler may decide to act like the pre-processor, and optimize-out the need for an explicit memory address to store the constant value, instead simply substituting the literal value where-ever it appears in the code). In a larger function macro it also won't have an instruction address like actual C/C++ function will have, thus you can't set break-points inside a function macro. But in a more general sense I'm not sure I would call this a "problem" unless you were intending to use the macro as a debug-symbol, and/or set debugging break-points inside your macro. Otherwise the macro is doing its job.

Universally compiler independent way of implementing an UNUSED macro in C/C++

When implementing stubs etc. you want to avoid "unused variable" warnings. I've come across a few alternatives of UNUSED() macros over the years, but never one which either is proven to work for "all" compilers, or one which by standard is air tight.
Or are we stuck with #ifdef blocks for each build platform?
EDIT: Due to a number of answers with non c-compliant alternatives, I'd like to clarify that I'm looking for a definition which is valid for both C and C++, all flavours etc.
According to this answer by user GMan the typical way is to cast to void:
#define UNUSED(x) (void)(x)
but if x is marked as volatile that would enforce reading from the variable and thus have a side effect and so the actual way to almost guarantee a no-op and suppress the compiler warning is the following:
// use expression as sub-expression,
// then make type of full expression int, discard result
#define UNUSED(x) (void)(sizeof((x), 0))
In C++, just comment out the names.
void MyFunction(int /* name_of_arg1 */, float /* name_of_arg2*/)
{
...
}
The universal way is not to turn on warnings options that spam warnings for clearly-correct code. Any "unused variable" warning option that includes function arguments in its analysis is simply wrong and should be left off. Don't litter your code with ugliness to quiet broken compilers.
You might also try sending a bug report to the compiler maintainer/vendor.