I expected __attribute__((noinline)), when added to a function, to make sure that that function gets emitted. This works with gcc, but clang still seems to inline it.
Here is an example, which you can also open on Godbolt:
namespace {
__attribute__((noinline))
int inner_noinline() {
return 3;
}
int inner_inline() {
return 4;
}
int outer() {
return inner_noinline() + inner_inline();
}
}
int main() {
return outer();
}
When build with -O3, gcc emits inner_noinline, but not inner_inline:
(anonymous namespace)::inner_noinline():
mov eax, 3
ret
main:
call (anonymous namespace)::inner_noinline()
add eax, 4
ret
Clang insists on inlining it:
main: # #main
mov eax, 7
ret
If adding a parameter to the functions and letting them perform some trivial work, clang respects the noinline attribute: https://godbolt.org/z/NNSVab
Shouldn't noinline be independent of how complex the function is? What am I missing?
__attribute__((noinline)) prevents the compiler from inlining the function. It doesn't prevent it from doing constant folding. In this case, the compiler was able to recognize that there was no need to call inner_noinline, either as an inline insertion or an out-of-line call. It could just replace the function call with the constant 3.
It sounds like you want to use the optnone attribute instead, to prevent the compiler from applying even the most obvious of optimizations (as this one is).
Related
GCC doesn't output a warning when it's optimizer detects std::string being compiled with a nullptr. I found a workaround, just wondering if anything better. It takes advantage of the std::string implementation having an assert that fires when constructed with nullptr. So by compiling with optimisation and looking at the assembly, I can see that line.
Note, below is just an example, this isn't application code.
However, I wondered if there is a better way than what I do. I search for __throw_logic_error with that particular string.
I've pasted this from godbolt.org.
Can be compiled with GCC 12 as follows:
g++ -O1 -S -fverbose-asm -std=c++23 -Wall -o string.s string.cpp
#include <string>
void f()
{
const char * a = nullptr;
std::string s(a);
}
int main()
{
f();
}
.LC0:
.string "basic_string: construction from null is not valid"
f():
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
call std::__throw_logic_error(char const*)
main:
sub rsp, 8
call f()
Throwing an exception could be in principle a legal use.
(This is the problem with logic_error's, they shouldn't be exceptions.)
For this reason, I doubt that a compiler will ever complain about this, even if it can evaluate all at compile time.
Your best bet is a static analyzer.
The version of clang-tidy in godbolt, can detect this,
warning: The parameter must not be null [clang-analyzer-cplusplus.StringChecker]
https://godbolt.org/z/h9GnT9ojx
This was incorporated in clang-tidy 17, https://clang.llvm.org/docs/analyzer/checkers.html#cplusplus-stringchecker-c
In the lines of compiler warnings you could be lucky (eventually) with something like this, although I wasn't lucky.
void f() {
const char * a = nullptr;
try {
std::string s(a);
}catch(std::logic_error&) {
std::unreacheable(); // or equivalent for current GCC
}
}
I was checking generated asm for some of my code and my eye caught some interesting stuff:
#include <mutex>
std::mutex m;
void foo()
{
m.lock();
}
generated asm code (x86-64 gcc 9.2, -std=c++11 -O2):
foo():
mov eax, OFFSET FLAT:_ZL28__gthrw___pthread_key_createPjPFvPvE
test rax, rax
je .L10 // (1) we can simply bypass lock() call?
sub rsp, 8
mov edi, OFFSET FLAT:m
call __gthrw_pthread_mutex_lock(pthread_mutex_t*)
test eax, eax
jne .L14 // (2) waste of space that will never be executed
add rsp, 8
ret
.L10:
ret
.L14:
mov edi, eax
call std::__throw_system_error(int)
m:
.zero 40
Questions:
part (1) -- gcc specific:
what it is doing? (allocating TLS entry?)
how failing that operation allows us to silently bypass lock() call?
part (2) -- looks like each compiler is affected:
std::mutex::lock() can throw according to standard
... but it never does in correct code (as discussed in related SO posts), for all intents and purposes std::mutex::lock() is always noexcept in correct code
is it possible to let compiler know so that it stops emitting unnecessary tests and instruction blocks (like .L14 above)?
Note: I can't see how throwing from std::mutex::lock() is better than simply abort()ing. In both cases your program is screwed (no one expects it to fail), but at least in latter case you end up with considerably smaller asm code ("pay only for something you use", remember?).
It seems that you are misinterpreting the asm output. What you see is not the code of foo but the inlined code of mutex::lock.
From https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/mutex:
void lock() // in class mutex
{
int __e = __gthread_recursive_mutex_lock(&_M_mutex);
// EINVAL, EAGAIN, EBUSY, EINVAL, EDEADLK(may)
if (__e)
__throw_system_error(__e);
}
From https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.0/gthr-default_8h-source.html:
static inline int __gthread_recursive_mutex_lock (__gthread_recursive_mutex_t *mutex)
{
return __gthread_mutex_lock (mutex);
}
static inline int __gthread_mutex_lock (__gthread_mutex_t *mutex)
{
if (__gthread_active_p ())
return pthread_mutex_lock (mutex);
else
return 0;
}
The names do not exactly match your asm code, so I probably looked at a different libstdc++ source, but to me it looks like the compiler inlined mutex::lock into your function foo and it also inlined the functions that mutex::lock is calling.
I am using explicit register variables to pass parameters to a raw Linux syscall using registers that don't have machine-specific constraints (such as r8, r9, r10 on x86_64) as suggested here.
#include <asm/unistd.h>
#ifdef __i386__
#define _syscallOper "int $0x80"
#define _syscallNumReg "eax"
#define _syscallRetReg "eax"
#define _syscallReg1 "ebx"
#define _syscallReg2 "ecx"
#define _syscallReg3 "edx"
#define _syscallReg4 "esi"
#define _syscallReg5 "edi"
#define _syscallReg6 "ebp"
#define _syscallClob
#else
#define _syscallOper "syscall"
#define _syscallNumReg "rax"
#define _syscallRetReg "rax"
#define _syscallReg1 "rdi"
#define _syscallReg2 "rsi"
#define _syscallReg3 "rdx"
#define _syscallReg4 "r10"
#define _syscallReg5 "r8"
#define _syscallReg6 "r9"
#define _syscallClob "rcx", "r11"
#endif
template <typename Ret = long, typename T1>
Ret syscall(long num, T1 arg1)
{
register long _num __asm__(_syscallNumReg) = num;
register T1 _arg1 __asm__(_syscallReg1) = arg1;
register Ret _ret __asm__(_syscallRetReg);
__asm__ __volatile__(_syscallOper
: "=r"(_ret)
: "r"(_num), "r"(_arg1)
: _syscallClob);
return _ret;
}
extern "C" void _start()
{
syscall(__NR_exit, 0);
}
However this feature requires the use of register keyword which was deprecated in C++11 and removed in C++17. So when I compile this code with GCC 7 (-std=c++17 -nostdlib) it gives me a warning:
ISO C++1z does not allow ‘register’ storage class specifier [-Wregister]
and it seems to ignore the register allocation and the program segfaults because syscall wasn't called properly. This code however compiles and works fine in Clang 6. Note: I actually have 6 syscall functions (up to 6 arguments) but only 1-argument version is shown here for the sake of minimal example.
I realize that register keyword by itself wasn't really useful that's why it was removed, but this specific use case seems like an exception to me so it seems unreasonable to remove compiler support for it as well.
I also realize that this use case is compiler-specific (i.e. non-standard), so my question is about compiler support rather then removal from the standard.
It appears you've found a GCC bug: GNU register-asm local variables don't work inside template functions. (clang compiles your example correctly). Apparently this was already a known bug, thanks for #Florian for finding it.
-Wregister triggering is just a symptom of this first bug: GNU register-asm local variables don't trigger the warning. But in a template, gcc compiles them as they were plain register int foo = bar; without the asm part of the declaration. So GCC thought you were just using plain register variables, not register-asm.
In a regular function, your code compiles fine with no warnings, even with -std=c++17.
#define T1 unsigned long
#define Ret T1
// template <typename Ret = long, typename T1>
... your code unchanged ...
__asm__ __volatile__(_syscallOper " #operands in %0, %1, %2"
...
On Godbolt with gcc7.3 -O3:
_start:
movl $60, %eax
xorl %edx, %edx
syscall #operands in %rax, %rax, %edx
ret
But clang6.0 doesn't have this bug, and we get:
_start: # #_start
movl $60, %eax
xorl %edi, %edi
syscall #operands in %rax, %rax, %edi
retq
Notice the asm comment I appended to your template (with C++ string-literal concatenation). We can just get the compiler to tell us what it thinks it's doing, instead of having to puzzle things out.
(Mostly posting this answer to discuss that debugging technique; Florian's answer already covers the specifics of this actual case.)
Instead of templates, you can use MUSL's existing portable headers:
It's a C library, so it might need a bit of extra casting to keep a C++ compiler happy. Or avoiding use of temporary expressions as lvalues, in the ARM headers.
But it should take care of most of the issues Florian pointed out. It has a permissive license, so you can just copy its syscall wrapper headers into your project. They work without linking against the rest of MUSL, and are truly inline.
http://git.musl-libc.org/cgit/musl/tree/arch/x86_64/syscall_arch.h is the x86-64 version.
This looks like a GCC bug to me. The C++17 warning is a red herring. The code works fine with optimization for me (when compiled with GCC 7), but it breaks at -O0.
According to the documentation for local register variables, this is not expected, so this is likely a GCC bug. According to this bug report, it is not even related to optimization, but ultimately caused by the use of a template.
I suggest to overload only on the number of system call arguments in the ultimate system call wrapper, and use long types for all arguments and the result:
inline long syscall_base(long num, long arg1)
{
register long _num __asm__(_syscallNumReg) = num;
register long _arg1 __asm__(_syscallReg1) = arg1;
register long _ret __asm__(_syscallRetReg);
__asm__ __volatile__(_syscallOper
: "=r"(_ret)
: "r"(_num), "r"(_arg1)
: _syscallClob);
return _ret;
}
template <typename Ret = long, typename T1>
Ret syscall(long num, T1 arg1)
{
return (Ret) (syscall_base(num, (long) arg1));
}
You'll have to use something nicer for the casts (probably type-indexed conversion functions), and of course you still have to deal with the syscall ABI variance in some other way (x32 has long long instead of long, and POWER has two return registers instead of one, etc.), but that's also a problem with your original approach.
Strangest error output:
#include <iostream>
int main(int arg, char **LOC[])
{
asm
(
"mov eax, 0CF;"
"pusha;"
);
return 0;
}
It complains, and here is the error from GCC:
t.s: Assembler messages:
t.s:31: Error: too many memory references for `mov'
You get this error because your assembly is malformatted. Register accesses are done like %eax, $ is used for immediate operands. Furthermore, GCC, by default (see DanielKO's comment), uses the AT&T syntax, which has the destination on the right and the source on the left. Is this what you are looking for?
mov $0xcf, %eax
Also, your pusha is unbalanced, ie you don't clean up the stack correctly before you return from your function. It would be nice to know what your overall goal is, because right now it seems like you copied and pasted only an incomplete fraction of the source.
I tried to compile with GCC inline assembly code which compiled fine with MSVC, but got the following errors for basic operations:
// var is a template variable in a C++ function
__asm__
{
mov edx, var //error: Register name not specified for %edx
push ebx //error: Register name not specified for %ebx
sub esp, 8 //error: Register name not specified for %esp
}
After looking through documentation covering the topic, I found out that I should probably convert (even if I am only interested in x86) Intel style assembly code to AT&T style. However, after trying to use AT&T style I got even more weird errors:
mov var, %edx //error: Expected primary-expression before % token
mov $var, edx //error: label 'LASM$$s' used but not defined
I should also note that I tried to use LLVM-GCC, but it failed miserably with internal errors after encountering inline assembly.
What should I do?
For Apple's gcc you want -fasm-blocks which allows you to omit gcc's quoting requirement for inline asm and also lets you use Intel syntax.
// test_asm.c
int main(void)
{
int var;
__asm__
{
mov edx,var
push ebx
sub esp,8
}
return 0;
}
Compile this with:
$ gcc -Wall -m32 -fasm-blocks test_asm.c -o test_asm
Tested with gcc 4.2.1 on OS X 10.6.
g++ inline assembler is much more flexible than MSVC, and much more complicated. It treats an asm directive as a pseudo-instruction, which has to be described in the language of the code generator. Here is a working sample from my own code (for MinGW, not Mac):
// int BNASM_Add (DWORD* result, DWORD* a, int len)
//
// result += a
int BNASM_Add (DWORD* result, DWORD* a, int len)
{
int carry ;
asm volatile (
".intel_syntax\n"
" clc\n"
" cld\n"
"loop03:\n"
" lodsd\n"
" adc [edx],eax\n"
" lea edx,[edx+4]\n" // add edx,4 without disturbing the carry flag
" loop loop03\n"
" adc ecx,0\n" // Return the carry flag (ecx known to be zero)
".att_syntax\n"
: "=c"(carry) // Output: carry in ecx
: "d"(result), "S"(a), "c"(len) // Input: result in edx, a in esi, len in ecx
) ;
return carry ;
}
You can find documentation at http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended-Asm.