std::mutex::lock() produces weird (and unnecessary) asm code - c++

I was checking generated asm for some of my code and my eye caught some interesting stuff:
#include <mutex>
std::mutex m;
void foo()
{
m.lock();
}
generated asm code (x86-64 gcc 9.2, -std=c++11 -O2):
foo():
mov eax, OFFSET FLAT:_ZL28__gthrw___pthread_key_createPjPFvPvE
test rax, rax
je .L10 // (1) we can simply bypass lock() call?
sub rsp, 8
mov edi, OFFSET FLAT:m
call __gthrw_pthread_mutex_lock(pthread_mutex_t*)
test eax, eax
jne .L14 // (2) waste of space that will never be executed
add rsp, 8
ret
.L10:
ret
.L14:
mov edi, eax
call std::__throw_system_error(int)
m:
.zero 40
Questions:
part (1) -- gcc specific:
what it is doing? (allocating TLS entry?)
how failing that operation allows us to silently bypass lock() call?
part (2) -- looks like each compiler is affected:
std::mutex::lock() can throw according to standard
... but it never does in correct code (as discussed in related SO posts), for all intents and purposes std::mutex::lock() is always noexcept in correct code
is it possible to let compiler know so that it stops emitting unnecessary tests and instruction blocks (like .L14 above)?
Note: I can't see how throwing from std::mutex::lock() is better than simply abort()ing. In both cases your program is screwed (no one expects it to fail), but at least in latter case you end up with considerably smaller asm code ("pay only for something you use", remember?).

It seems that you are misinterpreting the asm output. What you see is not the code of foo but the inlined code of mutex::lock.
From https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/mutex:
void lock() // in class mutex
{
int __e = __gthread_recursive_mutex_lock(&_M_mutex);
// EINVAL, EAGAIN, EBUSY, EINVAL, EDEADLK(may)
if (__e)
__throw_system_error(__e);
}
From https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.0/gthr-default_8h-source.html:
static inline int __gthread_recursive_mutex_lock (__gthread_recursive_mutex_t *mutex)
{
return __gthread_mutex_lock (mutex);
}
static inline int __gthread_mutex_lock (__gthread_mutex_t *mutex)
{
if (__gthread_active_p ())
return pthread_mutex_lock (mutex);
else
return 0;
}
The names do not exactly match your asm code, so I probably looked at a different libstdc++ source, but to me it looks like the compiler inlined mutex::lock into your function foo and it also inlined the functions that mutex::lock is calling.

Related

namespace in debug flags of in-class defined friend functions

I'm dealing with a class that defines a friend function in the class without outside declaration
namespace our_namespace {
template <typename T>
struct our_container {
friend our_container set_union(our_container const &, our_container const &) {
// meaningless for the example here, just a valid definition
// no valid semantics
return our_container{};
}
};
} // namespace our_namespace
As discussed (e.g. here or here) the function set_union is not in the our_namespace namespace but will be found by argument dependent lookup:
auto foo(std::vector<our_namespace::our_container<float>> in) {
// works:
return set_union(in[0], in[1]);
}
I noticed however that in the debug flags set_union appears to be in the our_namespace namespace
mov rdi, qword ptr [rbp - 40] # 8-byte Reload
mov rsi, rax
call our_namespace::set_union(our_namespace::our_container<float> const&, our_namespace::our_container<float> const&)
add rsp, 48
pop rbp
ret
our_namespace::set_union(our_namespace::our_container<float> const&, our_namespace::our_container<float> const&): # #our_namespace::set_union(our_namespace::our_container<float> const&, our_namespace::our_container<float> const&)
push rbp
mov rbp, rsp
mov qword ptr [rbp - 16], rdi
mov qword ptr [rbp - 24], rsi
pop rbp
ret
although I can't call it as our_namespace::set_union
auto foo(std::vector<our_namespace::our_container<float>> in) {
// fails:
return our_namespace::set_union(in[0], in[1]);
}
Any hints about how the debug information is to be understood?
EDIT: The set_union function body is only a strawdog example here to have a valid definition.
The C++ standard only defines compiler behavior in regards to the code compilation and behavior of the resulting program. It doesn't define all the aspects of code generation, and in particular, it doesn't define debug symbols.
So your compiler correctly (as per Standard) disallows calling the function through namespace it is not in. But since the function does exist and you should be able to debug it, it needs to put debug symbol somewhere. Enclosing namespace seems to be a reasonable choice.

clang ignoring attribute noinline

I expected __attribute__((noinline)), when added to a function, to make sure that that function gets emitted. This works with gcc, but clang still seems to inline it.
Here is an example, which you can also open on Godbolt:
namespace {
__attribute__((noinline))
int inner_noinline() {
return 3;
}
int inner_inline() {
return 4;
}
int outer() {
return inner_noinline() + inner_inline();
}
}
int main() {
return outer();
}
When build with -O3, gcc emits inner_noinline, but not inner_inline:
(anonymous namespace)::inner_noinline():
mov eax, 3
ret
main:
call (anonymous namespace)::inner_noinline()
add eax, 4
ret
Clang insists on inlining it:
main: # #main
mov eax, 7
ret
If adding a parameter to the functions and letting them perform some trivial work, clang respects the noinline attribute: https://godbolt.org/z/NNSVab
Shouldn't noinline be independent of how complex the function is? What am I missing?
__attribute__((noinline)) prevents the compiler from inlining the function. It doesn't prevent it from doing constant folding. In this case, the compiler was able to recognize that there was no need to call inner_noinline, either as an inline insertion or an out-of-line call. It could just replace the function call with the constant 3.
It sounds like you want to use the optnone attribute instead, to prevent the compiler from applying even the most obvious of optimizations (as this one is).

Referencing memory operands in .intel_syntax GNU C inline assembly

I'm catching a link error when compiling and linking a source file with inline assembly.
Here are the test files:
via:$ cat test.cxx
extern int libtest();
int main(int argc, char* argv[])
{
return libtest();
}
$ cat lib.cxx
#include <stdint.h>
int libtest()
{
uint32_t rnds_00_15;
__asm__ __volatile__
(
".intel_syntax noprefix ;\n\t"
"mov DWORD PTR [rnds_00_15], 1 ;\n\t"
"cmp DWORD PTR [rnds_00_15], 1 ;\n\t"
"je done ;\n\t"
"done: ;\n\t"
".att_syntax noprefix ;\n\t"
:
: [rnds_00_15] "m" (rnds_00_15)
: "memory", "cc"
);
return 0;
}
Compiling and linking the program results in:
via:$ g++ -fPIC test.cxx lib.cxx -c
via:$ g++ -fPIC lib.o test.o -o test.exe
lib.o: In function `libtest()':
lib.cxx:(.text+0x1d): undefined reference to `rnds_00_15'
lib.cxx:(.text+0x27): undefined reference to `rnds_00_15'
collect2: error: ld returned 1 exit status
The real program is more complex. The routine is out of registers so the flag rnds_00_15 must be a memory operand. Use of rnds_00_15 is local to the asm block. It is declared in the C code to ensure the memory is allocated on the stack and nothing more. We don't read from it or write to it as far as the C code is concerned. We list it as a memory input so GCC knows we use it and wire up the "C variable name" in the extended ASM.
Why am I receiving a link error, and how do I fix it?
Compile with gcc -masm=intel and don't try to switch modes inside the asm template string. AFAIK there's no equivalent before clang14 (Note: MacOS installs clang as gcc / g++ by default.)
Also, of course you need to use valid GNU C inline asm, using operands to tell the compiler which C objects you want to read and write.
Can I use Intel syntax of x86 assembly with GCC? clang14 supports -masm=intel like GCC
How to set gcc to use intel syntax permanently? clang13 and earlier didn't.
I don't believe Intel syntax uses the percent sign. Perhaps I am missing something?
You're getting mixed up between %operand substitutions into the Extended-Asm template (which use a single %), vs. the final asm that the assembler sees.
You need %% to use a literal % in the final asm. You wouldn't use "mov %%eax, 1" in Intel-syntax inline asm, but you do still use "mov %0, 1" or %[named_operand].
See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html. In Basic asm (no operands), there is no substitution and % isn't special in the template, so you'd write mov $1, %eax in Basic asm vs. mov $1, %%eax in Extended, if for some reason you weren't using an operand like mov $1, %[tmp] or mov $1, %0.
uint32_t rnds_00_15; is a local with automatic storage. Of course it there's no asm symbol with that name.
Use %[rnds_00_15] and compile with -masm=intel (And remove the .att_syntax at the end; that would break the compiler-generate asm that comes after.)
You also need to remove the DWORD PTR, because the operand-expansion already includes that, e.g. DWORD PTR [rsp - 4], and clang errors on DWORD PTR DWORD PTR [rsp - 4]. (GAS accepts it just fine, but the 2nd one takes precendence so it's pointless and potentially misleading.)
And you'll want a "=m" output operand if you want the compiler to reserve you some scratch space on the stack. You must not modify input-only operands, even if it's unused in the C. Maybe the compiler decides it can overlap something else because it's not written and not initialized (i.e. UB). (I'm not sure if your "memory" clobber makes it safe, but there's no reason not to use an early-clobber output operand here.)
And you'll want to avoid label name conflicts by using %= to get a unique number.
Working example (GCC and ICC, but not clang unfortunately), on the Godbolt compiler explorer (which uses -masm=intel depending on options in the dropdown). You can use "binary mode" (the 11010 button) to prove that it actually assembles after compiling to asm without warnings.
int libtest_intel()
{
uint32_t rnds_00_15;
// Intel syntax operand-size can only be overridden with operand modifiers
// because the expansion includes an explicit DWORD PTR
__asm__ __volatile__
( // ".intel_syntax noprefix \n\t"
"mov %[rnds_00_15], 1 \n\t"
"cmp %[rnds_00_15], 1 \n\t"
"je .Ldone%= \n\t"
".Ldone%=: \n\t"
: [rnds_00_15] "=&m" (rnds_00_15)
:
: // no clobbers
);
return 0;
}
Compiles (with gcc -O3 -masm=intel) to this asm. Also works with gcc -m32 -masm=intel of course:
libtest_intel:
mov DWORD PTR [rsp-4], 1
cmp DWORD PTR [rsp-4], 1
je .Ldone8
.Ldone8:
xor eax, eax
ret
I couldn't get this to work with clang: It choked on .intel_syntax noprefix when I left that in explicitly.
Operand-size overrides:
You have to use %b[tmp] to get the compiler to substitute in BYTE PTR [rsp-4] to only access the low byte of a dword input operand. I'd recommend AT&T syntax if you want to do much of this.
Using %[rnds_00_15] results in Error: junk '(%ebp)' after expression.
That's because you switched to Intel syntax without telling the compiler. If you want it to use Intel addressing modes, compile with -masm=intel so the compiler can substitute into the template with the correct syntax.
This is why I avoid that crappy GCC inline assembly at nearly all costs. Man I despise this crappy tool.
You're just using it wrong. It's a bit cumbersome, but makes sense and mostly works well if you understand how it's designed.
Repeat after me: The compiler doesn't parse the asm string at all, except to do text substitutions of %operand. This is why it doesn't notice your .intel_syntax noprefex and keeps substituting AT&T syntax.
It does work better and more easily with AT&T syntax though, e.g. for overriding the operand-size of a memory operand, or adding an offset. (e.g. 4 + %[mem] works in AT&T syntax).
Dialect alternatives:
If you want to write inline asm that doesn't depend on -masm=intel or not, use Dialect alternatives (which makes your code super-ugly; not recommended for anything other than wrapping one or two instructions):
Also demonstrates operand-size overrides
#include <stdint.h>
int libtest_override_operand_size()
{
uint32_t rnds_00_15;
// Intel syntax operand-size can only be overriden with operand modifiers
// because the expansion includes an explicit DWORD PTR
__asm__ __volatile__
(
"{movl $1, %[rnds_00_15] | mov %[rnds_00_15], 1} \n\t"
"{cmpl $1, %[rnds_00_15] | cmp %k[rnds_00_15], 1} \n\t"
"{cmpw $1, %[rnds_00_15] | cmp %w[rnds_00_15], 1} \n\t"
"{cmpb $1, %[rnds_00_15] | cmp %b[rnds_00_15], 1} \n\t"
"je .Ldone%= \n\t"
".Ldone%=: \n\t"
: [rnds_00_15] "=&m" (rnds_00_15)
);
return 0;
}
With Intel syntax, gcc compiles it to:
mov DWORD PTR [rsp-4], 1
cmp DWORD PTR [rsp-4], 1
cmp WORD PTR [rsp-4], 1
cmp BYTE PTR [rsp-4], 1
je .Ldone38
.Ldone38:
xor eax, eax
ret
With AT&T syntax, compiles to:
movl $1, -4(%rsp)
cmpl $1, -4(%rsp)
cmpw $1, -4(%rsp)
cmpb $1, -4(%rsp)
je .Ldone38
.Ldone38:
xorl %eax, %eax
ret

Selectively omit frame pointer in MSVC

In GCC i can selectively set optimization flags for specific function, so this:
void func() {}
generates:
func():
push rbp
mov rbp, rsp
nop
pop rbp
ret
And this:
__attribute__((optimize("-fomit-frame-pointer")))
void func() {}
generates:
func():
nop
ret
How can i do the same in visual studio?
There's a command line parameter to the compiler, /Oy, this makes the compiler to omit frame pointers. You can achieve the same with #pragma:
#pragma optimize("y", on)
int foo(int a) { // foo will be compiled with omitted frame pointers
return a;
}
#pragma optimize("y", off)
Here, foo() will be compiled with omitted frame pointers.
Note: As I see, you have to build an optimized build to make this option have an effect. So, either supply some optimization flag to the compiler (like "/Og"), or include "g" into the pragma: #pragma optimize("gy", ...)
(I've checked this with Visual Studio 2015)

Inline assembly troubles

I tried to compile with GCC inline assembly code which compiled fine with MSVC, but got the following errors for basic operations:
// var is a template variable in a C++ function
__asm__
{
mov edx, var //error: Register name not specified for %edx
push ebx //error: Register name not specified for %ebx
sub esp, 8 //error: Register name not specified for %esp
}
After looking through documentation covering the topic, I found out that I should probably convert (even if I am only interested in x86) Intel style assembly code to AT&T style. However, after trying to use AT&T style I got even more weird errors:
mov var, %edx //error: Expected primary-expression before % token
mov $var, edx //error: label 'LASM$$s' used but not defined
I should also note that I tried to use LLVM-GCC, but it failed miserably with internal errors after encountering inline assembly.
What should I do?
For Apple's gcc you want -fasm-blocks which allows you to omit gcc's quoting requirement for inline asm and also lets you use Intel syntax.
// test_asm.c
int main(void)
{
int var;
__asm__
{
mov edx,var
push ebx
sub esp,8
}
return 0;
}
Compile this with:
$ gcc -Wall -m32 -fasm-blocks test_asm.c -o test_asm
Tested with gcc 4.2.1 on OS X 10.6.
g++ inline assembler is much more flexible than MSVC, and much more complicated. It treats an asm directive as a pseudo-instruction, which has to be described in the language of the code generator. Here is a working sample from my own code (for MinGW, not Mac):
// int BNASM_Add (DWORD* result, DWORD* a, int len)
//
// result += a
int BNASM_Add (DWORD* result, DWORD* a, int len)
{
int carry ;
asm volatile (
".intel_syntax\n"
" clc\n"
" cld\n"
"loop03:\n"
" lodsd\n"
" adc [edx],eax\n"
" lea edx,[edx+4]\n" // add edx,4 without disturbing the carry flag
" loop loop03\n"
" adc ecx,0\n" // Return the carry flag (ecx known to be zero)
".att_syntax\n"
: "=c"(carry) // Output: carry in ecx
: "d"(result), "S"(a), "c"(len) // Input: result in edx, a in esi, len in ecx
) ;
return carry ;
}
You can find documentation at http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended-Asm.