Using inline ARM asm in Android NDK project - c++

I'm doing a little project using the Android NDK, and I must insert some asm code for ARM architecture.
Everything apart from the asm works just fine, but the asm code tells me that
Operand 1 should be an integer register
when conpiling simple code like
asm("mov r0, r0");
So, what is the problem? Is my computer trying to compile for x86_64 instead of ARM? If so, how should I change that?
Also, I've tried the x86_64 equivalent arm("mov rax, rax"); but the error is the same.

All your C sources are compiled for each architecture that is mentioned in your APP_ABI. So there is no point to wonder why ARM assembly was not unterstood by x64 compiler or vice versa.
Do not use inline assembly. It is much better to put all assembly stuff into dedicates *.S sources, that will be processed by as (NDK toolchains have it). That assembly sources should be placed into appropriate folders like arch-arm/, arch-x86/. Then you should add them to Android.mk properly:
LOCAL_SRC_FILES := arch-$(TARGET_ARCH)/my_awesome_code.S
$(TARGET_ARCH) helps to resolve path to appropriate source in correct and painless way.
P.S. Also standalone assembly gives more abilities to you than inline one. This is one more reason to avoid using of inline assembly. Moreover inline assembly syntax differs for compiler to compiler since it is not a part of standard.

Related

Step into standard library call with godbolt

I want to know how various compilers implement std::random_device, so I popped it into godbolt.
Unfortunately, the only thing it says is
std::random_device::operator()():
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov rdi, rax
call std::random_device::_M_getval()
leave
ret
which is not very helpful. How can I step into the _M_getval() call and examine the assembly there?
You can't "step into" functions; Godbolt isn't a debugger, it's a disassembler (in "binary" mode, otherwise a compiler asm-text output filter / viewer). Your program doesn't run, it just gets compiled. (And unless you choose the "binary" output option, it only compiles to asm, not to machine code, and doesn't actually link.)
But regardless of terminology, no, you can't get Godbolt to show you disassembly for whatever version of a library it happens to have installed.
Single-step the program on your desktop. (Compile with gcc -O3 -fno-plt to avoid having to step through PLT lazy dynamic linking.)
(I did, and libstdc++ 6.2.1 on Arch Linux runs cpuid in the constructor for std::random_device. If rdrand is available, it uses it on calls to _M_getval(). Figuring this out from just disassembly would have been tricky; there are several levels of function calls and branching, and without symbols it would have been hard to figure out what's what. My Skylake has rdseed available, but it didn't use it. Yes, as you commented, that would be a better choice.)
Different compilers can generate different versions of library functions from the same source, that's the main point of the compiler explorer's existence. And no, it doesn't have a separate version of libstdc++ compiled by every compiler in the dropdown.
There'd be no guarantee that the library code you saw would match what's on your desktop, or anything.
It does actually have x86-64 Linux libraries installed, though, so in theory it would be possible for Godbolt to give you an option to find and disassemble certain library functions, but that functionality does not exist currently. And would only work for targets where the "binary" option is available; I think for most of the cross-compile targets it only has headers not libraries. Or maybe there's some other reason it won't link and disassemble for non-x86 ISAs.
Using -static and binary mode shows stuff, but not what we want.
I tried compiling with -static -fno-plt -fno-exceptions -fno-rtti -nostartfiles -O3 -march=skylake (so rdrand and rdseed would be available in case they got inlined; they don't). -fno-plt is redundant with -static, but it's useful without to remove that clutter.
-static causes the library code to actually end up in the linked binary that Godbolt disassembles. But the output is limited to 500 lines, and the definition of std::random_device::_M_getval() happens not to be near the start of the file.
-nostartfiles avoids cluttering the binary with _start and so on from CRT startup files. I think Godbolt already filters these out of the disassembly, though, because you don't see them in the normal binary output (without -static). You're not going to run the program, so it doesn't matter that the linker couldn't find a _start symbol and just defaulted to putting the ELF entry point at the start of the .text section.
Despite compiling with -fno-exceptions -fno-rtti (so no unwind handler for your function is included), libstdc++ functions were compiled with exception handling enabled. So linking them pulls in boatloads of exception code. The static executable starts out with definitions for functions like std::__throw_bad_exception(): and std::__throw_bad_alloc():
BTW, without -fno-exceptions, there's also a get_random_seed() [clone .cold]: definition, which I think is an unwind handler. It's not a definition of your actual function. Near the start of the static binary is operator new(unsigned long) [clone .cold]: which again I think is libstdc++'s exception-handler code.
I think the .text.cold or .init sections got linked first, unfortunately, so none of the interesting functions are going to be visible in the first 500 lines.
Even if this had worked, it's only binary-mode disassembly, not compiler asm
Even with debug symbols, we wouldn't know which struct member was being accessed, just numeric offsets from registers, because objdump doesn't fill those in.
And with lots of branching, it's hard to follow complicated logic possibilities. Single-stepping at run-time automatically follows the actual path of execution.
Related:
How to remove "noise" from GCC/clang assembly output? about using Matt Godbolt's Compiler Explorer for things it is good for.
Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” is an excellent guide, and points out that you can clone the compiler-explorer repo and set it up locally with your own choices of compiler. You could even hack it to allow larger output, but that's still obviously a bad approach for this problem.

Assembler Messages: no such instruction when Compiling C++

I am attempting to compile a C++ code using gcc/5.3 on Scientific Linux release 6.7. I keep getting the following errors whenever I run my Makefile though:
/tmp/ccjZqIED.s: Assembler messages:
/tmp/ccjZqIED.s:768: Error: no such instruction: `shlx %rax,%rdx,%rdx'
/tmp/ccjZqIED.s:1067: Error: no such instruction: `shlx %rax,%rdx,%rdx'
/tmp/ccjZqIED.s: Assembler messages:
/tmp/ccjZqIED.s:6229: Error: no such instruction: `mulx %r10,%rcx,%rbx'
/tmp/ccjZqIED.s:6248: Error: no such instruction: `mulx %r13,%rcx,%rbx'
/tmp/ccjZqIED.s:7109: Error: no such instruction: `mulx %r10,%rcx,%rbx'
/tmp/ccjZqIED.s:7128: Error: no such instruction: `mulx %r13,%rcx,%rbx'
I've attmpted to follow the advice from this question with no change to my output:
Compile errors with Assembler messages
My compiler options are currently:
CXXFLAGS = -g -Wall -O0 -pg -std=c++11
Does anyone have any idea what could be causing this?
This means that GCC is outputting an instruction that your assembler doesn't support. Either that's coming from inline asm in the source code, or that shouldn't happen, and suggests that you have compiled GCC on a different machine with a newer assembler, then copied it to another machine where it doesn't work properly.
Assuming those instructions aren't used explicitly in an asm statement you should be able to tell GCC not to emit those instructions with a suitable flag such as -mno-avx (or whatever flag is appropriate to disable use of those particular instructions).
#jonathan-wakely's answer is correct in that the assembler, which your compiler invokes, does not understand the assembly code, which your compiler generates.
As to why that happens, there are multiple possibilities:
You installed the newer compiler by hand without also updating your assembler
Your compiler generates 64-bit instructions, but assembler is limited to 32-bit ones for some reason
Disabling AVX (-mno-avx) is unlikely to help, because it is not explicitly requested either -- there is no -march in the quoted CXXFLAGS. If it did help, then you did not show us all of the compiler flags -- it would've been best, if you simply included the entire compiler command-line.
If my suspicion is correct in 1. above, then you should build and/or install the latest binutils package, which will provide as aware of AVX instructions, among other things. You would then need to rebuild the compiler with the --with-as=/path/to/the/updated/as flag passed to configure.
If your Linux installation is 32-bit only (suspicion 2.), then you should not be generating 64-bit binaries at all. It is possible, but not trivial...
Do post the output of uname -a and your entire compiler command-line leading to the above error-messages.

-mimplicit-it compiler flag not recognized

I am attempting to compile a C++ library for a Tegra TK1. The library links to TBB, which I pulled using the package manager. During compilation I got the following error
/tmp/cc4iLbKz.s: Assembler messages:
/tmp/cc4iLbKz.s:9541: Error: thumb conditional instruction should be in IT block -- `strexeq r2,r3,[r4]'
A bit of googling and this question led me to try adding -mimplicit-it=thumb to CMAKE_CXX_FLAGS, but the compiler doesn't recognize it.
I am compiling on the tegra with kernal 3.10.40-grinch-21.3.4, and using gcc 4.8.4 compiler (thats what comes back when I type c++ -v)
I'm not sure what the initial error message means, though I think it has something to do with the TBB linked library rather than the source I'm compiling. The problem with the fix is also mysterious. Can anyone shed some light on this?
-mimplicit-it is an option to the assembler, not to the compiler. Thus, in the absence of specific assembler flags in your makefile (which you probably don't have, given that you don't appear to be using a separate assembler step), you'll need to use the -Wa option to the compiler to pass it through, i.e. -Wa,-mimplicit-it=thumb.
The source of the issue is almost certainly some inline assembly - possibly from a static inline in a header file if you're really only linking pre-built libraries - which contains conditionally-executed instructions (I'm going to guess its something like a cmpxchg implementation). Since your toolchain could well be configured to compile to the Thumb instruction set - which requires a preceding it (If-Then) instruction to set up conditional instructions - by default, another alternative might be to just compile with -marm (and/or remove -mthumb if appropriate) and sidestep the issue by not using Thumb at all.
Adding compiler option:
-wa
should solve the problem.

How can I stop gcc from emitting swap{b} on newer ARM cpus?

I'm compiling DCP-O-Matic on a Raspberry Pi 2 and am getting the following warning:
/tmp/ccu6rDcg.s: Assembler messages:
/tmp/ccu6rDcg.s:4208: Warning: swp{b} use is deprecated for ARMv6 and ARMv7
I've passed "-mcpu=cortex-a8 -mfpu=neon" to the compiler, but I still get the warnings. I'm pretty sure there is something in the Linux kernel that makes this warning irrelevant, but I would really like to solve this.
This post has a lot of good information, but I can't seem to find the right switches to prevent the warnings. I've verified that there is no explicit assembler code using swp{b}.
Can anyone recommend the best way to clear these warnings? I really am kind of anal about compilation warnings. ;) I figure if there's a warning, there's a fix.
To clarify, I'm interested in how to get the gcc toolchain to emit the correct LDREX/STREX instructions, rather than swap{b}.
You can disable the warning with -mno-warn-deprecated. A quick grep of the source code doesn't seem to show a use of inline asm, so perhaps it is in a header file of some library.
Incidentally, the Raspberry Pi 2 uses a Cortex-A7 processor, and you should get better performance if you build with -mcpu=cortex-a7 instead of -mcpu=cortex-a8.

Find out compilation optimization flag from executable

Here I have an executable without knowing its build environment, with the assumption of gcc/g++ being used.
Is there a way to find out the optimization flag used during compilation (like O0, O2, ...)?
All means are welcomed, no matter it's by analyzing the binary or some debug test via gdb (if we assume that -g flag is available during compilation).
If you are lucky, the command-line is present in the executable file itself, depending on the operating system and file format used. If it is an Elf-file, try to dump the content using the objdump from GNU binutils
I really don't know if this can help, but checking O0 is quite easy in the disassembly (objdump -d), since the generated code has no optimization at all and adds some few instructions to simplify debugging.
Typically on an x86, the prologue of a function includes saving the stack pointer (for the backtrace, I presume). So if you locate, for example, the main function, you should see something like:
... main:
... push %rbp
... mov %rsp,%rbp
And you should see this "pattern" at almost every beginning of the functions.
For other targets (I dunno what your target platform is), you should see more or less similar assembly sequences in the prologues or before the function calls.
For other optimization levels, things are way way trickier.
Sorry for remaining vague and not answering the entire question... Just hoping it will help.