asm function with c++ - c++

I would like to add method to my class using assebler language. How can I do it?
example:
main.cpp
Struct ex {
int field1;
asm_method(char*);
}
add.asm
asm_method:
//some asm code

Get asm output the compiler generates for a non-inline definition of the C++ member function, and use that as a starting point for an asm source file. This works for any ISA with any compiler that can emit valid asm (which is most of them, although apparently MSVC emits a bunch of extra junk that you have to remove.)
Example with GCC (for x86-64 GNU/Linux, but works anywhere)
Also works with clang.
e.g. g++ -O3 -fverbose-asm -masm=intel -S -o foo_func.S foo.cpp (How to remove "noise" from GCC/clang assembly output?)
That .S file is now your asm source file. Remove the compiler-generated instruction lines and insert your own.
Obviously you need to know the calling convention and other stuff like that (e.g. for x86 see https://www.agner.org/optimize/#manuals for a calling convention guide), but this will get the compiler to do the name mangling for you, for that specific target platform's ABI.
struct ex { // lower case struct not Struct
int field1;
void *asm_method(char*); // methods need a return type
}; // struct declarations end with a ;
void *ex::asm_method(char*) {
return this; // easy way to find out what register `this` is passed in.
}
compiles as follows for x86-64 System V, with g++ -O3 (Godbolt with Linux gcc and Windows MSVC)
# x86-64 System V: GNU/Linux g++ -O3
# This is GAS syntax
.intel_syntax noprefix
.text # .text section is already the default at top of file
.align 2
.p2align 4 # aligning functions by 16 bytes is typical
.globl _ZN2ex10asm_methodEPc # the symbol is global, not private to this file
.type _ZN2ex10asm_methodEPc, #function # (optional) and it's a function.
_ZN2ex10asm_methodEPc: # a label defines the symbol
.cfi_startproc
## YOUR CODE GOES HERE ##
## RSP-8 is aligned by 16 in x86-64 SysV and Windows ##
mov rax, rdi # copy first arg (this) to return-value register.
ret # pop into program counter
.cfi_endproc
.size _ZN2ex10asm_methodEPc, .-_ZN2ex10asm_methodEPc # maybe non-optional for dynamic linking
It's probably fine to omit the .cfi stack-unwind directives from hand-written asm for leaf functions, since you're not going to be throwing C++ exceptions from hand-written asm (I hope).

This depends on your target platform and compiler/toolchain and is generally too broad a question for StackOverflow.
For example, the C++ compiler in the GCC toolchain actually generates assembly from C++, and then produces object files from that assembly. Then the linker links together multiple object files to produce an ELF module.
You can bypass the C++ compilation step for a single object file and directly write .asm files.
You can compile it the same way you compile .c: gcc myfile.S -o myfile.o.
Though you should take platform ABI into account such that you can accept function arguments and return values via the correct registers. The platform ABI also specifies the calling convention and which registers should be preserved across function calls. Finally, you need to produce correct function names according to C++ name mangling rules, or use C naming rules (which are simpler) and declare your function extern "C".
For more details see C++ to ASM linkage and for Linux ABI refer to System V ABI.
For Windows start here: calling conventions and compiling assembly in Visual Studio.

Related

What does DT_TEXTREL mean and how to solve? [duplicate]

64 bit Linux uses the small memory model by default, which puts all code and static data below the 2GB address limit. This makes sure that you can use 32-bit absolute addresses. Older versions of gcc use 32-bit absolute addresses for static arrays in order to save an extra instruction for relative address calculation. However, this no longer works. If I try to make a 32-bit absolute address in assembly, I get the linker error:
"relocation R_X86_64_32S against `.data' can not be used when making a shared object; recompile with -fPIC".
This error message is misleading, of course, because I am not making a shared object and -fPIC doesn't help.
What I have found out so far is this: gcc version 4.8.5 uses 32-bit absolute addresses for static arrays, gcc version 6.3.0 doesn't. version 5 probably doesn't either. The linker in binutils 2.24 allows 32-bit absolute addresses, verson 2.28 does not.
The consequence of this change is that old libraries have to be recompiled and legacy assembly code is broken.
Now I want to ask: When was this change made? Is it documented somewhere? And is there a linker option that makes it accept 32-bit absolute addresses?
Your distro configured gcc with --enable-default-pie, so it's making position-independent executables by default, (allowing for ASLR of the executable as well as libraries). Most distros are doing that, these days.
You actually are making a shared object: PIE executables are sort of a hack using a shared object with an entry-point. The dynamic linker already supported this, and ASLR is nice for security, so this was the easiest way to implement ASLR for executables.
32-bit absolute relocation aren't allowed in an ELF shared object; that would stop them from being loaded outside the low 2GiB (for sign-extended 32-bit addresses). 64-bit absolute addresses are allowed, but generally you only want that for jump tables or other static data, not as part of instructions.1
The recompile with -fPIC part of the error message is bogus for hand-written asm; it's written for the case of people compiling with gcc -c and then trying to link with gcc -shared -o foo.so *.o, with a gcc where -fPIE is not the default. The error message should probably change because many people are running into this error when linking hand-written asm.
How to use RIP-relative addressing: basics
Always use RIP-relative addressing for simple cases where there's no downside. See also footnote 1 below and this answer for syntax. Only consider using absolute addressing when it's actually helpful for code-size instead of harmful. e.g. NASM default rel at the top of your file.
AT&T foo(%rip) or in GAS .intel_syntax noprefix use [rip + foo].
Disable PIE mode to make 32-bit absolute addressing work
Use gcc -fno-pie -no-pie to override this back to the old behaviour. -no-pie is the linker option, -fno-pie is the code-gen option. With only -fno-pie, gcc will make code like mov eax, offset .LC0 that doesn't link with the still-enabled -pie.
(clang can have PIE enabled by default, too: use clang -fno-pie -nopie. A July 2017 patch made -no-pie an alias for -nopie, for compat with gcc, but clang4.0.1 doesn't have it.)
Performance cost of PIE for 64-bit (minor) or 32-bit code (major)
With only -no-pie, (but still -fpie) compiler-generated code (from C or C++ sources) will be slightly slower and larger than necessary, but will still be linked into a position-dependent executable which won't benefit from ASLR. "Too much PIE is bad for performance" reports an average slowdown of 3% for x86-64 on SPEC CPU2006 (I don't have a copy of the paper so IDK what hardware that was on :/). But in 32-bit code, the average slowdown is 10%, worst-case 25% (on SPEC CPU2006).
The penalty for PIE executables is mostly for stuff like indexing static arrays, as Agner describes in the question, where using a static address as a 32-bit immediate or as part of a [disp32 + index*4] addressing mode saves instructions and registers vs. a RIP-relative LEA to get an address into a register. Also 5-byte mov r32, imm32 instead of 7-byte lea r64, [rel symbol] for getting a static address into a register is nice for passing the address of a string literal or other static data to a function.
-fPIE still assumes no symbol-interposition for global variables / functions, unlike -fPIC for shared libraries which have to go through the GOT to access globals (which is yet another reason to use static for any variables that can be limited to file scope instead of global). See The sorry state of dynamic libraries on Linux.
Thus -fPIE is much less bad than -fPIC for 64-bit code, but still bad for 32-bit because RIP-relative addressing isn't available. See some examples on the Godbolt compiler explorer. On average, -fPIE has a very small performance / code-size downside in 64-bit code. The worst case for a specific loop might only be a few %. But 32-bit PIE can be much worse.
None of these -f code-gen options make any difference when just linking,
or when assembling .S hand-written asm. gcc -fno-pie -no-pie -O3 main.c nasm_output.o is a case where you want both options.
Checking your GCC config
If your GCC was configured this way, gcc -v |& grep -o -e '[^ ]*pie' prints --enable-default-pie. Support for this config option was added to gcc in early 2015. Ubuntu enabled it in 16.10, and Debian around the same time in gcc 6.2.0-7 (leading to kernel build errors: https://lkml.org/lkml/2016/10/21/904).
Related: Build compressed x86 kernels as PIE was also affected by the changed default.
Why doesn't Linux randomize the address of the executable code segment? is an older question about why it wasn't the default earlier, or was only enabled for a few packages on older Ubuntu before it was enabled across the board.
Note that ld itself didn't change its default. It still works normally (at least on Arch Linux with binutils 2.28). The change is that gcc defaults to passing -pie as a linker option, unless you explicitly use -static or -no-pie.
In a NASM source file, I used a32 mov eax, [abs buf] to get an absolute address. (I was testing if the 6-byte way to encode small absolute addresses (address-size + mov eax,moffs: 67 a1 40 f1 60 00) has an LCP stall on Intel CPUs. It does.)
nasm -felf64 -Worphan-labels -g -Fdwarf testloop.asm &&
ld -o testloop testloop.o # works: static executable
gcc -v -nostdlib testloop.o # doesn't work
...
..../collect2 ... -pie ...
/usr/bin/ld: testloop.o: relocation R_X86_64_32 against `.bss' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
gcc -v -no-pie -nostdlib testloop.o # works
gcc -v -static -nostdlib testloop.o # also works: -static implies -no-pie
GCC can also make a "static PIE" with -static-pie; ASLRed by no dynamic libraries or ELF interpreter. Not the same thing as -static -pie - those conflict with each other (you get a static non-PIE) although it might possibly get changed.
related: building static / dynamic executables with/without libc, defining _start or main.
Checking if an existing executable is PIE or not
This has also been asked at: How to test whether a Linux binary was compiled as position independent code?
file and readelf say that PIEs are "shared objects", not ELF executables. ELF-type EXEC can't be PIE.
$ gcc -fno-pie -no-pie -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB executable, ...
$ gcc -O3 hello.c
$ file a.out
a.out: ELF 64-bit LSB shared object, ...
## Or with a more recent version of file:
a.out: ELF 64-bit LSB pie executable, ...
gcc -static-pie is a special thing that GCC doesn't do by default, even with -nostdlib. It shows up as LSB pie executable, dynamically linked with current versions of file. (See What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd?). It has ELF-type DYN, but readelf shows no .interp, and ldd will tell you it's statically linked. GDB starti and /proc/maps confirms that execution starts at the top of its _start, not in an ELF interpreter.
Semi-related (but not really): another recent gcc feature is gcc -fno-plt. Finally calls into shared libraries can be just call [rip + symbol#GOTPCREL] (AT&T call *puts#GOTPCREL(%rip)), with no PLT trampoline.
The NASM version of this is call [rel puts wrt ..got]
as an alternative to call puts wrt ..plt. See Can't call C standard library function on 64-bit Linux from assembly (yasm) code. This works in a PIE or non-PIE, and avoids having the linker build a PLT stub for you.
Some distros have started enabling it. It also avoids needing writeable + executable memory pages so it's good for security against code-injection. (I think modern PLT implementation's don't need that either, just updating a GOT pointer not rewriting a jmp rel32 instruction, so there might not be a security difference.)
It's a significant speedup for programs that make a lot of shared-library calls, e.g. x86-64 clang -O2 -g compiling tramp3d goes from 41.6s to 36.8s on whatever hardware the patch author tested on. (clang is maybe a worst-case scenario for shared library calls, making lots of calls to small LLVM library functions.)
It does require early binding instead of lazy dynamic linking, so it's slower for big programs that exit right away. (e.g. clang --version or compiling hello.c). This slowdown could be reduced with prelink, apparently.
This doesn't remove the GOT overhead for external variables in shared library PIC code, though. (See the godbolt link above).
Footnotes 1
64-bit absolute addresses actually are allowed in Linux ELF shared objects, with text relocations to allow loading at different addresses (ASLR and shared libraries). This allows you to have jump tables in section .rodata, or static const int *foo = &bar; without a runtime initializer.
So mov rdi, qword msg works (NASM/YASM syntax for 10-byte mov r64, imm64, aka AT&T syntax movabs, the only instruction which can use a 64-bit immediate). But that's larger and usually slower than lea rdi, [rel msg], which is what you should use if you decide not to disable -pie. A 64-bit immediate is slower to fetch from the uop cache on Sandybridge-family CPUs, according to Agner Fog's microarch pdf. (Yes, the same person who asked this question. :)
You can use NASM's default rel instead of specifying it in every [rel symbol] addressing mode. See also Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array for some more description of avoiding 32-bit absolute addressing. OS X can't use 32-bit addresses at all, so RIP-relative addressing is the best way there, too.
In position-dependent code (-no-pie), you should use mov edi, msg when you want an address in a register; 5-byte mov r32, imm32 is even smaller than RIP-relative LEA, and more execution ports can run it.

Step into standard library call with godbolt

I want to know how various compilers implement std::random_device, so I popped it into godbolt.
Unfortunately, the only thing it says is
std::random_device::operator()():
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov rdi, rax
call std::random_device::_M_getval()
leave
ret
which is not very helpful. How can I step into the _M_getval() call and examine the assembly there?
You can't "step into" functions; Godbolt isn't a debugger, it's a disassembler (in "binary" mode, otherwise a compiler asm-text output filter / viewer). Your program doesn't run, it just gets compiled. (And unless you choose the "binary" output option, it only compiles to asm, not to machine code, and doesn't actually link.)
But regardless of terminology, no, you can't get Godbolt to show you disassembly for whatever version of a library it happens to have installed.
Single-step the program on your desktop. (Compile with gcc -O3 -fno-plt to avoid having to step through PLT lazy dynamic linking.)
(I did, and libstdc++ 6.2.1 on Arch Linux runs cpuid in the constructor for std::random_device. If rdrand is available, it uses it on calls to _M_getval(). Figuring this out from just disassembly would have been tricky; there are several levels of function calls and branching, and without symbols it would have been hard to figure out what's what. My Skylake has rdseed available, but it didn't use it. Yes, as you commented, that would be a better choice.)
Different compilers can generate different versions of library functions from the same source, that's the main point of the compiler explorer's existence. And no, it doesn't have a separate version of libstdc++ compiled by every compiler in the dropdown.
There'd be no guarantee that the library code you saw would match what's on your desktop, or anything.
It does actually have x86-64 Linux libraries installed, though, so in theory it would be possible for Godbolt to give you an option to find and disassemble certain library functions, but that functionality does not exist currently. And would only work for targets where the "binary" option is available; I think for most of the cross-compile targets it only has headers not libraries. Or maybe there's some other reason it won't link and disassemble for non-x86 ISAs.
Using -static and binary mode shows stuff, but not what we want.
I tried compiling with -static -fno-plt -fno-exceptions -fno-rtti -nostartfiles -O3 -march=skylake (so rdrand and rdseed would be available in case they got inlined; they don't). -fno-plt is redundant with -static, but it's useful without to remove that clutter.
-static causes the library code to actually end up in the linked binary that Godbolt disassembles. But the output is limited to 500 lines, and the definition of std::random_device::_M_getval() happens not to be near the start of the file.
-nostartfiles avoids cluttering the binary with _start and so on from CRT startup files. I think Godbolt already filters these out of the disassembly, though, because you don't see them in the normal binary output (without -static). You're not going to run the program, so it doesn't matter that the linker couldn't find a _start symbol and just defaulted to putting the ELF entry point at the start of the .text section.
Despite compiling with -fno-exceptions -fno-rtti (so no unwind handler for your function is included), libstdc++ functions were compiled with exception handling enabled. So linking them pulls in boatloads of exception code. The static executable starts out with definitions for functions like std::__throw_bad_exception(): and std::__throw_bad_alloc():
BTW, without -fno-exceptions, there's also a get_random_seed() [clone .cold]: definition, which I think is an unwind handler. It's not a definition of your actual function. Near the start of the static binary is operator new(unsigned long) [clone .cold]: which again I think is libstdc++'s exception-handler code.
I think the .text.cold or .init sections got linked first, unfortunately, so none of the interesting functions are going to be visible in the first 500 lines.
Even if this had worked, it's only binary-mode disassembly, not compiler asm
Even with debug symbols, we wouldn't know which struct member was being accessed, just numeric offsets from registers, because objdump doesn't fill those in.
And with lots of branching, it's hard to follow complicated logic possibilities. Single-stepping at run-time automatically follows the actual path of execution.
Related:
How to remove "noise" from GCC/clang assembly output? about using Matt Godbolt's Compiler Explorer for things it is good for.
Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” is an excellent guide, and points out that you can clone the compiler-explorer repo and set it up locally with your own choices of compiler. You could even hack it to allow larger output, but that's still obviously a bad approach for this problem.

When we compile a source code that contains a 'main' without linking, why can't we run it?

I am learning about compiling process and I know that linking is mainly used to link a binary file which contains a 'main' function with other binary files that contain other helper functions that are used in our main functions.
However when I try to run an object file with the code:
int main() {
return 0;
}
Compiled with the -c command in gcc on Ubuntu, I try to run it and I get the error:
"bash: ./source.o: cannot execute binary file: Exec format error"
Read Levine's Linkers & Loaders.
Read about ELF.
Try compiling with gcc -v (you'll see what are the actual programs used: cc1 to compile C code into some assembler, as to assemble that into some object file, ld & collect2 to link). Look also at the generated assembler file with gcc -S -fverbose-asm -O. Notice that gcc knows about (and compiles specially) the main function. And the starting point of your executable is provided by some crt0, etc (it is not main but some _start routine coded in assembler which calls your main....).
Object files are not the same as executables. The executable contains stuff like crt0 and the C standard library, or some way to dynamically link it as a shared object (and you need to link your source.o -compiled from your empty main in source.c- into an executable because of that).
On Linux, play with objdump(1) & readelf(1) (on some existing binaries, and also on your source.o object file)
See also elf(5), execve(2), ld-linux(8), Linux assembly howto, syscalls(2), Advanced Linux Programming, Operating Systems: Three Easy Pieces, and (to understand about libc.so) Drepper's How To Write Shared Libraries, the Dragon Book ...
(you need to read entire books to understand the details; I gave some references)
Look also into Common Lisp & SBCL. Its compiler has a very different model (really different from C).
You dont have a bootstrap. you are in this chicken and egg problem.
The code (for that function) is there, but there are assumptions, first and foremost you need a stack. Depending on the architecture your return address may be on that stack for example. The return value may be on that stack. The C language itself doesnt provide for that directly in the language there is always at least a little bit of assembly or some other language required in order to "bootstrap" your function. For example in ARM for gnu:
bs.s
.globl _start
_start:
mov sp,#0x8000
bl main
b .
so.c
int main ( void )
{
return(0);
}
For ARM the function is complete the instructions dont need to be modified by the linker. but there is no address space defined, either specified or the disassembler assumes zero as the address for this object, but it is an object not a loadable binary.
00000000 <main>:
0: e3a00000 mov r0, #0
4: e12fff1e bx lr
now if we add the bootstrap and link to some address we get a real, executable, program
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <main>
8008: eafffffe b 8008 <_start+0x8>
0000800c <main>:
800c: e3a00000 mov r0, #0
8010: e12fff1e bx lr
It doesnt mean one couldnt craft an operating system nor an environment where you could load functions in this way, using the compilers object output. But that is the reason for the word chain, tool chain. Compiler makes assembly language, the assembler assembles the assembly language, combined with other necessary objects (bootstrap plus compiler libraries plus C libraries, etc) the linker defines the address spaces for everything and modifies the code/data as needed to resolve externals. A sequence or chain of events to get the final result.
Even the most basic commands like exit aren't directly in the language and need to be linked.
http://en.cppreference.com/w/c/program/exit

Building C++ and Assembly source in Xcode

I'm trying to build a command-line application in Xcode for OS X 10.9 which contains a .cpp source file, which uses a function externally defined in a .asm assembly file. The C++ code:
#include <iostream>
using namespace std;
extern "C" void NOTHING();
int main(){
NOTHING();
return 0;
}
The following is the assembly function:
global NOTHING
section .text
NOTHING:
mov eax, 0
ret
It's a program that does nothing but temporarily move the value zero into the EAX register. I made sure to choose NASM assembly when creating the .asm source file. When I hit the 'play' button to build the executable, Xcode simply states build failed, without specifying a reason.
I could revert back to doing it all in the command line, as I would on Linux. However, if possible, I'd prefer to start using Xcode, as it combines many tools, e.g. Git, into a single application for development.
EDIT: After the answer, I have decided to abandon Xcode; the command line is just much simpler. Based on the answer, I have written the following 'makefile' for future users visiting the question:
test: main.cpp asm.o
g++ -stdlib=libstdc++ main.cpp asm.o -o test
asm.o: asm.asm
nasm -f macho64 asm.asm -o asm.o
which assumes the assembly file is 'asm.asm', the C++ 'main.cpp', and the executable created is named 'test.' As in the answer, make sure functions in the .asm file begin with an underscore.
You need to specify -f macho64 - for 64 bit x86-64 Mach-O object files. As you've already seen, Mach-O (function) symbols are prefixed with an underscore. So if you give the function definition NOTHING, you must provide the global _NOTHING in the assembly.
Also, a function with "C" linkage should be specified as: void NOTHING (void);

Can we inspect an object file for presence of temporaries introduced by C++ compiler?

Is there a way to inspect object file generated from code below ( file1.o ) for presence of compiler introduced temporary? What tools can we use to obtain such info from object files?
//file1.cpp
void func(const int& num){}
int main(){ func(2); }
The easiest way I can think of to do this is to load up a program that uses the object file and disassemble the function in the debugger. The program code you posted would work fine for this. Just break on the call to func and then display the assembler when you single-step into the function.
In a more complex program you can usually display the assembler code for a given function by name. Check your debugger documentation for how to do this. On Windows (Visual Studio) you can open the Disassembly window and enter the name of the function to display the assembler code.
If you have the source, most compilers allow you to output assembler, sometimes mixed with the source code. For Visual C++ this is /Fa.
If you're on an ELF system and have GNU binutils you can call readelf, probably with the -s switch.
If you have the source available, it is probably easier to look at the assembler file generated by the compiler (-save-temps for gcc). Otherwise, objdump is your friend.
You can use clang -cc1 --ast-print-xml to get a XML representation of a translation unit. The presence of temporaries can be easily detected from the AST.