How would I properly include nasm functions in cpp project? - c++

I am trying to include a function I have coded in nasm into a project that is in c++.
When I try to include the function in the cpp file using the extern keyword and compile the project, linker (ld) fails with "undefined reference to `rnd_by_seed(unsigned long long)".
Almost as if there was missing some reference in cmakelists file.
cmake_minimum_required(VERSION 3.23)
project(App)
set(CMAKE_CXX_STANDARD 23)
#enable_language(ASM_NASM)
set(NASM_COMPILER nasm)
set(CMAKE_ASM_NASM_OBJECT_FORMAT elf64)
set(CMAKE_ASM_NASM_COMPILE_OBJECT "<CMAKE_ASM_NASM_COMPILER> <INCLUDES> <FLAGS>
-f ${CMAKE_ASM_NASM_OBJECT_FORMAT} -o <OBJECT> <SOURCE>")
set(SOURCE_FILES main.cpp Tools/rnd_byseed.asm)
add_executable(App ${SOURCE_FILES})
here's the nasm function I am trying to execute:
global rnd_by_seed
;------------------------------------------
; return random number by seed value
; #param rdi seed value (unsigned 64bit integer)
; #return rax random number (64bit unsigned)
rnd_by_seed:
push rbp
mov rbp, rsp
mov eax, dword [rsp+8]
mov ebx, 16807
mul ebx
mov esi, edx
mov edi, eax
mov eax, dword [rsp+4]
mul ebx
add eax, esi
adc edx, 0
shl eax, 1
rcl edx, 1
shr eax, 1
add edx, edi
adc eax, 0
xchg eax, edx
test edx, 80000000h
jz Store
and edx, 7fffffffh
add eax, 1
adc edx, 0
Store:
mov dword [rsp+4], edx
mov dword [rsp+8], eax
mov rax, qword [rsp+8]
add rsp, 16
leave
ret
and main.cpp:
#include <iostream>
extern "C" unsigned long long rnd_by_seed(unsigned long long seed);
int main()
{
std::cout << "Hello, World!" << std::endl;
// call rnd_rnd_by_seed function
auto val = rnd_by_seed(12);
printf("val = %llu", val);
return 0;
}
I tried to look at different methods for including nasm files into cmakelists file, got worse results. This is the only cmakelists file that does not fail when reloading the project.

You're almost there, just need to uncomment the enable_language(ASM_NASM) line.
enable_language(ASM_NASM)
set(CMAKE_ASM_NASM_OBJECT_FORMAT elf64)
set(CMAKE_ASM_NASM_COMPILE_OBJECT "<CMAKE_ASM_NASM_COMPILER> <INCLUDES> <FLAGS -f ${CMAKE_ASM_NASM_OBJECT_FORMAT} -o <OBJECT> <SOURCE>")
Or enable ASM_NASM at project level.
project(App C CXX ASM_NASM)
However, your assembly looks wrong - assuming x64 SysV ABI, I see at least the following issues:
the first 64-bit argument is passed in via RDI, not stack
the value of register RBX must be preserved
writing to [rsp+4], [rsp+8] will damade the stack
add rsp, 16 is bogus

Related

Found error in GNU Compiler. Later version?

I have found that this code causes a startling error in the gnu C++ compiler when it is optimizing.
#include <stdio.h>
int main()
{
int a = 333666999, b = 0;
for (short i = 0; i<7; ++i)
{
b += a;
printf("%d ", b);
}
return 9;
}
To compile using g++ -Os fail.cpp the executable does not print seven numbers, it goes on forever, printing and printing. I am using -
-rwxr-xr-x 4 root root 700388 Jun 3 2013 /usr/bin/g++
Is there a later corrected version?
The compiler is very, very rarely wrong. In this case, b is overflowing, which is undefined behaviour for signed integers:
$ g++ --version
g++ (GCC) 10.2.0
...
$ g++ -Os -otest test.cpp
test.cpp: In function ‘int main()’:
test.cpp:8:11: warning: iteration 6 invokes undefined behavior [-Waggressive-loop-optimizations]
8 | b += a;
| ~~^~~~
test.cpp:6:24: note: within this loop
6 | for (short i = 0; i<7; ++i)
| ~^~
And if you invoke undefined behaviour, the compiler is free to do whatever it likes, including making your program never terminate.
Edit: Some people seem to think that the UB should only affect the value of b, but not the loop iteration. This is not according to the Standard (UB can cause literally anything to happen) but it's a reasonable thought, so let's look at the generated assembly to see why the loop doesn't terminate.
First without -Os:
.LC0:
.string "%d "
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-12], 333666999
mov DWORD PTR [rbp-4], 0
mov WORD PTR [rbp-6], 0
.L3:
cmp WORD PTR [rbp-6], 6 # Compare i to 6
jg .L2 # If greater, jump to end
mov eax, DWORD PTR [rbp-12]
add DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
movzx eax, WORD PTR [rbp-6]
add eax, 1
mov WORD PTR [rbp-6], ax
jmp .L3
.L2:
mov eax, 9
leave
ret
Then with -Os:
.LC0:
.string "%d "
main:
push rbx
xor ebx, ebx
.L2:
add ebx, 333666999
mov edi, OFFSET FLAT:.LC0
xor eax, eax
mov esi, ebx
call printf
jmp .L2
The comparison and jump instructions are completely gone. Ironically, the compiler did exactly what you asked it to do: optimize for size, so remove as many instructions as it can while obeying the C++ standard. -O3 and -O2 generate the exact same code as -Os here.
-O1 generates a very interesting output:
.LC0:
.string "%d "
main:
push rbx
mov ebx, 0
.L2:
add ebx, 333666999
mov esi, ebx
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
cmp ebx, -1959298303
jne .L2
mov eax, 9
pop rbx
ret
Here, the compiler optimized away the loop counter i and just compares the value of b to its final value after 7 iterations, using the fact that signed overflow happens according to two's complement on this platform! Cheeky, isn't it? :)
I am using g++ version 4.8.1. Thomas has version 10.2.0 which evidently puts out a warning about "undefined behavior" when adding two signed integers. However, being only a warning still goes ahead and compiles the program. In all circumstances though, the "undefined behavior" should only be concerning the integers being added. In practice those integers in fact do abide by the 2's complement expected result. The "undefined behavior" should not overwrite other variables in the program. Otherwise the executable cannot be trusted at all. And if it cannot be trusted it shouldn't be compiled. Perhaps there is an even later version of the gnu compiler that works correctly when optimizing?

C++ variable reset to 0 after calling x64 assembly function

I'm trying to call x64 assembly function from C++ code with four parameters and the assembly function reset the first parameter to zero every time. Please find the code snippet below.
C++ code: test.cpp
#include <iostream>
extern "C" int IntegerShift_(unsigned int a, unsigned int* a_shl, unsigned int* a_shr, unsigned int count);
int main(int argc, char const *argv[])
{
unsigned int a = 3119, count = 6, a_shl, a_shr;
std::cout << "a value before calling " << a << std::endl;
IntegerShift_(a, &a_shl, &a_shr, count);
std::cout << "a value after calling " << a << std::endl;
return 0;
}
x64 assembly code: test.asm
section .data
section .bss
section .text
global IntegerShift_
IntegerShift_:
;prologue
push rbp
mov rbp, rsp
mov rax, rdi
shl rax, cl
mov [rsi], rax
mov rax, rdi
shr rax, cl
mov [rdx], rax
xor rax,rax
;epilogue
mov rbp, rsp
pop rbp
ret
I'm working on the below environment.
OS - Ubuntu 18.04 64-bit
Assembler - nasm (2.13.02)
C++ compiler - g++ (7.4.0)
processor - Intel® Pentium(R) CPU G3240 # 3.10GHz × 2
and I'm compiling my code as below
$ nasm -f elf64 -g -F dwarf test.asm
$ g++ -g -o test test.cpp test.o
$ ./test
$ a value before calling 3119
$ a value after calling 0
But if i comment out the line mov [rdx], rax from assembly function, its not resetting the value of variable a. I'm new to x64 assembly programming and I couldn't find the relation between rdx register and variable a.
unsigned int* a_shl, unsigned int* a_shr are pointers to unsigned int, a 32-bit (dword) type.
You do two qword stores, mov [rsi], rax and mov [rdx], rax which store outside of the pointed-to objects.
The C equivalent would be a function that takes unsigned int* args and does
*(unsigned long)a_shr = a>>count;. This is of course UB, and behaviour like this (overwriting other variables) is pretty much what you'd expect.
Presumably you compiled with optimization disabled so the caller actually reloaded a from the stack. And it put a_shr or a_shl next to a in its stack frame, and one of your stores zeroed your caller's copy of a.
(As usual, gcc happened to zero the upper 32 bits of RDI while it put a into EDI as the first arg. Writing a 32-bit register zero-extends to the full register. So your other bug; right shifting high garbage into the low 32 bits for a_shr, didn't bite you with this caller.)
Simpler implementation:
global IntegerShift ; why the trailing underscore? That's weird for no reason.
IntegerShift:
;prologue not needed, we don't even use the stack
; so don't waste instructions making a frame pointer.
mov eax, edi
shl rax, cl ; a<<count
mov [rsi], eax ; 32-bit store
;mov rax, rdi ; we can just destroy our local a, we're done with it
shr edi, cl ; a>>count
mov [rdx], edi ; 32-bit store
xor eax, eax ; return 0
ret
xor eax, eax is the most efficient way to zero a 64-bit register (no wasted REX prefix). And your return value is only 32-bit anyway because you declared it int, so it makes no sense to be using 64-bit registers.
BTW, if you had BMI2 available (which you don't on your budget Pentium CPU, unfortunately), you could avoid all the register copying, and be more efficient on Intel CPUs (SHL/RX is only 1 uop instead of 3 for shl/r reg, cl because of legacy x86 FLAGS-unmodified semantics for the cl=0 case)
shlx eax, edi, ecx
shrx edi, edi, ecx
mov [rsi], eax
mov [rdx], edi
xor eax, eax
ret

Mystery: casting a GNU C label pointer to a function pointer, with inline asm to put a ret in that block. Block being optimized away?

Firstly: This code is considered to be of pure fun, please do not do anything like this in production. We will not be responsible of any harm caused to you, your company or your reindeer after compiling and executing this piece of code in any environment. The code below is not safe, not portable and is plainly dangerous. Be warned. Long post below. You were warned.
Now, after the disclaimer: Let's consider the following piece of code:
#include <stdio.h>
int fun()
{
return 5;
}
typedef int(*F)(void) ;
int main(int argc, char const *argv[])
{
void *ptr = &&hi;
F f = (F)ptr;
int c = f();
printf("TT: %d\n", c);
if(c == 5) goto bye;
//else goto bye; /* <---- This is the most important line. Pay attention to it */
hi:
c = 5;
asm volatile ("movl $5, %eax");
asm volatile ("retq");
bye:
return 66;
}
For the beginning we have the function fun which I have created purely for reference to get the generated assembly code.
Then we declare a function pointer F to functions taking no parameters and returning an int.
Then we use the not so well known GCC extension https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html to get the address of a label hi, and this works in clang too. Then we do something evil, we create a function pointer F called f and initialize it to be the label above.
Then the worst of all, we actually call this function, and assign its return value to a local variable, called C and the we print it out.
The following is an if to check if the value assigned to the c is actually the one we need, and if yes go to bye so that he application exits normally, with exit code 66. If that can be considered a normal exit code.
The next line is commented out, but I can say this is the most important line in the entire application.
The piece of code after the label hi is to assign 5 to the value of c, then two lines of assembly to initialize the value of eax to 5 and to actually return from the "function" call. As mentioned, there is a reference function, fun which generates the same code.
And now we compile this application, and run it on our online platform: https://gcc.godbolt.org/z/K6z5Yc
It generates the following assembly (with -O1 turned on, and O0 gives a similar result, albeit a bit more longer):
# else goto bye is COMMENTED OUT
fun:
mov eax, 5
ret
.LC0:
.string "TT: %d\n"
main:
push rbx
mov eax, OFFSET FLAT:.L3
call rax
mov ebx, eax
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
cmp ebx, 5
je .L4
.L3:
movl $5, %eax
retq
.L4:
mov eax, 66
pop rbx
ret
The important lines are mov eax, OFFSET FLAT:.L3 where the L3 corresponds to our hi label, and the line after that: call rax which actually calls it.
And runs like:
ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 66
TT: 5
Now, let's revisit the most important line in the application and uncomment it.
With -O0 we get the following assembly, generated by gcc:
# else goto bye is UNCOMMENTED
# even gcc -O0 "knows" hi: is unreachable.
fun:
push rbp
mov rbp, rsp
mov eax, 5
pop rbp
ret
.LC0:
.string "TT: %d\n"
main:
push rbp
mov rbp, rsp
sub rsp, 48
mov DWORD PTR [rbp-36], edi
mov QWORD PTR [rbp-48], rsi
mov QWORD PTR [rbp-8], OFFSET FLAT:.L4
mov rax, QWORD PTR [rbp-8]
mov QWORD PTR [rbp-16], rax
mov rax, QWORD PTR [rbp-16]
call rax
mov DWORD PTR [rbp-20], eax
mov eax, DWORD PTR [rbp-20]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
cmp DWORD PTR [rbp-20], 5
nop
.L4:
mov eax, 66
leave
ret
and the following output:
ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 66
so, as you can see our printf was never called, the culprit is the line mov QWORD PTR [rbp-8], OFFSET FLAT:.L4 where L4 actually corresponds to our bye label.
And from what I can see from the generated assembly, not a piece of code from the part after hi was added into the generated code.
But at least the application runs and at least has some code for comparing c to 5.
On the other end, clang, with O0 generates the following nightmare, which by the way crashes:
# else goto bye is UNCOMMENTED
# clang -O0 also doesn't emit any instructions for the hi: block
fun: # #fun
push rbp
mov rbp, rsp
mov eax, 5
pop rbp
ret
main: # #main
push rbp
mov rbp, rsp
sub rsp, 48
mov dword ptr [rbp - 4], 0
mov dword ptr [rbp - 8], edi
mov qword ptr [rbp - 16], rsi
mov qword ptr [rbp - 24], 1
mov rax, qword ptr [rbp - 24]
mov qword ptr [rbp - 32], rax
call qword ptr [rbp - 32]
mov dword ptr [rbp - 36], eax
mov esi, dword ptr [rbp - 36]
movabs rdi, offset .L.str
mov al, 0
call printf
cmp dword ptr [rbp - 36], 5
jne .LBB1_2
jmp .LBB1_3
.LBB1_2:
jmp .LBB1_3
.LBB1_3:
mov eax, 66
add rsp, 48
pop rbp
ret
.L.str:
.asciz "TT: %d\n"
If we turn on some optimization, for example O1, we get from gcc:
# else goto bye is UNCOMMENTED
# gcc -O1
fun:
mov eax, 5
ret
.LC0:
.string "TT: %d\n"
main:
sub rsp, 8
mov eax, OFFSET FLAT:.L3
call rax
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
.L3:
mov eax, 66
add rsp, 8
ret
and the application crashes, which is sort of understandable. Again, the compiler had entirely removed our hi section (mov eax, OFFSET FLAT:.L3 goes tiptoe to L3 which corresponds to our bye section) and unfortunately decided that it's a good idea to increase rsp before a ret so to be sure we end up somewhere totally different where we need to be.
And clang delivers something even more dubious:
# else goto bye is UNCOMMENTED
# clang -O1
fun: # #fun
mov eax, 5
ret
main: # #main
push rax
mov eax, 1
call rax
mov edi, offset .L.str
mov esi, eax
xor eax, eax
call printf
mov eax, 66
pop rcx
ret
.L.str:
.asciz "TT: %d\n"
1 ? How on earth did clang end up with this?
To some level I understand that the compiler decided that dead code after an if where both if and else go to the same location is not needed, but here my knowledge and insight stops.
So now, dear C and C++ gurus, assembly aficionados and compiler crushers, here comes the question:
Why?
Why do you think did the compiler decide that the two labels should be considered equivalent if we have added the else branch, or why did clang put there 1, and last but not least: someone with a deep understanding of the C standard could maybe point out where this piece of code deviated so badly from normality that we ended up in this really really weird situation.
someone with a deep understanding of the C standard could maybe point out where this piece of code deviated so badly from normality that we ended up in this really really weird situation.
You think the ISO C standard has anything to say about this code? It's chock full of UB and GNU extensions, notably pointers to local labels.
Casting a label pointer to a function pointer and calling through it is obviously UB. The GCC manual doesn't say you can do that. It's also UB to goto a label in another function.
You were only able to make that work by tricking the compiler into thinking that block might be reached so it's not removed, then using GNU C Basic asm statements to emit a ret instruction there.
GCC and clang remove dead code even with optimization disabled; e.g. if(0) { ... } doesn't emit any instructions to implement the ...
Also note that the c=5 in hi: compiles with optimization fully disabled (and else goto bye commented) to asm like movl $5, -20(%rbp). i.e. using the caller's RBP to modify local variables in the stack frame of the caller. So you have a nested function.
GNU C allows you to define nested functions that can access the local vars of their parent scope. (If you liked the asm you got from your experiment, you'll love the executable trampoline of machine-code that GCC stores to the stack with mov-immediate if you take a pointer to a nested function!)
asm volatile ("movl $5, %eax"); is missing a clobber on EAX. You step on the compiler's toes which would be UB if this statement was ever reached normally, rather than as if it were a separate function.
The use-case for GNU C Basic asm (no constraints / clobbers) is instructions like cli (disable interrupts), not anything involving integer registers, and definitely not ret.
If you want to define a callable function using inline asm, you can use asm("") at global scope, or as the body of an __attribute__((naked)) function.

std::ifstream crashes in release build on Windows with exit code 0xC0000409: Unknown software exception

I'm reading a file using std::ifstream:
printf("Before stream initialization\n");
ifstream stream(file_path, ios::binary);
printf("Stream initialized\n");
ifstream::pos_type position = stream.tellg();
auto file_size = position;
printf("Position acquired\n");
However, the program crashes in the release mode of the binary. Here is the compiled assembly code snippet:
.text:0000000000413411 lea rcx, aBeforeStreamIn ; "Before stream initialization\n"
.text:0000000000413418 mov rbx, rax
.text:000000000041341B call _ZL6printfPKcz ; printf(char const*,...)
.text:000000000041341B ; } // starts at 41340C
.text:0000000000413420 lea rdi, [rsp+878h+var_248]
.text:0000000000413428 lea rcx, [rdi+0D8h] ; this
.text:000000000041342F mov [rsp+878h+var_820], rdi
.text:0000000000413434 call _ZNSt8ios_baseC1Ev ; std::ios_base::ios_base(void)
.text:0000000000413439 xor r8d, r8d
.text:000000000041343C mov rax, cs:_refptr__ZTVSt9basic_iosIcSt11char_traitsIcEE
.text:0000000000413443 xor edx, edx
.text:0000000000413445 mov [rsp+878h+var_90], r8w
.text:000000000041344E pxor xmm0, xmm0
.text:0000000000413452 movaps [rsp+878h+var_88], xmm0
.text:000000000041345A movaps [rsp+878h+var_78], xmm0
.text:0000000000413462 mov [rsp+878h+var_98], 0
.text:000000000041346E add rax, 10h
.text:0000000000413472 mov [rsp+878h+var_170], rax
.text:000000000041347A mov rax, cs:_refptr__ZTTSt14basic_ifstreamIcSt11char_traitsIcEE
.text:0000000000413481 mov rsi, [rax+8]
.text:0000000000413485 mov rcx, [rax+10h]
.text:0000000000413489 mov rax, [rsi-18h]
.text:000000000041348D mov [rsp+878h+var_248], rsi
.text:0000000000413495 mov [rsp+878h+var_7E8], rcx
.text:000000000041349D mov [rsp+878h+var_7F0], rsi
.text:00000000004134A5 mov [rsp+rax+878h+var_248], rcx
.text:00000000004134AD mov [rsp+878h+var_240], 0
.text:00000000004134B9 mov rcx, [rsi-18h]
.text:00000000004134BD add rcx, rdi
.text:00000000004134C0 ; try {
.text:00000000004134C0 call _ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E ; std::basic_ios<char,std::char_traits<char>>::init(std::basic_streambuf<char,std::char_traits<char>> *)
.text:00000000004134C0 ; } // starts at 4134C0
.text:00000000004134C5 mov rax, cs:_refptr__ZTVSt14basic_ifstreamIcSt11char_traitsIcEE
.text:00000000004134CC lea rcx, [rdi+10h]
.text:00000000004134D0 add rax, 18h
.text:00000000004134D4 mov [rsp+878h+var_248], rax
.text:00000000004134DC mov rax, cs:_refptr__ZTVSt14basic_ifstreamIcSt11char_traitsIcEE
.text:00000000004134E3 add rax, 40h
.text:00000000004134E7 mov [rsp+878h+var_170], rax
.text:00000000004134EF ; try {
.text:00000000004134EF call _ZNSt13basic_filebufIcSt11char_traitsIcEEC1Ev ; std::basic_filebuf<char,std::char_traits<char>>::basic_filebuf(void)
.text:00000000004134EF ; } // starts at 4134EF
.text:00000000004134F4 lea rdx, [rdi+10h]
.text:00000000004134F8 lea rcx, [rdi+0D8h]
.text:00000000004134FF ; try {
.text:00000000004134FF call _ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E ; std::basic_ios<char,std::char_traits<char>>::init(std::basic_streambuf<char,std::char_traits<char>> *)
.text:0000000000413504 lea rcx, [rdi+10h]
.text:0000000000413508 mov r8d, 0Eh
.text:000000000041350E mov rdx, rbx
.text:0000000000413511 call _ZNSt13basic_filebufIcSt11char_traitsIcEE4openEPKcSt13_Ios_Openmode ; std::basic_filebuf<char,std::char_traits<char>>::open(char const*,std::_Ios_Openmode)
.text:0000000000413516 mov rdx, [rsp+878h+var_248]
.text:000000000041351E add rdi, [rdx-18h]
.text:0000000000413522 test rax, rax
.text:0000000000413525 mov rcx, rdi
.text:0000000000413528 jz loc_414688
.text:000000000041352E xor edx, edx
.text:0000000000413530 call _ZNSt9basic_iosIcSt11char_traitsIcEE5clearESt12_Ios_Iostate ; std::basic_ios<char,std::char_traits<char>>::clear(std::_Ios_Iostate)
.text:0000000000413530 ; } // starts at 4134FF
.text:0000000000413535
.text:0000000000413535 loc_413535: ; CODE XREF: PointerSearcher::parse_pointer_map(void)+1363↓j
.text:0000000000413535 lea rcx, aStreamInitiali ; "Stream initialized\n"
.text:000000000041353C ; try {
.text:000000000041353C call _ZL6printfPKcz ; printf(char const*,...)
In my function it crashes at this line:
.text:0000000000413504 lea rcx, [rdi+10h]
The output is:
Before stream initialization
Process finished with exit code -1073741819 (0xC0000409)
The stacktrace is:
std::locale::operator=(std::locale const&)
std::ios_base::_M_init()
std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*)
MyExecutable::myFunction()
The crash only happens in the Windows binary. The binary works in release mode for Linux. I'm using the MinGW compiler to compile the Windows binary and the compilation flags are:
-fopenmp -O3 -DNDEBUG
They're the default CMake release build flags. I also made sure the passed file_path is correct.
gdb says:
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004a2521 in std::locale::operator=(std::locale const&) ()
Thread 1 received signal SIGSEGV, Segmentation fault.
0x00000000004a2521 in std::locale::operator=(std::locale const&) ()
[Thread 48616.0xc508 exited with code 3221225477]
[Thread 48616.0xc510 exited with code 3221225477]
[Thread 48616.0xc638 exited with code 3221225477]
[Inferior 1 (process 48616) exited with code 030000000005]
The compiler version:
"C:\Program Files\mingw-w64\x86_64-8.1.0-win32-seh-rt_v6-rev0\mingw64\bin\x86_64-w64-mingw32-gcc.exe" --version
x86_64-w64-mingw32-gcc.exe (x86_64-win32-seh-rev0, Built by MinGW-W64 project) 8.1.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Does anyone have an idea what went wrong and how to fix it?
This seems to be a MinGW compiler bug since when using MSVC in Visual Studio to compile the code, the same exception does not occur either.

What style assembly is this (intel, att...etc?) and how can I produce it?

I'm trying to produce assembly code like this (so that it works with nasm)
;hello.asm
[SECTION .text]
global _start
_start:
jmp short ender
starter:
xor eax, eax ;clean up the registers
xor ebx, ebx
xor edx, edx
xor ecx, ecx
mov al, 4 ;syscall write
mov bl, 1 ;stdout is 1
pop ecx ;get the address of the string from the stack
mov dl, 5 ;length of the string
int 0x80
xor eax, eax
mov al, 1 ;exit the shellcode
xor ebx,ebx
int 0x80
ender:
call starter ;put the address of the string on the stack
db 'hello'
First off, what assembly style is this and second, how can I produce it from a C file using a command similar to gcc -S code.c -o code.S -masm=intel
This is Intel style.
What's wrong with the commandline you wrote in the question?