I am trying to use Assembly in C++ (Dev-CPP) and it won't output the string as it should. after some research I have discovered that it uses the AT&T syntax. my code will not output the string it just comes up with assembly messages.
This is my code:
#include <iostream>
using namespace std;
int main()
{
asm(".section .data");
asm("hello: .string\"Hello, World!$\"\n");
asm(".section .text");
asm("movl $0x09, %ah \n");
asm("mov hello, %dx\n");
asm("int 0x21");
system("PAUSE");
return 0;
}
could I get some help please.
In theory, only programs compiled with DJGPP (gcc port for DOS) can legally use DOS service functions via a DOS extender IFF you run them in DOS or Windows (XP and below, generally not Vista/7/8). Also, gcc does not generate 16-bit x86 code, which is what you seem to be expecting.
Further, you should really, really learn some inline assembly (google it up).
A compilable version if your code would look like:
#include <iostream>
#include <cstdlib>
using namespace std;
int main()
{
asm(".section .data");
asm("hello: .string\"Hello, World!$\"\n");
asm(".section .text");
asm("movb $0x09, %ah\n"); // movl->movb
asm("movl $hello, %edx\n"); // mov->movl,hello->$hello,dx->edx
asm("int $0x21"); // 0x21->$0x21
system("PAUSE");
return 0;
}
But it's still unlikely to be good inline assembly because:
Your code trashes the registers and doesn't tell the compiler which are trashed, and so it likely corrupts the state of the program, which can lead to a crash or hang.
You write your instructions in individual asm statements, between which the compiler can insert any kind of code and disrupt your inline assembly. You really want to put your related instructions into a single block to prevent that from happening.
Something like this would be better:
asm volatile (
".section .data\n"
"hello: .string \"Hello, World!$\"\n"
".section .text\n"
"movb $0x09, %ah\n"
"movl $hello, %edx\n"
"int $0x21\n"
);
Unfortunately, this still won't work even with DJGPP. The problem has something to do with the memory segmentation setup done by DJGPP and the DPMI host (CWSDPMI), probably virtual memory. I can't tell what exactly is wrong there, but the above code doesn't work as-is.
So, please figure out what OS you're compiling your program for and write inline assembly code appropriately for that OS, that is, using correct registers and system call mechanisms.
DOS int 21h functions won't work in native Windows and Linux apps. Period. You've got the wrong tutorial.
To extend on Alexey's answer (how to overcome the segmentation issues), this would compile (and possibly run on DOS):
asm volatile(
"call 0f\n"
".byte 'H','e','l','l','o','W','o','r','l','d','!',0\n"
"0: pop %0\n"
"push %ds\n"
"push %cs\n"
"pop %ds\n"
"int $0x21\n"
"pop %ds\n" : "d"(0), "a"(9) : : "memory", "cc");
The idea is to inline the string within the code but jump over it; the return address of that call is the start address of the string. Then temporarily make the data segment identical to the code segment, call the DOS INT and restore the proper data segment after that.
Related
I want to test the performance of a userspace program in linux running on x86. To calculate the performance, it is necessary for me to flush specific cache lines to memory (make sure those lines are invalidated and upon the next request there will be a cache miss).
I've already seen suggestions using cacheflush(2) which supposed to be a system call, yet g++ complains about it is not being declared. Also, I cannot use clflush_cache_range which apparently can be invoked only within a kernel program.
Right now what I tried to do is to use the following code:
static inline void clflush(volatile void *__p)
{
asm volatile("clflush %0" : "+m" (*(volatile char __force *)__p));
}
But this gives the following error upon compilation:
error: expected primary-expression before ‘volatile’
Then I changed it as follows:
static inline void clflush(volatile void *__p)
{
asm volatile("clflush %0" :: "m" (__p));
}
It compiled successfully, but the timing results did not change. I'm suspicious if the compiler removed it for the purpose of optimization.
Dose anyone has any idea how can I solve this problem?
The second one flushes the memory containing the pointer __p, which is on the stack, which is why it doesn’t have the effect you want.
The problem with the first one is that it uses the macro __force, which is defined in the Linux kernel and is unneeded here. (What does the __attribute__((force)) do?)
If you remove __force, it will do what you want.
(You should also change it to avoid using the variable name __p, which is a reserved identifier.)
I've compiled the following using Visual Studio C++ 2008 SP1, x64 C++ compiler:
I'm curious, why did compiler add those nop instructions after those calls?
PS1. I would understand that the 2nd and 3rd nops would be to align the code on a 4 byte margin, but the 1st nop breaks that assumption.
PS2. The C++ code that was compiled had no loops or special optimization stuff in it:
CTestDlg::CTestDlg(CWnd* pParent /*=NULL*/)
: CDialog(CTestDlg::IDD, pParent)
{
m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME);
//This makes no sense. I used it to set a debugger breakpoint
::GdiFlush();
srand(::GetTickCount());
}
PS3. Additional Info: First off, thank you everyone for your input.
Here's additional observations:
My first guess was that incremental linking could've had something to do with it. But, the Release build settings in the Visual Studio for the project have incremental linking off.
This seems to affect x64 builds only. The same code built as x86 (or Win32) does not have those nops, even though instructions used are very similar:
I tried to build it with a newer linker, and even though the x64 code produced by VS 2013 looks somewhat different, it still adds those nops after some calls:
Also dynamic vs static linking to MFC made no difference on presence of those nops. This one is built with dynamical linking to MFC dlls with VS 2013:
Also note that those nops can appear after near and far calls as well, and they have nothing to do with alignment. Here's a part of the code that I got from IDA if I step a little bit further on:
As you see, the nop is inserted after a far call that happens to "align" the next lea instruction on the B address! That makes no sense if those were added for alignment only.
I was originally inclined to believe that since near relative calls (i.e. those that start with E8) are somewhat faster than far calls (or the ones that start with FF,15 in this case)
the linker may try to go with near calls first, and since those are one byte shorter than far calls, if it succeeds, it may pad the remaining space with nops at the end. But then the example (5) above kinda defeats this hypothesis.
So I still don't have a clear answer to this.
This is purely a guess, but it might be some kind of a SEH optimization. I say optimization because SEH seems to work fine without the NOPs too. NOP might help speed up unwinding.
In the following example (live demo with VC2017), there is a NOP inserted after a call to basic_string::assign in test1 but not in test2 (identical but declared as non-throwing1).
#include <stdio.h>
#include <string>
int test1() {
std::string s = "a"; // NOP insterted here
s += getchar();
return (int)s.length();
}
int test2() throw() {
std::string s = "a";
s += getchar();
return (int)s.length();
}
int main()
{
return test1() + test2();
}
Assembly:
test1:
. . .
call std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
npad 1 ; nop
call getchar
. . .
test2:
. . .
call std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
call getchar
Note that MSVS compiles by default with the /EHsc flag (synchronous exception handling). Without that flag the NOPs disappear, and with /EHa (synchronous and asynchronous exception handling), throw() no longer makes a difference because SEH is always on.
1 For some reason only throw() seems to reduce the code size, using noexcept makes the generated code even bigger and summons even more NOPs. MSVC...
This is special filler to let exception handler/unwinding function to detect correctly whether it's prologue/epilogue/body of the function.
This is due to a calling convention in x64 which requires the stack to be 16 bytes aligned before any call instruction. This is not (to my knwoledge) a hardware requirement but a software one. This provides a way to be sure that when entering a function (that is, after a call instruction), the value of the stack pointer is always 8 modulo 16. Thus permitting simple data alignement and storage/reads from aligned location in stack.
I'm trying to learn to execute shellcode from within a program, but I can't get even the most basic code to run. The following code is just supposed to exit the terminal when it's run:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <sys/mman.h>
char exitcode[] = "\xb0\x01\x31\xdb\xcd\x80";
int main() {
int (*func)();
func = (int (*)())exitcode;
(int)(*func)();
return 0;
}
But all I get is a segfault. GDB says it's happening when the program accesses the memory location of exitcode [at (int)(*func)(); ], but I'm not sure why this is causing a problem. I'm running a 64-bit Linux Mint OS. Any help is greatly appreciated.
Modern operating systems use memory protection. Pages of memory have access rights just like files: readable, writable, executable. Your data segment of your program is typically in a non-executable page, trying to execute it results in a segfault.
If you want to execute dynamically written binary code from your program on linux, you first have to map a page using mmap() that you can write to, then place your code there, and then change it to read only, executable using mprotect(). THEN you can jump there.
You could for example read this article for details.
EDIT: If this is about security breaches, note that the stack typically is non-executable nowadays, too ... so all these old "hacking tutorials" won't work any more. If you're interested in newer techniques, read about return oriented programming.
The code must be marked as executable code. One way to do it is to copy this binary machine code into executable buffer.
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
char exitcode[] = "\xb0\x01\x31\xdb\xcd\x80";
int main(int argc, char **argv)
{
void *buf;
/* copy code to executable buffer */
buf = mmap (0,sizeof(exitcode),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
memcpy (buf, exitcode, sizeof(code));
/* run code */
int i = ((int (*) (void))buf)();
printf("OK. returned: %d", i);
return 0;
}
Your shellcode is:
mov $0x1,%al
xor %ebx,%ebx
int $0x80
There are two problems:
Syscall 0x1 is a sys_write on 64-bit (but on 32-bit it's sys_exit)
You should assign to %rax, not %al. %rax will contain leftovers in high bits.
I had this problem and searched a lot to solve it.
You must use this code to compile your C code (To disable stack protection and make it executable):
gcc -fno-stack-protector -z execstack -o hello hello.c
Tested in Kali 32/64 bit. No segfault anymore.
Good luck
Basing on that code I found on google (which is working fine on GCC compiler)
#define ASM __asm__ __volatile__
void MyFunc()
{
ASM("enc_start:\n\t");
printf("Hello, world!");
ASM("enc_end:\n\t");
}
int main()
{
unsigned int addr, len;
ASM("movl $enc_start, %0\n\t"
"movl $enc_end, %1\n\t"
: "=r"(addr), "=r"(len));
printf("Address of begin of function: %X", addr);
printf("Len is of: %d bytes", len);
return 0;
}
I would like to convert it in visual studio, I tried but I failed. I don't even know if that's possible. Can you please convert it for me or at least giving me a similar way for get the same result ?
Thanks a lot.
Visual Studio's inline assembly doesn't support labels IIRC, and without them, the whole snippet is useless. What are you trying to accomplish in the first place?
You can capture the current value of EIP in VS's inline assembly as the function is executing, but not externally.
By the way, even in GCC, it won't capture the address and length of the entire function's body. It'll capture address and length of the function sans the prologue and the epilogue code. To get function's real range, generate the MAP file and parse it.
I vaguely recall Visual Studio has prologue/epilogue size as predefined macros. I might be mistaken on this one.
You could try using a pointer to function to get the starting address of a function, but in the case of Visual Studio, in debug mode, it's a pointer to a branch instruction (part of a branch table) that branches to the function. This would require that you check to see if the pointer points to a branch instruction and if so, decode the branch instruction to determine the start of the actual function.
I read the JIT Interface chapter and faced the problem: how to write a simpliest example for simpliest possible code (preferably in C++ and at least for x86-64 platform)? Say, I want to debug the following code (namely, code_.data() function):
#include "eallocator.hpp"
#include <iostream>
#include <vector>
#include <cstdlib>
int main()
{
std::vector< std::uint8_t, eallocator< std::uint8_t > > code_;
code_.push_back(0b11011001u); code_.push_back(0b11101011u); // fldpi
code_.push_back(0b11000011u); // ret
double result_;
__asm("call *%1"
: "=&t"(result_)
: "r"(code_.data())
:
);
std::cout << result_ << std::endl;
return EXIT_SUCCESS;
}
What (minimally) I should to do to use the interface? Particularly, I want to be able to provide some pseudocode (arbitrary text in memory) as "source" (with corresponding lines info) if it possible.
How to instrument the above code (or something similar), while remaining terse.
#include "eallocator.hpp" should use the approaches from this for Windows or from this for Linux.
If I understand you correctly, what you're trying to do is to dynamically emit some executable code into memory and set up GDB to be able to debug it, is that correct?
What makes this task quite difficult to express in a "minimal" example is that GDB actually expects to find a whole ELF object in memory, not just a lump of code. GDB's registration interfaces need ELF symbol tables to examine in order to figure out which symbols exist in the emitted code and where they reside.
Your best bet to do this without unreasonable effort is to look at LLVM. The Debugging JIT-ed Code with GDB section in the documentation describes how to do this with MCJIT, with a full example at the bottom - going from some simple C code, JITing it to memory with LLVM MCJIT and attaching GDB to it. Moreover, since the LLVM MCJIT framework is involved you get full debug info there, up to the C level!
To be honest, that documentation section has not been updated in a while, but it should work. Let me know if it doesn't - I'll look into fixing things and updating it.
I hope this helps.
There is an example in their tests which should help
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/testsuite/gdb.base/jit-main.c