I'm trying to call native machine-language code. Here's what I have so far (it gets a bus error):
char prog[] = {'\xc3'}; // x86 ret instruction
int main()
{
typedef double (*dfunc)();
dfunc d = (dfunc)(&prog[0]);
(*d)();
return 0;
}
It does correctly call the function and it gets to the ret instruction. But when it tries to execute the ret instruction, it has a SIGBUS error. Is it because I'm executing code on a page that is not cleared for execution or something like that?
So what am I doing wrong here?
One first problem might be that the location where the prog data is stored is not executable.
On Linux at least, the resulting binary will place the contents of global variables in the "data" segment or here, which is not executable in most normal cases.
The second problem might be that the code you are invoking is invalid in some way. There's a certain procedure to calling a method in C, called the calling convention (you might be using the "cdecl" one, for example). It might not be enough for the called function to just "ret". It might also need to do some stack cleanup etc. otherwise the program will behave unexpectedly. This might prove an issue once you get past the first problem.
You need to call memprotect in order to make the page where prog lives executable. The following code does make this call, and can execute the text in prog.
#include <unistd.h>
#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
char prog[] = {
0x55, // push %rbp
0x48, 0x89, 0xe5, // mov %rsp,%rbp
0xf2, 0x0f, 0x10, 0x05, 0x00, 0x00, 0x00,
//movsd 0x0(%rip),%xmm0 # c <x+0xc>
0x00,
0x5d, // pop %rbp
0xc3, // retq
};
int main()
{
long pagesize = sysconf(_SC_PAGE_SIZE);
long page_no = (long)prog/pagesize;
int res = mprotect((void*)(page_no*pagesize), (long)page_no+sizeof(prog), PROT_EXEC|PROT_READ|PROT_WRITE);
if(res)
{
fprintf(stderr, "mprotect error:%d\n", res);
return 1;
}
typedef double (*dfunc)(void);
dfunc d = (dfunc)(&prog[0]);
double x = (*d)();
printf("x=%f\n", x);
fflush(stdout);
return 0;
}
As everyone already said, you must ensure prog[] is executable, however the proper way to do it, unless you're writing a JIT compiler, is to put the symbol in an executable area, either by using a linker script or by specifying the section in the C code if the compiler allows , e.g.:
const char prog[] __attribute__((section(".text"))) = {...}
Virtually all C compilers will let you do this by embedding regular assembly language in your code. Of course it's a non-standard extension to C, but compiler writers recognise that it's often necessary. As a non-standard extension, you'll have to read your compiler manual and check how to do it, but the GCC "asm" extension is a fairly standard approach.
void DoCheck(uint32_t dwSomeValue)
{
uint32_t dwRes;
// Assumes dwSomeValue is not zero.
asm ("bsfl %1,%0"
: "=r" (dwRes)
: "r" (dwSomeValue)
: "cc");
assert(dwRes > 3);
}
Since it's easy to trash the stack in assembler, compilers often also allow you to identify registers you'll use as part of your assembler. The compiler can then ensure the rest of that function steers clear of those registers.
If you're writing the assembler code yourself, there is no good reason to set up that assembler as an array of bytes. It's not just a code smell - I'd say it is a genuine error which could only happen by being unaware of the "asm" extension which is the right way to embed assembler in your C.
Essentially this has been clamped down on because it was an open invitation to virus writers. But you can allocate and buffer and set it up with native machinecode in straight C - that's no problem. The issue is calling it. Whilst you can try setting up a function pointer with the address of the buffer and calling it, that's highly unlikely to work, and highly likely to break on the next version of the compiler if somehow you do manage to coax it into doing what you want. So the best bet is to simply resort to a bit of inline assembly, to set up the return and jump to the automatically generated code. But if the system protects against this, you'll have to find methods of circumventing the protection, as Rudi described in his answer (but very specific to one particular system).
One obvious error is that \xc3 is not returning the double that you claim it's returning.
Related
I was just wondering how random number is generated in assembler, I found question from russian stack overflow where a person asks rather not how to generate a random number in assembler, but how to implement that in c code using _asm{}.
The answer posted to his question surprised me (translated to eng):
char r[]="!!!!!!!!!!!№№№№№№№№№№№;;;;;;;;;;;;;;;;;;;;;;;;;55555555555555666666666666666666666666666777777777777777777777777777777777777777777777777777777777777"; // String, which length should be calculated
main()
{
static unsigned long (__cdecl *lenstr)(char*); // Pointer to function declaration. The method for passing parameters must be defined explicitly - it is different in different compilers
static int i=0;
if(!i)
{
static char s[]={
0x5a,
//pop %%edx
0x5f,
//pop %%edi
0xfc,
//cld
0x31,0xc9,
//xor %%ecx,%%ecx
0x31,0xc0,
//xor %%eax,%%eax
0x49,
//dec %%ecx
0xf2,0xae,
//repne scasв
0xf7,0xd1,
//not %%ecx
0x49,
//dec %%ecx
0x91,
//xchg %%eax,%%ecx
0x52,
//push %%edx
0xc3
//ret
}; // Array with assembler code
lenstr=(unsigned long ( __cdecl *)(char*))&s; // Linking function pointer to to that array
i=1;
}
printf("%s%c%d%c%s\n","String length",' ',lenstr(r),' ',"symbols");
}
Two questions:
How long does the opportunity to put assembler code as a casted char array to function-pointer is existing and why it was developed?
I didn’t understand: calculating string length is kinda smart method of random number generation or it was just an example of machine code to pointer casting?
About the code example
Pasting the second answer's text from your link to a translator gave me:
And you can make it so that the machine code will be located in an array. Here's how you can write a program to count the number of characters in a string.
So it's only an example about how to use assembly code inside a C program. One could use __asm, but many don't like the syntax there. Therefore the assembly source code is first assembled externally (using NASM or FASM for example) and the resulting machine code is then embedded as a char array in the C program.
Make the code executable
As Peter Cordes already mentioned, it's mostly not possible to execute code within data sections (where this char array is stored in the program). There are two ways to execute the code anyway: Either the appropriate compiler settings have to be set (to make the data section executable) or additional memory has to be allocated that is executable.
Under Linux, for example, you can use mmap to request such storage and then copy the code over:
void* executableStorage = mmap(NULL, sizeof(executableCode),
PROT_EXEC | PROT_READ | PROT_WRITE,
MAP_PRIVATE, 0, 0);
memcpy(executableStorage, executableCode, sizeof(executableCode));
Under Windows, something similar can be done with GlobalAlloc, for example, which always returns an executable memory area.
Random assembler
The first answer from the linked question is about the random numbers:
The simplest option is to implement a linear congruent generator:
R1 = (a * R0 + b) mod M
Here a and b are constant coefficients (are selected), M is the modulus, the maximum value for a pseudo-random number (the minimum will be 0), R0 is the result the previous call to the generator (for the first call, you can substitute any number).
Linear-feedback shift registers are another way to easily generate pseudo-random numbers.
Why does this opportunity exists?
It is not really a functionality of C. Since C is very close to hardware, everything can be interpreted as data, pointer or even as program code. By casting with a certain data type, you can switch between them. Therefore this possibility will probably also be described somewhere. Conversely, C-code could also be interpreted as data:
#include <stdio.h>
int main() {
// Interpret the main function (program code) as data
unsigned char* data = (unsigned char*) main;
// Print out some machine code of the main-function
for (int i=0; i<64; i++) {
printf("%02X ", data[i]);
if ((i & 15) == 15)
printf("\n");
}
}
So C offers these possibilities. Whether or not they are also allowed is not a matter for the language C. Security mechanisms, which are primarily provided by the operating system, can make memory areas and thus this data write-protected or non-executable.
Because of these security mechanisms, the way with the char array is no longer really practical. It was more of a quick-and-dirty solution, it's a bad programming style and impractical: every time the assembler code was changed, it would have to be manually transferred to the C program. Normally you would write the assembler code in a separate file and then link the assembled object file with the C object files:
assembly object executable
source code files program
assembler linker
ASSEMBLY.asm ────────────> ASSEMBLY.o ───┬───> ./PROGRAM
│
c-compiler │
PROGRAM.c ────────────> PROGRAM.o ───┘
c source code
I would like to alloc a buffer that I can execute on Win32 but I have an exception in visual studio cuz the malloc function returns a non executable memory zone. I read that there a NX flag to disable... My goal is convert a bytecode to asm x86 on fly with keep in mind performance.
Does somemone can help me?
You don't use malloc for that. Why would you anyway, in a C++ program? You also don't use new for executable memory, however. There's the Windows-specific VirtualAlloc function to reserve memory which you then mark as executable with the VirtualProtect function applying, for instance, the PAGE_EXECUTE_READ flag.
When you have done that, you can cast the pointer to the allocated memory to an appropriate function pointer type and just call the function. Don't forget to call VirtualFree when you are done.
Here is some very basic example code with no error handling or other sanity checks, just to show you how this can be accomplished in modern C++ (the program prints 5):
#include <windows.h>
#include <vector>
#include <iostream>
#include <cstring>
int main()
{
std::vector<unsigned char> const code =
{
0xb8, // move the following value to EAX:
0x05, 0x00, 0x00, 0x00, // 5
0xc3 // return what's currently in EAX
};
SYSTEM_INFO system_info;
GetSystemInfo(&system_info);
auto const page_size = system_info.dwPageSize;
// prepare the memory in which the machine code will be put (it's not executable yet):
auto const buffer = VirtualAlloc(nullptr, page_size, MEM_COMMIT, PAGE_READWRITE);
// copy the machine code into that memory:
std::memcpy(buffer, code.data(), code.size());
// mark the memory as executable:
DWORD dummy;
VirtualProtect(buffer, code.size(), PAGE_EXECUTE_READ, &dummy);
// interpret the beginning of the (now) executable memory as the entry
// point of a function taking no arguments and returning a 4-byte int:
auto const function_ptr = reinterpret_cast<std::int32_t(*)()>(buffer);
// call the function and store the result in a local std::int32_t object:
auto const result = function_ptr();
// free the executable memory:
VirtualFree(buffer, 0, MEM_RELEASE);
// use your std::int32_t:
std::cout << result << "\n";
}
It's very unusual compared to normal C++ memory management, but not really rocket science. The hard part is to get the actual machine code right. Note that my example here is just very basic x64 code.
Extending the above answer, a good practice is:
Allocate memory with VirtualAlloc and read-write-access.
Fill that region with your code
Change that region's protection with VirtualProtectto execute-read-access
jump to/call the entry point in this region
So it could look like this:
adr = VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
// write code to the region
ok = VirtualProtect(adr, size, PAGE_EXECUTE_READ, &oldProtection);
// execute the code in the region
As stated in documentation for VirtualAlloc
flProtect [in]
The memory protection for the region of pages to be allocated. If the pages are being committed, you can specify any one of the memory protection constants.
one of them is:
PAGE_EXECUTE
0x10
Enables execute access to the committed region of pages. An attempt to write to the committed region results in an access violation.
This flag is not supported by the CreateFileMapping function.
PAGE_EXECUTE_READ
0x20
Enables execute or read-only access to the committed region of pages. An attempt to write to the committed region results in an access violation.
Windows Server 2003 and Windows XP: This attribute is not supported by the CreateFileMapping function until Windows XP with SP2 and Windows Server 2003 with SP1.
PAGE_EXECUTE_READWRITE
0x40
Enables execute, read-only, or read/write access to the committed region of pages.
Windows Server 2003 and Windows XP: This attribute is not supported by the CreateFileMapping function until Windows XP with SP2 and Windows Server 2003 with SP1.
and so on from here
C version based off of Christian Hackl's answer
I think SIZE_T dwSize of VirtualAlloc should be the size of the code in bytes, not system_info.dwPageSize (what if sizeof code is bigger than system_info.dwPageSize?).
I don't know C enough to know if sizeof(code) is the "correct" way of getting the size of the machine code
this compiles under c++ so I guess it's not off topic lol
#include <Windows.h>
#include <stdio.h>
int main()
{
// double add(double a, double b) {
// return a + b;
// }
unsigned char code[] = { //Antonio Cuni - How to write a JIT compiler in 30 minutes: https://www.youtube.com/watch?v=DKns_rH8rrg&t=118s
0xf2,0x0f,0x58,0xc1, //addsd %xmm1,%xmm0
0xc3, //ret
};
LPVOID buffer = VirtualAlloc(NULL, sizeof(code), MEM_COMMIT, PAGE_READWRITE);
memcpy(buffer, code, sizeof(code));
//protect after write, because protect will prevent writing.
DWORD oldProtection;
VirtualProtect(buffer, sizeof(code), PAGE_EXECUTE_READ, &oldProtection);
double (*function_ptr)(double, double) = (double (*)(double, double))buffer; //is there a cleaner way to write this ?
// double result = (*function_ptr)(2, 234); //NOT SURE WHY THIS ALSO WORKS
double result = function_ptr(2, 234);
VirtualFree(buffer, 0, MEM_RELEASE);
printf("%f\n", result);
}
At compile time, the linker will organize your program's memory footprint by allocating memory into data sections and code sections. The CPU will make sure that the program counter (the hard CPU register) value remains within a code section or the CPU will throw a hardware exception for violating the memory bounds. This provides some security by making sure your program only executes valid code. Malloc is intended for allocating data memory. Your application has a heap and the heap's size is established by the linker and is marked as data memory. So at runtime malloc is just grabbing some of the virtual memory from your heap which will always be data.
I hope this helps you have a better understanding what's going on, though it might not be enough to get you where you need to be. Perhaps you can pre-allocate a "code heap" or memory pool for your runtime-generated code. You will probably need to fuss with the linker to accomplish this but I don't know any of the details.
Given this code:
#include <stdio.h>
int main(int argc, char **argv)
{
int x = 1;
printf("Hello x = %d\n", x);
}
I'd like to access and manipulate the variable x in inline assembly. Ideally, I want to change its value using inline assembly. GNU assembler, and using the AT&T syntax.
In GNU C inline asm, with x86 AT&T syntax:
(But https://gcc.gnu.org/wiki/DontUseInlineAsm if you can avoid it).
// this example doesn't really need volatile: the result is the same every time
asm volatile("movl $0, %[some]"
: [some] "=r" (x)
);
after this, x contains 0.
Note that you should generally avoid mov as the first or last instruction of an asm statement. Don't copy from %[some] to a hard-coded register like %%eax, just use %[some] as a register, letting the compiler do register allocation.
See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html and https://stackoverflow.com/tags/inline-assembly/info for more docs and guides.
Not all compilers support GNU syntax.
For example, for MSVC you do this:
__asm mov x, 0 and x will have the value of 0 after this statement.
Please specify the compiler you would want to use.
Also note, doing this will restrict your program to compile with only a specific compiler-assembler combination, and will be targeted only towards a particular architecture.
In most cases, you'll get as good or better results from using pure C and intrinsics, not inline asm.
asm("mov $0, %1":"=r" (x):"r" (x):"cc"); -- this may get you on the right track. Specify register use as much as possible for performance and efficiency. However, as Aniket points out, highly architecture dependent and requires gcc.
This is my first time around, and I really hope you guys can help me, as I have ran out of ideas by now.
I have searched for an answer for a couple of hours now, and could not find an answer that would actually work.
I would like to directly inject code into a running process. Yes, you have read it right. I am trying to inject code into another application, and - believe it or not - this is only to extend the functionality of it.
I am using Visual Studio 2012 Express Edition on Windows.
I have the following code:
__declspec(naked) void Foo()
{
__asm
{
// Inline assembly code here
}
}
__declspec(naked) void FooEnd() {}
int main()
{
cout << HEX(Foo) << endl;
cout << HEX(FooEnd) << endl;
cout << (int)FooEnd - (int)Foo << endl;
// Inject code here using WriteProcessMemory
return 0;
}
Most of the code has been removed in order to maintain readability, though I can post other portions of it on request.
Output is the following:
0x010B1000
0x010B1010
16
The resulting size is actually incorrect. The functions are compiled in the right order (made sure using /ORDER), but the compiler adds a bunch of 0xCC (int 3) bytes after each method which extends it's size, and so I can't get the real (useful) number of bytes that contains actual executable code.
In another stackoverflow question, it has been said that disabling "Edit and Continue" would make these extra bytes go away, but no matter what, that didn't work for me.
I also tried using Release setup instead of Debug, changed a bunch of optimization settings, but none of these had any effect. What do you think could be the solution? I may be missing something obvious.
Anyway, is this (in your opinion) the best way to acquire a function's length (readability, reliability, ease of use)?
I hope I explained everything I had to in order for you to be able to help. If you have further questions, please feel free to leave a comment.
Thanks for your time and efforts.
As Devolus points out, the compiler is inserting these extra bytes after your code in order to align the next function on a reasonable (usually divisible by 16) starting address.
The compiler is actually trying to help you since 0xCC is the breakpoint instruction, the code will break into the debugger (if attached) should the instruction pointer accidentally point outside a function at any point during execution.
None of this should worry you for your purposes. You can consider the 0xCC padding as part of the function.
You don't need the extra padding when you're injecting the code, so it's fine to discard them. It should also be fine to copy them over, it will just result in a few extra bytes of copying. Chances are the memory you're injecting to will by a page-aligned block anyway, so you're not really gaining anything by stripping it out.
But if you really want to strip it out, a simple solution to your problem would be to just iterate backwards from the last byte before the next function, until there are no more 0xcc bytes.
i.e.:
__declspec(naked) void Foo()
{
__asm
{
_emit 0x4A
_emit 0x4B
}
}
__declspec(naked) void FooEnd() {}
int main(int argc, char** argv)
{
//start at the last byte of the memory-aligned code instead of the first byte of FooEnd
unsigned char* fooLast = (unsigned char*)FooEnd-1;
//keep going backwards until we don't have a 0xcc
while(*fooLast == 0xCC)
fooLast--;
//fooLast will now point at the last byte of your function, so you need to add 1
int length = ((int)fooLast - (int)Foo) + 1;
//should output 2 for the length of Foo
std::cout << length;
}
The extra bytes are inserted by the compiler to create a memory alignment, so you can't discard it, since you are using the next function as a marker.
On the other hand, since you are writing the injected code in assembly anyway, you can just as well write the code, compile it, and then put the binary form in a byte array. That's how I would do this, because then you have the exact length.
I'm working with a proprietary MCU that has a built-in library in metal (mask ROM). The compiler I'm using is clang, which uses GCC-like inline ASM. The issue I'm running into, is calling the library since the library does not have a consistent calling convention. While I found a solution, I've found that in some cases the compiler will make optimizations that clobber registers immediately before the call, I think there is just something wrong with how I'm doing things. Here is the code I'm using:
int EchoByte()
{
register int asmHex __asm__ ("R1") = Hex;
asm volatile("//Assert Input to R1 for MASKROM_EchoByte"
:
:"r"(asmHex)
:"%R1");
((volatile void (*)(void))(MASKROM_EchoByte))(); //MASKROM_EchoByte is a 16-bit integer with the memory location of the function
}
Now this has the obvious problem that while the variable "asmHex" is asserted to register R1, the actual call does not use it and therefore the compiler "doesn't know" that R1 is reserved at the time of the call. I used the following code to eliminate this case:
int EchoByte()
{
register int asmHex __asm__ ("R1") = Hex;
asm volatile("//Assert Input to R1 for MASKROM_EchoByte"
:
:"r"(asmHex)
:"%R1");
((volatile void (*)(void))(MASKROM_EchoByte))();
asm volatile("//Assert Input to R1 for MASKROM_EchoByte"
:
:"r"(asmHex)
:"%R1");
}
This seems really ugly to me, and like there should be a better way. Also I'm worried that the compiler may do some nonsense in between, since the call itself has no indication that it needs the asmHex variable. Unfortunately, ((volatile void (*)(int))(MASKROM_EchoByte))(asmHex) does not work as it will follow the C-convention, which puts arguments into R2+ (R1 is reserved for scratching)
Note that changing the Mask ROM library is unfortunately impossible, and there are too many frequently used routines to recreate them all in C/C++.
Cheers, and thanks.
EDIT: I should note that while I could call the function in the ASM block, the compiler has an optimization for functions that are call-less, and by calling in assembly it looks like there's no call. I could go this route if there is some way of indicating that the inline ASM contains a function call, but otherwise the return address will likely get clobbered. I haven't been able to find a way to do this in any case.
Per the comments above:
The most conventional answer is that you should implement a stub function in assembly (in a .s file) that simply performs the wacky call for you. In ARM, this would look something like
// void EchoByte(int hex);
_EchoByte:
push {lr}
mov r1, r0 // move our first parameter into r1
bl _MASKROM_EchoByte
pop pc
Implement one of these stubs per mask-ROM routine, and you're done.
What's that? You have 500 mask-ROM routines and don't want to cut-and-paste so much code? Then add a level of indirection:
// typedef void MASKROM_Routine(int r1, ...);
// void GeneralPurposeStub(MASKROM_Routine *f, int arg, ...);
_GeneralPurposeStub:
bx r0
Call this stub by using the syntax GeneralPurposeStub(&MASKROM_EchoByte, hex). It'll work for any mask-ROM entry point that expects a parameter in r1. Any really wacky entry points will still need their own hand-coded assembly stubs.
But if you really, really, really must do this via inline assembly in a C function, then (as #JasonD pointed out) all you need to do is add the link register lr to the clobber list.
void EchoByte(int hex)
{
register int r1 asm("r1") = hex;
asm volatile(
"bl _MASKROM_EchoByte"
:
: "r"(r1)
: "r1", "lr" // Compare the codegen with and without this "lr"!
);
}