I was just wondering how random number is generated in assembler, I found question from russian stack overflow where a person asks rather not how to generate a random number in assembler, but how to implement that in c code using _asm{}.
The answer posted to his question surprised me (translated to eng):
char r[]="!!!!!!!!!!!№№№№№№№№№№№;;;;;;;;;;;;;;;;;;;;;;;;;55555555555555666666666666666666666666666777777777777777777777777777777777777777777777777777777777777"; // String, which length should be calculated
main()
{
static unsigned long (__cdecl *lenstr)(char*); // Pointer to function declaration. The method for passing parameters must be defined explicitly - it is different in different compilers
static int i=0;
if(!i)
{
static char s[]={
0x5a,
//pop %%edx
0x5f,
//pop %%edi
0xfc,
//cld
0x31,0xc9,
//xor %%ecx,%%ecx
0x31,0xc0,
//xor %%eax,%%eax
0x49,
//dec %%ecx
0xf2,0xae,
//repne scasв
0xf7,0xd1,
//not %%ecx
0x49,
//dec %%ecx
0x91,
//xchg %%eax,%%ecx
0x52,
//push %%edx
0xc3
//ret
}; // Array with assembler code
lenstr=(unsigned long ( __cdecl *)(char*))&s; // Linking function pointer to to that array
i=1;
}
printf("%s%c%d%c%s\n","String length",' ',lenstr(r),' ',"symbols");
}
Two questions:
How long does the opportunity to put assembler code as a casted char array to function-pointer is existing and why it was developed?
I didn’t understand: calculating string length is kinda smart method of random number generation or it was just an example of machine code to pointer casting?
About the code example
Pasting the second answer's text from your link to a translator gave me:
And you can make it so that the machine code will be located in an array. Here's how you can write a program to count the number of characters in a string.
So it's only an example about how to use assembly code inside a C program. One could use __asm, but many don't like the syntax there. Therefore the assembly source code is first assembled externally (using NASM or FASM for example) and the resulting machine code is then embedded as a char array in the C program.
Make the code executable
As Peter Cordes already mentioned, it's mostly not possible to execute code within data sections (where this char array is stored in the program). There are two ways to execute the code anyway: Either the appropriate compiler settings have to be set (to make the data section executable) or additional memory has to be allocated that is executable.
Under Linux, for example, you can use mmap to request such storage and then copy the code over:
void* executableStorage = mmap(NULL, sizeof(executableCode),
PROT_EXEC | PROT_READ | PROT_WRITE,
MAP_PRIVATE, 0, 0);
memcpy(executableStorage, executableCode, sizeof(executableCode));
Under Windows, something similar can be done with GlobalAlloc, for example, which always returns an executable memory area.
Random assembler
The first answer from the linked question is about the random numbers:
The simplest option is to implement a linear congruent generator:
R1 = (a * R0 + b) mod M
Here a and b are constant coefficients (are selected), M is the modulus, the maximum value for a pseudo-random number (the minimum will be 0), R0 is the result the previous call to the generator (for the first call, you can substitute any number).
Linear-feedback shift registers are another way to easily generate pseudo-random numbers.
Why does this opportunity exists?
It is not really a functionality of C. Since C is very close to hardware, everything can be interpreted as data, pointer or even as program code. By casting with a certain data type, you can switch between them. Therefore this possibility will probably also be described somewhere. Conversely, C-code could also be interpreted as data:
#include <stdio.h>
int main() {
// Interpret the main function (program code) as data
unsigned char* data = (unsigned char*) main;
// Print out some machine code of the main-function
for (int i=0; i<64; i++) {
printf("%02X ", data[i]);
if ((i & 15) == 15)
printf("\n");
}
}
So C offers these possibilities. Whether or not they are also allowed is not a matter for the language C. Security mechanisms, which are primarily provided by the operating system, can make memory areas and thus this data write-protected or non-executable.
Because of these security mechanisms, the way with the char array is no longer really practical. It was more of a quick-and-dirty solution, it's a bad programming style and impractical: every time the assembler code was changed, it would have to be manually transferred to the C program. Normally you would write the assembler code in a separate file and then link the assembled object file with the C object files:
assembly object executable
source code files program
assembler linker
ASSEMBLY.asm ────────────> ASSEMBLY.o ───┬───> ./PROGRAM
│
c-compiler │
PROGRAM.c ────────────> PROGRAM.o ───┘
c source code
Related
I'm trying to call native machine-language code. Here's what I have so far (it gets a bus error):
char prog[] = {'\xc3'}; // x86 ret instruction
int main()
{
typedef double (*dfunc)();
dfunc d = (dfunc)(&prog[0]);
(*d)();
return 0;
}
It does correctly call the function and it gets to the ret instruction. But when it tries to execute the ret instruction, it has a SIGBUS error. Is it because I'm executing code on a page that is not cleared for execution or something like that?
So what am I doing wrong here?
One first problem might be that the location where the prog data is stored is not executable.
On Linux at least, the resulting binary will place the contents of global variables in the "data" segment or here, which is not executable in most normal cases.
The second problem might be that the code you are invoking is invalid in some way. There's a certain procedure to calling a method in C, called the calling convention (you might be using the "cdecl" one, for example). It might not be enough for the called function to just "ret". It might also need to do some stack cleanup etc. otherwise the program will behave unexpectedly. This might prove an issue once you get past the first problem.
You need to call memprotect in order to make the page where prog lives executable. The following code does make this call, and can execute the text in prog.
#include <unistd.h>
#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
char prog[] = {
0x55, // push %rbp
0x48, 0x89, 0xe5, // mov %rsp,%rbp
0xf2, 0x0f, 0x10, 0x05, 0x00, 0x00, 0x00,
//movsd 0x0(%rip),%xmm0 # c <x+0xc>
0x00,
0x5d, // pop %rbp
0xc3, // retq
};
int main()
{
long pagesize = sysconf(_SC_PAGE_SIZE);
long page_no = (long)prog/pagesize;
int res = mprotect((void*)(page_no*pagesize), (long)page_no+sizeof(prog), PROT_EXEC|PROT_READ|PROT_WRITE);
if(res)
{
fprintf(stderr, "mprotect error:%d\n", res);
return 1;
}
typedef double (*dfunc)(void);
dfunc d = (dfunc)(&prog[0]);
double x = (*d)();
printf("x=%f\n", x);
fflush(stdout);
return 0;
}
As everyone already said, you must ensure prog[] is executable, however the proper way to do it, unless you're writing a JIT compiler, is to put the symbol in an executable area, either by using a linker script or by specifying the section in the C code if the compiler allows , e.g.:
const char prog[] __attribute__((section(".text"))) = {...}
Virtually all C compilers will let you do this by embedding regular assembly language in your code. Of course it's a non-standard extension to C, but compiler writers recognise that it's often necessary. As a non-standard extension, you'll have to read your compiler manual and check how to do it, but the GCC "asm" extension is a fairly standard approach.
void DoCheck(uint32_t dwSomeValue)
{
uint32_t dwRes;
// Assumes dwSomeValue is not zero.
asm ("bsfl %1,%0"
: "=r" (dwRes)
: "r" (dwSomeValue)
: "cc");
assert(dwRes > 3);
}
Since it's easy to trash the stack in assembler, compilers often also allow you to identify registers you'll use as part of your assembler. The compiler can then ensure the rest of that function steers clear of those registers.
If you're writing the assembler code yourself, there is no good reason to set up that assembler as an array of bytes. It's not just a code smell - I'd say it is a genuine error which could only happen by being unaware of the "asm" extension which is the right way to embed assembler in your C.
Essentially this has been clamped down on because it was an open invitation to virus writers. But you can allocate and buffer and set it up with native machinecode in straight C - that's no problem. The issue is calling it. Whilst you can try setting up a function pointer with the address of the buffer and calling it, that's highly unlikely to work, and highly likely to break on the next version of the compiler if somehow you do manage to coax it into doing what you want. So the best bet is to simply resort to a bit of inline assembly, to set up the return and jump to the automatically generated code. But if the system protects against this, you'll have to find methods of circumventing the protection, as Rudi described in his answer (but very specific to one particular system).
One obvious error is that \xc3 is not returning the double that you claim it's returning.
For example:
In the file demo.c,
#inlcude<stdio.h>
int a = 5;
int main(){
int b=5;
int c=a;
printf("%d", b+c);
return 0;
}
For int a = 5, does the compiler translate this into something like store 0x5 at the virtual memory address, for example, Ox0000000f in the const area so that for int c = a, it is translated to something like movl 0x0000000f %eax?
Then for int b = 5, the number 5 is not put into the const area, but translated directly to a immediate in the assembly instruction like mov $0x5 %ebx.
It depends. Your program has several constants:
int a = 5;
This is a "static" initialization (which occurs when the program text and data is loaded before running). The value is stored in the memory reserved by a which is in a read-write data "program section". If something changes a, the value 5 is lost.
int b=5;
This is a local variable with limited scope (only by main()). The storage could well be a CPU register or a location on the stack. The instructions generated for most architectures will place the value 5 in an instruction as "immediate data", for an x86 example:
mov eax, 5
The ability for instructions to hold arbitrary constants is limited. Small constants are supported by most CPU instructions. "Large" constants are not usually directly supported. In that case the compiler would store the constant in memory and load it instead. For example,
.psect rodata
k1 dd 3141592653
.psect code
mov eax k1
The ARM family has a powerful design for loading most constants directly: any 8-bit constant value can be rotated any even number of times. See this page 2-25.
One not-as-obvious but totally different item is in the statement:
printf("%d", b+c);
The string %d is, by modern C semantics, a constant array of three char. Most modern implementations will store it in read-only memory so that attempts to change it will cause a SEGFAULT, which is a low level CPU error which usually causes the program to instantly abort.
.psect rodata
s1 db '%', 'd', 0
.psect code
mov eax s1
push eax
In OP's program, a is an "initialized" "global". I expect that it is placed in the initialized part of the data segment. See https://en.wikipedia.org/wiki/File:Program_memory_layout.pdf, http://www.cs.uleth.ca/~holzmann/C/system/memorylayout.gif (from more info on Memory layout of an executable program (process)). The location of a is decided by the compiler- linker duo.
On the other hand, being automatic (stack) variables, b and c are expected in the stack segment.
Being said that, the compiler/linker has the liberty to perform any optimization as long as the observed behavior is not violated (What exactly is the "as-if" rule?). For example, if a is never referenced, then it may be optimized out completely.
Is it possible to store data in integer form from 0 to 255 rather than 8-bit characters.Although both are same thing, how can we do it, for example, with write() function?
Is it ok to directly cast any integer to char and vice versa? Does something like
{
int a[1]=213;
write((char*)a,1);
}
and
{
int a[1];
read((char*)a,1);
cout<<a;
}
work to get 213 from the same location in the file? It may work on that computer but is it portable, in other words, is it suitable for cross-platform projects in that way? If I create a file format for each game level(which will store objects' coordinates in the current level's file) using this principle, will it work on other computers/systems/platforms in order to have loaded same level?
The code you show would write the first (lowest-address) byte of a[0]'s object representation - which may or may not be the byte with the value 213. The particular object representation of an int is imeplementation defined.
The portable way of writing one byte with the value of 213 would be
unsigned char c = a[0];
write(&c, 1);
You have the right idea, but it could use a bit of refinement.
{
int intToWrite = 213;
unsigned char byteToWrite = 0;
if ( intToWrite > 255 || intToWrite < 0 )
{
doError();
return();
}
// since your range is 0-255, you really want the low order byte of the int.
// Just reading the 1st byte may or may not work for your architecture. I
// prefer to let the compiler handle the conversion via casting.
byteToWrite = (unsigned char) intToWrite;
write( &byteToWrite, sizeof(byteToWrite) );
// you can hard code the size, but I try to be in the habit of using sizeof
// since it is better when dealing with multibyte types
}
{
int a = 0;
unsigned char toRead = 0;
// just like the write, the byte ordering of the int will depend on your
// architecture. You could write code to explicitly handle this, but it's
// easier to let the compiler figure it out via implicit conversions
read( &toRead, sizeof(toRead) );
a = toRead;
cout<<a;
}
If you need to minimize space or otherwise can't afford the extra char sitting around, then it's definitely possible to read/write a particular location in your integer. However, it can need linking in new headers (e.g. using htons/ntons) or annoying (using platform #defines).
It will work, with some caveats:
Use reinterpret_cast<char*>(x) instead of (char*)x to be explicit that you’re performing a cast that’s ordinarily unsafe.
sizeof(int) varies between platforms, so you may wish to use a fixed-size integer type from <cstdint> such as int32_t.
Endianness can also differ between platforms, so you should be aware of the platform byte order and swap byte orders to a consistent format when writing the file. You can detect endianness at runtime and swap bytes manually, or use htonl and ntohl to convert between host and network (big-endian) byte order.
Also, as a practical matter, I recommend you prefer text-based formats—they’re less compact, but far easier to debug when things go wrong, since you can examine them in any text editor. If you determine that loading and parsing these files is too slow, then consider moving to a binary format.
This is my first time around, and I really hope you guys can help me, as I have ran out of ideas by now.
I have searched for an answer for a couple of hours now, and could not find an answer that would actually work.
I would like to directly inject code into a running process. Yes, you have read it right. I am trying to inject code into another application, and - believe it or not - this is only to extend the functionality of it.
I am using Visual Studio 2012 Express Edition on Windows.
I have the following code:
__declspec(naked) void Foo()
{
__asm
{
// Inline assembly code here
}
}
__declspec(naked) void FooEnd() {}
int main()
{
cout << HEX(Foo) << endl;
cout << HEX(FooEnd) << endl;
cout << (int)FooEnd - (int)Foo << endl;
// Inject code here using WriteProcessMemory
return 0;
}
Most of the code has been removed in order to maintain readability, though I can post other portions of it on request.
Output is the following:
0x010B1000
0x010B1010
16
The resulting size is actually incorrect. The functions are compiled in the right order (made sure using /ORDER), but the compiler adds a bunch of 0xCC (int 3) bytes after each method which extends it's size, and so I can't get the real (useful) number of bytes that contains actual executable code.
In another stackoverflow question, it has been said that disabling "Edit and Continue" would make these extra bytes go away, but no matter what, that didn't work for me.
I also tried using Release setup instead of Debug, changed a bunch of optimization settings, but none of these had any effect. What do you think could be the solution? I may be missing something obvious.
Anyway, is this (in your opinion) the best way to acquire a function's length (readability, reliability, ease of use)?
I hope I explained everything I had to in order for you to be able to help. If you have further questions, please feel free to leave a comment.
Thanks for your time and efforts.
As Devolus points out, the compiler is inserting these extra bytes after your code in order to align the next function on a reasonable (usually divisible by 16) starting address.
The compiler is actually trying to help you since 0xCC is the breakpoint instruction, the code will break into the debugger (if attached) should the instruction pointer accidentally point outside a function at any point during execution.
None of this should worry you for your purposes. You can consider the 0xCC padding as part of the function.
You don't need the extra padding when you're injecting the code, so it's fine to discard them. It should also be fine to copy them over, it will just result in a few extra bytes of copying. Chances are the memory you're injecting to will by a page-aligned block anyway, so you're not really gaining anything by stripping it out.
But if you really want to strip it out, a simple solution to your problem would be to just iterate backwards from the last byte before the next function, until there are no more 0xcc bytes.
i.e.:
__declspec(naked) void Foo()
{
__asm
{
_emit 0x4A
_emit 0x4B
}
}
__declspec(naked) void FooEnd() {}
int main(int argc, char** argv)
{
//start at the last byte of the memory-aligned code instead of the first byte of FooEnd
unsigned char* fooLast = (unsigned char*)FooEnd-1;
//keep going backwards until we don't have a 0xcc
while(*fooLast == 0xCC)
fooLast--;
//fooLast will now point at the last byte of your function, so you need to add 1
int length = ((int)fooLast - (int)Foo) + 1;
//should output 2 for the length of Foo
std::cout << length;
}
The extra bytes are inserted by the compiler to create a memory alignment, so you can't discard it, since you are using the next function as a marker.
On the other hand, since you are writing the injected code in assembly anyway, you can just as well write the code, compile it, and then put the binary form in a byte array. That's how I would do this, because then you have the exact length.
I've got this DLL I made. It's injected to another process. Inside the other process,
I do a search from it's memory space with the following function:
void MyDump(const void *m, unsigned int n)
{
const char *p = reinterpret_cast(m);
for (unsigned int i = 0; i < n; ++i) {
// Do something with p[i]...
}
}
Now my question. If the target process uses a data structure, let's say
struct S
{
unsigned char a;
unsigned char b;
unsigned char c;
};
Is it always presented the same way in the process' memory? I mean, if S.a = 2 (which always follows b = 3, c = 4), is the structure presented in a continuous row in the process' memory space, like
Offset
---------------------
0x0000 | 0x02 0x03 0x04
Or can those variables be in a different places there, like
Offset
---------------------
0x0000 | 0x00 0x02 0x00
0x03fc | 0x00 0x03 0x04
If the latter one, how to reconstruct the data-structure from various points from the memory?
Many thanks in advance,
nhaa123
If your victim is written in C or C++, and the datatypes used are truly that simple, then you'll always find them as a single block of bytes in memory.
But as soon as you have C++ types like std::string that observation no longer holds. For starters, the exact layout will differ between C++ compilers, and even different versions of the same compiler. The bytes of a std::string will likely not be in a contiguous array, but sometimes they are. If they're split in two, finding the second half probably will not help you in finding the first half.
Not throw in more complicated environments like a JIT'ting JVM running a Java app. The types you encounter in memory are very very complex; one could write a book about decoding them.
The order of member will always be the same and the structure will occupy a contiguous memory block.
Depending on a compiler padding might be added between members but it still will be the same if the program is recompiled with the same compiler and the same settings. If padding is added and you are unaware of it you can't detect it reliably at runtime - all the information the compiler had is lost to that moment and you are left to just analyze the patterns and guess.
It depends on the alignment of the structure.
If you have something like this:
struct A
{
int16_t a;
char b;
int32_t c;
char d;
}
then by default on 32bit platform( I dont know if that is true for 64bit ), the offset of c is 4 as there is one byte padded after b, and after d there are 3 more bytess padded at the end (if I remember correctly).
It will be different if the structure has a specified alignment.
Now my question. If the target process uses a data structure [...] is it always presented the same way in the process' memory? I mean, if S.a = 2 (which always follows b = 3, c = 4), is the structure presented in a continuous row in the process' memory space?
Yes, however it will often be padded to align members in ways you may not expect. Thus, simply recreating the data structure in order to interface with it via code injection.
I would highly recommend using ReClassEx or ReClass.NET, two open-source programs created specifically for reconstructing data structures from memory and generating useable C++ code! Check out a screenshot: