I was trying to develop a better understanding of the linkers and how they work, so I tried to call the simple function(printf) from the c library (MSVCRTD.lib) but with assembly code on MASM.
I dissected the external symbols from the "MSVCRTD.lib" library which has many printf's functions like:
__imp__printf
_printf
___imp___printf_l
;and more ...
I had 2 challenges (linking/building) and (running).
as for the first challenge linking my assembly code to the library was not a problem at all, I could link my assembly code with any call to any external function of the library,all I needed just to mimic the decorated (Mangled) name of the function so the linker can recognize it. I first tried the second one "_printf" which locked shorter and nicer, and after disassembling it's code I knew that it takes 2 parameters on the stack and it's a cdecl calling convention, so I write the code it needs and it was:
.386
.model flat,stdcall
.stack 4096
option casemap :none
Extern printf :PROC ; MASM will decorate it to be "_printf"
.data
message byte "Hello C library, this is MASM calling"
.code
main proc
push 0
push offset message
call printf
add esp,8 ; clean the stack
retn
main endp
end
and shoot! every thing was smooth .
but when I tried the same thing with "_imp__printf" the problems start.
BTY: this function is the one that the c compiler calls when you write the famous hello world! c application
the linker successfully build the program but when I run the program it crashes!
I read the linker output messages and every thing looks normal except for the line that says: " Discarded _printf from MSVCRTD.lib(MSVCR100D.dll)".
I debugged the program with OllyDBG and I found that the call instruction that should land on the function actually lands on an area that is recognized as DATA ! in the .rdata section
why the "_printf" function succeed and the "__imp__printf" didn't :( , any idias?
Thanks for Mr. Jester and Mr. Raymond Chen
they provided the solution for the problem in the comments.
it was the declaration of the __imp__printf. that is declared as a PROC like the working example _printf but there were DATA so declaring.
Extern _imp__printf :DWORD
will makes it work as printf
thank you so much , both of you
EDIT: this is not the solution , but I will leave it for later reference. the solution in the next answer.
I think I'm close to figure out why calling _imp__printf external function failed because of a problem in the jumping instruction, here is what I did..
I tried to build the hello world! C program to see how the _imp__printf function will look like in the symbols table if the file compiled as a C program instead of assembly , I then dumped the OBJ file that was compiled from both calling program (the MASM/ and the c ), and the results were very interesting , here is the _imp__printf in the OBJ compiled from the C file
and here is the _imp__printf in the OBJ compiled from the MASM file
Interesting! and after referencing to the COFF documentation It seems that the relocation type REL32 will force the linker not to process the import address table correctly so the jump instruction will fall as it happened before.
now my question is "how I can tell MASM to assemble the file with _imp__printf symbol type "DIR32" ?
Related
I came across this code recently that compiles, but runs with segmentation fault(g++).
Here's the original link from topcoder
topcoder profile
#include <iostream>
int main = ( std::cout << "Hello world!\n", 42 );
This also compiles
int main=0;
Can someone explain what's happening in this program. Using g++
This is all silly games. Both programs violate the requirement "a program shall contain a global function called main" (3.6.1p1). Those programs may fool some compilers because they define a symbol main, but that symbol is not a function at all! No wonder at least one of them crashes when the runtime tries to use that main symbol as a function.
The shortest valid C++03 program in a hosted implementation:
int main(){}
Sorry, the code you posted is not a valid C++ program. A valid C++ program must have an entry point that is a function of name main in the global scope with one of the signatures dictated by the standard. The shortest valid program in C++ is:
int main(){}
The following code will work:
char main[]="\xb4\x00\xcd\x16\xcd\x20";
This assigns the machine level code of the following to the symbol main, which is a char array.
mov ah,0
int 16h ; Wait for a keyboard input i.e getch();
int 20h ; Exit to DOS
Compiler sees a symbol main and compiles the code properly. It passes the control to the symbol main (which is the default action of a C/C++ compiler), where it finds the machine code. Hence, it executes properly.
If you're really interested in the size of the executable the number of lines of code really isn't important, at least not to me. What matters is machine instructions and the size of the file. Here are two really great links:
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux.
Tiny PE (Portable Executable, the win32 and x64 executable file format).
In short, the smallest possible executable does not necessarily depend on the number of lines of code but many other things besides. This is some seriously interesting engineering, in my opinion.
The smallest c++ programm is :
int main(){}
output : https://i.stack.imgur.com/X6xiK.png
I am writing code for embedded programming on an ARM 32-bit based SAM D51 microprocessor using the IDE SEGGER Studio. I'm new to embedded programming and am writing my first interrupt.
The vector table with a dummy handler is written in ARM assembly, here's one example, the following code is auto-generated on project creation but marked as weak so it can be overridden in C/C++
ldr r0, =__stack_end__
mov sp, r0
bl SystemInit
b _start
.thumb_func
.weak SystemInit
SystemInit:
bx lr
Anywhere I read online says to simply add a C/C++ function with identical name and it's magically used by the linker because it's not marked as weak. Below is where I'm overriding it.
void SystemInit()
{
printf("Here");
}
However the debugger states that it can't place a breakpoint there because there's no code and in the disassembler it reveals that the entire function has been made into a comment with no code.
I've tried other functions including many of the handler functions but they all do the exact same thing and I have no idea why.
I've even tried forward declaring a weak function and then overriding it or marking the function as volatile. Here is my latest attempt but with the same result:
extern void __attribute__((weak)) SystemInit();
void SystemInit()
{
printf("Here");
}
and another attempt
volatile void SystemInit()
{
printf("Here");
}
They all end in no code being generated for the function and it appearing as a comment in disassembly.
Identifiers in C++ source files have their names mangled. The resulting linker function name is different from SystemInit. To stop C++ compiler from mangling the function name, declare the function with extern "C". That way generated function name will match the function name expected by the linker.
I would like to call ARM/ARM64 ASM code from C++. ASM code contains syscall and a relocation to external function.
ARM architecture here is not so important, I just want to understand how to solve my problem conceptually.
I have following ASM syscall (output from objdump -d) which is called inside shared library:
198: d28009e8 mov x8, #0x4f // #79
19c: d4000001 svc #0x0
1a0: b140041f cmn x0, #0x1, lsl #12
1a4: da809400 cneg x0, x0, hi
1a8: 54000008 b.hi 0 <__set_errno_internal>
1ac: d65f03c0 ret
This piece of code calls fstatat64 syscall and sets errno through external __set_errno_internal function.
readelf -r shows following relocation for __set_errno_internal function:
00000000000001a8 R_AARCH64_CONDBR19 __set_errno_internal
I want to call this piece of code from C++, so I converted it to buffer:
unsigned char machine_code[] __attribute__((section(".text"))) =
"\xe8\x09\x80\xd2"
"\x01\x00\x00\xd4"
"\x1f\x04\x40\xb1"
"\x00\x94\x80\xda"
"\x08\x00\x00\x54" // Here we have mentioned relocation
"\xc0\x03\x5f\xd6";
EDIT: Important detail - I chose to use buffer (not inline assembly etc) because I want to run extra processing on this buffer (for example decryption function on string literal as a software protection mechanism but that's not important here) before it gets evaluated as machine code.
Afterwards, buffer can be cast to function and called directly to execute machine code. Obviously there is a problem with relocation, it's not fixed automatically and I have to fix it manually. But during run-time I can't do it because .text section is read-only & executable.
Although I have almost full control over source code I must not turn off stack protection & other features to make that section writable (don't ask why). So it seems that relocation fix should be performed during link stage somehow. As far as I know shared library contains relative offsets (for similar external function calls) after relocations are fixed by linker and binary *.so file should contain correct offsets (without need of run-time relocation work), so fixing that machine_code buffer during linking should be possible.
I'm using manually built Clang 7 compiler and I have full control over LLVM passes so I thought maybe it's possible to write some kind of LLVM pass which executes during link time. Though it looks like ld is called in the end so maybe LLVM passes will not help here (not an expert here).
Different ideas would be appreciated also.
As you can see problem is pretty complicated. Maybe you have some directions/ideas how to solve this? Thanks!
There's already a working, packaged mechanism to handle relocations. It's called dlsym(). While it doesn't directly give you a function pointer, all major C++ compilers support reinterpret_casting the result of dlsym to any ordinary function pointer. (Member functions are another issue altogether, but that's not relevant here)
Intel CPU
Windows 10 64bit
C++
x86 assembly
I have two programs, both written by me in C++. For the sake of simplicity I will refer to them as program A and program B. They do not do anything special really, I am just using them to test things out and have some fun in the process.
The idea is that program A injects code into program B and that injected code will set the parameters of a function in program B and will call a function in program B.
I must say I have learned a lot from this experiment. As I needed to open up a handle to a process with proper permissions and then construct assembly code to inject, call it with CreateRemoteThread and clean up afterwards.
I ve managed to do this and call a function from program B and that function takes one parameter of type UINT64.
I do this by injecting the following assembly code:
b9 paramAddr
e8 funcAddr
c3
By calling this code snippet from program A with CreateRemoteThread in program B I manage to call a function at an address and with a parameter passed. And this works fine. Nothing too complex just call a function that takes one param. One thing to note here is that I have injected the parameter prior to this code and just provided a parameter address to b9.
Now what I am failing to do is call a function in program B from program A that takes two parameters.
Function Example:
myFunction(uint num1, int num2)
The procedure for injection is the same, and all that works just fine windows API provides plenty of well documented functionalities.
What I do not seam to be able to do is pass the two parameters to the function. This is where my troubles begin. I have been looking at x86 assembly function call conventions. And what they do is either just
push param2
push param1
call functAddr
retn
or
perform a mov to esi
Could anyone please clarify,explain and provide a clear example of how to call a function in x86 assembly that takes two parameters or type uint and int.
Thank you all for your time and effort.
Since you are looking for a way to understand and clarify what is happening internally, I recommend to start with generating an assembler file for the specific machine you are working with. If you are using gcc or g++ you can use the -S flag to generate the associated assembler files. For the beginning you can implement a function with two arguments and call that function inside your main function. Using the assembler files, you should get a really good picture of how the stack is filled before your function is called and where your return value is put. In the next step you should compare what you see in the assembler file with the x86 calling convetion.
I have disassembled code in arm. I want to know the corresponding line number of these instructions in its original source file.
Also, I would like to understand few things.
a function for example say android::CameraHardware::createInstance is being shown in assembly as _ZN7android18CameraHardware14createInstanceEib . I am not even completely sure if this is the right function i am supposed to compare it with or not.
Why are names so strange and things are appended in front and back? I generally do the same for C code. There function names look straight forward in disassembled code.
So to summarize I have two questions.
Inside GDB, is there a way i could get the line number of a
particular line of assembly instruction?
Say for example at 0x40d9078c, i want to know which line it
corresponds to in its source file. I tried info line. No use. Any
other suggestions?
When we are understanding the disassembly of cpp code, how to
understand the naming conventions? Also what other things we need to
understand as prerequisites?
Thanks.
The translation from android::CameraHardware::createInstance to _ZN7android18CameraHardware14createInstanceEib is called "name mangling", and is normal for C++. It is how you can have multiple functions with the same name, taking different parameters, and get the linker to tell you that "I couldn't find a foo(int x, double y)" when you only declared it, but didn't define it.
In Linux, you can use c++filt to translate a mangled name to its unmangled form (assuming it's compiled with Linux style mangling convention - which android does - but if you were to take a Microsoft compiled piece of code, it clearly wouldn't work).
If you compile with debug symbols, gdb should be able to show you the source for a given piece of code. Add -g to the g++ line in the compile.