Overhead of DLL function call - c++

How big is a performance penalty when calling functions from DLL? Loading DLL is not an issue for us, number of calls to our highperf library will not be big.
Approximately, how many instructions/clock-cycles does one call take over a static library call?

My answer is based on how the Linux/glibc/ELF dynamic linker works, but I would assume the overall answer is the same for other platforms:
There is a difference between the first call to a dynamically loaded symbol and the next calls. The first call is expensive, can involve many cycles. All other calls are more or less 1 - 2 instruction away.
The way it works is that the linker set up an entry in the Procedure Linkage Table that grabs an address for that outside function from the Global Offset Table. At first call the address of the GOT points to a stub that runs the dynamic linker to resolve the real address of the function in the DLL. This can take a lot of cycles, but once it is done once, the dynamic linker will path the GOT entry to point directly to the function, so the next time the PLT code is called is will call directly to the function.
Here is a link to a fairly good walk through of this process: http://www.technovelty.org/linux/pltgot.html

Related

How to print stack backtrace of functions defined in shared library?

I'm optimizing the performance of some codes, and I find that the call of func() in third-party shared library is too slow every a few minutes. For example, if I called it for 10 times in a loop, the first call was very slow compared the other calls. A few minutes later I repeated the same operation and got the same result. I suspect that the root cause is cache missing(data and instructions). So if I can load the instructions to cache before calling this function, maybe the first call of this function will be faster than before.
I plan to using memcpy() to access the address of func() before calling it. But the problem is that func() calls other functions which also are defined in shared library, and I can't access the address of those functions.
According to this question: How to make backtrace()/backtrace_symbols() print the function names?, I know backtrace()/backtrace_symbols() can be used to print stack backtrace, but is it possible print stack backtrace of function call in third-party shared library?
BTW: I would appreciate any advice and suggestions about preloading instructions to cache.

Call function from executable

I want to call a function from an executable. The only way to reach that process is to inject a dll in the parent process. I can inject a dll in the parent process but how do I call a function from the child process?
Something like
_asm
{
call/jmp address
}
doesnt work. I hope you understand what I mean.
If you are running inside the process, you need to know the offset of the function you want to call from the base of the module (the exe) which contains the function. Then, you just need to make a function pointer and call it.
// assuming the function you're calling returns void and takes 0 params
typedef void(__stdcall * voidf_t)();
// make sure func_offset is the offset of the function when the module is loaded
voidf_t func = (voidf_t) (((uint8_t *)GetModuleHandle('module_name')) + func_offset);
func(); // the function you located is called here
The solution you have will work on 32bit systems (inline assembly is not permitted in 64 bit) if you know the address of the function, but you'll need to make sure you implement the calling convention properly. The code above uses GetModuleHandle to resolve the currently loaded base of the module whose function you want to call.
Once you've injected your module into the running process ASLR isn't really an issue, since you can just ask windows for the base of the module containing the code you wish to call. If you want to find the base of the exe running the current process, you can call GetModuleHandle with a parameter of NULL. If you are confident that the function offset is not going to change, you can hard code the offset of the function you wish to call, after you've found the offset in a disassembler or other tool. Assuming the exe containing the function isn't altered, that offset will be constant.
As mentioned in the comments, the calling convention is important in the function typedef, make sure it matches the calling convention of the function you're calling.
Execution Fundamentals
To call a function you need an address or a interrupt number. The address is loaded into the Program Counter register and execution is transferred. Some processors allow for "Software Interrupts", in which the program executes a special instruction that invokes the software interrupt. This is the foundation for executing functions.
More Background -- Relative Addresses
There are two common forms of executables: Absolute Addressing and Relative (or Position Independ Code,PIC). In absolute addressing, the functions are at hard-coded addresses. The functions won't move. Usually used in embedded systems.
In the relative addressing model, the addresses are relative to the value in the Program Counter register. For example, your function may be 1024 bytes away, so the compiler would emit a relative branch instruction for 1024 bytes (away).
Operating Systems and Moving Targets
Many operating systems load programs in different places for each invocation. This means your executable may start at address 1000, and the next time at address 127654. In these operating systems, there is no guarantee that an executable will be launched at the same location each time.
Executing within your program
Executing functions within your program is easy. The linker decides where all the functions will be located and determines how to execute them; whether to use absolute addressing, PIC or a mixture.
Executing Functions in another Executable
With the above knowledge, there are issues with executing functions in another program:
Location of the Function in the external executable
Determining if the executable is active
Calling protocol for the executable
Most executables do not contain any information about where their functions are, so you will need to know where it is. You will also need to know if the function is absolute addressing or PIC. You will also need to know if the function is in memory when you need it or if the OS has paged the function to the hard drive.
Knowing the function location is necessary. However, the location is of no use if the OS has not loaded the executable. Before you call a function in another executable, you will need to know if it is present in memory when the call is executed.
Lastly, you will need to know the protocol used for the external function. For example, are the values passed by register? Are they on the stack? Are they passed by pointer (address)?
A Solution: Shared Libraries
Operating systems (OS) have evolved to allow for dynamically sharing of functions. These functions exist in Dynamically Linked Libraries (DLL) or Shared Library(.SO). Your program tells the OS to load the library into memory, then you tell the OS to execute the function by giving it the name of the function.
The caveat is that the function you desire must be in a library. If the executable doesn't use a shared library or the function you need is not in a library, then your mission is more difficult.

Call external function(from one exe to another)

Lets say the process 1 is the main process and the process 2 is the target process(i can't edit it by the way), i want to be able to call a function from the process 2 in the process 1, anyone have a nice way to do that?I was thinking in inject a dll with exports that calls that function and use GetProcAddress externally...Is that possible?Is that the best way to do it?
Thanks for the time.
The title and body of your question ask two subtly different questions.
Having one executable call a function that's contained in another executable is quite easy, at least if the name of the function in question has been exported. You can use LoadLibrary to load an executable just like you would a DLL, then use GetProcAddress to get the address of the function you want to call, and call it normally. Keep in mind, however, that the function may not work correctly without other initialization that happens before it's called inside its own executable.
Calling a function in the context of another process (not just in another executable) is considerably more work. The basic idea is to have a function that makes the call and (for example) writes a result to some memory shared with the process making the call. You then use CreateRemoteThread to have that function execute in the context of the process containing the function you need to call.
If the target process has been written to support it there are other methods such as COM that are intended to support this type of capability much more cleanly. They're generally preferable if available.

Load a DLL More Than Once?

I'm using the LoadLibrary function to load a DLL in Windows. My question is this: If I call this method more than once for the same DLL, do I get handles to different instances of the DLL, or will they all refer to the same instance?
Additionally, how does this behaviour correlate to Linux SO files, is it the same or completely different, and what assumptions can I make in this regard? Thanks.
The MSDN documentation states:
The system maintains a per-process reference count on all loaded
modules. Calling LoadLibrary increments the reference count. Calling
the FreeLibrary or FreeLibraryAndExitThread function decrements the
reference count. The system unloads a module when its reference count
reaches zero or when the process terminates (regardless of the
reference count).
So it would appear that loading the module more than once (without matching calls to FreeLibrary) will return the same handle.
If the DLL is already loaded, LoadLibrary will simply return the address of the library in memory. However, DllMain is not called again with DLL_PROCESS_ATTACH when the second load is attempted. Handles in the sense of libraries are just memory locations, so the value you get the second time around should be the same as the first.
As far as linux SO files go, I don't see why they would load twice either. However, someone else will have to weigh in on this to give you a proper answer.
For Linux shared objects, from the dlopen(3) manpage:
If the same library is loaded again with dlopen(), the same file
handle is returned. The dl library maintains reference counts for
library handles, so a dynamic library is not deallocated until
dlclose() has been called on it as many times as dlopen() has
succeeded on it. The _init() routine, if present, is only called once.
But a subsequent call with RTLD_NOW may force symbol resolution for a
library earlier loaded with RTLD_LAZY.

How to link non thread-safe library so each thread will have its own global variables from it?

I have a program that I link with many libraries. I run my application on profiler and found out that most of the time is spent in "waiting" state after some network requests.
Those requests are effect of my code calling sleeping_function() from external library.
I call this function in a loop which executes many, many times so all waiting times sum up to huge amounts.
As I cannot modify the sleeping_function() I want to start a few threads to run a few iterations of my loop in parallel. The problem is that this function internally uses some global variables.
Is there a way to tell linker on SunOS that I want to link specific libraries in a way that will place all variables from them in Thread Local Storage?
I don’t think you’ll be able to achieve this with just the linker, but you might be able to get something working with some code in C.
The problem is that a call to load a library that is already loaded will return a reference to the already loaded instance instead of loading a new copy. A quick look at the documentation for dlopen and LoadLibrary seems to confirm that there’s no way to load the same library more than once, at least not if you want the image to be prepared for execution. One way to circumvent this would be to prevent the OS from knowing that it is the same library. To do this you could make a copy of the file.
Some pseudo code, just replace calls to sleeping_function with calls to call_sleeping_function_thread_safe:
char *shared_lib_name
void sleeping_function_thread_init(char *lib_name);
void call_sleeping_function_thread_safe()
{
void *lib_handle;
pthread_t pthread;
new_file_name = make_copy_of_file(shared_lib_name);
pthread_create(&pthread, NULL, sleeping_function_thread_init, new_file_name);
}
void sleeping_function_thread_init(char *lib_name)
{
void *lib_handle;
void (*)() sleeping_function;
lib_handle = dlopen(lib_name, RTLD_LOCAL);
sleeping_function = dlsym(lib_handle, "sleeping_function")
while (...)
sleeping_function;
dlclose(lib_handle);
delete_file(lib_name);
}
For windows dlopen becomes LoadLibrary and dlsym becomes GetProcAddress etc... but the basic idea would still work.
In general, this is a bad idea. Global data isn't the only issue that may prevent a non thread-safe library from running in a multithreaded environment.
As one example, what if the library had a global variable that points to a memory-mapped file that it always maps into a single, hardcoded address. In this case, with your technique, you would have one global variable per thread, but they would all point to the same memory location, which would be trashed by multi-threaded access.