How to print stack backtrace of functions defined in shared library? - c++

I'm optimizing the performance of some codes, and I find that the call of func() in third-party shared library is too slow every a few minutes. For example, if I called it for 10 times in a loop, the first call was very slow compared the other calls. A few minutes later I repeated the same operation and got the same result. I suspect that the root cause is cache missing(data and instructions). So if I can load the instructions to cache before calling this function, maybe the first call of this function will be faster than before.
I plan to using memcpy() to access the address of func() before calling it. But the problem is that func() calls other functions which also are defined in shared library, and I can't access the address of those functions.
According to this question: How to make backtrace()/backtrace_symbols() print the function names?, I know backtrace()/backtrace_symbols() can be used to print stack backtrace, but is it possible print stack backtrace of function call in third-party shared library?
BTW: I would appreciate any advice and suggestions about preloading instructions to cache.

Related

Memory leak check in C/C++ under heavy load

I am curious to know how memory leaks are detected in a C/C++ based product under heavy load (linux platform).
I am aware of Valgrind does a great job in finding memory leak, invalid access etc.
But with valgrind, product need to operate at low load. With valgrind, you can not expect to run product at high load.
Under high load, product code execution path may be different. In that case if memory leak is there, how to catch that memory leak.
Is there any such tool available?
Followed these steps to override system functions.
1. Made my own .so file, that implements system function
2. Pre loaded the .so file in system using LD_PRELOAD
3. Thats all, after that when I execute any program it was call my custom function instead of system function.
But I had one issue, recursive calling.
When my custom function is called, internally I call system function, again it calls my function so on...
To stop it, I did not call system function directly. Instead I searched as below
static void * (*func)();
if(!func)
func = (void *(*)()) dlsym(RTLD_NEXT, "malloc");
return(func(size));
Thanks all for helping to resolve this.

Call function from executable

I want to call a function from an executable. The only way to reach that process is to inject a dll in the parent process. I can inject a dll in the parent process but how do I call a function from the child process?
Something like
_asm
{
call/jmp address
}
doesnt work. I hope you understand what I mean.
If you are running inside the process, you need to know the offset of the function you want to call from the base of the module (the exe) which contains the function. Then, you just need to make a function pointer and call it.
// assuming the function you're calling returns void and takes 0 params
typedef void(__stdcall * voidf_t)();
// make sure func_offset is the offset of the function when the module is loaded
voidf_t func = (voidf_t) (((uint8_t *)GetModuleHandle('module_name')) + func_offset);
func(); // the function you located is called here
The solution you have will work on 32bit systems (inline assembly is not permitted in 64 bit) if you know the address of the function, but you'll need to make sure you implement the calling convention properly. The code above uses GetModuleHandle to resolve the currently loaded base of the module whose function you want to call.
Once you've injected your module into the running process ASLR isn't really an issue, since you can just ask windows for the base of the module containing the code you wish to call. If you want to find the base of the exe running the current process, you can call GetModuleHandle with a parameter of NULL. If you are confident that the function offset is not going to change, you can hard code the offset of the function you wish to call, after you've found the offset in a disassembler or other tool. Assuming the exe containing the function isn't altered, that offset will be constant.
As mentioned in the comments, the calling convention is important in the function typedef, make sure it matches the calling convention of the function you're calling.
Execution Fundamentals
To call a function you need an address or a interrupt number. The address is loaded into the Program Counter register and execution is transferred. Some processors allow for "Software Interrupts", in which the program executes a special instruction that invokes the software interrupt. This is the foundation for executing functions.
More Background -- Relative Addresses
There are two common forms of executables: Absolute Addressing and Relative (or Position Independ Code,PIC). In absolute addressing, the functions are at hard-coded addresses. The functions won't move. Usually used in embedded systems.
In the relative addressing model, the addresses are relative to the value in the Program Counter register. For example, your function may be 1024 bytes away, so the compiler would emit a relative branch instruction for 1024 bytes (away).
Operating Systems and Moving Targets
Many operating systems load programs in different places for each invocation. This means your executable may start at address 1000, and the next time at address 127654. In these operating systems, there is no guarantee that an executable will be launched at the same location each time.
Executing within your program
Executing functions within your program is easy. The linker decides where all the functions will be located and determines how to execute them; whether to use absolute addressing, PIC or a mixture.
Executing Functions in another Executable
With the above knowledge, there are issues with executing functions in another program:
Location of the Function in the external executable
Determining if the executable is active
Calling protocol for the executable
Most executables do not contain any information about where their functions are, so you will need to know where it is. You will also need to know if the function is absolute addressing or PIC. You will also need to know if the function is in memory when you need it or if the OS has paged the function to the hard drive.
Knowing the function location is necessary. However, the location is of no use if the OS has not loaded the executable. Before you call a function in another executable, you will need to know if it is present in memory when the call is executed.
Lastly, you will need to know the protocol used for the external function. For example, are the values passed by register? Are they on the stack? Are they passed by pointer (address)?
A Solution: Shared Libraries
Operating systems (OS) have evolved to allow for dynamically sharing of functions. These functions exist in Dynamically Linked Libraries (DLL) or Shared Library(.SO). Your program tells the OS to load the library into memory, then you tell the OS to execute the function by giving it the name of the function.
The caveat is that the function you desire must be in a library. If the executable doesn't use a shared library or the function you need is not in a library, then your mission is more difficult.

Delay on call to MATLAB function

I call a MATLAB function (dll) from my C++ code. This function gets an array as a parameter.
Function does some calculation on each member of array.
I did two tests.
For the first time I called this function with an array with 24 elements.
For the second time I called this function three times with 8 elements.
The second test took twice more time.
Why ?
Does enter into MATLAB function and exit from it take a lot of time ?
If yes, why ?
What you've noticed is that it costs a fair amount of time to call into a MEX function. Consider the minimum that Matlab has to do:
Scan the Matlab path to make sure that the function maps to the MEX file (and that the MEX file hasn't changed)
Load the MEX function from its DLL or shared library, and then resolve its mexFunction symbol.
Allocate arrays of input and output parameters, and initialize them
Call your function
Look for and free any temporary variables that your MEX function loaded
Free the arrays of input and output parameters
In theory, Matlab can use caching to avoid the first two steps. I'm not sure if it does, though. None of the subsequent steps can be skipped, or even really optimized by the Matlab interpreter (or its JIT compiler). Basically, if your calculation is fast, then you'll spend a lot more time calling the MEX function than actually running it.
You've already hit on the way to maximize MEX performance, which is to have the MEX function do as much as possible with each call.
In addition to having it work on as much data as you can on each call, you should also push any simple outer loops into the MEX function. Simple loops are easy to implement in MEX functions. They're also faster than loops in Matlab (even JIT-compiled Matlab), and avoid the cost of repeatedly calling the MEX function.
You can also see if judicious use of the mexLock function will help. You should provide some way to unlock the MEX function with mexUnlock, or you may start leaking memory, and will also have to restart your Matlab session every time you change the MEX function.

Overhead of DLL function call

How big is a performance penalty when calling functions from DLL? Loading DLL is not an issue for us, number of calls to our highperf library will not be big.
Approximately, how many instructions/clock-cycles does one call take over a static library call?
My answer is based on how the Linux/glibc/ELF dynamic linker works, but I would assume the overall answer is the same for other platforms:
There is a difference between the first call to a dynamically loaded symbol and the next calls. The first call is expensive, can involve many cycles. All other calls are more or less 1 - 2 instruction away.
The way it works is that the linker set up an entry in the Procedure Linkage Table that grabs an address for that outside function from the Global Offset Table. At first call the address of the GOT points to a stub that runs the dynamic linker to resolve the real address of the function in the DLL. This can take a lot of cycles, but once it is done once, the dynamic linker will path the GOT entry to point directly to the function, so the next time the PLT code is called is will call directly to the function.
Here is a link to a fairly good walk through of this process: http://www.technovelty.org/linux/pltgot.html

How to link non thread-safe library so each thread will have its own global variables from it?

I have a program that I link with many libraries. I run my application on profiler and found out that most of the time is spent in "waiting" state after some network requests.
Those requests are effect of my code calling sleeping_function() from external library.
I call this function in a loop which executes many, many times so all waiting times sum up to huge amounts.
As I cannot modify the sleeping_function() I want to start a few threads to run a few iterations of my loop in parallel. The problem is that this function internally uses some global variables.
Is there a way to tell linker on SunOS that I want to link specific libraries in a way that will place all variables from them in Thread Local Storage?
I don’t think you’ll be able to achieve this with just the linker, but you might be able to get something working with some code in C.
The problem is that a call to load a library that is already loaded will return a reference to the already loaded instance instead of loading a new copy. A quick look at the documentation for dlopen and LoadLibrary seems to confirm that there’s no way to load the same library more than once, at least not if you want the image to be prepared for execution. One way to circumvent this would be to prevent the OS from knowing that it is the same library. To do this you could make a copy of the file.
Some pseudo code, just replace calls to sleeping_function with calls to call_sleeping_function_thread_safe:
char *shared_lib_name
void sleeping_function_thread_init(char *lib_name);
void call_sleeping_function_thread_safe()
{
void *lib_handle;
pthread_t pthread;
new_file_name = make_copy_of_file(shared_lib_name);
pthread_create(&pthread, NULL, sleeping_function_thread_init, new_file_name);
}
void sleeping_function_thread_init(char *lib_name)
{
void *lib_handle;
void (*)() sleeping_function;
lib_handle = dlopen(lib_name, RTLD_LOCAL);
sleeping_function = dlsym(lib_handle, "sleeping_function")
while (...)
sleeping_function;
dlclose(lib_handle);
delete_file(lib_name);
}
For windows dlopen becomes LoadLibrary and dlsym becomes GetProcAddress etc... but the basic idea would still work.
In general, this is a bad idea. Global data isn't the only issue that may prevent a non thread-safe library from running in a multithreaded environment.
As one example, what if the library had a global variable that points to a memory-mapped file that it always maps into a single, hardcoded address. In this case, with your technique, you would have one global variable per thread, but they would all point to the same memory location, which would be trashed by multi-threaded access.