Delay on call to MATLAB function - c++

I call a MATLAB function (dll) from my C++ code. This function gets an array as a parameter.
Function does some calculation on each member of array.
I did two tests.
For the first time I called this function with an array with 24 elements.
For the second time I called this function three times with 8 elements.
The second test took twice more time.
Why ?
Does enter into MATLAB function and exit from it take a lot of time ?
If yes, why ?

What you've noticed is that it costs a fair amount of time to call into a MEX function. Consider the minimum that Matlab has to do:
Scan the Matlab path to make sure that the function maps to the MEX file (and that the MEX file hasn't changed)
Load the MEX function from its DLL or shared library, and then resolve its mexFunction symbol.
Allocate arrays of input and output parameters, and initialize them
Call your function
Look for and free any temporary variables that your MEX function loaded
Free the arrays of input and output parameters
In theory, Matlab can use caching to avoid the first two steps. I'm not sure if it does, though. None of the subsequent steps can be skipped, or even really optimized by the Matlab interpreter (or its JIT compiler). Basically, if your calculation is fast, then you'll spend a lot more time calling the MEX function than actually running it.
You've already hit on the way to maximize MEX performance, which is to have the MEX function do as much as possible with each call.
In addition to having it work on as much data as you can on each call, you should also push any simple outer loops into the MEX function. Simple loops are easy to implement in MEX functions. They're also faster than loops in Matlab (even JIT-compiled Matlab), and avoid the cost of repeatedly calling the MEX function.
You can also see if judicious use of the mexLock function will help. You should provide some way to unlock the MEX function with mexUnlock, or you may start leaking memory, and will also have to restart your Matlab session every time you change the MEX function.

Related

How to print stack backtrace of functions defined in shared library?

I'm optimizing the performance of some codes, and I find that the call of func() in third-party shared library is too slow every a few minutes. For example, if I called it for 10 times in a loop, the first call was very slow compared the other calls. A few minutes later I repeated the same operation and got the same result. I suspect that the root cause is cache missing(data and instructions). So if I can load the instructions to cache before calling this function, maybe the first call of this function will be faster than before.
I plan to using memcpy() to access the address of func() before calling it. But the problem is that func() calls other functions which also are defined in shared library, and I can't access the address of those functions.
According to this question: How to make backtrace()/backtrace_symbols() print the function names?, I know backtrace()/backtrace_symbols() can be used to print stack backtrace, but is it possible print stack backtrace of function call in third-party shared library?
BTW: I would appreciate any advice and suggestions about preloading instructions to cache.

MEX code in a MATLAB Wrapper

I have the following code:
for i=1:N,
some_mex_file();
end
My MEX file does the following:
Declares an object, of a class I defined, that has 2 large memory blocks, i.e., 32x2048x2 of type double.
Processes the data in this object.
Destroys the object.
I am wondering if it takes more time when I call a MEX file in a loop that allocates large memory blocks for its object. I was thinking of migrating to C++ so that I can declare the object only once and just reset its memory space so that it can be used again and again without new declaration. Is this going to make a difference or going to be a worthless effort? In other words, does it take more time to allocate a memory in MEX file than to declare it once and reuse it?
So, the usual advice here applies: Profile your code (both in Matlab and using a C/C++ profiler), or at least stop it in a debugger several times to see where it's spending its time. Stop "wondering" about where it's spending its time, and actually measure where it's spending its time.
However, I have run into problems like this, where allocating/deallocating memory in the MEX function is the major performance sink. You should verify this, however, by profiling (or stopping the code in a debugger).
The easiest solution to this kind of performance problem is twofold:
Move the loop into the MEX function. Call the MEX function with an iteration count, and let your fast C/C++ code actually perform the loop. This eliminates the cost of calling from Matlab into your MEX function (which can be substantial for large N), and facilitates the second optimization:
Have your MEX function cache its allocation/deallocation, which is much, much easier (and safer) to do if you move the loop into the MEX function. This can be done several ways, but the easiest is to just allocate the space once (outside the loop), and deallocate it once the loop is done.

Converting a string into a function in c++

I have been looking for a way to dynamically load functions into c++ for some time now, and I think I have finally figure it out. Here is the plan:
Pass the function as a string into C++ (via a socket connection, a file, or something).
Write the string into file.
Have the C++ program compile the file and execute it. If there are any errors, catch them and return it.
Have the newly executed program with the new function pass the memory location of the function to the currently running program.
Save the location of the function to a function pointer variable (the function will always have the same return type and arguments, so
this simplifies the declaration of the pointer).
Run the new function with the function pointer.
The issue is that after step 4, I do not want to keep the new program running since if I do this very often, many running programs will suck up threads. Is there some way to close the new program, but preserve the memory location where the new function is stored? I do not want it being overwritten or made available to other programs while it is still in use.
If you guys have any suggestions for the other steps as well, that would be appreciated as well. There might be other libraries that do things similar to this, and it is fine to recommend them, but this is the approach I want to look into — if not for the accomplishment of it, then for the knowledge of knowing how to do so.
Edit: I am aware of dynamically linked libraries. This is something I am largely looking into to gain a better understanding of how things work in C++.
I can't see how this can work. When you run the new program it'll be a separate process and so any addresses in its process space have no meaning in the original process.
And not just that, but the code you want to call doesn't even exist in the original process, so there's no way to call it in the original process.
As Nick says in his answer, you need either a DLL/shared library or you have to set up some form of interprocess communication so the original process can send data to the new process to be operated on by the function in question and then sent back to the original process.
How about a Dynamic Link Library?
These can be linked/unlinked/replaced at runtime.
Or, if you really want to communicated between processes, you could use a named pipe.
edit- you can also create named shared memory.
for the step 4. we can't directly pass the memory location(address) from one process to another process because the two process use the different virtual memory space. One process can't use memory in other process.
So you need create a shared memory through two processes. and copy your function to this memory, then you can close the newly process.
for shared memory, if in windows, looks Creating Named Shared Memory
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx
after that, you still create another memory space to copy function to it again.
The idea is that the normal memory allocated only has read/write properties, if execute the programmer on it, the CPU will generate the exception.
So, if in windows, you need use VirtualAlloc to allocate the memory with the flag,PAGE_EXECUTE_READWRITE (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx)
void* address = NULL;
address= VirtualAlloc(NULL,
sizeof(emitcode),
MEM_COMMIT|MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
After copy the function to address, you can call the function in address, but need be very careful to keep the stack balance.
Dynamic library are best suited for your problem. Also forget about launching a different process, it's another problem by itself, but in addition to the post above, provided that you did the virtual alloc correctly, just call your function within the same "loadder", then you shouldn't have to worry since you will be running the same RAM size bound stack.
The real problems are:
1 - Compiling the function you want to load, offline from the main program.
2 - Extract the relevant code from the binary produced by the compiler.
3 - Load the string.
1 and 2 require deep understanding of the entire compiler suite, including compiler flag options, linker, etc ... not just the IDE's push buttons ...
If you are OK, with 1 and 2, you should know why using a std::string or anything but pure char *, is an harmfull.
I could continue the entire story but it definitely deserve it's book, since this is Hacker/Cracker way of doing things I strongly recommand to the normal user the use of dynamic library, this is why they exists.
Usually we call this code injection ...
Basically it is forbidden by any modern operating system to access something for exceution after the initial loading has been done for sake of security, so we must fall back to OS wide validated dynamic libraries.
That's said, one you have valid compiled code, if you realy want to achieve that effect you must load your function into memory then define it as executable ( clear the NX bit ) in a system specific way.
But let's be clear, your function must be code position independant and you have no help from the dynamic linker in order to resolve symbol ... that's the hard part of the job.

How do you call a function in another address-space in C++

I'm aware of the threading issues etc that this could cause and of its dangers but I need to know how to do this for a security project I am doing at school. I need to know how to call a function in a remote address space of a given calling convention with arguments - preferably recovering the data the remote function has returned though its really not required that I do.
If I can get specifics from the remote function's function prototype at compile time, I will be able to make this method work. I need to know how big the arguments are and if the arguments are explicitly declared as pointers or not (void*, char*, int*, etc...)
I.e if I define a function prototype like:
typedef void (__cdecl *testFunc_t)(int* pData);
I would need to, at compile time, get the size of arguments at least, and if I could, which ones are pointers or not. Here we are assuming the remote function is either an stdcall or _cdecl call.
The IDE I am using is Microsoft Visual Studio 2007 in case the solution is specific to a particular product.
Here is my plan:
Create a thread in the remote process using CreateRemoteThread at the origin of the function want to call, though I would do so in a suspended state.
I would setup the stack such that the return address was that of a stub of code allocated inside of the process that would call ExitThread(eax) - as this would exit the thread with the function's return value - I would then recover this by by using GetExitCodeThread
I would also copy the arguments for the function call from my local stack to that of the newly created thread - this is where I need to know if function arguments are pointers and the size of the arguments.
Resume the thread and wait for it to exit, at which point I will return to the caller with the threads exit code.
I know that this should be doable at compile time but whether the compiler has some method I can use to do it, I'm not sure. I'm also aware all this data can be easily recovered from a PDB file created after compiling the code and that the size of arguments might change if the compiler performs optimizations. I don't need to be told how dangerous this is, as I am fully aware of it, but this is not a commercial product but a small project I must do for school.
The question:
If I have a function prototype such as
typedef void (__cdecl testFunc_t)(int pData);
Is there anyway I can get the size of this prototype's arguments at compile time(i.e in the above example, the arguments would sum to a total size of sizeof(int*) If, for example, I have a function like:
template<typename T> unsigned long getPrototypeArgLength<T>()
{
//would return size of arguments described in the prototype T
}
//when called as
getPrototypeArgLength<testFunc>()
This seems like quite a school project...
For step 3 you can use ReadProcessMemory / WriteProcessMemory (one of them). For example, the new thread could receive the address (on the calling process), during the thread creation, of the parameters on the start (begin and end). Then it could read the caller process memory from that region and copy it to its own stack.
Did you consider using COM for this whole thing? you could probably get things done much easier if you use a mechanism that was designed especially for that.
Alright, I figured out that I can use the BOOST library to get a lot of type information at compile-time. Specifically, I am using boost::function_traits however, if you look around the boost library, you will find that you can recover quite a bit of information. Here's a bit of code I wrote to demonstrate how to get the number of arguments of a function prototype.
(actually, I haven't tested the below code, its just something I'm throwing together from another function I've made and tested.)
template<typename T>
unsigned long getArgCount()
{
return boost::function_traits<boost::remove_pointer<T>::type>::arity;
}
void (*pFunc)(int, int);
2 = getArgCount<BOOST_TYPEOF(pFunc)>();

Overhead of DLL function call

How big is a performance penalty when calling functions from DLL? Loading DLL is not an issue for us, number of calls to our highperf library will not be big.
Approximately, how many instructions/clock-cycles does one call take over a static library call?
My answer is based on how the Linux/glibc/ELF dynamic linker works, but I would assume the overall answer is the same for other platforms:
There is a difference between the first call to a dynamically loaded symbol and the next calls. The first call is expensive, can involve many cycles. All other calls are more or less 1 - 2 instruction away.
The way it works is that the linker set up an entry in the Procedure Linkage Table that grabs an address for that outside function from the Global Offset Table. At first call the address of the GOT points to a stub that runs the dynamic linker to resolve the real address of the function in the DLL. This can take a lot of cycles, but once it is done once, the dynamic linker will path the GOT entry to point directly to the function, so the next time the PLT code is called is will call directly to the function.
Here is a link to a fairly good walk through of this process: http://www.technovelty.org/linux/pltgot.html