Calling a function in C file from multiple threads - c++

I am working with a code base with some .C files and .CPP files.
The multiple threads in the system calls some functions in C files as the one given below.
void CalculateCrc(PWORD pwCRC, PBYTE pbyBuffer, DWORD dwBufferLen)
{
WORD wFCS = 0xffff;
ASSERT(pbyBuffer);
ASSERT(pwCRC);
while(dwBufferLen--)
{
wFCS = (WORD)(wFCS >> 8) ^ tbl_FCS[(wFCS ^ *pbyBuffer++) & 0xff];
}
wFCS ^= 0xffff; // complement
*pwCRC = wFCS;
}
For each calling thread will there be copies of arguments[pwCRC, pbyBuffer, dwBufferLen] and non-static data members of function [WORD wFCS], or will there be only a single set of data shared by all threads that will result in data corruption and make the calls from multiple threads unsafe?
I am not a native English speaker. Forgive me if the question is not asked in a clear manner.
Thanks for your time.

I believe each thread will have its own stack, which is a copy of the spawning process' stack (I hope I am technically correct on this one). They do share address space and heap though.
So anything that existed before spawn will be shared. Anything created after is thread-local. And since everything is passed by value, and data member is non-static, it will be created thread-local.
Your function per-se is safe. However, since you work with pointers, you need to take care that two threads do not work over the same memory area. The variables are safe, the memory is not.

The function will have its own copy of pwCRC and dwBufferLen BUT NOT pbyBuffer because you are passing it as a pointer.
I give two solutions:
A. ensure that all threads only have read (or no) access to pbyBuffer while this function is called; or (if the data is rather small
You could do this by making a copy.
B. Pass the buffer by value. you can do this by using a structure
struct buffer
{
char buffer [LEN] ;
}
This only works if the buffer is small. If I remember correctly, the C++ standard limits the size of the call stack as a concession to the VAX architecture. Your compiler might exceed the limits of the standard. Even so, it is not a good idea to kill the stack with large arguments.

will there be copies of arguments[pwCRC, pbyBuffer, dwBufferLen]
In C, the arguments are passed by value, so for each call from different threads these will have different copied. However, if the passed variables are global/shared by the threads then all such threads will pass the same variables.
In your case PWORD pwCRC, PBYTE pbyBuffer are pointers. If these are shared between the threads then also, your function is not thread-safe. As multiple threads may try to change the value pointed by these pointers.
non-static data members of function [WORD wFCS]
Yes, there will be copy for each function call.

Related

How to make SAFEARRAYs threadsafe if locking is unsafe?

I'm writing a C++ DLL that takes SAFEARRAYs passed in from Excel VBA. The DLL has multiple threads sharing the SAFEARRAYs (both reading from and writing to them). While trying to figure out how to do the sharing safely, I came across this bit of MSDN documentation:
For example, consider an application that uses the SafeArrayLock and SafeArrayUnlock functions. If these functions are called concurrently from different threads on the same SAFEARRAY data type instance, an inconsistent lock count may be created. This will eventually cause the SafeArrayUnlock function to return E_UNEXPECTED. You can prevent this by providing your own synchronization code.
It's confusing me because I thought the whole point of locks was to ensure thread safety and clearly this locking functionality is not intended for that. But why would you need locking in a single-threaded application?
The documentation for SafeArrayLock also says that the function "places a pointer to the array data in pvData of the array descriptor" but by my tests, the pvData pointer is valid even when SafeArrayLock has never been called (and the lock count is 0). For example, this function:
void __declspec(dllexport) __stdcall testfun(VARIANT& vararr) {
if (vararr.parray->cLocks != 0) throw -1;
else {
double* data = (double*) vararr.parray->pvData;
data[5] = 4.1;
}
}
effectively writes to the array stored in vararr and the change is visible in the VBA that calls it. What's up with that?
Given the seeming persistence of pvData and the unsafe locking mechanism, my instinct is to just scrap all the array manipulation functions and let my threads reach into pvData as they please (the writes never collide, so what could go wrong?) but others on here caution against manual array manipulation for unclear reasons. What's the right approach? Thanks in advance.

Do cpp object methods have their own stack frame?

I have a hypothesis here, but it's a little tough to verify.
Is there a unique stack frame for each calling thread when two threads invoke the same method of the same object instance? In a compiled binary, I understand a class to be a static code section filled with function definitions in memory and the only difference between different objects is the this pointer which is passed beneath the hood.
But therefore the thread calling it must have its own stack frame, or else two threads trying to access the same member function of the same object instance, would be corrupting one another's local variables.
Just to reiterate here, I'm not referring to whether or not two threads can corrupt the objects data by both modifying this at the same time, I'm well aware of that. I'm more getting at whether or not, in the case that two threads enter the same method of the same instance at the same time, whether or not the local variables of that context are the same places in memory. Again, my assumption is that they are not.
You are correct. Each thread makes use of its own stack and each stack makes local variables distinct between threads.
This is not specific to C++ though. It's just the way processors function. (That is in modern processors, some older processors had only one stack, like the 6502 that had only 256 bytes of stack and no real capability to run threads...)
Objects may be on the stack and shared between threads and thus you can end up modifying the same object on another thread stack. But that's only if you share that specific pointer.
you are right that different threads have unique stacks. That is not a feature of c++ or cpp but something provided by the OS. class objects won't necessary be different. This depends on how they are allocated. Different threads could share heap objects which might lead to concurrent problem.
Local variables of any function or class method are stored in each own stack (actually place in thread's stack, stack frame), so it is doesn't matter from what thread you're calling method - it will use it's own stack during execution for each call
a little different explanation: each method call creates its own stack (or better stack frame)
NOTE: static variables will be the same
of course there exists techniques to get access to another's method's stack memory during execution, but there are kinda hacks

Thread-safety with C++ and passing by reference

I wanted to confirm my understanding of threads and passing by reference in C++. Is the following function thread safe?
QString sA = "hello";
QString sB = "world";
bool someFlag = AreStringsEqual(sA,sB);
...
bool AreStringsEqual(QString const &stringA, QString const &stringB)
{
if(stringA == stringB)
{ return true; }
return false;
}
I think it is thread safe. I'd like it if someone could confirm my thought process, or tell me I have no idea what I'm talking about :)
There are two copies of sA and sB in the process's memory. One set is created on Thread1's stack and the second set is created on Thread2's stack. Because we passed by reference, each thread only needs one set of sA and sB in memory to execute the function call.
If we had passed by value instead, there could be up to four copies of sA and sB in the process's memory (each thread having two sets) at some time point where both threads were trading processor control within the function call.
In no case is memory shared here, therefore the function is thread safe.
Sorry if this question is super simple, threads have fried my brain :)
Pris
There's no reason why two threads wouldn't hold references to the same strings.
This function is not thread-safe because the statement if(stringA == stringB) is not atomic.
First you fetch stringA from memory, and only then string B.
Let's stay stringA == stringB == 2.
You fetch stringA, then there's a context switch and both stringA and stringB change to 3. Then you fetch stringB. Your function would return false (because 2 != 3) although stringA was equal to stringB all along.
Your question is a little vague on where sA and sB are declared. It sounds like they are declared inside a function, in which case you're correct that each thread would have it's own version of sA and sB. But, in the odd chance that they are declared at global scope, this is not the case. If I understand your question correctly, you meant that the two were declared at local scope, so your first point is correct. By the same token, your second point is correct as well.
Your third point is tricky, though. In your particular case, no memory is shared, so your program is a "thread-safe" program (not sure if that's a good way to word it). However, the function AreStringsEqual is not thread-safe. At some point in the future, you (or someone else) could use the function with data that is shared, and the function itself does not guard itself against this usage.
Unless QString has specified that operator== is thread safe, the function is not thread safe. The implementation of AreStringsEqual does nothing itself to protect the data.
You are putting the responsibility of thread safety on the client with this implementation. The client must ensure the parameters and the parameters' internal data does not mutate (e.g. by another thread) while in AreStringsEqual. Consequently, they may find themselves making unnecessary copies. How exactly this must happen is dictated by the implementation of QString. Even std::string implementations vary dramatically =)
For strings in concurrent contexts, one would generally take a copy before moving the string into a concurrent context. If it really needs to be shared, you'll need something to protect it (such as a lock). For primitive collections (e.g. std::string and std::vector), you'll want to avoid locking at every access because it would kill performance and could fail rather easily. So, you'd generally copy or lock if you must share objects which are not explicitly thread safe.
Therefore, the implementation of AreStringsEqual is not thread safe (again, unless bool QString::operator==(const QString&) const is guaranteed to be thread safe).
However, your usage of AreStringsEqual:
QString sA = "hello";
QString sB = "world";
bool someFlag = AreStringsEqual(sA,sB);
would be fine for the majority of string implementations, because the parameters and their data would be local to the thread.
The function is not thread safe if sA and sB are shared between threads.
It is quite possible that during the execution of function AreStringsEqual in one thread, another thread tries to modify the value of sA or sB or both, then there would be a Race condition.
While your function is not modifying the value, code outside your function can.
So it is better to use pass by value, as then the function will have local copies on the stack
which is guaranteed to be thread safe
First of all, it's not clear as to why you would need two copies of the same string if they are always to have equal value.
Perhaps it's thread safe based on the context you described, but simply looking at the function it self, it's not Thread Safe, since by the time the if condition is executed, values of the strings may have changed.

Why do thread creation methods take an argument?

All thread create methods like pthread_create() or CreateThread() in Windows expect the caller to provide a pointer to the arg for the thread. Isn't this inherently unsafe?
This can work 'safely' only if the arg is in the heap, and then again creating a heap variable
adds to the overhead of cleaning the allocated memory up. If a stack variable is provided as the arg then the result is at best unpredictable.
This looks like a half-cooked solution to me, or am I missing some subtle aspect of the APIs?
Context.
Many C APIs provide an extra void * argument so that you can pass context through third party APIs. Typically you might pack some information into a struct and point this variable at the struct, so that when the thread initializes and begins executing it has more information than the particular function that its started with. There's no necessity to keep this information at the location given. For instance you might have several fields that tell the newly created thread what it will be working on, and where it can find the data it will need. Furthermore there's no requirement that the void * actually be used as a pointer, its a typeless argument with the most appropriate width on a given architecture (pointer width), that anything can be made available to the new thread. For instance you might pass an int directly if sizeof(int) <= sizeof(void *): (void *)3.
As a related example of this style: A FUSE filesystem I'm currently working on starts by opening a filesystem instance, say struct MyFS. When running FUSE in multithreaded mode, threads arrive onto a series of FUSE-defined calls for handling open, read, stat, etc. Naturally these can have no advance knowledge of the actual specifics of my filesystem, so this is passed in the fuse_main function void * argument intended for this purpose. struct MyFS *blah = myfs_init(); fuse_main(..., blah);. Now when the threads arrive at the FUSE calls mentioned above, the void * received is converted back into struct MyFS * so that the call can be handled within the context of the intended MyFS instance.
Isn't this inherently unsafe?
No. It is a pointer. Since you (as the developer) have created both the function that will be executed by the thread and the argument that will be passed to the thread you are in full control. Remember this is a C API (not a C++ one) so it is as safe as you can get.
This can work 'safely' only if the arg is in the heap,
No. It is safe as long as its lifespan in the parent thread is as long as the lifetime that it can be used in the child thread. There are many ways to make sure that it lives long enough.
and then again creating a heap variable adds to the overhead of cleaning the allocated memory up.
Seriously. That's an argument? Since this is basically how it is done for all threads unless you are passing something much more simple like an integer (see below).
If a stack variable is provided as the arg then the result is at best unpredictable.
Its as predictable as you (the developer) make it. You created both the thread and the argument. It is your responsibility to make sure that the lifetime of the argument is appropriate. Nobody said it would be easy.
This looks like a half-cooked solution to me, or am i missing some subtle aspects of the APIs?
You are missing that this is the most basic of threading API. It is designed to be as flexible as possible so that safer systems can be developed with as few strings as possible. So we now hove boost::threads which if I guess is build on-top of these basic threading facilities but provide a much safer and easier to use infrastructure (but at some extra cost).
If you want RAW unfettered speed and flexibility use the C API (with some danger).
If you want a slightly safer use a higher level API like boost:thread (but slightly more costly)
Thread specific storage with no dynamic allocation (Example)
#include <pthread.h>
#include <iostream>
struct ThreadData
{
// Stuff for my thread.
};
ThreadData threadData[5];
extern "C" void* threadStart(void* data);
void* threadStart(void* data)
{
intptr_t id = reinterpret_cast<intptr_t>(data);
ThreadData& tData = threadData[id];
// Do Stuff
return NULL;
}
int main()
{
for(intptr_t loop = 0;loop < 5; ++loop)
{
pthread_t threadInfo; // Not good just makes the example quick to write.
pthread_create(&threadInfo, NULL, threadStart, reinterpret_cast<void*>(loop));
}
// You should wait here for threads to finish before exiting.
}
Allocation on the heap does not add a lot of overhead.
Besides the heap and the stack, global variable space is another option. Also, it's possible to use a stack frame that will last as long as the child thread. Consider, for example, local variables of main.
I favor putting the arguments to the thread in the same structure as the pthread_t object itself. So wherever you put the pthread record, put its arguments as well. Problem solved :v) .
This is a common idiom in all C programs that use function pointers, not just for creating threads.
Think about it. Suppose your function void f(void (*fn)()) simply calls into another function. There's very little you can actually do with that. Typically a function pointer has to operate on some data. Passing in that data as a parameter is a clean way to accomplish this, without, say, the use of global variables. Since the function f() doesn't know what the purpose of that data might be, it uses the ever-generic void * parameter, and relies on you the programmer to make sense of it.
If you're more comfortable with thinking in terms of object-oriented programming, you can also think of it like calling a method on a class. In this analogy, the function pointer is the method and the extra void * parameter is the equivalent of what C++ would call the this pointer: it provides you some instance variables to operate on.
The pointer is a pointer to the data that you intend to use in the function. Windows style APIs require that you give them a static or global function.
Often this is a pointer to the class you are intending to use a pointer to this or pThis if you will and the intention is that you will delete the pThis after the ending of the thread.
Its a very procedural approach, however it has a very big advantage which is often overlooked, the CreateThread C style API is binary compatible so that when you wrap this API with a C++ class (or almost any other language) you can do this actually do this. If the parameter was typed, you wouldn't be able to access this from another language as easily.
So yes, this is unsafe but there's a good reason for it.

Is it possible to use function pointers across processes?

I'm aware that each process creates it's own memory address space, however I was wondering,
If Process A was to have a function like :
int DoStuff() { return 1; }
and a pointer typedef like :
typedef int(DoStuff_f*)();
and a getter function like :
DoStuff_f * getDoStuff() { return DoStuff; }
and a magical way to communicate with Process B via... say boost::interprocess
would it be possible to pass the function pointer to process B and call
Process A's DoStuff from Process B directly?
No. All a function pointer is is an address in your process's address space. It has no intrinsic marker that is unique to different processes. So, even if your function pointer just happened to still be valid once you've moved it over to B, it would call that function on behalf of process B.
For example, if you had
////PROCESS A////
int processA_myfun() { return 3; }
// get a pointer to pA_mf and pass it to process B
////PROCESS B////
int processB_myfun() { return 4; } // This happens to be at the same virtual address as pA_myfun
// get address from process A
int x = call_myfun(); // call via the pointer
x == 4; // x is 4, because we called process B's version!
If process A and B are running the same code, you might end up with identical functions at identical addresses - but you'll still be working with B's data structures and global memory! So the short answer is, no, this is not how you want to do this!
Also, security measures such as address space layout randomization could prevent these sort of "tricks" from ever working.
You're confusing IPC and RPC. IPC is for communicating data, such as your objects or a blob of text. RPC is for causing code to be executed in a remote process.
In short, you cannot use function pointer that passed to another process.
Codes of function are located in protected pages of memory, you cannot write to them. And each process has isolated virtual address space, so address of function is not valid in another process. In Windows you could use technique described in this article to inject your code in another process, but latest version of Windows rejects it.
Instead of passing function pointer, you should consider creating a library which will be used in both processes. In this case you could send message to another process when you need to call that function.
If you tried to use process A's function pointer from process B, you wouldn't be calling process A - you'd call whatever is at the same address in process B. If they are the same program you might get lucky and it will be the same code, but it won't have access to any of the data contained in process A.
A function pointer won't work for this, because it only contains the starting address for the code; if the code in question doesn't exist in the other process, or (due to something like address space randomization) is at a different location, the function pointer will be useless; in the second process, it will point to something, or nothing, but almost certainly not where you want it to.
You could, if you were insane^Wdaring, copy the actual instruction sequence onto the shared memory and then have the second process jump directly to it - but even if you could get this to work, the function would still run in Process B, not Process A.
It sounds like what you want is actually some sort of message-passing or RPC system.
This is why people have invented things like COM, RPC and CORBA. Each of them gives this general kind of capability. As you'd guess, each does so the job a bit differently from the others.
Boost IPC doesn't really support remote procedure calls. It will enable putting a variable in shared memory so its accessible to two processes, but if you want to use a getter/setter to access that variable, you'll have to do that yourself.
Those are all basically wrappers to produce a "palatable" version of something you can do without them though. In Windows, for example, you can put a variable in shared memory on your own. You can do the same in Linux. The Boost library is a fairly "thin" library around those, that lets you write the same code for Windows or Linux, but doesn't try to build a lot on top of that. CORBA (for one example) is a much thicker layer, providing a relatively complete distributed environment.
If both processes are in the same application, then this should work. If you are trying to send function pointers between applications then you are out of luck.
My original answer was correct if you assume a process and a thread are the same thing, which they're not. The other answers are correct - different processes cannot share function pointers (or any other kind of pointers, for that matter).