I'm writing a C++ DLL that takes SAFEARRAYs passed in from Excel VBA. The DLL has multiple threads sharing the SAFEARRAYs (both reading from and writing to them). While trying to figure out how to do the sharing safely, I came across this bit of MSDN documentation:
For example, consider an application that uses the SafeArrayLock and SafeArrayUnlock functions. If these functions are called concurrently from different threads on the same SAFEARRAY data type instance, an inconsistent lock count may be created. This will eventually cause the SafeArrayUnlock function to return E_UNEXPECTED. You can prevent this by providing your own synchronization code.
It's confusing me because I thought the whole point of locks was to ensure thread safety and clearly this locking functionality is not intended for that. But why would you need locking in a single-threaded application?
The documentation for SafeArrayLock also says that the function "places a pointer to the array data in pvData of the array descriptor" but by my tests, the pvData pointer is valid even when SafeArrayLock has never been called (and the lock count is 0). For example, this function:
void __declspec(dllexport) __stdcall testfun(VARIANT& vararr) {
if (vararr.parray->cLocks != 0) throw -1;
else {
double* data = (double*) vararr.parray->pvData;
data[5] = 4.1;
}
}
effectively writes to the array stored in vararr and the change is visible in the VBA that calls it. What's up with that?
Given the seeming persistence of pvData and the unsafe locking mechanism, my instinct is to just scrap all the array manipulation functions and let my threads reach into pvData as they please (the writes never collide, so what could go wrong?) but others on here caution against manual array manipulation for unclear reasons. What's the right approach? Thanks in advance.
Related
I am working with a code base with some .C files and .CPP files.
The multiple threads in the system calls some functions in C files as the one given below.
void CalculateCrc(PWORD pwCRC, PBYTE pbyBuffer, DWORD dwBufferLen)
{
WORD wFCS = 0xffff;
ASSERT(pbyBuffer);
ASSERT(pwCRC);
while(dwBufferLen--)
{
wFCS = (WORD)(wFCS >> 8) ^ tbl_FCS[(wFCS ^ *pbyBuffer++) & 0xff];
}
wFCS ^= 0xffff; // complement
*pwCRC = wFCS;
}
For each calling thread will there be copies of arguments[pwCRC, pbyBuffer, dwBufferLen] and non-static data members of function [WORD wFCS], or will there be only a single set of data shared by all threads that will result in data corruption and make the calls from multiple threads unsafe?
I am not a native English speaker. Forgive me if the question is not asked in a clear manner.
Thanks for your time.
I believe each thread will have its own stack, which is a copy of the spawning process' stack (I hope I am technically correct on this one). They do share address space and heap though.
So anything that existed before spawn will be shared. Anything created after is thread-local. And since everything is passed by value, and data member is non-static, it will be created thread-local.
Your function per-se is safe. However, since you work with pointers, you need to take care that two threads do not work over the same memory area. The variables are safe, the memory is not.
The function will have its own copy of pwCRC and dwBufferLen BUT NOT pbyBuffer because you are passing it as a pointer.
I give two solutions:
A. ensure that all threads only have read (or no) access to pbyBuffer while this function is called; or (if the data is rather small
You could do this by making a copy.
B. Pass the buffer by value. you can do this by using a structure
struct buffer
{
char buffer [LEN] ;
}
This only works if the buffer is small. If I remember correctly, the C++ standard limits the size of the call stack as a concession to the VAX architecture. Your compiler might exceed the limits of the standard. Even so, it is not a good idea to kill the stack with large arguments.
will there be copies of arguments[pwCRC, pbyBuffer, dwBufferLen]
In C, the arguments are passed by value, so for each call from different threads these will have different copied. However, if the passed variables are global/shared by the threads then all such threads will pass the same variables.
In your case PWORD pwCRC, PBYTE pbyBuffer are pointers. If these are shared between the threads then also, your function is not thread-safe. As multiple threads may try to change the value pointed by these pointers.
non-static data members of function [WORD wFCS]
Yes, there will be copy for each function call.
I have a multithreaded program with the main thread being third-party (can't change it) and pure C. My task is to build new modules (in C++) around it, those reside partly in other threads and need to use the C program's interface. Basically just reading some variables (ints, floats, nothing complicated) that are stored and updated in the C thread.
Now to my question: How can I make sure that I don't get rubbish out of the C interface when accessing those variables as I can't use a mutex to lock it while reading. Is this even possible? Or is writing a float/int an atomic operation anyway?
Statements like "writing a float/int [is] an atomic operation anyway" are, unfortunately, not well defined in C or C++ (although with the use of std::atomic in C++11 and the stdatomic.h methods from C11 can help here - but that's not going to help you with C interop for a library you can't modify, so you can probably ignore it here).
You can find guidance about these issues on specific compilers and platforms - for example, you can probably figure out that on most platforms, aligned 32-bit or 64-bit reads or writes will be atomic, if aligned, and that most compilers will align them appropriately.
However, down this road lies madness. If you have multiple threads involve, just use POSIX/pthreads functionality, like pthreads mutexes - which are easily accessible from both C and C++, to guard any access to state shared across threads.
Since you can't modify the C code, you may have to do all the locking in the C++ code, before any call to the C library, unlocking after. If you can read, but not modify the C code, or the document is very clear about the threading/sharing model, you may be able to use a fine grained locking strategy, but in the absence of any profiling indicating a bottleneck, I'd start with one global lock you use to guard every access to the C API.
You can't. The only right way to work in this scenario is to work only with arguments which are provided to your functions by the calling C thread - and not store any references to them afterwards. There's no way to guarantee any variables will not be modified - in general case.
You need to rethink your architecture so that such need does not arise.
If you are unable to make sure that the code which sets the variables values is synchronized, puting a lock while reading is pointless and won't work. It's not only the atomicity of the operations, it's also data visibility - updates to those variables may not be visible to other threads.
If you control the main thread you have to create a new variable for each of those you have to access, access it from the main thread, and using locks, set the value of the newly created variable. Then, from other threads, access only those synchronized variables.
int myVal = 0;
int main() {
while(!shouldQuit()) {
doSomeIndependentStuff();
pthread_lock(&mutex);
myVal = independentGlobalVal;
pthread_unlock(&mutex);
}
}
int getMyVal() {
int retVal = 0;
pthread_lock(&mutex);
retVal = myVal;
pthread_unlock(&mutex);
return retval;
}
You cannot. Reading and writing anything is not an atomic operation and if you cannot change the C code, you are out of luck. Syncronization always needs both parts to be syncronized.
Your best bet is to ask the third party to make their part thread safe and/or share a locking mechanism with you.
In Microsoft Visual C++ I can call CreateThread() to create a thread by starting a function with one void * parameter. I pass a pointer to a struct as that parameter, and I see a lot of other people do that as well.
My question is if I am passing a pointer to my struct how do I know if the structure members have been actually written to memory before CreateThread() was called? Is there any guarantee they won't be just cached? For example:
struct bigapple { string color; int count; } apple;
apple.count = 1;
apple.color = "red";
hThread = CreateThread( NULL, 0, myfunction, &apple, 0, NULL );
DWORD WINAPI myfunction( void *param )
{
struct bigapple *myapple = (struct bigapple *)param;
// how do I know that apple's struct was actually written to memory before CreateThread?
cout << "Apple count: " << myapple->count << endl;
}
This afternoon while I was reading I saw a lot of Windows code on this website and others that passes in data that is not volatile to a thread, and there doesn't seem to be any memory barrier or anything else. I know C++ or at least older revisions are not "thread aware" so I'm wondering if maybe there's some other reason. My guess would be the compiler sees that I've passed a pointer &apple in a call to CreateThread() so it knows to write out members of apple before the call.
Thanks
No. The relevant Win32 thread functions all take care of the necessary memory barriers. All writes prior to CreateThread are visible to the new thread. Obviously the reads in that newly created thread cannot be reordered before the call to CreateThread.
volatile would not add any extra useful constraints on the compiler, and merely slow down the code. In practice thiw wouldn't be noticeable compared to the cost of creating a new thread, though.
No, it should not be volatile. At the same time you are pointing at the valid issue. Detailed operation of the cache is described in the Intel/ARM/etc papers.
Nevertheless you can safely assume that the data WILL BE WRITTEN. Otherwise too many things will be broken. Several decades of experience tell that this is so.
If thread scheduler will start thread on the same core, the state of the cache will be fine, otherwise, if not, kernel will flush the cache. Otherwise, nothing will work.
Never use volatile for interaction between threads. It is an instruction on how to handle data inside the thread only (use a register copy or always reread, etc).
First, I think optimizer cannot change the order at expense of the correctness. CreateThread() is a function, parameter binidng for function calls happens before the call is made.
Secondly, volatile is not very helpful for the purpose you intend. Check out this article.
You're struggling into a non-problem, and are creating at least other two...
Don't worry about the parameter given to CreateThread: if they exist at the time the thread is created they exist until CreateThread returns. And since the thread who creates them does not destroy them, they are also available to the other thread.
The problem now becomes who and when they will be destroyed: You create them with new so they will exist until a delete is called (or until the process terminates: good memory leak!)
The process terminate when its main thread terminate (and all other threads will also be terminated as well by the OS!). And there is nothing in your main that makes it to wait for the other thread to complete.
Beware when using low level API like CreateThread form languages that have thir own library also interfaced with thread. The C-runtime has _beginthreadex. It call CreateThread and perform also other initialization task for the C++ library you will otherwise miss. Some C (and C++) library function may not work properly without those initializations, that are also required to properly free the runtime resources at termination. Unsing CreateThread is like using malloc in a context where delete is then used to cleanup.
The proper main thread bnehavior should be
// create the data
// create the other thread
// // perform othe task
// wait for the oter thread to terminate
// destroy the data
What the win32 API documentation don't say clearly is that every HANDLE is waitable, and become signaled when the associate resource is freed.
To wait for the other thread termination, you main thread will just have to call
WaitForSingleObject(hthread,INFINITE);
So the main thread will be more properly:
{
data* pdata = new data;
HANDLE hthread = (HANDLE)_beginthreadex(0,0,yourprocedure, pdata,0,0);
WaitForSingleObject(htread,INFINITE);
delete pdata;
}
or even
{
data d;
HANDLE hthread = (HANDLE)_beginthreadex(0,0,yourprocedure, &d,0,0);
WaitForSingleObject(htread,INFINITE);
}
I think the question is valid in another context.
As others have pointed out using a struct and the contents is safe (although access to the data should by synchronized).
However I think that the question is valid if you hav an atomic variable (or a pointer to one) that can be changed outside the thread. My opinion in that case would be that volatile should be used in this case.
Edit:
I think the examples on the wiki page are a good explanation http://en.wikipedia.org/wiki/Volatile_variable
All thread create methods like pthread_create() or CreateThread() in Windows expect the caller to provide a pointer to the arg for the thread. Isn't this inherently unsafe?
This can work 'safely' only if the arg is in the heap, and then again creating a heap variable
adds to the overhead of cleaning the allocated memory up. If a stack variable is provided as the arg then the result is at best unpredictable.
This looks like a half-cooked solution to me, or am I missing some subtle aspect of the APIs?
Context.
Many C APIs provide an extra void * argument so that you can pass context through third party APIs. Typically you might pack some information into a struct and point this variable at the struct, so that when the thread initializes and begins executing it has more information than the particular function that its started with. There's no necessity to keep this information at the location given. For instance you might have several fields that tell the newly created thread what it will be working on, and where it can find the data it will need. Furthermore there's no requirement that the void * actually be used as a pointer, its a typeless argument with the most appropriate width on a given architecture (pointer width), that anything can be made available to the new thread. For instance you might pass an int directly if sizeof(int) <= sizeof(void *): (void *)3.
As a related example of this style: A FUSE filesystem I'm currently working on starts by opening a filesystem instance, say struct MyFS. When running FUSE in multithreaded mode, threads arrive onto a series of FUSE-defined calls for handling open, read, stat, etc. Naturally these can have no advance knowledge of the actual specifics of my filesystem, so this is passed in the fuse_main function void * argument intended for this purpose. struct MyFS *blah = myfs_init(); fuse_main(..., blah);. Now when the threads arrive at the FUSE calls mentioned above, the void * received is converted back into struct MyFS * so that the call can be handled within the context of the intended MyFS instance.
Isn't this inherently unsafe?
No. It is a pointer. Since you (as the developer) have created both the function that will be executed by the thread and the argument that will be passed to the thread you are in full control. Remember this is a C API (not a C++ one) so it is as safe as you can get.
This can work 'safely' only if the arg is in the heap,
No. It is safe as long as its lifespan in the parent thread is as long as the lifetime that it can be used in the child thread. There are many ways to make sure that it lives long enough.
and then again creating a heap variable adds to the overhead of cleaning the allocated memory up.
Seriously. That's an argument? Since this is basically how it is done for all threads unless you are passing something much more simple like an integer (see below).
If a stack variable is provided as the arg then the result is at best unpredictable.
Its as predictable as you (the developer) make it. You created both the thread and the argument. It is your responsibility to make sure that the lifetime of the argument is appropriate. Nobody said it would be easy.
This looks like a half-cooked solution to me, or am i missing some subtle aspects of the APIs?
You are missing that this is the most basic of threading API. It is designed to be as flexible as possible so that safer systems can be developed with as few strings as possible. So we now hove boost::threads which if I guess is build on-top of these basic threading facilities but provide a much safer and easier to use infrastructure (but at some extra cost).
If you want RAW unfettered speed and flexibility use the C API (with some danger).
If you want a slightly safer use a higher level API like boost:thread (but slightly more costly)
Thread specific storage with no dynamic allocation (Example)
#include <pthread.h>
#include <iostream>
struct ThreadData
{
// Stuff for my thread.
};
ThreadData threadData[5];
extern "C" void* threadStart(void* data);
void* threadStart(void* data)
{
intptr_t id = reinterpret_cast<intptr_t>(data);
ThreadData& tData = threadData[id];
// Do Stuff
return NULL;
}
int main()
{
for(intptr_t loop = 0;loop < 5; ++loop)
{
pthread_t threadInfo; // Not good just makes the example quick to write.
pthread_create(&threadInfo, NULL, threadStart, reinterpret_cast<void*>(loop));
}
// You should wait here for threads to finish before exiting.
}
Allocation on the heap does not add a lot of overhead.
Besides the heap and the stack, global variable space is another option. Also, it's possible to use a stack frame that will last as long as the child thread. Consider, for example, local variables of main.
I favor putting the arguments to the thread in the same structure as the pthread_t object itself. So wherever you put the pthread record, put its arguments as well. Problem solved :v) .
This is a common idiom in all C programs that use function pointers, not just for creating threads.
Think about it. Suppose your function void f(void (*fn)()) simply calls into another function. There's very little you can actually do with that. Typically a function pointer has to operate on some data. Passing in that data as a parameter is a clean way to accomplish this, without, say, the use of global variables. Since the function f() doesn't know what the purpose of that data might be, it uses the ever-generic void * parameter, and relies on you the programmer to make sense of it.
If you're more comfortable with thinking in terms of object-oriented programming, you can also think of it like calling a method on a class. In this analogy, the function pointer is the method and the extra void * parameter is the equivalent of what C++ would call the this pointer: it provides you some instance variables to operate on.
The pointer is a pointer to the data that you intend to use in the function. Windows style APIs require that you give them a static or global function.
Often this is a pointer to the class you are intending to use a pointer to this or pThis if you will and the intention is that you will delete the pThis after the ending of the thread.
Its a very procedural approach, however it has a very big advantage which is often overlooked, the CreateThread C style API is binary compatible so that when you wrap this API with a C++ class (or almost any other language) you can do this actually do this. If the parameter was typed, you wouldn't be able to access this from another language as easily.
So yes, this is unsafe but there's a good reason for it.
I have some data that is both read and updated by multiple threads. Both reads and writes must be atomic. I was thinking of doing it like this:
// Values must be read and updated atomically
struct SValues
{
double a;
double b;
double c;
double d;
};
class Test
{
public:
Test()
{
m_pValues = &m_values;
}
SValues* LockAndGet()
{
// Spin forver until we got ownership of the pointer
while (true)
{
SValues* pValues = (SValues*)::InterlockedExchange((long*)m_pValues, 0xffffffff);
if (pValues != (SValues*)0xffffffff)
{
return pValues;
}
}
}
void Unlock(SValues* pValues)
{
// Return the pointer so other threads can lock it
::InterlockedExchange((long*)m_pValues, (long)pValues);
}
private:
SValues* m_pValues;
SValues m_values;
};
void TestFunc()
{
Test test;
SValues* pValues = test.LockAndGet();
// Update or read values
test.Unlock(pValues);
}
The data is protected by stealing the pointer to it for every read and write, which should make it threadsafe, but it requires two interlocked instructions for every access. There will be plenty of both reads and writes and I cannot tell in advance if there will be more reads or more writes.
Can it be done more effective than this? This also locks when reading, but since it's quite possible to have more writes then reads there is no point in optimizing for reading, unless it does not inflict a penalty on writing.
I was thinking of reads acquiring the pointer without an interlocked instruction (along with a sequence number), copying the data, and then having a way of telling if the sequence number had changed, in which case it should retry. This would require some memory barriers, though, and I don't know whether or not it could improve the speed.
----- EDIT -----
Thanks all, great comments! I haven't actually run this code, but I will try to compare the current method with a critical section later today (if I get the time). I'm still looking for an optimal solution, so I will get back to the more advanced comments later. Thanks again!
What you have written is essentially a spinlock. If you're going to do that, then you might as well just use a mutex, such as boost::mutex. If you really want a spinlock, use a system-provided one, or one from a library rather than writing your own.
Other possibilities include doing some form of copy-on-write. Store the data structure by pointer, and just read the pointer (atomically) on the read side. On the write side then create a new instance (copying the old data as necessary) and atomically swap the pointer. If the write does need the old value and there is more than one writer then you will either need to do a compare-exchange loop to ensure that the value hasn't changed since you read it (beware ABA issues), or a mutex for the writers. If you do this then you need to be careful how you manage memory --- you need some way to reclaim instances of the data when no threads are referencing it (but not before).
There are several ways to resolve this, specifically without mutexes or locking mechanisms. The problem is that I'm not sure what the constraints on your system is.
Remember that atomic operations is something that often get moved around by the compilers in C++.
Generally I would solve the issue like this:
Multiple-producer-single-consumer by having 1 single-producer-single-consumer per writing thread. Each thread writes into their own queue. A single consumer thread that gathers the produced data and stores it in a single-consumer-multiple-reader data storage. The implementation for this is a lot of work and only recommended if you are doing a time-critical application and that you have the time to put in for this solution.
There are more things to read up about this, since the implementation is platform specific:
Atomic etc operations on windows/xbox360:
http://msdn.microsoft.com/en-us/library/ee418650(VS.85).aspx
The multithreaded single-producer-single-consumer without locks:
http://www.codeproject.com/KB/threads/LockFree.aspx#heading0005
What "volatile" really is and can be used for:
http://www.drdobbs.com/cpp/212701484
Herb Sutter has written a good article that reminds you of the dangers of writing this kind of code:
http://www.drdobbs.com/cpp/210600279;jsessionid=ZSUN3G3VXJM0BQE1GHRSKHWATMY32JVN?pgno=2