Do I have to lock std::queue when getting size? - c++

I'm using std::queue in a multithreaded environment. Other threads may modify the queue as they wish. At some point I would like to call std::queue::size(). Do I have to lock the queue for that call? Will something bad happen if I don't?

This is undefined behavior and anything can happen. Behavior is undefined when one thread accesses an object while another thread is, or might be, modifying it.
I hesitate to add this because whether or not you can think of a way it can fail is not relevant. It's not defined, period. But just in case someone argues there's no imaginable way it could fail: Consider a queue that dynamically allocates a control structure that contains the size of the queue and information about each object in the queue. When the queue is enlarged, a new control structure might be allocated, the old structure freed, and a pointer updated. A concurrent call to size might grab the old pointer and then access it after it's freed and possibly contains completely different information or even has been removed from the memory map.

Just reading the size isn't going to be an issue in itself. You're going to read a single memory location. You might read the wrong size, because a change in the size made by another thread might not yet be visible in this thread, but you're not going to read a corrupted value. Such as half the bits from one value and the other from another value.

Related

Are mutex locks necessary when modifying values?

I have an unordered_map and I'm using mutex locks for emplace and delete, find operations, but I don't use a mutex when modifying map's elements, because I don't see any point. but I'm curious whether I'm wrong in this case.
Should I use one when modifying element value?
std::unordred_map<std::string, Connection> connections;
// Lock at Try_Emplace
connectionsMapMutex.lock();
auto [element, inserted] = connections.try_emplace(peer);
connectionsMapMutex.unlock();
// No locks here from now
auto& connection = element->second;
// Modifying Element
connection.foo = "bar";
Consider what can happen when you have one thread reading from the map and the other one writing to it:
Thread A starts executing the command string myLocalStr = element->second.foo;
As part of the above, the std::string copy-constructor starts executing: it stores foo's character-buffer-pointer into a register, and starts dereferencing it to copy out characters from the original string's buffer to myLocalStr's buffer.
Just then, thread A's quantum expires, and thread B gains control of the CPU and executes the command connection.foo = "some other string"
Thread B's assignment-operator causes the std::string to deallocate its character-buffer and allocate a new one to hold the new string.
Thread A then starts running again, and continues executing the std::string copy-constructor from step 2, but now the pointer it is dereferencing to read in characters is no longer pointing at valid data, because Thread A deleted the buffer! Poof, Undefined Behavior is invoked, resulting in a crash (if you're lucky) or insidious data corruption (if you're unlucky, in which case you'll be spending several weeks trying to figure out why your program's data gets randomly corrupted only about once a month).
And note that the above scenario is just on a single-core CPU; on a multicore system there are even more ways for unsynchronized accesses to go wrong, since the CPUs have to co-ordinate their local and shared memory-caches correctly, which they won't know to do if there is no synchronization code included.
To sum up: Neither std::unordered_map nor std::string are designed for unsynchronized multithreaded access, and if you try to get away with it you're likely to regret it later on.
Here's what I would do, if and only if I'm threading and there's a chance other threads are manipulating the list and its contents.
I would create a mutex lock when manipulating the list (which you've done) or when traversing the list.
And if I felt it was necessary to protect an individual item in the list (you're calling methods on it), I'd give each one a distinct mutex. You could change element A and element B simultaneously and it's fine, but by using the local locks for each item, each is safe.
However, it's very rare I've had to be that careful.

c++ what happens when in one thread write and in second read the same object? (is it safe?) [duplicate]

This question already has answers here:
What happens if two threads read & write the same piece of memory
(3 answers)
Closed 2 years ago.
What happens when in one thread write and in second thread read the same object? It would lead to application crash?
My idea is, on main thread save the data in to object or change the data from object and on second thread only read this data.
If I understand, the problem can be only while writing to object new value and reading in same time from same object, reading value will be old. But this is not problem for me.
I search my question and found this topic What happens if two threads read & write the same piece of memory but I am not sure if it apply for my question.
Unless the object is atomic, the behaviour of one thread writing and another thread reading the same object is undefined.
Your current perception that the only issue is that state data could be read is not correct. You cannot assume that will be only manifestation of the undefined behaviour. In particular, you may well find that the value you read is neither the old nor the new value.
It really depends on the size of the memory block you are trying to read and write from. If you are reading a single atomic data type then you can only read and write to the memory block as a whole (int as an example). You'll either non-deterministic-ally get the new or old value from the data type without any issues.
If you are reading and writing to a block of memory that isn't atomic, then during the reading cycle, some of the blocks can be overwritten and as such, the information can be corrupted. This will lead to some wild behavior and may cause breaks.
https://en.wikipedia.org/wiki/Linearizability
It is not safe. Consider using mutex to avoid memory corruption :
http://en.cppreference.com/w/cpp/thread/mutex
Generally the result is simply undefined. You have to do things to make it defined and you have a lot of docs to read.
The compiler has to know that values might be changed from under it. Otherwise the optimizer will do the wrong thing and writes can be completely ignored. The keyword here is volatile. This is required when variables are changed from signal handlers or by the hardware itself. Not sufficient for multithreading.
You must ensure that reads and writes are not interrupted. One way to do this is to use atomic types. The other is to protect access to a variable using locks or mutexes. Atomic types are limited in size by what the hardware supports so anything beyond simple integers or single pointers will require locks. The std::atomic type abstracts this away. It knows which types are atomic on your hardware and when it needs to use locking.
Even that isn't enough since the timing of reads and writes is still random. If one thread writes and the other reads what will it read? Will the second thread read just before the first writes or just after? Even with all the right use of atomic or locks you don't know weather you will get the old or new value. If that matters you have to synchronize between the threads so the order of reads and writes becomes defined. Look for conditions.

double checked locking pattern in c++ concurrent programming

I am reading concurrency programming in c++ and came across this piece of code. the book mentioned the potential for nasty race conditions.
void undefined_behaviour_with_double_checked_locking(){
if(!resource_ptr){ //<1>
std::lock_guard<std::mutex> lk(resource_mutex);
if(!resource_ptr){ //<2>
resource_ptr.reset(new some_resource); //<3>
}
}
resource_ptr->do_something(); //<4>
}
here is the quote of explanation from the book. however, i just cant come up with a real example. I wonder if anyone here could help me out.
Unfortunately, this pattern is infamous for a reason: it has the
potential for nasty race conditions, because the read outside the lock
<1> isn’t synchronized with the write done by another thread inside
the lock <3>. This therefore creates a race condition that covers not
just the pointer itself but also the object pointed to; even if a
thread sees the pointer written by another thread, it might not see
the newly created instance of some_resource, resulting in the call to
do_something() <4> operating on incorrect values.
You don't show what resource_ptr is but from the explanation the reasoning seems to be that "!resource_ptr" (outside the lock) and "resource_ptr.reset" (inside the lock) are not atmoic and are not synchronized with each other.
The use case would be:
thread1 comes into the method, sees that resource_ptr is not
populated, enters the lock and is in the middle of the
resource_ptr.reset.
thread2 comes into the method and is when
checking !resource_ptr may see it as set but resource_ptr may not be
fully configured for use.
thread2 falls through to execute "resource_ptr->do_something()" and may see resource_ptr in an inconsistent state and bad things may happen.
I recommend you read this: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf.
Anyway, the gist of it is: the compiler is free to reorder operations as long as they appear to be executed in the program's order in a single threaded situation. On top of that, some CPU architectures take the same liberties with their instruction execution order. So, technically resource_ptr could be modified to point to newly allocated memory before some_resource's constructor has finished. Another thread could at that time see that resource_ptr is not null and attempt to use the not-yet-fully-constructed instance.
The use of a smart pointer instead of a raw pointer might make this less likely, but it doesn't rule it out afaik.
The potential problem is that the write to resource_ptr isn't atomic (inside the reset call). Assuming that resource_ptr is a global or static variable that (/ or otherwise) starts initialized with the value NULL before we get here, it will never cause a thread to fall-through unless the object some_resource is already fully allocated and constructed, however - say that the pointer to this new object is 0x123456789, then it is theoretically possible that resource_ptr has, for example, the value 0x12340000 when another thread does the if (!resource_ptr) test, falls through and uses that value (especially more likely when using aliasing). If resource_ptr is an atomic variable then this code would be fine.
If a program can guarantee that the first time this code is called there is only one thread running (ie, the first call will be from main() before any other thread is created) then this will work fine too, because once initialized, the if test will just always fall through, resulting in only read accesses to resource_ptr while more than one thread is running. In that case you don't need the lock inside the if block though, and you are not allowed to ever write to resource_ptr anywhere else.

Pointer to STL Container Thread Safety (Queue/Deque)

I currently have a bit of a multi-threading conundrum. I have two threads, one that reads serial data, and another that attempts to extracts packets from the data. The two threads share a queue. The thread that attempts to create packets has a function entitled parse with the following declaration:
Parse(std::queue<uint8_t>* data, pthread_mutex_t* lock);
Essentially it takes a pointer to the STL queue and uses pop() as it goes through the queue looking for a packet. The lock is used since any pop() is locked and this lock is shared between the Parse function and the thread that is pushing data onto the queue. This way, the queue can be parsed while data is being actively added to it.
The code seems to work for the most part, but I'm seeing invalid packets at a somewhat higher rate than I'd expect. My main question is I'm wondering if the pointer is changing while I'm reading data out of the queue. For example, if the first thread pushes a bunch of data, is there a chance that where the queue is found in memory can change? Or am I guaranteed that the pointer to the queue will remain constant, even as data is added? My concern is that the memory for the queue can be reallocated during my Parse() function, and therefore in the middle of my function, the pointer is invalidated.
For example, I understand that certain STL iterators are invalidated for certain operations. However, I am passing a pointer to the container itself. That is, something like this:
// somewhere in my code I create a queue
std::queue<uint8_t> queue;
// elsewhere...
Parse(&queue, &lock_shared_between_the_two_threads);
Does the pointer to the container itself ever get invalidated? And what does it point to? The first element, or ...?
Note that I'm not pointing to any given element, but to the container itself. Also, I never specified which underlying container should be used to implement the queue, so underneath it all, it's just a deque.
Any help will be greatly appreciated.
EDIT 8/1:
I was able to run a few tests on my code. A couple of points:
The pointer for the container itself does not change over the lifecycle of my program. This makes sense since the queue itself is a member variable of a class. That is, while the queue's elements are dynamically allocated, it does not appear to be the case for the queue itself.
The bad packets I was experiencing appear to be a function of the serial data I'm receiving. I dumped all the data to a hex file and was able to find packets that were invalid, and my alogrithm was correctly marking them as such.
As a result, I'm thinking that passing a reference or pointer to an STL container into a function is thread safe, but I'd like to hear some more commentary ensuring that this is the case, or if this is implementation specific (as alot of STL is...).
You are worried that modifying a container (adding/deleting nodes) in one thread will somehow invalidate the pointer to the container in another thread. The container is an abstraction and will remain valid unless you delete the container object itself. The memory for the data maintained by the containers are typically allocated on the heap by stl::allocators.
This is quite different from the memory allocated for the container object itself which can be on the stack, heap etc., based on how the container object itself was created. This separation of the container from the allocator is what's preventing some modification to the data from modifying the container object itself.
To make debugging your problem simpler, like Jonathan Reinhart suggests, make it a single threaded system, that reads the stream AND parses it.
On a side note, have you considered using Boost Lookfree Queues or something similar. They are designed exactly for this type of scenarios. If you were receiving packets/reading them frequently, locking the queue for reading/writing for each packet can become a significant performance overhead.

Queue in shared memory acting up

Shared memory is giving me a hard time and GDB isn't being much help. I've got 32KB of shared memory allocated, and I used shmat to cast it to a pointer to a struct containing A) a bool and B) a queue of objects containing one std::string, three ints, and one bool, plus assorted methods. (I don't know if this matryoshka structure is how you're supposed to do it, but it's the only way I know. Using a message queue isn't an option, and I need to use multiple processes.)
Pushing one object onto the queue works, but when I try to push a second, the program freezes. No error message, no nothing. What's causing this? I doubt it's a lack of memory, but if it is, how much do I need?
EDIT: In case I was unclear -- the objects in the queue are of a class with the five data members described.
EDIT 2: I changed the class of the queue's entries so that it doesn't use std::string. (Embarrassingly enough, I was able to represent the data with a primitive.) The program still freezes on the second push().
EDIT 3: I tried calling front() from the same queue immediately after the first push(), and it froze the program too. Checking the value of the bool outside the queue, however, worked fine, so it's gotta be something wrong with the queue itself.
EDIT 4: As an experiment, I added an std::queue<int> to the struct I was using for the shared memory. It showed the same behavior -- push() worked once, then front() made it freeze. So it's not a problem with the class I'm using for the queue items, either.
This question suggests I'm not likely to solve this with std::queue. Is that so? Should I use boost like it says? (In my case, I'm executing shmget() and shmat() in the parent process and trying to let two child processes communicate, so it's slightly different.)
EDIT 5: The other child process also freezes when it calls front(). A semaphore ensures this happens after the first push() call.
Putting std::string objects into a shared memory segment can't possibly work.
It should work fine for a single process, but as soon as you try to access it from a second process, you'll get garbage: the string will contain a pointer to heap-allocated data, and that pointer is only valid in the process that allocated it.
I don't know why your program freezes, but it is completely pointless to even think about.
As I said in my comment, your problem stems from attempting to use objects that internally require heap allocation in a structure, which should be self contained (i.e. requires no further dynamically allocated memory).
I would tweak your setup, and change the std::string to some fixed size character array, something like
// this structure fits nicely into a typical cache line
struct Message
{
boost::array<char, 48> some_string;
int a, b, c;
bool c;
};
Now, when you need to post something on the queue, copy the string content into some_string. Of course you should size your strings appropriately (and boost::array probably isn't the best - ideally you want some length information too) but you get the idea...