Move semantics vs pimpl - c++

I'm writing a generic pure C++14 implementation of a signal class, with lifetime tracking of the connected objects, and an additional plus that all this works with copy and move operations.
There are three types of implementations I have now discovered are possible:
Use a global connection_manager that, well, manages all connections. This would be terrible in a highly concurrent scenario where many signals and slots are connected and disconnected, and would need fine-tuned locking to alleviate some of these issues.
Each signal stores its connections, and proper move and copy semantics are implemented. This has two disadvantages: the signal object is large, and move operations are expensive (all the connections need to be updated when the signal or the connectee is moved. This burdens both sides to have proper move constructors.
Each signal stores a pointer to the real signal. This extra level of indirection makes moves a lot cheaper, as the real connected object stays put in memory behind the extra level of indirection. Same goes for the connectee, although each signal emission will need to dereference one extra pointer to get the real object. No move semantics need to be implemented explicitly.
Currently, I almost have version 2 ready. I don't know exactly what Qt does, but I know it covers almost everything I want to be able to do with signals/slots, albeit no copy/move semantics and a pre-compilation step.
Am I missing something or is number three the best way to go in the end if I don't want to have a huge impact on users of signals (i.e. I don't want to increase the size and complexity of their classes just because my signal/connectee classes do a lot of work each time they're moved/copied.
Note I am looking for real-world experience and suggestions, not opinions.

Related

Strandify inter coorporating objects for multithread support

My current application owns multiple «activatable» objects*. My intent is to "run" all those object in the same io_context and to add the necessary protection in order to toggle from single to multiple threads (to make it scalable)
If these objects were completely independent from each others, the number of threads running the associated io_context could grow smoothly. But since those objects need to cooperate, the application crashes in multithread despite the strand in each object.
Let's say we have objects of type A and type B, all of them served by the same io_context. Each of those types run asynchronous operations (timers and sockets - their handlers are surrounded with bind_executor(strand, handler)), and can build a cache based on information received via sockets and posted operations to them. Objects of type A needs to get information cached from multiple instances of B in order to perform their own work.
Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?
If not, what strategy could be adopted to achieve the scalability?
I already tried playing with futures but that strategy leads unsurprisingly to deadlocks.
Thanx
(*) Maybe I'm wrong in the terminology: objects get a reference to an io_context and own their own strand, so I think they are activatable, because they don't own really a running thread
You're mixing vague words a bit. "Activatable", "Strandify", "inter coorporating". They're all close to meaningful concepts, yet, narrowly avoid binding to any precise meaning.
Deconstructing
Let's simplify using more precise concepts.
Let's say we have objects of type A and type B, all of them served by the same io_context
I think it's more fruitful to say "types A and B have associated executors". When you make sure all operations on A and B operate from that executor and you make sure that executor serializes access, then you basically get the Active Object pattern.
[can build a cache based on information received via sockets] and posted operations to them
That's interesting. I take that to mean you don't directly call members of the class, unless they defer the actual execution to the strand. This, again, would be the Active Object.
However, your symptoms suggest that not all operations are "posted to them". Which implies they run on arbitrary threads, leading to your problem.
Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?
The key to your problems is here. Data dependencies. It's also, ;ole;y going to limit the usefulness of scaling, unless of course the generation of information to retrieve from other threads is a computationally expensive operation.
However, in the light of the phrase _"to get information cached from multiple instances of B'" suggests that in fact, the data is instantaneous, and you'll just be paying synchronization costs for accessing across threads.
Questions
Q. Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?
Technically, yes. By making sure all operations go on the strand, and the objects become true active objects.
However, there's an important caveat: strands aren't zero-cost. Only in certain contexts they can be optimized (e.g. in immediate continuations or when the execution context has no concurrency).
But in all other contexts, they end up synchronizing at similar cost as mutexes. The purpose of a strand is not to remove the lock contention. Instead it rather allows one to declaratively specify the synchronization requirements for tasks, so that so that the same code can be correctly synchronized regardless of the methods of async completion (using callbacks, futures, coroutines, awaitables, etc) or the chosen execution context(s).
Example: I recently uncovered a vivid illustration of the cost of strand synchronization even in a simple context (where serial execution was already implicitly guaranteed) here:
sehe mar 15, 23:08 Oh cool. The strands were unnecessary. I add them for safety until I know it's safe to go without. In this case the async call chains form logical strands (there are no timers or full duplex sockets going on, so it's all linear). That... improves the situation :)
Now it's 3.5gbps even with the 1024 byte server buffer
The throughput increased ~7x from just removing the strand.
Q. If not, what strategy could be adopted to achieve the scalability?
I suspect you really want caches that contain shared_futures. So that the first retrieval puts the future for the result in cache, where subsequent retrievals get the already existing shared future immediately.
If you make sure your cache lookup datastructure is threadsafe, likely with a reader/writer lock (shared_mutex), you will be free to access it with minimal overhead from any actor, instead of requiring to go through individual strands of each producer.
Keep in mind that awaiting futures is a blocking operation. So, if you do that from tasks posted on the execution context, you may easily run out of threads. In such cases it maybe better to provide async_get in terms of boost::asio::async_result or boost::asio::async_completion so you can wait in non-blocking fashion.

Many instances vs. many std::shared_ptr

I just want to know if it makes a performance-difference while copying objects in C++ if I use many instances of a class or use std::shared_ptr.
Background: I have some structures which are delivered through a signals&slot mechanism (Qt). (I know that instances are copied while sending a signal) These delivering can occur many times so it has to be fast with low memory usage.
edit (add some details):
I write an embedded application (yeah, Qt is not the fastest for embedded backend I know) which can have a dynamic number of "modules". Each module has its own functionality. Every module has a signal and a slot. Which module receive emitted signals is freely configurable. So it could be that many signals are emitted in a very small time. In this case the signals has to be delivered as fast as possible. The delivered structure has some module-specific data and the data which has to be delivered to the other modules. I cannot say how large the delivered data will be because on the future there will be many more modules which maybe delivers much data.
BTW: I abuse std::shared_ptrin this case. I do not use I for really sharing the ownership. Qt just treat references and instances the same way in signals&slots, it copies the object. So to have the benefits of both, easy memory management of instance and lower memory usage of reference, I thought of using a std::shared_ptr.
Qt just treat references and instances the same way in signals&slots, it copies the object.
No, it only copies in specific circumstances. See this answer for details. TL;DR: It only copies if it needs to deliver the call over a queued connection, and then only if there is a receiver attached to a given signal. With direct connections (or default automatic connections within the same thread), if both the signal and the slot pass the argument by reference, then no copies will be made.
I abuse std::shared_ptr in this case.
It's not abuse. You're passing what amount to a shared data structure, held by a shared_ptr. It makes perfect sense.
The real question is: if your structures are expensive to copy, why won't you use explicit sharing via QSharedData and QExplicitlySharedDataPointer? And why doesn't your question include measurement results to substantiate your concern? Come on, such things are trivial to measure. You've got Qt to help you out - use it.

Multithreading: when to call mutex.lock?

So, I have a ton ob objects, each having several fields, including a c-array, which are modified within their "Update()" method. Now I create several threads, each updating a section of these objects. As far as I understand calling lock() before calling the update function would be useless, since this would essentially cause the updates being called in a sequential order just like they would be without multithreading. Now, there objects have pointers, cross referencing to each other. Do I need to call lock every time ANY field is modified, or just before specific operations (like delete, re-initializing arrays, etc?)
Do I need to call lock every time ANY field is modified, or just before specific operations (like delete, re-initializing arrays, etc?)
Neither. You need to have a lock even to read, to make sure another thread isn't part way through modifying the data you're reading. You might want to use a many reader / one writer lock. I suggest you start by having a single lock (whether a simple mutex or the more elaborate multi-reader/writer lock) and get the code working so you can profile it and see whether you actually need more fine-grained locking, then you'll have a bit more experience and understanding of options and advice about how to manage that.
If you do need fine-grained locking, then the trick is to think about where the locks logically belong - for example - there could be one per object. You'll then need to learn about techniques for avoiding deadlocks. You should do some background reading too.
It depends on the consequences of the data changes you want to make. If each thread is, for example, changing well defined sub-blocks of data and each sub-block is entirely independent of all other sub-blocks then it might make sense to have a mutex per sub-block.
That would allow one thread to deal with one set of sub-blocks whilst another gets a different subset to process.
Having threads make changes without gaining a mutex lock first is going to lead to inconsistencies at best...
If the data and processing isn't subdivisible that way then you would probably have to start thinking about how you might handle whole objects in parallel, ie adopt a coarser granularity and one mutex per object. This is perhaps more likely to be possible - different objects are supposed to be independent of each other, so it should in theory be possible to process their data in parallel.
However the unavoidable truth is that some computer jobs require fast single thread performance. For that one starts seriously needing the right sort of supercomputer and perhaps some jolly long pipelines.

Rvalue refs in concurrency

I've been getting a little down to grips with the new Visual Studio native Concurrency runtime (ConcRT). Is it just an oversight, or is there a valid reason that no cross-thread movement of data has movement semantics? They're all copy semantics. You can't move into a concurrent queue, you can't move with asend, etc. You can't even move construct concurrent queues.
I don't know this specific framework, but generally for inter thread queues you must have copy semantics.
Imagine I create an object, take a reference/pointer to it then move it to the queue. Then the other thread moves it out of the queue. Then both threads can access it at the same time.
I think that in the general case it is only necessary to have copy on ever of add or remove, not both (ie only one copy needed). e.g. copy-in move-out, but this would be semantically the same as copy-in copy-out.
There are a number of areas where rvalue support could enhance ConcRT, agents and the PPL. Like any big software project, when you are building features that rely on other new features, there always some risk in being able to deliver everything at once.
PPL was a major step forward but we never said it was "done". :-)
If you have particular suggestions where ConcRT, PPL, or the Agents library should support move semantics, please open up a suggestion in connect.microsoft.com.

Is checking current thread inside a function ok?

Is it ok to check the current thread inside a function?
For example if some non-thread safe data structure is only altered by one thread, and there is a function which is called by multiple threads, it would be useful to have separate code paths depending on the current thread. If the current thread is the one that alters the data structure, it is ok to alter the data structure directly in the function. However, if the current thread is some other thread, the actual altering would have to be delayed, so that it is performed when it is safe to perform the operation.
Or, would it be better to use some boolean which is given as a parameter to the function to separate the different code paths?
Or do something totally different?
What do you think?
You are not making all too much sense. You said a non-thread safe data structure is only ever altered by one thread, but in the next sentence you talk about delaying any changes made to that data structure by other threads. Make up your mind.
In general, I'd suggest wrapping the access to the data structure up with a critical section, or mutex.
It's possible to use such animals as reader/writer locks to differentiate between readers and writers of datastructures but the performance advantage for typical cases usually wont merit the additional complexity associated with their use.
From the way your question is stated, I'm guessing you're fairly new to multithreaded development. I highly suggest sticking with the simplist and most commonly used approaches for ensuring data integrity (most books/articles you readon the issue will mention the same uses for mutexes/critical sections). Multithreaded development is extremely easy to get wrong and can be difficult to debug. Also, what seems like the "optimal" solution very often doesn't buy you the huge performance benefit you might think. It's usually best to implement the simplist approach that will work then worry about optimizing it after the fact.
There is a trick that could work in case, as you said, the other threads will only make changes only once in a while, although it is still rather hackish:
make sure your "master" thread can't be interrupted by the other ones (higher priority, non fair scheduling)
check your thread
if "master", just change
if other, put off scheduling, if needed by putting off interrupts, make change, reinstall scheduling
really test to see whether there are no issues in your setup.
As you can see, if requirements change a little bit, this could turn out worse than using normal locks.
As mentioned, the simplest solution when two threads need access to the same data is to use some synchronization mechanism (i.e. critical section or mutex).
If you already have synchronization in your design try to reuse it (if possible) instead of adding more. For example, if the main thread receives its work from a synchronized queue you might be able to have thread 2 queue the data structure update. The main thread will pick up the request and can update it without additional synchronization.
The queuing concept can be hidden from the rest of the design through the Active Object pattern. The activ object may also be able to publish the data structure changes through the Observer pattern to other interested threads.