Adding progress bar to calculation - c++

I have a class that does a complex calculation which may take a long time. I'd like to add a progress bar with Abort button. However, I would like to avoid mixing calculation code with GUI code if possible.
Are there any good design patterns for this?

The relevant design pattern is the Observer pattern: your class doing a calculation is the "subject", it maintains a list of Observers, implementing a common Observer interface, and when its state changes (i.e. the amount of progress has changed), it notifies each observer via an update() call.

Related

Is it a good practice to use signals and slots in Qt also when no input from GUI occurs?

I've earned experience in C++ but I'm new at Qt. I was given a real project to work on, developed by someone that doesn't work for this company any longer. I don't know if it is a good practice and I apologize in advance for the terminology that might well not be adedcquate: I noticed that this project is literally full of signals/slots pairs that I deem unnecessary. More precisely: the classes that dictate the logic of the applications see each other, it would be sufficient to expose some public methods to trigger the desired procedures but, nevertheless, this is almost always achieved using signals and slots (and I say it again here: even when no input from GUI occurs). Given that I'm a newby in Qt, is it a good practice to do so? Thanks.
Edit: the cases that I reported don't encompass signals coming from timers, threads or whatever. This guy used signals and slots pairs like if it was a substitution for direct method call from class, say, A to class B
Overuse of signals and slots is a very bad and unfortunately very common practice. It hides dependencies, it makes code hard to debug and basically unmaintainable in the long term. Unfortunately many programmers think this it is a good practice because they achieve "decoupling", which seems as a holy grail to them. This is a nonsense.
I do not say you should not use signals and slots at all. I only say you should not overuse them. Signals and slots are the perfect tool to implement Observer design pattern to have a "reactive" system in which objects react to other objects having changed their states. Only this is the correct use of signals and slots. Almost every other use of signals and slots is wrong. The most extreme case which I have seen was implementing a getter function with a signal-slot connection. The signal sent a reference to a variable and the slot filled it with a value and then it returned to the emitter. This is just mad!
How you know that your signals and slots implement Observer pattern correctly? These are rules of thumb which follow from my quite long experience with Qt:
The nature of the signal is that the emitter announces publicly (signals are always public - except if you use a private class dummy parameter) by sending out some signal that its state has somehow changed.
The emitter does not care who are the observers or whether there are any observers at all, i.e. the emitter must not depend on observers in any way.
It is never the emitter's responsibility to establish or manage the connection - do not ever do it! The connection/disconnection is the responsibility of the observer (then it often connects to a private slot) or of some parent object which knows of the existence of both, the emitter and the observer (in that case the mutual parent connects emitter's signal to observers public slot).
It is normal that you will see lots of signal-slot connections in GUI layer and this is perfectly OK (note: GUI layer includes view models!). This is because GUI is typically a reactive system where objects react to other objects or to some changes in the underlying layers. But you will probably see much less signal-slot connections in the business logic layer (btw. in many projects business logic is coded without using Qt).
Regarding naming: I have encountered an interesting code smell. When the observer's public (!) slot is called like onSomethingHappened() - with emphasis on the prefix on. This is almost always sign of bad design and abuse of signals and slots. Usually this slot should be a) made private and the connection should be established by the observer or b) should be renamed to doSomething() or c) should be renamed and should be called as normal method instead of using signals and slots.
And a note about why overuse of signals and slots are hard to maintain. There are many potential problems in the long term which can break your code:
The dependencies with signals and slots are often hidden in a distant seemingly unrelated part of code. This relates to the signal-slot abuse when emitter actually depends on the observer but this is not clear when looking at the emitter's code. If your class depends on some other class/module, this dependency should be explicit and clearly visible.
When signals and slots are connected and then disconnected programmatically by your code, you often end up in state when you forgot do disconnect and you now have multiple connections. Having multiple connections is often overlooked because it often does not do any harm, it only makes the code somewhat slower, i.e. changed text is updated multiple times instead of once only - nobody will catch this issue unless you have a thousand-fold connection. these multiplying connections are somewhat similar to memory leaks. Small memory leaks remain often unnoticed, which is similar to multiple connections.
It often happens that you depend on the order in which the connections are established. And when these order-dependent connections are established in distant parts of code, you are in bad trouble, this code will fall apart sooner or later.
To check whether I do not have multiple connections or whether the connection/disconnection was successful, I am using these my helper utils https://github.com/vladimir-kraus/qtutils/blob/main/qtutils/safeconnect.h
PS: In the text above I am using term "emitter" (emits the signal) and "observer" (observes the emitter and receives the signal). Sometimes people use "sender" and "receiver" instead. My intention was to emphasize the fact that the emitter emits a signals without actually knowing whether anyone receives it. The word "sender" gives the impression that you send the signal to someone, which is however exactly the cause of signal-slot overuse and bad design. So using "sender" only leads to confusion, IMO. And by using "observer" I wanted to emphasize that signals and slots are the tool to implement the Observer design pattern.
PPS: Signals and slots are also the perfect tool for async communicating between threads in Qt. This use case may be one of the very few exceptions to the principles which I described above.
It depends of course, but mostly yes, it's a correct practice because it keeps object decoupled. Two classes which sees each other does not means they can use each other if they're not in a relation of master-slave o don't follow a logical hierarchy. Mostly you will couple everything in a non-reversible way in a result of flipper-effect of calls. The proof could be you want to fix that "making methods public" which may breaks incapsulation and contract of a class, which may lead to bad design choice non dependant from using Qt.
Since we're not seeing the actual code it could be he is misusing signals too, but from your explanation I'd go with the first option.
Signals and slots mechanism is a central feature of Qt.
In general, signals and slots are preferred/used because:
They allow asynchronous execution via queued connections.
They are loosely coupled.
They allow to connect n signals to one slot, one signal to n slots and signal to another signal.
In your project, if signal-slot mechanism has been used to achieve the above, then it is likely the right usage.
GUI input handling isn't the only place where signal-slot mechanism is used.
Unless we know your project's use cases it is difficult to comment if the signal-slot mechanism has been misused/overused.

Efficiently pass notifications between decoupled design layers

I am upgrading a design where data was lightly coupled with the UI:
class Object {
UI * ui;
};
class UI {
Object * object;
};
It was fairly straightforward to push update notifications to the ui through the UI pointer, but new requirements for data to be entirely separated from UI and also for different objects to have multiple different UI representations, so a single UI pointer no longer does it nor is allowed to be part of the data layer whatsoever.
It is not possible to use something like QObject and signals due to its overhead because of the high object count (in the range of hundreds of millions) and QObject is several times larger than the biggest object in the hierarchy. For the UI part it doesn't matter that much, because only a portion of the objects are visible at a time.
I implemented a UI registry, which uses a multihash to store all UIs using the Object * as a key in order to be able to get the UI(s) for a given object and send notifications, but the lookup and the registration and deregistration of UIs presents a significant overhead given the high object count.
So I was wondering if there is some design pattern to send notifications between decoupled layers with less overhead?
A clarification: most changes are done on the UI side, the UI elements keep a pointer to the related object, so that's not an issue. But some changes made to some objects from the UI side results in changes which occur in related objects in the data layer which can't be predicted in order to request update of the affected object's UIs. In fact a single change on the UI made to one object can result in a cascade of changes to other objects, so I need to be able to notify their eventual UI representations to update to reflect those changes.
One generic mechanism for decoupled communication is the publish-subscribe pattern. In this situation, the updated objects would post a notification to a message queue, and then the message queue is responsible for informing the UI components who have registered with the queue an interest in accepting that particular class of notification.
This is similar, in principle, to the UI registry that you have already tried. The main difference is that UI components to update are identified not purely by their referenced Objects, but rather by the notification type.
This allows a trade off between specificity and state keeping: if the model is set up such that every UI component associated with an Object obj gets notified by every update of obj, then it's equivalent to the UI registry. On the other hand, the model could be arranged such that some UI components are notified whenever a certain sub-category of Object posts an update, and then each component can check for itself if it needs to modify its state based on the content of the notification. Carried to the extreme, every UI object could be notified by any message posted by any Object, which would be equivalent to a global 'update-UI-state' approach.
The publish-subscribe model encompasses both these extremes, but also the range in between, where you may be able to find a suitable compromise.
I managed to come up with a immensely more efficient solution.
Instead of tracking all UIs using a "UI registry" I created a Proxy object and replaced the UI registry with a Proxy registry.
The Proxy object is created for each object that has any visual representation. It itself extends QObject and implements an interface to access the properties of the underlying Object, wrapping them in Qt style properties.
Then the Proxy object is used as a property for each UI to read and write the underlying Object properties, so it works "automatically" for every UI that might be referencing the particular proxy.
Which means there is no need to track every particular UI for every Object, instead the lifetime of the Proxy is managed simply by counting the number of UIs which reference it.
I also managed to eliminate all the look-ups which would not yield a result by adding a single bit hasProxy flag (had a few free bits left from the other flags) which is toggled for every object when a proxy is created or destroyed. This way in the actual Object's members I can quickly check if the object has a proxy without a look-up in the registry, if not use the "blind" data routines, if so look-up the proxy and manipulate the object through it. This limits registry look-ups to only the few which will actually get a result and eliminates a tremendous amount of those which would be pretty much in vain, just to realize the object has no visual representation at all.
In short, to summarize the improvements over the previous design:
the registry is now much smaller, from having to store a pointer for the object itself and a vector of all associated UIs I am now down to 8 bytes for the Proxy - the pointer to the object and a counter for any number of associated UIs
notifications are automated, only the proxy needs to be notified, it automatically notifies all UIs which reference it
the functionality previously bestowed to the UIs is now moved to the proxy and shared between all UIs, so the UIs themselves are lighter and easier to implement, in fact I've gone from having to specialize a unique QQuickItem for each object type to being able to use a generic QML Item without having to implement and compile any native classes for the UI
stuff I previously had to manage manually, both the actual notifications and the objects, responsible for them are now managed automatically
the overhead in both memory usage and CPU cycles has been reduced tremendously. The previous solution sacrificed CPU time for less memory usage relative to the original design, but the new design eliminates most of the CPU overhead and decreases memory usage further, plus makes the implementation much easier and faster.
It's like having a cake and eating it too :)

Refactoring single threaded GUI code for multithreaded scenarious

The usual scenario, there is an MFC/Win32/WTL/wxWidgets/Qt application that does something useful. It was designed to be single threaded, and there is some logic that handles errors/questions within processing blocks.
So, somewhere deep inside some class, a dialog can be fired that asks the user "Are you sure you want to complete the action?"/"Error with document layout" or something like that.
The problem is the dialog is fired from computationally heavy/strightforward code. Like FFT/image sharpening/file system de-fragmentation function, or something along the lines. Which could be launched in a worker thread easily, if not for the GUI. And would suit there better, as it would avoid GUI stalls that are so annoying for the user.
However, GUI cannot work in a worker thread, and dependency injection is pretty much impossible to do, because it would go down several layers of computational code. In a very unclean way from class interface standpoint, like someclass instance(data_in, data_out, param1, param2, GUI_class_ref) : m_GUI(GUI_class_ref), ... 3 or more levels deep.
Is there a pattern/checklist for such scenarios that can be used to marshall GUI prompts back to main thread and return the result back into the core of the computational code, if the code is split in multiple threads?
You can create synchronization context. It is a queue of commands to be executed by main thread. Worker thread adds command into this queue (which must be locked for single-thread access) and waits. Main thread processes this queue periodically, executes commands (for example, "Cancel operation" dialogs) and notifies worker threads about results.
In C#, this was done with delegates and arguments to call them. In C++, you can go with enum-coded messages to be processed in a switch (like messages in Windows programs.) Or create something with pointers to member functions + object pointer to call them from + arguments to call with.
You are at one classical old code refactoring crossroad. Proper isolation and dependency injection is infeasible, so you are left with making the GUI context globally accessible. That is creating a Singleton. It doesn't necessarily need to be the GUI context directly, so at least some isolation is achieved. It can be some kind of manager which has the GUI context and accepts just specific one purpose calls from the computation code. You could make the GUI thread class a friend of this manager and make the GUI callbacks (upon closing the dialog) private.
I could give more specific ideas what to write as i went through exactly the same challenge (threadization of existing heavy app). But i am confused whether you want only the GUI thread to be running freely, or the background computation as well. The example dialog prompt you gave is confusing as it suggests a decision which needs to be answered to know whether continue at all (which would mean that computation is on hold).

How to keep asynchronous parallel program code manageable (for example in C++)

I am currently working on a server application that needs to control a collection devices over a network. Because of this, we need to do a lot of parallel programming. Over time, I have learned that there are three approaches to communication between processing entities (threads/processes/applications). Regrettably, all three approaches have their disadvantages.
A) You can make a synchronous request (a synchronous function call). In this case, the caller waits until the function is processed and the response has been received. For example:
const bool convertedSuccessfully = Sync_ConvertMovie(params);
The problem is that the caller is idling. Sometimes this is just not an option. For example, if the call was made by the user interface thread, it will seem like the application has blocked until the response arrives, which can take a long time.
B) You can make an asynchronous request and wait for a callback to be made. The client code can continue with whatever needs to be done.
Async_ConvertMovie(params, TheFunctionToCallWhenTheResponseArrives);
This solution has the big disadvantange that the callback function necessarily runs in a separate thread. The problem is now that it is hard to get the response back to the caller. For example, you have clicked a button in a dialog, which called a service asynchronlously, but the dialog has been long closed when the callback arrives.
void TheFunctionToCallWhenTheResponseArrives()
{
//Difficulty 1: how to get to the dialog instance?
//Difficulty 2: how to guarantee in a thread-safe manner that
// the dialog instance is still valid?
}
This in itself is not that big a problem. However, when you want to make more than one of such calls, and they all depend on the response of the previous one, this becomes in my experience unmanageably complex.
C) The last option I see is to make an asynchronous request and keep polling until the response has arrived. In between the has-the-response-arrived-yet checks, you can do something useful. This is the best solution I know of to solve the case in which there is a sequence of asynchronous function calls to make. This is because it has the big advantage that you still have the whole caller context around when the response arrives. Also, the logical sequence of the calls remains reasonably clear. For example:
const CallHandle c1 = Sync_ConvertMovie(sourceFile, destFile);
while(!c1.ResponseHasArrived())
{
//... do something in the meanwhile
}
if (!c1.IsSuccessful())
return;
const CallHandle c2 = Sync_CopyFile(destFile, otherLocation);
while(!c1.ResponseHasArrived())
{
//... do something in the meanwhile
}
if (c1.IsSuccessful())
//show a success dialog
The problem with this third solution is that you cannot return from the caller's function. This makes it unsuitable if the work you want to do in between has nothing to do at all with the work you are getting done asynchronously. For a long time I am wondering if there is some other possibility to call functions asynchronously, one that doesn't have the downsides of the options listed above. Does anyone have an idea, some clever trick perhaps?
Note: the example given is C++-like pseudocode. However, I think this question equally applies to C# and Java, and probably a lot of other languages.
You could consider an explicit "event loop" or "message loop", not too different from classic approaches such as a select loop for asynchronous network tasks or a message loop for a windowing system. Events that arrive may be dispatched to a callback when appropriate, such as in your example B, but they may also in some cases be tracked differently, for example to cause transactions in a finite state machine. A FSM is a fine way to manage the complexity of an interaction along a protocol that requires many steps, after all!
One approach to systematize these consideration starts with the Reactor design pattern.
Schmidt's ACE body of work is a good starting point for these issues, if you come from a C++ background; Twisted is also quite worthwhile, from a Python background; and I'm sure that similar frameworks and sets of whitepapers exist for, as you say, "a lot of other languages" (the Wikipedia URL I gave does point at Reactor implementations for other languages, besides ACE and Twisted).
I tend to go with B, but instead of calling forth and back, I'd do the entire processing including follow-ups on a separate thread. The main thread can meanwhile update the GUI and either actively wait for the thread to complete (i.e. show a dialog with a progress bar), or just let it do its thing in the background and pick up the notification when it's done. No complexity problems so far, since the entire processing is actually synchronous from the processing thread's point of view. From the GUI's point of view, it's asynchronous.
Adding to that, in .NET it's no problem to switch to the GUI thread. The BackgroundWorker class and the ThreadPool make this easy as well (I used the ThreadPool, if I remember correctly). In Qt, for example, to stay with C++, it's quite easy as well.
I used this approach on our last major application and am very pleased with it.
Like Alex said, look at Proactor and Reactor as documented by Doug Schmidt in Patterns of Software Architecture.
There are concrete implementations of these for different platforms in ACE.

Event-driven simulation class

I am working through some of the exercises in The C++ Programming Language by Bjarne Stroustrup. I am confused by problem 11 at the end of Chapter 12:
(*5) Design and implement a library for writing event-driven simulations. Hint: <task.h>. ... An object of class task should be able to save its state and to have that state restored so that it can operate as a coroutine. Specific tasks can be defined as objects of classes derived from task. The program to be executed by a task might be defined as a virtual function. ... There should be a scheduler implementing a concept of virtual time. ... The tasks will need to communicate. Design a class queue for that. ...
I am not sure exactly what this is asking for. Is a task a separate thread? (As far as I know it is not possible to create a new thread without system calls, and since this is a book about C++ I do not believe that is the intent.) Without interrupts, how is it possible to start and stop a running function? I assume this would involve busy waiting (which is to say, continually loop and check a condition) although I cannot see how that could be applied to a function that might not terminate for some time (if it contains an infinite loop, for example).
EDIT: Please see my post below with more information.
Here's my understanding of an "event-driven simulation":
A controller handles an event queue, scheduling events to occur at certain times, then executing the top event on the queue.
Events ocur instantaneously at the scheduled time. For example, a "move" event would update the position and state of an entity in the simulation such that the state vector is valid at the current simulation time. A "sense" event would have to make sure all entities' states are at the current time, then use some mathematical model to evaluate how well the current entity can sense the other entities. (Think robots moving around on a board.)
Thus time progresses discontinuously, jumping from event to event. Contrast this with a time-driven simulation, where time moves in discrete steps and all entities' states are updated every time step (a la most Simulink models).
Events can then occur at their natural rate. It usually doesn't make sense to recompute all data at the finest rate in the simulation.
Most production event-driven simulations run in a single thread. They can be complex by their very nature, so trying to synchronize a multi-threaded simulation tends to add exponential layers of complexity. With that said, there's a standard for multi-process military simulations called Distributive Interactive Simulation (DIS) that uses predefined TCP messages to transmit data between processes.
EDIT: It's important to define a difference between modeling and simulation. A model is a mathematical representation of a system or process. A simulation is built from one or more models that are executed over a period of time. Again, an event driven simulation hops from event to event, while a time driven simulation proceeds at a constant time step.
Hint: <task.h>.
is a reference to an old cooperative multi-tasking library that shipped with early versions of CFront (you can also download at that page).
If you read the paper "A Set of C++ Classes for Co-routine Style Programming" things will make a lot more sense.
Adding a bit:
I'm not an old enough programmer to have used the task library. However, I know that C++ was designed after Stroustrup wrote a simulation in Simula that had many of the same properties as the task library, so I've always been curious about it.
If I were to implement the exercise from the book, I would probably do it like this (please note, I haven't tested this code or even tried to compile it):
class Scheduler {
std::list<*ITask> tasks;
public:
void run()
{
while (1) // or at least until some message is sent to stop running
for (std::list<*ITask>::iterator itor = tasks.begin()
, std::list<*ITask>::iterator end = tasks.end()
; itor != end
; ++itor)
(*itor)->run(); // yes, two dereferences
}
void add_task(ITask* task)
{
tasks.push_back(task);
}
};
struct ITask {
virtual ~ITask() { }
virtual void run() = 0;
};
I know people will disagree with some of my choices. For instance, using a struct for the interface; but structs have the behavior that inheriting from them is public by default (where inheriting from classes is private by default), and I don't see any value in inheriting privately from an interface, so why not make public inheritance the default?
The idea is that calls to ITask::run() will block the scheduler until the task arrives at a point where it can be interrupted, at which point the task will return from the run method, and wait until the scheduler calls run again to continue. The "cooperative" in "cooperative multitasking" means "tasks say when they can be interrupted" ("coroutine" usually means "cooperative multitasking"). A simple task may only do one thing in its run() method, a more complex task may implement a state machine, and may use its run() method to figure out what state the object is currently in and make calls to other methods based on that state. The tasks must relinquish control once in a while for this to work, because that is the definition of "cooperative multitasking." It's also the reason why all modern operating systems don't use cooperative multitasking.
This implementation does not (1) follow fair scheduling (maybe keeping a running total of clock ticks spent in in task's run() method, and skipping tasks that have used too much time relative to the others until the other tasks "catch up"), (2) allow for tasks to be removed, or even (3) allow for the scheduler to be stopped.
As for communicating between tasks, you may consider looking at Plan 9's libtask or Rob Pike's newsqueak for inspiration (the "UNIX implementation of Newsqueak" download includes a paper, "The Implementation of Newsqueak" that discusses message passing in an interesting virtual machine).
But I believe this is the basic skeleton Stroustrup had in mind.
Sounds to me like the exercise is asking you to implement a cooperative multi-tasking scheduler. The scheduler operates in virtual time (time ticks you define/implement at whatever level you want), chooses a task to run based on the queue (note that the description mentioned you'd need to implement one), and when the current task is done, the scheduler selects the next one and starts it running.
The generalised structure of a discrete event simulation is based on a priority queue keyed on a time value. At a broad level it goes like:
While (not end condition):
Pop next event (one with the lowest time) from the priority queue
Process that event, which may generate more events
If a new event is generated:
Place this on the priority queue keyed at its generated time
Co-routines change the view of the model from being event-centric to being entity-centric. Entities can go through some life-cycle (e.g. accept job, grab resource X, process job, release resource X, place job in queue for next step). This is somewhat easier to program as the grab resources are handled with semaphore-like synchronisation primitives. The jobs and synchronisation primitives generate the events and queue them behind the scenes.
This gives a model conceptually similar to processes in an operating system and a scheduler waking the process up when its input or a shared resource it has requested is available. The co-routine model makes the simulation quite a lot easier to understand, which is useful for simulating complex systems.
(I'm not a C++ dev)
Probably what it means is that you need to create a class Task (as in Event) that will consist mostly of a callback function pointer and a scheduled time, and can be stored in a list in the Scheduler class, which in turn basically should keep track of a time counter and call each Task's function when the time arrives. These tasks should be created by the Objects of the simulation.
If you need help on the discrete simulation side, go ahead and edit the question.
This is in response to titaniumdecoy's comment to SottieT812's answer. Its much too large for a comment, so I decided to make it another answer.
It is event driven in the sense that simulation state only changes in response to an event. For example, assume that you have two events missile launch and missile impact. When the launch event is executed, it figures out when and where it will impact, and schedules an impact event for the appropriate time. The position of the missile is not calculated between the launch and impact, although it will probably have a method that can be called by other objects to get the position at a particular time.
This is in contrast to a time driven simulation, where the exact position of the missile (and every other object in the simulation) is calculated after every time step, say 1 second.
Depending on the characteristics of the model, the fidelity of the answer required, and many other factors, either event driven or time driven simulation may perform better.
Edit: If anyone is interested in learning more, check out the papers from the Winter Simulation Conference
There is a book and framework called DEMOS (Discrete Event Modelling on Simula) that describes a co-routine based framework (the eponymous DEMOS). Despite being 30 years or so old DEMOS is actually quite a nice system, and Graham Birtwistle is a really nice guy.
If you implement co-routines on C++ (think setjump/longjump) you should take a look at this book for a description of a really, really elegant discrete event modelling framework. Although it's 30 years old it's a bit of a timeless classic and still has a fan base.
In the paper linked to by "me.yahoo.com/..." which describes the task.h class:
Tasks execute in parallel
A task may be suspended and resumed later
The library is described as a method of multiprogramming.
Is it possible to do this without using threads or separate processes?