Use Clojure Cells or add-watcher for reactive program? - clojure

I want to use alot of reactive (dataflow) type programming techniques in my clojure program. Is uses "add-watcher" on clojure refs going to be good enough to do this. A simple case for this would be to update the GUI when the underlying data changes.

Yes, that is indeed a good idea. I have used it in my own code to update UI elements when the streaming data changes. Only thing you need to be careful of is that, the watchers are called synchronously in the agent's thread or the main thread if atom, ref or var. So to avoid blocking the thread, don't do too much processing in the watchers. If you need to do so then create a future.

Related

How are golang select statements implemented?

In particular, I have some blocking queues in C++, and I want to wait until any one of them has some item I can pop.
The only mechanism I can think of is to spawn a separate thread for each queue that pops from its input queue and feeds into a master queue that the original thread can wait on.
It seems kind of resource heavy to spawn N new threads and then kill them all every time I want to pop from a group of queues.
Does Golang implement some more elegant mechanism that I might be able to implement in my own C++ code?
I wouldn't necessarily say that Go's select implementation is elegant, but I think it's beautiful in its own way and it's fairly optimized.
it special-handles selects with a single non-default case
it permutes the order in which cases are evaluated in order to avoid deterministic starvation
it does an optimistic first pass over the cases looking for one that's already satisfied
it enqueues on the internal sender/receiver queues of each channel using many internal, known only to the runtime mechanisms
it uses sudogs which are like lightweight goroutine references (there can be many sudogs for the same goroutine) that allow quick jumping into the goroutine stack
it uses the scheduler's gopark mechanism to block itself which allows efficient unparking on signal
when signalled and unparked, it immediately goes into the triggered case handler function by manipulating the select goroutine's program counter
There's no single overarching groundbreaking idea in the implementation, but you would really appreciate how each step was carefully tinkered with so that it's fast, efficient and well integrated with concept of channels. Because of that, it's not very easy to reimplement Go's select statement in another language, unless you at least have the chan construct first.
You can take a look at the reimplementations available in other languages, where the idea was redone with various degrees of similarity and effectiveness. If I had to reimplement select from scratch in another language, I would probably first try a single shared semaphore and, in case that didn't work, switch to a cruder, sleep-a-little-then-check-in-random-order strategy.
Golang's select statement is inspired from the C select function (see the GNU libc documentation), that is used for waiting I/O on a set of file descriptors. If your queues communicate using a socket or a pipe, you may use it.

Should GTK+ interface run in a separate thread?

I'm taking my first steps in GTK+ (C++ and gtkmm more specificaly) and I have a rather conceptual doubts about how to best structure my program. Right now I just want my GUI to show what is happening in my C++ program by printing several values, and since my main thread is halted while the GUI window is running, I've come across solutions that separated both the processing/computing operations and the graphical interface in separate threads. Is this commonly accepted as the best way to do it, not at all, or not even relevant?
Unless you have a good reason, you are generally better off not creating new threads. Synchronization is hard to get right.
GUI programming is event driven (click on a button and something happens). So you will probably need to tie your background processing into the GUI event system.
In the event that your background processing takes a long time, you will need to break it into a number of fast chunks. At the end of each chunk, you can update a progress bar and schedule the next chunk.
This will mean you will need to probably use some state machine patterns.
Also make sure that any IO is non-blocking.
Here's an example of lengthy operation split in smaller chunks using the main loop without additional threads. Lazy Loading using the main loop.
Yes, absolutely! (in response to your title)
The GUI must be run in a separate thread. If you have ever come across those extremely annoying interfaces that lock up while an operation is in progress1, you'd know why it's very important to have the GUI always running regardless of operation happening.
It's a user experience thing.
1 I don't mean the ones that disable some buttons during operation (that's normal), but the ones that everything seems frozen.
This is the reverse: the main thread should be the Gtk one, and the long processing/computing tasks should be done in threads.
The documentation gives a clear example:
https://pygobject.readthedocs.io/en/latest/guide/threading.html

Periodically call a C function without manually creating a thread

I have implemented a WebSocket handler in C++ and I need to send ping messages once in a while. However, I don't want to start one thread per socket/one global poll thread which only calls the ping function but instead use some OS functionality to call my timer function. On Windows, there is SetTimer but that requires a working message loop (which I don't have.) On Linux there is timer_create, which looks better.
Is there some portable, low-overhead method to get a function called periodically, ideally with some custom context? I.e. something like settimer (const int millisecond, const void* context, void (*callback)(const void*))?
[Edit] Just to make this a bit clearer: I don't want to have to manage additional threads. On Windows, I guess using CreateThreadpoolTimer on the system thread pool will do the trick, but I'm curious to hear if there is a simpler solution and how to port this over to Linux.
If you are intending to go cross-platform, I would suggest you use a cross platform event library like libevent.
libev is newer, however currently has weak Win32 support.
If you use sockets, you can use select, to wait sockets events with timeout,
and in this loop calc time and call callback in suitable time.
If you are looking for a timer that will not require an additional thread, let you do your work transparently and then call the timer function at the appropriate time in the same thread by pre-emptively interrupting your application, then there is no such portable thing.
The first reason is that it's downright dangerous. That's like writing a multi-threaded application with absolutely no synchronization. The second reason is that it is extremely difficult to have good semantics in multi-threaded applications. Which thread should execute the timer callback?
If you're writing a web-socket handler, you are probably already writing a select()-based loop. If so, then you can just use select() with a short timeout and check the different connections for which you need to ping each peer.
Whenever you have asynchronous events, you should have an event loop. This doesn't need to be some system default one, like Windows' message loop. You can create your own. But you should be using it.
The whole point about event-based programming is that you are decoupling your code handling to deal with well-defined functional fragments based on these asynchronous events. Without an event loop, you are condemning yourself to interleaving code that get's input and produces output based on poorly defined "states" that are just fragments of procedural code.
Without a well-defined separation of states using an event-based design, code quickly becomes unmanageable. Because code pauses inside procedures to do input tasks, you have lifetimes of objects that will not span entire procedure scopes, and you will begin to write if (nullptr == xx) in various places that access objects created or destroyed based on events. Dispatch becomes comnbinatorially complex because you have different events expected at each input point and no abstraction.
However, simply using an event loop and dispatch to state machines, you've decreased handling complexity to basic management of handlers (O(n) handlers versus O(mn) branch statements with n types of events and m states). You decouple handling but still allow for functionality to change depending on state. But now these states are well-defined using state classes. And new states can be added if the requirements of the product change.
I'm just saying, stop trying to avoid an event loop. It's a software pattern for very important reasons, all of which have to do with producing professional, reusable, scalable code. Use Boost.ASIO or some other framework for cross platform capabilities. Don't get in the habit of doing it wrong just because you think it will be less of an effort. In the end, even if it's not a professional project that needs maintenance long term, you want to practice making your code professional so you can do something with your skills down the line.

Save data periodically during execution

I have a program which executes constantly and I need to save data every minute.
The program process data and every minute I want to save the value of a variable and do some statistical operations to know the variation of this variable.
I thought i can make it with a signal, SIGALRM and alarm(60). My subquestion is, can I put a class method as the destiny method for SIGALRM?
Any other idea to execute a method to save data and do some operations every minute ??
The program is written in C++, runs in Linux an a mono-core processor.
Your solution using alarm will work, both open and write being asynchronous-signal-safe. Though you have to be aware that interactions between alarm and sleep are undefined, so don't use them in the same program.
A different solution, especially in case you already use an epoll, would be to have a timerfd trigger the epoll. That will avoid possible undefined interactions.
As for the actual saving, consider forking. This is a technique that I learned from redis (maybe someone else invented it, but that's where I learned it from), and which I consider totally cool. The point being that the forked process can take all time in the universe to finish writing as much data as you want to disk. It can access the snapshot at the time of forking while the other process keeps running and modifying data. And thanks to page magic done in the kernel, it still all works seamlessly without any risk of corruption, without ever stalling, and without ever needing to look at something like asynchronous IO, which is great.
You can call a class method using something like boost bind
Apart from that I wouldn't recommend to use signals for that, they are not that reliable, and could, for example, make one of your syscalls to return prematurely.
I would spawn a thread, assuming your monocore doesn't mean no threads, that waits 60 seconds, takes locks, makes calcs, outputs and releases locks.
As they have already suggested, if you have an async compatible system(driven by events) you could use timerfd to generate events.
Saving data from a signal handler is a very bad idea. Even if open and write are async-signal-safe, your data could very well be in an inconsistent state due to a signal interrupting a function that was modifying it.
A much better approach would be to add to all functions which modify the data:
if (current_time > last_save_time + 60) save();
This will avoid useless saves when the data has not been modified, too. If you don't want the overhead of making a system call to determine the current time on every operation, you could instead install a timer/signal handler that updates current_time, as long as you declare it volatile.
Another good approach would be to use threads instead of signals. Then you should use a mutex (or better, rwlock) to synchronize access to the data.

Is checking current thread inside a function ok?

Is it ok to check the current thread inside a function?
For example if some non-thread safe data structure is only altered by one thread, and there is a function which is called by multiple threads, it would be useful to have separate code paths depending on the current thread. If the current thread is the one that alters the data structure, it is ok to alter the data structure directly in the function. However, if the current thread is some other thread, the actual altering would have to be delayed, so that it is performed when it is safe to perform the operation.
Or, would it be better to use some boolean which is given as a parameter to the function to separate the different code paths?
Or do something totally different?
What do you think?
You are not making all too much sense. You said a non-thread safe data structure is only ever altered by one thread, but in the next sentence you talk about delaying any changes made to that data structure by other threads. Make up your mind.
In general, I'd suggest wrapping the access to the data structure up with a critical section, or mutex.
It's possible to use such animals as reader/writer locks to differentiate between readers and writers of datastructures but the performance advantage for typical cases usually wont merit the additional complexity associated with their use.
From the way your question is stated, I'm guessing you're fairly new to multithreaded development. I highly suggest sticking with the simplist and most commonly used approaches for ensuring data integrity (most books/articles you readon the issue will mention the same uses for mutexes/critical sections). Multithreaded development is extremely easy to get wrong and can be difficult to debug. Also, what seems like the "optimal" solution very often doesn't buy you the huge performance benefit you might think. It's usually best to implement the simplist approach that will work then worry about optimizing it after the fact.
There is a trick that could work in case, as you said, the other threads will only make changes only once in a while, although it is still rather hackish:
make sure your "master" thread can't be interrupted by the other ones (higher priority, non fair scheduling)
check your thread
if "master", just change
if other, put off scheduling, if needed by putting off interrupts, make change, reinstall scheduling
really test to see whether there are no issues in your setup.
As you can see, if requirements change a little bit, this could turn out worse than using normal locks.
As mentioned, the simplest solution when two threads need access to the same data is to use some synchronization mechanism (i.e. critical section or mutex).
If you already have synchronization in your design try to reuse it (if possible) instead of adding more. For example, if the main thread receives its work from a synchronized queue you might be able to have thread 2 queue the data structure update. The main thread will pick up the request and can update it without additional synchronization.
The queuing concept can be hidden from the rest of the design through the Active Object pattern. The activ object may also be able to publish the data structure changes through the Observer pattern to other interested threads.