detecting dropped items with core.async pub - clojure

I'm looking at using a shared core.async pub channel as the primary communication layer for my app as a way to introduce some indirection between components. I'm concerned about the behavior of pub, though, specifically the way that it silently drops items if there is no matching sub for a given topic. In a large system, this seems like it could be a real headache to debug. Is there any way to detect that an item got dropped, or at least to throw an exception in this case?

No, it looks like it just gets dropped silently if there's no matching topic subscriber https://github.com/clojure/core.async/blob/2afc2dc5102f60713135ffca6fab993fb35809f0/src/main/clojure/clojure/core/async.clj#L879

Related

Concurrency, general guidance on application design

I asked this question over at the rakulang sub reddit and was suggested to post here:
I keep falling back to Perl 5 for a lot of my work so I can "get it done" simply because I am so much more familiar with Perl 5.
However, I have the need to build something that will subscribe to multiple MQTT topics (similar conceptually to a websocket subscription) and process data, keeping a lot of this data as internal state. A concurrency project. So I see this as a great opportunity to immerse myself in some Raku :)
So far I understand I need to create a supply / given / when setup, but I'm not totally sure how I will deal with each stream of data's state received over each topic. And a reply from my reddit post suggested Cro, which I think will fit the bill very nicely. But there are still some implementation details I am unclear on.
For example, a message payload arrives on topic foo, I want to add data from that payload to an existing array (my internal state).
But this subscription to topics will be happening for an 'undetermined' number topics, and will adjust at runtime. So it will not be possible to have a hard coded array to store and manage this data in like #foo
In a non concurrent world, I could create a hash (associative array) with a key that matches my topic name, %data<foo> for example, and store the array there.
However in the world of concurrency, I would need an answer to the mutex problem. If each member of the hash is having it's data modified concurrently by different threads, then I would think the entire hash would require a lock.
This has potential to result in a deadlock or poor performance at the very least (I am expecting some several hundred messages per second, across multiple topic subscriptions).
Perhaps I can create a variable 'dynamically' (or better yet, object) based the topic name, so there is a separate memory address for each array of data. However, I'm not sure how to do that, or indeed if that is the 'best' approach in this scenario.
In summary, Question 1: Is creating an object or variable dynamically for this purpose a sound pattern?
Question 2: Is there a design approach I am simply not aware of that would be more suitable?
So, any specific advice would be greatly appreciated. I feel like this is a case of "I don't know what I don't know" type of problem!
Thanks!

The actor model: Why is Erlang/OTP special? Could you use another language?

I've been looking into learning Erlang/OTP, and as a result, have been reading (okay, skimming) about the actor model.
From what I understand, the actor model is simply a set of functions (run within lightweight threads called "processes" in Erlang/OTP), which communicate with each other only via message passing.
This seems fairly trivial to implement in C++, or any other language:
class BaseActor {
std::queue<BaseMessage*> messages;
CriticalSection messagecs;
BaseMessage* Pop();
public:
void Push(BaseMessage* message)
{
auto scopedlock = messagecs.AquireScopedLock();
messagecs.push(message);
}
virtual void ActorFn() = 0;
virtual ~BaseActor() {} = 0;
}
With each of your processes being an instance of a derived BaseActor. Actors communicate with each other only via message-passing. (namely, pushing). Actors register themselves with a central map on initialization which allows other actors to find them, and allows a central function to run through them.
Now, I understand I'm missing, or rather, glossing over one important issue here, namely:
lack of yielding means a single Actor can unfairly consume excessive time. But are cross-platform coroutines the primary thing that makes this hard in C++? (Windows for instance has fibers.)
Is there anything else I'm missing, though, or is the model really this obvious?
The C++ code does not deal with fairness, isolation, fault detection or distribution which are all things which Erlang brings as part of its actor model.
No actor is allowed to starve any other actor (fairness)
If one actor crashes, it should only affect that actor (isolation)
If one actor crashes, other actors should be able to detect and react to that crash (fault detection)
Actors should be able to communicate over a network as if they were on the same machine (distribution)
Also the beam SMP emulator brings JIT scheduling of the actors, moving them to the core which is at the moment the one with least utilization and also hibernates the threads on certain cores if they are no longer needed.
In addition all the libraries and tools written in Erlang can assume that this is the way the world works and be designed accordingly.
These things are not impossible to do in C++, but they get increasingly hard if you add the fact that Erlang works on almost all of the major hw and os configurations.
edit: Just found a description by Ulf Wiger about what he sees erlang style concurrency as.
I don't like to quote myself, but from Virding's First Rule of Programming
Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.
With respect to Greenspun. Joe (Armstrong) has a similar rule.
The problem is not to implement actors, that's not that difficult. The problem is to get everything working together: processes, communication, garbage collection, language primitives, error handling, etc ... For example using OS threads scales badly so you need to do it yourself. It would be like trying to "sell" an OO language where you can only have 1k objects and they are heavy to create and use. From our point of view concurrency is the basic abstraction for structuring applications.
Getting carried away so I will stop here.
This is actually an excellent question, and has received excellent answers that perhaps are yet unconvincing.
To add shade and emphasis to the other great answers already here, consider what Erlang takes away (compared to traditional general purpose languages such as C/C++) in order to achieve fault-tolerance and uptime.
First, it takes away locks. Joe Armstrong's book lays out this thought experiment: suppose your process acquires a lock and then immediately crashes (a memory glitch causes the process to crash, or the power fails to part of the system). The next time a process waits for that same lock, the system has just deadlocked. This could be an obvious lock, as in the AquireScopedLock() call in the sample code; or it could be an implicit lock acquired on your behalf by a memory manager, say when calling malloc() or free().
In any case, your process crash has now halted the entire system from making progress. Fini. End of story. Your system is dead. Unless you can guarantee that every library you use in C/C++ never calls malloc and never acquires a lock, your system is not fault tolerant. Erlang systems can and do kill processes at will when under heavy load in order make progress, so at scale your Erlang processes must be killable (at any single point of execution) in order to maintain throughput.
There is a partial workaround: using leases everywhere instead of locks, but you have no guarantee that all the libraries you utilize also do this. And the logic and reasoning about correctness gets really hairy quickly. Moreover leases recover slowly (after the timeout expires), so your entire system just got really slow in the face of failure.
Second, Erlang takes away static typing, which in turn enables hot code swapping and running two versions of the same code simultaneously. This means you can upgrade your code at runtime without stopping the system. This is how systems stay up for nine 9's or 32 msec of downtime/year. They are simply upgraded in place. Your C++ functions will have to be manually re-linked in order to be upgraded, and running two versions at the same time is not supported. Code upgrades require system downtime, and if you have a large cluster that cannot run more than one version of code at once, you'll need to take the entire cluster down at once. Ouch. And in the telecom world, not tolerable.
In addition Erlang takes away shared memory and shared shared garbage collection; each light weight process is garbage collected independently. This is a simple extension of the first point, but emphasizes that for true fault tolerance you need processes that are not interlocked in terms of dependencies. It means your GC pauses compared to java are tolerable (small instead of pausing a half-hour for a 8GB GC to complete) for big systems.
There are actual actor libraries for C++:
http://actor-framework.org/
http://www.theron-library.com/
And a list of some libraries for other languages.
It is a lot less about the actor model and a lot more about how hard it is to properly write something analogous to OTP in C++. Also, different operating systems provide radically different debugging and system tooling, and Erlang's VM and several language constructs support a uniform way of figuring out just what all those processes are up to which would be very hard to do in a uniform way (or maybe do at all) across several platforms. (It is important to remember that Erlang/OTP predates the current buzz over the term "actor model", so in some cases these sort of discussions are comparing apples and pterodactyls; great ideas are prone to independent invention.)
All this means that while you certainly can write an "actor model" suite of programs in another language (I know, I have done this for a long time in Python, C and Guile without realizing it before I encountered Erlang, including a form of monitors and links, and before I'd ever heard the term "actor model"), understanding how the processes your code actually spawns and what is happening amongst them is extremely difficult. Erlang enforces rules that an OS simply can't without major kernel overhauls -- kernel overhauls that would probably not be beneficial overall. These rules manifest themselves as both general restrictions on the programmer (which can always be gotten around if you really need to) and basic promises the system guarantees for the programmer (which can be deliberately broken if you really need to also).
For example, it enforces that two processes cannot share state to protect you from side effects. This does not mean that every function must be "pure" in the sense that everything is referentially transparent (obviously not, though making as much of your program referentially transparent as practical is a clear design goal of most Erlang projects), but rather that two processes aren't constantly creating race conditions related to shared state or contention. (This is more what "side effects" means in the context of Erlang, by the way; knowing that may help you decipher some of the discussion questioning whether Erlang is "really functional or not" when compared with Haskell or toy "pure" languages.)
On the other hand, the Erlang runtime guarantees delivery of messages. This is something sorely missed in an environment where you must communicate purely over unmanaged ports, pipes, shared memory and common files which the OS kernel is the only one managing (and OS kernel management of these resources is necessarily extremely minimal compared to what the Erlang runtime provides). This doesn't meant that Erlang guarantees RPC (anyway, message passing is not RPC, nor is it method invocation!), it doesn't promise that your message is addressed correctly, and it doesn't promise that a process you're trying to send a message to exists or is alive, either. It just guarantees delivery if the thing your sending to happens to be valid at that moment.
Built on this promise is the promise that monitors and links are accurate. And based on that the Erlang runtime makes the entire concept of "network cluster" sort of melt away once you grasp what is going on with the system (and how to use erl_connect...). This permits you to hop over a set of tricky concurrency cases already, which gives one a big head start on coding for the successful case instead of getting mired in the swamp of defensive techniques required for naked concurrent programming.
So its not really about needing Erlang, the language, its about the runtime and OTP already existing, being expressed in a rather clean way, and implementing anything close to it in another language being extremely hard. OTP is just a hard act to follow. In the same vein, we don't really need C++, either, we could just stick to raw binary input, Brainfuck and consider Assembler our high level language. We also don't need trains or ships, as we all know how to walk and swim.
All that said, the VM's bytecode is well documented, and a number of alternative languages have emerged that compile to it or work with the Erlang runtime. If we break the question into a language/syntax part ("Do I have to understand Moon Runes to do concurrency?") and a platform part ("Is OTP the most mature way to do concurrency, and will it guide me around the trickiest, most common pitfalls to be found in a concurrent, distributed environment?") then the answer is ("no", "yes").
Casablanca is another new kid on the actor model block. A typical asynchronous accept looks like this:
PID replyTo;
NameQuery request;
accept_request().then([=](std::tuple<NameQuery,PID> request)
{
if (std::get<0>(request) == FirstName)
std::get<1>(request).send("Niklas");
else
std::get<1>(request).send("Gustafsson");
}
(Personally, I find that CAF does a better job at hiding the pattern matching behind a nice interface.)

Error handling design problem on collection of items

I have a collection of some items and some operation on them. This operation is a part of remote calls between client and server and it should run on all items at once. On server side it runs repeatedly on each item and may fail or succeed. I need to know which items succeeded and which failed. I guess this is rather common case and there are good solutions to it. How should I design it?
it should run on all items at once
You will hate your life if you don't read into this as a design requirement. All or nothing is the right way to handle it. It will simplify everything you do.
If that isn't an option, just do the dumbest thing possible. Wrap each call in a try/catch and give some report. Chances are no one will be able to consume the report, which is another reason all or nothing is the right thing to do.
edit:
To elaborate: When batching, writing simple logic to report errors is fine, but writing logic to recover from errors is very complicated. I've never seen a system really handle recovery well on batching. I'm sure there are some corner cases where each item is completely independent. At which point makes no matter that one or another failed, but that is usually not the case.
Generally, I expect any errors that happen during a batching operation to not be critical. By that I mean the system should be able to ignore errors and continue operating as if the message that caused the error never existed.
If it's really vital that these messages get processed, then I would definately try for all or nothing.

How to get a debug flow of execution in C++

I work on a global trading system which supports many users. Each user can book,amend,edit,delete trades. The system is regulated by a central deal capture service. The deal capture service informs all the user of any updates that occur.
The problem comes when we have crashes, as the production environment is impossible to re-create on a test system, I have to rely on crash dumps and log files.
However this doesn't tell me what the user has been doing.
I'd like a system that would (at the time of crashing) dump out a history of what the user has been doing. Anything that I add has to go into the live environment so it can't impact performance too much.
Ideas wise I was thinking of a MACRO at the top of each function which acted like a stack trace (only I could supply additional user information, like trade id's, user dialog choices, etc ..) The system would record stack traces (on a per thread basis) and keep a history in a cyclic buffer (varying in size, depending on how much history you wanted to capture). Then on crash, I could dump this history stack.
I'd really like to hear if anyone has a better solution, or if anyone knows of an existing framework?
Thanks
Rich
Your solution sounds pretty reasonable, though perhaps rather than relying on viewing your audit trail in the debugger you can trigger it being printed with atexit() handlers. Something as simple as a stack of strings that have __FILE__,__LINE__,pthread_self() in them migth be good enough
You could possibly use some existing undo framework, as its similar to an audit trail, but it's going to be more heavyweight than you want. It will likely be based on the command pattern and expect you to implement execute() methods, though I suppose you could just leave them blank.
Trading systems usually don't suffer the performance hit of instrumentation of that level. C++ based systems, in particular, tend to sacrifice the ease of debugging for performance. Otherwise, more companies would be developing such systems in Java/C#.
I would avoid an attempt to introduce stack traces into C++. I am also not confident that you could introduce such a system in a way that would not affect the behavior of the program in some way (e.g., affect threading behavior).
It might, IMHO, be preferable to log the external inputs (e.g., user GUI actions and message traffic) rather than attempt to capture things internally in the program. In that case, you might have a better chance of replicating the failure and debugging it.
Are you currently logging all network traffic to/from the client? Many FIX based systems record this for regulatory purposes. Can you easily log your I/O?
I suggest creating another (circular) log file that contains your detailed information. Beware that this file will grow exponentially compared to other files.
Another method is to save the last N transactions. Write a program that reads the transaction log and feeds the data into your virtual application. This may help create the cause. I've used this technique with embedded systems before.

Options for a message passing system for a game

I'm working on an RTS game in C++ targeted at handheld hardware (Pandora). For reference, the Pandora has a single ARM processor at ~600Mhz and runs Linux. We're trying to settle on a good message passing system (both internal and external), and this is new territory for me.
It may help to give an example of a message we'd like to pass. A unit may make this call to load its models into memory:
sendMessage("model-loader", "load-model", my_model.path, model_id );
In return, the unit could expect some kind of message containing a model object for the particular model_id, which can then be passed to the graphics system. Please note that this sendMessage function is in no way final. It just reflects my current understanding of message passing systems, which is probably not correct :)
From what I can tell there are two pretty distinct choices. One is to pass messages in memory, and only pass through the network when you need to talk to an external machine. I like this idea because the overhead seems low, but the big problem here is it seems like you need to make extensive use of mutex locking on your message queues. I'd really like to avoid excess locking if possible. I've read a few ways to implement simple queues without locking (by relying on atomic int operations) but these assume there is only one reader and one writer for a queue. This doesn't seem useful to our particular case, as an object's queue will have many writers and one reader.
The other choice is to go completely over the network layer. This has some fun advantages like getting asynchronous message passing pretty much for free. Also, we gain the ability to pass messages to other machines using the exact same calls as passing locally. However, this solution rubs me the wrong way, probably because I don't fully understand it :) Would we need a socket for every object that is going to be sending/receiving messages? If so, this seems excessive. A given game will have thousands of objects. For a somewhat underpowered device like the Pandora, I fear that abusing the network like that may end up being our bottleneck. But, I haven't run any tests yet, so this is just speculation.
MPI seems to be popular for message passing but it sure feels like overkill for what we want. This code is never going to touch a cluster or need to do heavy calculation.
Any insight into what options we have for accomplishing this is much appreciated.
The network will be using locking as well. It will just be where you cannot see it, in the OS kernel.
What I would do is create your own message queue object that you can rewrite as you need to. Start simple and make it better as needed. That way you can make it use any implementation you like behind the scenes without changing the rest of your code.
Look at several possible implementations that you might like to do in the future and design your API so that you can handle them all efficiently if you decide to implement in those terms.
If you want really efficient message passing look at some of the open source L4 microkernels. Those guys put a lot of time into fast message passing.
Since this is a small platform, it might be worth timing both approaches.
However, barring some kind of big speed issue, I'd always go for the approach that is simpler to code. That is probably going to be using the network stack, as it will be the same code no matter where the recipient is, and you won't have to manually code and degug your mutual exclusions, message buffering, allocations, etc.
If you find out it is too slow, you can always recode the local stuff using memory later. But why waste the time doing that up front if you might not have to?
I agree with Zan's recommendation to pass messages in memory whenever possible.
One reason is that you can pass complex objects C++ without needing to marshal and unmarshal (serialize and de-serialize) them.
The cost of protecting your message queue with a semaphore is most likely going to be less than the cost of making networking code calls.
If you protect your message queue with some lock-free algorithm (using atomic operations as you alluded to yourself) you can avoid a lot a context switches into and out of the kernel.