Concurrency, general guidance on application design - concurrency

I asked this question over at the rakulang sub reddit and was suggested to post here:
I keep falling back to Perl 5 for a lot of my work so I can "get it done" simply because I am so much more familiar with Perl 5.
However, I have the need to build something that will subscribe to multiple MQTT topics (similar conceptually to a websocket subscription) and process data, keeping a lot of this data as internal state. A concurrency project. So I see this as a great opportunity to immerse myself in some Raku :)
So far I understand I need to create a supply / given / when setup, but I'm not totally sure how I will deal with each stream of data's state received over each topic. And a reply from my reddit post suggested Cro, which I think will fit the bill very nicely. But there are still some implementation details I am unclear on.
For example, a message payload arrives on topic foo, I want to add data from that payload to an existing array (my internal state).
But this subscription to topics will be happening for an 'undetermined' number topics, and will adjust at runtime. So it will not be possible to have a hard coded array to store and manage this data in like #foo
In a non concurrent world, I could create a hash (associative array) with a key that matches my topic name, %data<foo> for example, and store the array there.
However in the world of concurrency, I would need an answer to the mutex problem. If each member of the hash is having it's data modified concurrently by different threads, then I would think the entire hash would require a lock.
This has potential to result in a deadlock or poor performance at the very least (I am expecting some several hundred messages per second, across multiple topic subscriptions).
Perhaps I can create a variable 'dynamically' (or better yet, object) based the topic name, so there is a separate memory address for each array of data. However, I'm not sure how to do that, or indeed if that is the 'best' approach in this scenario.
In summary, Question 1: Is creating an object or variable dynamically for this purpose a sound pattern?
Question 2: Is there a design approach I am simply not aware of that would be more suitable?
So, any specific advice would be greatly appreciated. I feel like this is a case of "I don't know what I don't know" type of problem!
Thanks!

Related

Basics to how a distributed, consistent key-value storage system return the latest key when dealing with concurrent requests?

I am getting up to speed on distributed systems (studying for an upcoming interview), and specifically on the basics for how a distributed system works for a distributed, consistent key-value storage system managed in memory.
My specific questions I am stuck on that I would love just a high level answer on if it's no trouble:
#1
Let's say we have 5 servers that are responsible to act as readers, and I have one writer. When I write the value 'foo' to the key 'k1', I understand it has to propagate to all of those servers so they all store the value 'foo' for the key k1. Is this correct, or does the writer only write to the majority (quorum) for this to work?
#2
After #1 above takes place, let's say concurrently a read comes in for k1, and a write comes in to replace 'foo' with 'bar', however not all of the servers are updated with 'bar. This means some are 'foo' and some are 'bar'. If I had lots of concurrent reads, it's conceivable some would return 'foo' and some 'bar' since it's not updated everywhere yet.
When we're talking about eventual consistency, this is expected, but if we're talking about strong consistency, how do you avoid #2 above? I keep seeing content about quorum and timestamps but on a high level, is there some sort of intermediary that sorts out what the correct value is? Just wanted to get a basic idea first before I dive in more.
Thank you so much for any help!
In doing more research, I found that "consensus algorithms" such as Paxos or Raft is the correct solution here. The idea is that your nodes need to arrive at a consensus of what the value is. If you read up on Paxos or Raft you'll learn everything you need to - it's quite complex to explain here, but there are videos/resources out there that cover this well.
Another thing I found helpful was learning more about Dynamo and DynamoDB. They handle the subject as well, although not strongly consistent/distributed.
Hope this helps someone, and message me if you'd like more details!
Read the CAP theorem will help you solve your problem. You are looking for consistence and network partition in this question, so you have to sacrifice the availability. The system needs to block and wait until all nodes finish writing. In other word, the change can not be read before all nodes have updated it.
In theoretical computer science, the CAP theorem, also named Brewer's
theorem after computer scientist Eric Brewer, states that any
distributed data store can only provide two of the following three
guarantees:
Consistency Every read receives the most recent write or an error.
Availability Every request receives a (non-error) response, without
the guarantee that it contains the most recent write.
Partition tolerance The system continues to operate despite an arbitrary number
of messages being dropped (or delayed) by the network between nodes.

(preferably boost) lock-free array/vector/map/etc?

Considering my lack of c++ knowledge, please try to read my intent and not my poor technical question.
This is the backbone of my program https://github.com/zaphoyd/websocketpp/blob/experimental/examples/broadcast_server/broadcast_server.cpp
I'm building a websocket server with websocket++ (and oh is websocket++ sweet. I highly recommend), and I can easily manipulate per user data thread-safely because it really doesn't need to be manipulated by different threads; however, I do want to be able to write to an array (I'm going to use the catch-all term "array" from weaker languages like vb, php, js) in one function thread (with multiple iterations that could be running simultanously) and also read in 1 or more threads.
Take stack as an example: if I wanted to have all of the ids (PRIMARY column of all articles) sorted in a particular way, in this case by net votes, and held in memory, I'm thinking I would have a function that's called in its' own boost::thread, fired whenever a vote on the site comes in to reorder the array.
How can I do this without locking & blocking? I'm 100% fine with users reading from an old array while another is being built, but I absolutely do not want their reads or the thread writes to ever fail/be blocked.
Does a lock-free array exist? If not, is there some way to build the new array in a temporary array and then write it to the actual array when the building is finished without locking & blocking?
Have you looked at Boost.Lockfree?
Uh, uh, uh. Complicated.
Look here (for an example): RCU -- and this is only about multiple reads along with ONE write.
My guess is that multiple writers at once are not going to work. You should rather look for a more efficient representation than an array, one that allows for faster updates. How about a balanced tree? log(n) should never block anything in a noticeable fashion.
Regarding boost -- I'm happy that it finally has proper support for thread synchronization.
Of course, you could also keep a copy and batch the updates. Then a background process merges the updates and copies the result for the readers.

Designing pipeline for processing objects

Recently I have been asked a simple design question in an interview:
Suppose there is some data that needs to be processed in a pipeline fashion for efficiency. Suppose there are five tasks to complete and each task's output acts as input to the next one, and once a task completes for an object, it can process next one.
How to design that system? How the next task will get triggered? How the data from one task can be given to the next task?
Any ideas? This was asked in a C++ interview. So, a C++ oriented design will be good.
This is an interview question, so they want you to think aloud and demonstrate the depth of your experience. There is no one "design" or even "answer", and as such you should give as much thought on different cases as you can.
You can spend an entire book on pipeline designs so I won't (and can't) list all aspects you want to watch out for, but here are a few common ones:
Watch out for bottlenecks
Having a common protocol between the pipeline tasks
Can the pipeline reject inputs, or even pass them backwards, how do you handle this
Does it require lots of type conversions
Can you parallelise it, and feed more data into the pipeline even before the first items have made it out

What is the defacto standard for sharing variables between programs in different languages?

I've never had formal training in this area so I'm wondering what do they teach in school (if they do).
Say you have two programs in written in two different languages: C++ and Python or some other combination and you want to share a constantly updated variable on the same machine, what would you use and why? The information need not be secured but must be isochronous should be reliable.
Eg. Program A will get a value from a hardware device and update variable X every 0.1ms, I'd like to be able to access this X from Program B as often as possible and obtain the latest values. Program A and B are written and compiled in two different (robust) languages. How do I access X from program B? Assume I have the source code from A and B and I do not want to completely rewrite or port either of them.
The method's I've seen used thus far include:
File Buffer - Read and write to a
single file (eg C:\temp.txt).
Create a wrapper - From A to B or B
to A.
Memory Buffer - Designate a specific
memory address (mutex?).
UDP packets via sockets - Haven't
tried it yet but looks good.
Firewall?
Sorry for just throwing this out there, I don't know what the name of this technique is so I have trouble searching.
Well you can write XML and use some basic message queuing (like rabbitMQ) to pass messages around
Don't know if this will be helpful, but I'm also a student, and this is what I think you mean.
I've used marshalling to get a java class and import it into a C# program.
With marshalling you use xml to transfer code in a way so that it can be read by other coding environments.
When asking particular questions, you should aim at providing as much information as possible. You have added a use case, but the use case is incomplete.
Your particular use case seems like a very small amount of data that has to be available at a high frequency 10kHz. I would first try to determine whether I can actually make both pieces of code part of a single process, rather than two different processes. Depending on the languages (missing from the question) it might even be simple, or turn the impossible into possible --depending on the OS (missing from the question), the scheduler might not be fast enough switching from one process to another, and it might impact the availability of the latest read. Switching between threads is usually much faster.
If you cannot turn them into a single process, then you will have to use some short of IPC (Inter Process Communication). Due to the frequency I would rule out most heavy weight protocols (avoid XML, CORBA) as the overhead will probably be too high. If the receiving end needs only access to the latest value, and that access may be less frequent than 0.1 ms, then you don't want to use any protocol that includes queueing as you do not want to read the next element in the queue, you only care about the last, if you did not read the element when it was good, avoid the cost of processing it when it is already stale --i.e. it does not make sense to loop extracting from the queue and discarding.
I would be inclined to use shared memory, or a memory mapped shared file (they are probably quite similar, depends on the platform missing from the question). Depending on the size of the element and the exact hardware architecture (missing from the question) you may be able to avoid locking with a mutex. As an example in current intel processors, read/write access to 32 bit integers from memory is guaranteed to be atomic if the variable is correctly aligned, so in that case you would not be locking.
At my school they teach CORBA. They shouldn't, it's an ancient hideous language from the eon of mainframes, it's a classic case of design-by-committee, every feature possible that you don't want is included, and some that you probably do (asynchronous calls?) aren't. If you think the c++ specification is big, think again.
Don't use it.
That said though, it does have a nice, easy-to-use interface for doing simple things.
But don't use it.
It almost always pass through C binding.

Options for a message passing system for a game

I'm working on an RTS game in C++ targeted at handheld hardware (Pandora). For reference, the Pandora has a single ARM processor at ~600Mhz and runs Linux. We're trying to settle on a good message passing system (both internal and external), and this is new territory for me.
It may help to give an example of a message we'd like to pass. A unit may make this call to load its models into memory:
sendMessage("model-loader", "load-model", my_model.path, model_id );
In return, the unit could expect some kind of message containing a model object for the particular model_id, which can then be passed to the graphics system. Please note that this sendMessage function is in no way final. It just reflects my current understanding of message passing systems, which is probably not correct :)
From what I can tell there are two pretty distinct choices. One is to pass messages in memory, and only pass through the network when you need to talk to an external machine. I like this idea because the overhead seems low, but the big problem here is it seems like you need to make extensive use of mutex locking on your message queues. I'd really like to avoid excess locking if possible. I've read a few ways to implement simple queues without locking (by relying on atomic int operations) but these assume there is only one reader and one writer for a queue. This doesn't seem useful to our particular case, as an object's queue will have many writers and one reader.
The other choice is to go completely over the network layer. This has some fun advantages like getting asynchronous message passing pretty much for free. Also, we gain the ability to pass messages to other machines using the exact same calls as passing locally. However, this solution rubs me the wrong way, probably because I don't fully understand it :) Would we need a socket for every object that is going to be sending/receiving messages? If so, this seems excessive. A given game will have thousands of objects. For a somewhat underpowered device like the Pandora, I fear that abusing the network like that may end up being our bottleneck. But, I haven't run any tests yet, so this is just speculation.
MPI seems to be popular for message passing but it sure feels like overkill for what we want. This code is never going to touch a cluster or need to do heavy calculation.
Any insight into what options we have for accomplishing this is much appreciated.
The network will be using locking as well. It will just be where you cannot see it, in the OS kernel.
What I would do is create your own message queue object that you can rewrite as you need to. Start simple and make it better as needed. That way you can make it use any implementation you like behind the scenes without changing the rest of your code.
Look at several possible implementations that you might like to do in the future and design your API so that you can handle them all efficiently if you decide to implement in those terms.
If you want really efficient message passing look at some of the open source L4 microkernels. Those guys put a lot of time into fast message passing.
Since this is a small platform, it might be worth timing both approaches.
However, barring some kind of big speed issue, I'd always go for the approach that is simpler to code. That is probably going to be using the network stack, as it will be the same code no matter where the recipient is, and you won't have to manually code and degug your mutual exclusions, message buffering, allocations, etc.
If you find out it is too slow, you can always recode the local stuff using memory later. But why waste the time doing that up front if you might not have to?
I agree with Zan's recommendation to pass messages in memory whenever possible.
One reason is that you can pass complex objects C++ without needing to marshal and unmarshal (serialize and de-serialize) them.
The cost of protecting your message queue with a semaphore is most likely going to be less than the cost of making networking code calls.
If you protect your message queue with some lock-free algorithm (using atomic operations as you alluded to yourself) you can avoid a lot a context switches into and out of the kernel.