How do you process messages in parallel while ensuring FIFO per entity? - concurrency

Let's say you have an entity, say, "Person" in your system and you want to process events that modify various Person entities. It is important that:
Events for the same Person are processed in FIFO order
Multiple Person event streams be processed in parallel by different threads/processes
We have an implementation that solves this using a shared database and locks. Threads compete to acquire the lock for a Person and then process events in order after acquiring the lock. We'd like to move to a message queue to avoid polling and locking, which we feel would reduce load on the DB and simplify the implementation of the consumer code.
I've done some research into ActiveMQ, RabbitMQ, and HornetQ but I don't see an obvious way to implement this.
ActiveMQ supports consumer subscription wildcards, but I don't see a way to limit the concurrency on each queue to 1. If I could do that, then the solution would be straightforward:
Somehow tell broker to allow a concurrency of 1 for all queues starting with: /queue/person.
Publisher writes event to queue using Person ID in the queue name. e.g.: /queue/person.20
Consumers subscribe to the queue using wildcards: /queue/person.>
Each consumer would receive messages for different person queues. If all person queues were in use, some consumers may sit idle, which is ok
After processing a message, the consumer sends an ACK, which tells the broker it's done with the message, and allows another message for that Person queue to be sent to another consumer (possibly the same one)
ActiveMQ came close: You can do wildcard subscriptions and enable "exclusive consumer", but that combination results in a single consumer receiving all messages sent to all matching queues, reducing your concurrency to 1 across all Persons. I feel like I'm missing something obvious.
Questions:
Is there way to implement the above approach with any major message queue implementation? We are fairly open to options. The only requirement is that it run on Linux.
Is there a different way to solve the general problem that I'm not considering?
Thanks!

It looks like JMSXGroupID is what I'm looking for. From the ActiveMQ docs:
http://activemq.apache.org/message-groups.html
Their example use case with stock prices is exactly what I'm after. My only concern is what happens if the single consumer dies. Hopefully the broker will detect that and pick another consumer to associate with that group id.

One general way to solve this problem (if I got your problem right) is to introduce some unique property for Person (say, database-level id of Person) and use hash of that property as index of FIFO queue to put that Person in.
Since hash of that property can be unwieldy big (you can't afford 2^32 queues/threads), use only N the least significant bits of that hash.
Each FIFO queue should have dedicated worker that will work upon it -- voila, your requirements are satisfied!
This approach have one drawback -- your Persons must have well-distributed ids to make all queues work with more-or-less equal load. If you can't guarantee that, consider using round-robin set of queues and track which Persons are being processed now to ensure sequential processing for same person.

If you already have a system that allows shared locks, why not have a lock for every queue, which consumers must acquire before they read from the queue?

Related

How Erlang processes access mailbox concurrently

There are lots of info regarding how to use erlang mailbox, but seldom to find a paper or document describe how erlang actual access mailbox concurrently internally within the VM.
To my understanding, Erlang VM must have to do locking or CAS action to secure message integrity. Is there any sophisticate way of method behind erlang's curtain
By mailbox I'm assuming you mean the process mailbox, the one messages are inserted into. Fun question!
There's some conversation here about the locking characteristics of the Erlang process message queue:
Just a curiosity: currently there is some kind of locks in sending message.
Have anybody tried to implement a lock-free linked list:
http://www.amd64.org/fileadmin/user_upload/pub/epham08-asf-eval.pdf
Or I'm just looking at wrong place and erts_smp_proc_lock is already
using something like this?
The message queue already has this, sort of. The process that owns the
message box has an "inner box" that he has a lock on and an "outer
box" that all senders compete for. So the lock contention is on the
tail of the queue on the "outer box" when lots of processes sends to
that process. The mail box owner is not concerned with it though.
You might find reading the implementation of the BEAM process illustrative.
Short answer: yes, locking is done on the message queue, but it's complicated and optimized to reduce contention between scheduler threads.
There are several locks which handle process structure. The most important regarding sending messages are MSGQ lock and MAIN lock. The MAIN lock is the one that locks the structure's fields while it is operational - one of fields is outgoing queue. The MSGQ lock covers linked-list of incoming messages.
So, to send message we need to acquire recipients MSGQ lock and copy message from our queue (guarded by MAIN) to the queue of incoming messages of the other process.
Mind how async is this sending operation. Processes do not block each other! (most of the time;)

Alternative to JMS messaging for concurrent processing in Torquebox

I have an application that periodically calls some service for data (using torquebox schedulers) and when set of data is available it should process each "data record" separately.
I'd like to process those records concurrently for better performance and my first thought was to set-up jms queue (available in torquebox ouf of the box) so that scheduled job would put all the received data in queue and each record would be picked up (for one of multiple receivers connected) for processing.
But isn't it overengineering things to puth JMS queue between elements of the same application? Any other approaches you could suggest here?
A JMS queue may not be a bad solution at all, try it out and see how it works for you. When they are as easy to use as in Torquebox it doesn't need to be overengineering.
If you want something less involved I recommend using Java's own BlockingQueues, either LinkedBlockingQueue or ArrayBlockingQueue depending on the exact use case.
These are just regular collections like arrays or hashes, so you'll need to create them somewhere and pass them into the components that you want to be able to publish and consume from them. They also do not have any concept of acknowledgement like JMS queues have.
How about using Java messaging queue (as you mentioned) since HornetQ is part of the JBoss/Torquebox and then using a message processor for the handling of the messages. You can also specify the level of concurrency at the torquebox.rb (or .yml).
Your_Scheduled_Job -> /queues/my_queue -> TorqueBox::Messaging::MessageProcessor
In your config/torquebox.rb file you can specify the concurrency and name messaging processor:
queue '/queues/my_queue' do
processor MyMessageProcessor do
concurrency 5
end
end
The messaging processor will process the messages on the queue concurrently without needing any other steps.
I'm also still experimenting with Torquebox and Ruby concurrency and this something I'm trying to implement these days...

Pull all item from Message queue

There is an application which connect to multiple sockets. It has two threads, receiving thread and processing thread. So in between them, I create a message queue. Since it does not require to process the message one by one, all the messages can be pulled from the queue and then update the internal data structure. Finally, start to process. Currently, I create my own message queue. I am just wondering if there is any better option. ps performance is critical
EDIT: Better means good performance, easy to use and guarantee delivery. optional: use zeromq to do so.

C++: synchronize 5 consumers to 1 producer (multithreaded)

I have five consumers and one producer. The five consumers each output different data, from the one producer, for ~10ms. During those 10ms the producer prepares the parameters for the next output. When the output parameters are set, I want to set a flag instructing the consumers to begin the next output. I only want the producer to produce when the consumers are outputting the data.
I am unsure how to synchronize the five consumers and the single producer. I currently have two flags, runFlag and doneFlag. When a consumer reads the new data I want to set runFlag to true so the calculations begin, and I want to set doneFlag to false, as the calculations have not completed. However, if I set doneFlag to false in one consumer it may be false in another consumer before that consumer can check the flag.
I hope my question is specific enough. Please let me know if there's anything else I can provide. Also, I'm just looking for a general idea of how to proceed. I know there are multiple ways to do this, but I'm unsure which method will work the best.
Thanks!
You will need 2 events and an integer reference count.
When producer has produced some thing it:
initiates read_count = 0;
sets event readme.
starts to wait on event completed;
Consumers wait on event readme. After doing their work they ATOMICALLY increment read_count. If the read_count reaches the number of consumers, 5 in your case, then it sets the completed event. Thus producer can continue and the cycle repeats itself.
A few years back, I had to create a generic work dispatcher that does post-processing. Its not a producer-consumer exactly and may be overkill for your app, but it may give you some ideas.
I particularly like using a pair of in-memory shared queues, an outbound queue and an inbound queue, arranged like a two-way channel. If you create a queue class that has the proper synchronization for reading and writing, the producer and consumers can become independent. They don't need to know how to synchronize with each other.
Your data, known to the producer and consumer, referenced in a work item class. The work item class contains all of the status flags. The data should also be thread safe.
The producer enqueues work items onto the outbound queue and each consumer dequeues a single work item. When the work is completed, the status flags are updated and the work item is posted back to inbound queue for post-processing by the producer.
IIRC, the architecture only contain three classes or so.

Network Multithreading

I'm programming an online game for two reasons, one to familiarize myself with server/client requests in a realtime environment (as opposed to something like a typical web browser, which is not realtime) and to actually get my hands wet in that area, so I can proceed to actually properly design one.
Anywho, I'm doing this in C++, and I've been using winsock to handle my basic, basic network tests. I obviously want to use a framelimiter and have 3D going and all of that at some point, and my main issue is that when I do a send() or receive(), the program kindly idles there and waits for a response. That would lead to maybe 8 fps on even the best internet connection.
So the obvious solution to me is to take the networking code out of the main process and start it up in its own thread. Ideally, I would call a "send" in my main process which would pass the networking thread a pointer to the message, and then periodically (every frame) check to see if the networking thread had received the reply, or timed out, or what have you. In a perfect world, I would actually have 2 or more networking threads running simultaneously, so that I could say run a chat window and do a background download of a piece of armor and still allow the player to run around all at once.
The bulk of my problem is that this is a new thing to me. I understand the concept of threading, but I can see some serious issues, like what happens if two threads try to read/write the same memory address at the same time, etc. I know that there are already methods in place to handle this sort of thing, so I'm looking for suggestions on the best way to implement something like this. Basically, I need thread A to be able to start a process in thread B by sending a chunk of data, poll thread B's status, and then receive the reply, also as a chunk of data., ideally without any major crashing going on. ^_^ I'll worry about what that data actually contains and how to handle dropped packets, etc later, I just need to get that happening first.
Thanks for any help/advice.
PS: Just thought about this, may make the question simpler. Is there a way to use the windows event handling system to my advantage? Like, would it be possible to have thread A initialize data somewhere, then trigger an event in thread B to have it pick up the data, and vice versa for thread B to tell thread A it was done? That would probably solve a lot of my problems, since I don't really need both threads to be able to work on the data at the same time, more of a baton pass really. I just don't know if this is possible between two different threads. (I know one thread can create its own messages for the event handler.)
The easiest thing
for you to do, would be to simply invoke the windows API QueueUserWorkItem. All you have to specify is the function that the thread will execute and the input passed to it. A thread pool will be automatically created for you and the jobs executed in it. New threads will be created as and when is required.
http://msdn.microsoft.com/en-us/library/ms684957(VS.85).aspx
More Control
You could have a more detailed control using another set of API's which can again manage the thread pool for you -
http://msdn.microsoft.com/en-us/library/ms686980(VS.85).aspx
Do it yourself
If you want to control all aspects of your thread creation and the pool management you would have to create the threads yourself, decide how they should end , how many to create etc (beginthreadex is the api you should be using to create threads. If you use MFC you should use AfxBeginThread function).
Send jobs to worker threads - Io completion Ports
In this case, you would also have to worry about how to communicate your jobs - i would recommend IoCOmpletionPorts to do that. It is the most scalable notification mechanism that i currently know of made for this purpose. It has the additional advantage that it is implemented in the kernel so you avoid all kinds of dead loack sitautions you would encounter if you decide to handroll something yourself.
This article will show you how with code samples -
http://blogs.msdn.com/larryosterman/archive/2004/03/29/101329.aspx
Communicate Back - Windows Messages
You could use windows messages to communicate the status back to your parent thread since it is doing the message wait anyway. use the PostMessage function to do this. (and check for errors)
ps : You could also allocate the data that needs to be sent out on a dedicated pointer and then the worker thread could take care of deleting it after sending it out. That way you avoid the return pointer traffic too.
BlodBath's suggestion of non-blocking sockets is potentially the right approach.
If you're trying to avoid using a multithreaded approach, then you could investigate the use of setting up overlapped I/O on your sockets. They will not block when you do a transmit or receive, but have the added bonus of giving you the option of waiting for multiple events within your single event loop. When your transmit has finished, you will receive an event. (see this for some details)
This is not incompatible with a multithreaded approach, so there's the option of changing your mind later. ;-)
On the design of your multithreaded app. the best thing to do is to work out all of the external activities that you want to be alerted to. For example, so far in your question you've listed network transmits, network receives, and user activity.
Depending on the number of concurrent connections you're going to be dealing with you'll probably find it conceptually simpler to have a thread per socket (assuming small numbers of sockets), where each thread is responsible for all of the processing for that socket.
Then you can implement some form of messaging system between your threads as RC suggested.
Arrange your system so that when a message is sent to a particular thread and event is also sent. Your threads can then be sent to sleep waiting for one of those events. (as well as any other stimulus - like socket events, user events etc.)
You're quite right that you need to be careful of situations where more than one thread is trying to access the same piece of memory. Mutexes and semaphores are the things to use there.
Also be aware of the limitations that your gui has when it comes to multithreading.
Some discussion on the subject can be found in this question.
But the abbreviated version is that most (and Windows is one of these) GUIs don't allow multiple threads to perform GUI operations simultaneously. To get around this problem you can make use of the message pump in your application, by sending custom messages to your gui thread to get it to perform gui operations.
I suggest looking into non-blocking sockets for the quick fix. Using non-blocking sockets send() and recv() do not block, and using the select() function you can get any waiting data every frame.
See it as a producer-consumer problem: when receiving, your network communication thread is the producer whereas the UI thread is the consumer. When sending, it's just the opposite. Implement a simple buffer class which gives you methods like push and pop (pop should be blocking for the network thread and non-blocking for the UI thread).
Rather than using the Windows event system, I would prefer something that is more portable, for example Boost condition variables.
I don't code games, but I've used a system similar to what pukku suggested. It lends nicely to doing things like having the buffer prioritize your messages to be processed if you have such a need.
I think of them as mailboxes per thread. You want to send a packet? Have the ProcessThread create a "thread message" with the payload to go on the wire and "send" it to the NetworkThread (i.e. push it on the NetworkThread's queue/mailbox and signal the condition variable of the NetworkThread so he'll wake up and pull it off). When the NetworkThread receives the response, package it up in a thread message and send it back to the ProcessThread in the same manner. Difference is the ProcessThread won't be blocked on a condition variable, just polling on mailbox.empty( ) when you want to check for the response.
You may want to push and pop directly, but a more convenient way for larger projects is to implement a toThreadName, fromThreadName scheme in a ThreadMsg base class, and a Post Office that threads register their Mailbox with. The PostOffice then has a send(ThreadMsg*); function that gets/pushes the messages to the appropriate Mailbox based on the to and from. Mailbox (the buffer/queue class) contains the ThreadMsg* = receiveMessage(), basically popping it off the underlying queue.
Depending on your needs, you could have ThreadMsg contain a virtual function process(..) that could be overridden accordingly in derived classes, or just have an ordinary ThreadMessage class with a to, from members and a getPayload( ) function to get back the raw data and deal with it directly in the ProcessThread.
Hope this helps.
Some topics you might be interested in:
mutex: A mutex allows you to lock access to specific resources for one thread only
semaphore: A way to determine how many users a certain resource still has (=how many threads are accessing it) and a way for threads to access a resource. A mutex is a special case of a semaphore.
critical section: a mutex-protected piece of code (street with only one lane) that can only be travelled by one thread at a time.
message queue: a way of distributing messages in a centralized queue
inter-process communication (IPC) - a way of threads and processes to communicate with each other through named pipes, shared memory and many other ways (it's more of a concept than a special technique)
All topics in bold print can be easily looked up on a search engine.