I have an application that periodically calls some service for data (using torquebox schedulers) and when set of data is available it should process each "data record" separately.
I'd like to process those records concurrently for better performance and my first thought was to set-up jms queue (available in torquebox ouf of the box) so that scheduled job would put all the received data in queue and each record would be picked up (for one of multiple receivers connected) for processing.
But isn't it overengineering things to puth JMS queue between elements of the same application? Any other approaches you could suggest here?
A JMS queue may not be a bad solution at all, try it out and see how it works for you. When they are as easy to use as in Torquebox it doesn't need to be overengineering.
If you want something less involved I recommend using Java's own BlockingQueues, either LinkedBlockingQueue or ArrayBlockingQueue depending on the exact use case.
These are just regular collections like arrays or hashes, so you'll need to create them somewhere and pass them into the components that you want to be able to publish and consume from them. They also do not have any concept of acknowledgement like JMS queues have.
How about using Java messaging queue (as you mentioned) since HornetQ is part of the JBoss/Torquebox and then using a message processor for the handling of the messages. You can also specify the level of concurrency at the torquebox.rb (or .yml).
Your_Scheduled_Job -> /queues/my_queue -> TorqueBox::Messaging::MessageProcessor
In your config/torquebox.rb file you can specify the concurrency and name messaging processor:
queue '/queues/my_queue' do
processor MyMessageProcessor do
concurrency 5
end
end
The messaging processor will process the messages on the queue concurrently without needing any other steps.
I'm also still experimenting with Torquebox and Ruby concurrency and this something I'm trying to implement these days...
Related
I am looking to use ZeroMQ to facilitate IPC in my embedded systems application, however, I'm not able to find many examples on using multiple 0MQ socket types in the same process.
For example, say I have a process called "antenna_mon" that monitors an antenna. I want to be able to send messages to this process and get responses back - a classic REQ-REP pattern. However, I also have a "cm" process, that publishes configuration changes to subscribers. I want antenna_mon to also subscribe to antenna configuration changes - PUB-SUB.
I found this example of reading from multiple sockets in the same process, but it seems sub optimal, because now you no longer block waiting for messages, you inefficiently check for messages constantly and go back to sleep.
Has anyone encountered this problem before? Am I just thinking about it wrong? Maybe I should have two threads - one for CM changes, one for REQ-REP servicing?
I would love any insights or examples of solving this type of problem.
Welcome to the very nature of distributed computing!
Yes, there are new perspectives one has to solve, once assembling a Project for a multi-agent domain, where more than one process works and communicates with it's respective peers ad-hoc.
A knowledge base, acquired from a soft Real-Time System or embedded systems design experience will help a lot here. If none such available, some similarities might be also chosen from GUI design, where a centerpiece is something like a lightweight .mainloop() scheduler, and most of the hard-work is embedded into round-robin polled GUI-devices and internal-state changes or external MMI-events are marshalled into event-triggered handlers.
ZeroMQ infrastructure gives one all the tools needed for such non-blocking, controllably poll-able ( scaleable, variable or adaptively ad-hoc adjustable poll-timeouts, not to overcome the given, design defined, round-trip duration of the controller .mainloop() ) and transport-agnostic, asynchronously operated, message dispatcher ( with thread-mapped performance scaling & priority tuning ).
What else one may need?
Well, just imagination and a lot of self-discipline to adhere the Zero-Copy, Zero-Sharing and Zero-Blocking design maxims.
The rest is in your hands.
Many "academic" examples may seem trivial and simplified, so as to illustrate just the currently discussed, or a feature demonstrated in some narrow perspective.
Not so in the real-life situations.
As an example, my distributed ML-engine uses a tandem of several PUSH/PULL pipelines for moving state data updates transfers and prediction forcasts + another PUSH/PULL for remote keyboard + a reversed .bind()/.connect() on PUB/SUB for easy broadcasting of distributed agents' telemetry to a remote centrally operated syslog and some additional PAIR/PAIR pipes, as processing requires.
( nota bene: one shall always bear in mind, that robust and error-resilient systems ought avoid to use a default REQ/REP Scaleable Formal Communication Pattern, as there is non-zero probability of falling the pairwise-stepped REQ/REP dual-FSA into an unsalvageable deadlock. Do not hesitate to read more about this smart tool. )
I am creating a chatbot and need a solution to send messages to the user in the future after a specific delay. I have my system set up with Nginx, Gunicorn and Django. The idea is that if the bot needs to send the user several messages, it can delay each subsequent message by a certain amount of time before it sends it to seem more 'human'.
However, a simple threading.Timer approach won't work because the user might interrupt this process at any moment prompting future messages to be changed, but the timer threads might not be available to be stopped as they are on a different worker. So far I have come across two solutions:
Use threading.Timer blindly to check a to-send list in the database, can create problems with lots of unneeded threads. Also makes the database less clean/organized.
Use celery or some other system to execute these future tasks. Seems like overkill and over-engineering a simple problem. Tasks will always just be delayed function calls. Also a hassle dealing with which messages belong to which conversation.
What would be the best solution for this problem?
Also, a more generic question:
Ideally the best solution would be a framework where I can 'simulate' a new bot for each conversation so it acts as its own entity and holds all the state/message queue information in memory for itself. It would be necessary for this framework to only allocate resources to a bot when it needs to do something based on a preset delay or incoming message. Is there anything that exists like this?
Personally I would use Celery for this; executing delayed function calls is its job. And I don't know why knowing what messages belong where would be more of a problem there than doing it in a thread.
But you might also want to investigate the new Django-Channels work that Andrew Godwin is doing, since that is intended to support async background tasks.
Looking for the best approach to sending the same message to multiple destinations using TCP/IP sockets. I'm working with an existing VS 2010 C++ application on Windows. Hoping to use a standard library/design pattern approach that has many of the complexities already worked out if possible.
Here's one approach I'm thinking about.. One main thread retrieves messages from a database and adds them to some sort of thread safe queue. The application also has one thread for each client socket connection to some destination server. Each one of these threads would read from the thread safe queue, and send the message over a tcp/ip socket.
There may be better/simpler/more robust approaches than this one though..
The issues I have to be concerned about mostly are latency. The destinations could be anywhere, and there may be significant latency between one socket connection and another.
The messages must go in an exact FIFO order to all the destinations.
Also one destination will be considered the primary destination.. all messages must get to this destination, no exceptions. For the other destinations, i.e. non-primary, the messages are just copies and it's not absolutely critical if the non-primary destinations do not receive a few messages. At any point, one of the non-primary destinations could become the primary destination. If one of the destinations falls too far behind, then that thread would need to catch up to the primary destination, but skipping some messages.
Looking for any suggestions. Preliminary research so far, my situation appears to be something akin to a single producer and multiple consumers pattern, or possibly master-worker pattern in Java.
I need to implement this in C++ on Windows, and the application must use tcp/ip sockets using an existing defined protocol.
Any help at all would be greatly appreciated.
You need exactly two threads, one that saturates the IO channel to the database and another that saturates the IO channel to the network leading to the 12 servers. Unless you have multiple network interfaces (which you should think about!) you don't send things faster by using multiple threads. Also, since you don't have multiple threads taking care of the network, you don't have to sync them.
What you definitely need to know about is select(). In the case of WinSock, also take a look at WSAEventSelect/WaitForMultipleObjects. Basically, you take a message from the queue and then send it to all clients when they're ready. select() tells you when one of a set of sockets is ready to accept data, so you don't waste time waiting or block trying to send data. What you need to come up with is a schema to reconnect after broken connections, when to drop messages to lagging clients etc. Also, in case the throughput to the different targets varies a lot, you need to think about handling multiple messages in parallel. If they are small (less than a network packet's payload) it makes sense combining them anyway to avoid overhead.
I hope this short overview helps getting you started, otherwise I can elaborate on the details.
Let's say you have an entity, say, "Person" in your system and you want to process events that modify various Person entities. It is important that:
Events for the same Person are processed in FIFO order
Multiple Person event streams be processed in parallel by different threads/processes
We have an implementation that solves this using a shared database and locks. Threads compete to acquire the lock for a Person and then process events in order after acquiring the lock. We'd like to move to a message queue to avoid polling and locking, which we feel would reduce load on the DB and simplify the implementation of the consumer code.
I've done some research into ActiveMQ, RabbitMQ, and HornetQ but I don't see an obvious way to implement this.
ActiveMQ supports consumer subscription wildcards, but I don't see a way to limit the concurrency on each queue to 1. If I could do that, then the solution would be straightforward:
Somehow tell broker to allow a concurrency of 1 for all queues starting with: /queue/person.
Publisher writes event to queue using Person ID in the queue name. e.g.: /queue/person.20
Consumers subscribe to the queue using wildcards: /queue/person.>
Each consumer would receive messages for different person queues. If all person queues were in use, some consumers may sit idle, which is ok
After processing a message, the consumer sends an ACK, which tells the broker it's done with the message, and allows another message for that Person queue to be sent to another consumer (possibly the same one)
ActiveMQ came close: You can do wildcard subscriptions and enable "exclusive consumer", but that combination results in a single consumer receiving all messages sent to all matching queues, reducing your concurrency to 1 across all Persons. I feel like I'm missing something obvious.
Questions:
Is there way to implement the above approach with any major message queue implementation? We are fairly open to options. The only requirement is that it run on Linux.
Is there a different way to solve the general problem that I'm not considering?
Thanks!
It looks like JMSXGroupID is what I'm looking for. From the ActiveMQ docs:
http://activemq.apache.org/message-groups.html
Their example use case with stock prices is exactly what I'm after. My only concern is what happens if the single consumer dies. Hopefully the broker will detect that and pick another consumer to associate with that group id.
One general way to solve this problem (if I got your problem right) is to introduce some unique property for Person (say, database-level id of Person) and use hash of that property as index of FIFO queue to put that Person in.
Since hash of that property can be unwieldy big (you can't afford 2^32 queues/threads), use only N the least significant bits of that hash.
Each FIFO queue should have dedicated worker that will work upon it -- voila, your requirements are satisfied!
This approach have one drawback -- your Persons must have well-distributed ids to make all queues work with more-or-less equal load. If you can't guarantee that, consider using round-robin set of queues and track which Persons are being processed now to ensure sequential processing for same person.
If you already have a system that allows shared locks, why not have a lock for every queue, which consumers must acquire before they read from the queue?
I have a remote server which handles various different commands, one of which is an event fetching method.
The event fetch returns right away if there is 1 or more events listed in the queue ready for processing. If the event queue is empty, this method does not return until a timeout of a few seconds. This way I don't run into any HTTP/socket timeouts. The moment an event becomes available, the method returns right away. This way the client only ever makes connections to the server, and the server does not have to make any connections to the client.
This event mechanism works nicely. I'm using the boost library to handle queues, event notifications, etc.
Here's the problem. While the server is holding back on returning from the event fetch method, during that time, I can't issue any other commands.
In the source code, XmlRpcDispatch.cpp, I'm seeing in the "work" method, a simple loop that uses a blocking call to "select".
Seems like while the handling of a method is busy, no other requests are processed.
Question: am I not seeing something and can XmlRpcpp (xmlrpc++) handle multiple requests asynchronously? Does anyone know of a better xmlrpc library for C++? I don't suppose the Boost library has a component that lets me issue remote commands?
I actually don't care about the XML or over-HTTP feature. I simply need to issue (asynchronous) commands over TCP in any shape or form?
I look forward to any input anyone might offer.
I had some problems with XMLRPC also, and investigated many solutions like GSoap and XMLRPC++, but in the end I gave up and wrote the whole HTTP+XMLRPC from scratch using Boost.ASIO and TinyXML++ (later I swaped TinyXML to expat). It wasn't really that much work; I did it myself in about a week, starting from scratch and ending up with many RPC calls fully implemented.
Boost.ASIO gave great results. It is, as its name says, totally async, and with excellent performance with little overhead, which to me was very important because it was running in an embedded environment (MIPS).
Later, and this might be your case, I changed XML to Google's Protocol-buffers, and was even happier. Its API, as well as its message containers, are all type safe (i.e. you send an int and a float, and it never gets converted to string and back, as is the case with XML), and once you get the hang of it, which doesn't take very long, its very productive solution.
My recomendation: if you can ditch XML, go with Boost.ASIO + ProtobufIf you need XML: Boost.ASIO + Expat
Doing this stuff from scratch is really worth it.