What is a distributed messaging system? Specifically what is 'distributed' in it? - django

It is ubiquitously mentioned that - 'Celery is an asynchronous task queue/job queue based on distributed message passing'.Though I know how to use Celery workers and all.But deep down I dont understand the real importance and meaning of distributed messaging passing and role of task queue in it.
I have surfed much of the internet but nowhere is the clear explanation of the definition word by word.It is just always mentioned as a fact.
Can someone please here spare his/her time to explain all the importance of the terms with some relevant examples?
Sorry if this question looks trivial to most of the people but for me solution to this will go long way in understanding how things work.
Thanks.

To put it very simply "distributed" means that the work is distributed among many workers.
distribute
dɪˈstrɪbjuːt,ˈdɪstrɪbjuːt/Submit
verb
past tense: distributed; past participle: distributed
1.
give a share or a unit of (something) to each of a number of recipients.
when you run a task celery puts it on a queue, messages are passed to the workers, one of them runs the task.

Related

How to handle out of order microservice messages?

We have adopted an AWS powered microservice architecture where different sorts of payloads enter the system with a UUID and type via mysql.lambda_async from our database.
The problem is, that we've noticed that messages can come out of order. Imagine the scenario with the following type of message:
DEASSIGN_ROLE
ASSIGN_ROLE
When the actual intention was a quick toggle:
ASSIGN_ROLE
DEASSIGN_ROLE
Now we have a user with the wrong (elevated) permissions.
I've done some cursory research and for example answers like Handling out of order events in CQRS read side suggest using sequence numbers.
Introducing a sequence number would be quite hard as we have many different types of messages. A sequence number would require a syncronous counter, where we have gone great pains to be simply asynchronous. Bear in mind, our system that generates the message is an SQL trigger ultimately.
Are there simpler solutions I am missing?
I would say there is an unsolvable problem :
you want to be full asynchronous
you need sequentiality in your results
We had the same problem as yours, and we ended by setting sequences by type of messages.
Trying to be asynchronous and parallel when possible (according to message types/topics)

Background jobs occur very frequently and eat memory

I'd like to optimize my notification system, so here is how it works now:
Every time some change occurred on application, we're calling background job (Sidekiq) in order to compute some values and then to notify users via email.
This approach worked very well for a while, but suddenly we got memory leak as there were a lot of actions very frequently and we had about 30-50 workers per second so I need to refactor this.
What I would like to do is, instead of running worker immediately, to store it in array and perform bit later.
But I'm afraid that also will cause a problem, but just "delayed" problem.
I'm looking forward to hear more approaches and solutions as well.
Thanks in advance
So I found one very interesting solution:
I'm storing values to Redis directly as key - value, where the value is dataset with data I'd need later for computation. Then I'm using simple cron job, which occurs service which is responsible for reading data from Redis and computing them. I optimized Sidekiq workers to work only when cron is executed, everything works perfectly fine and even much faster then before.
I'm still eager to hear if there is any other approach/solution.
Thanks

High level PHP library for Amazon SWF deciders to check state of activity tasks

I'm writing PHP for fairly simple workflow for Amazon SWF. I've found myself starting to write a library to check if certain actions have been started or completed. Essentially looping over the event list to check how things have progressed, and then starting an appropriate activity if its needed. This can be a bit faffy at times as the activity type and input information isn't in every event, it seems to be in the ActivityTaskScheduled event. This sort of thing I've discovered along the way, and I'm concerned that I could be missing subtle things about event lists.
It makes me suspect that someone must have already written some sort of generic library for finding the current state of various activities. Maybe even some sort of more declarative way of coding up the flowcharts that are associated with SWF. Does anything like this exist for PHP?
(Googling hasn't come up with anything)
I'm not aware of anything out there that does what you want, but you are doing it right. What you're talking about is coding up the decider, which necessarily has to look at the entire execution state (basically loop through the event list) and decide what to do next.
Here's an example written in python
( Using Amazon SWF To communicate between servers )
that looks for events of type 'ActivityTaskCompleted' to then decide what to do next, and then, yes, looks at the previous 'ActivityTaskScheduled' entry to figure out what the attributes for the previous task were.
If you write a php framework that specifies the workflow in a declarative way then a generic decider that implements it, please consider sharing it :)
I've since found https://github.com/cbalan/aws-swf-fluent-php which looks promising, but not really used it, so can't speak to the whether it works or not.
I've forked it and started a bit of very light refactoring to allow some testing, available at https://github.com/michalc/aws-swf-fluent-php

Camel + ActiveMQ: Handling Two Distinct Concurrency Constraints With Competing Consumers

Problem:
Process a backlog of messages where each message has three headers "service", "client", and "stream". I want to process the backlog of messages with maximum concurrency, but I have some requirements:
Only 10 messages with the same service can be processing at once.
Only 4 messages with the same service AND client can be processing at
once.
All messages with the same service AND client AND stream must
be kept in order.
Additional Information:
I've been playing around with "maxConcurrentConsumers" along with the "JMSXGroupID" in a ServiceMix (Camel + ActiveMQ) context, and I seem to be able to get 2 out of 3 of my requirements satisfied.
For example, if I do some content-based routing to split the backlog up into separate "service" queues (one queue for each service), then I can set the JMSXGroupID to (service + client + stream), and set maxConcurrentConsumers=10 on routes consuming from each queue. This solves the first and last requirements, but I may have too many messages for the same client processing at the same time.
Please note that if a solution requires a separate queue and route for every single combination of service+client, that would become unmanageable because there could be 10s of thousands of combinations.
Any feedback is greatly appreciated! If my question is unclear, please feel free to suggest how I can improve it.
To my knowledge, this would be very hard to achieve if you have 10k+ combos.
You can get around one queue per service/client combo by using consumers and selectors. That would, however, be almost equally hard to deal with (you simply don't create 10k+ selector consumers unharmed and without significant performance considerations), if you cannot predict in some way a limited set of service/client active at once.
Can you elaborate on the second requirement? Do you need it to make sure there are some sense of fairness among your clients? Please elaborate and I'll update if I can think of anything else.
Update:
Instead of consuming by just listening to messages, you could possibly do a browse on the queue, looping through the messages and pick one that "has free slots". You can probably figure out if the limit has been reached by some shared variable that keeps track given you run in a single instance.

How do you model a business workflow in ColdFusion?

Since there's no complete BPM framework/solution in ColdFusion as of yet, how would you model a workflow into a ColdFusion app that can be easily extensible and maintainable?
A business workflow is more then a flowchart that maps nicely into a programming language. For example:
How do you model a task X that follows by multiple tasks Y0,Y1,Y2 that happen in parallel, where Y0 is a human process (need to wait for inputs) and Y1 is a web service that might go wrong and might need auto retry, and Y2 is an automated process; follows by a task Z that only should be carried out when all Y's are completed?
My thoughts...
Seems like I need to do a whole lot of storing / managing / keeping
track of states, and frequent checking with cfscheuler.
cfthread ain't going to help much since some tasks can take days
(e.g. wait for user's confirmation).
I can already image the flow is going to be spread around in multiple UDFs,
DB, and CFCs
any opensource workflow engine in other language that maybe we can port over to CF?
Thank you for your brain power. :)
Study the Java Process Definition Language specification where JBoss has an execution engine for it. Using this Java based engine may be your easiest solution, and it solves many of the problems you've outlined.
If you intend to write your own, you will probably end up modelling states and transitions, vertices and edges in a directed graph. And this as Ciaran Archer wrote are the components of a State Machine. The best persistence approach IMO is capturing versions of whatever data is being sent through workflow via serialization, capturing the current state, and a history of transitions between states and changes to that data. The mechanism probably needs a way to keep track of who or what has responsibility for taking the next action against that workflow.
Based on your question, one thing to consider is whether or not you really need to represent parallel tasks in your solution. Where instead it might be possible to en-queue a set of messages and then specify a wait state for all of those to complete. Representing actual parallelism implies you are moving data simultaneously through several different processes. In which case when they join again you need an algorithm to resolve deltas, which is very much a non trivial task.
In the context of ColdFusion and what you're trying to accomplish, a scheduled task may be necessary if the system you're writing needs to poll other systems. Consider WDDX as a serialization format. JSON, while seductively simple, I recall has some edge cases around numbers and dates that can cause you grief.
Finally see my answer to this question for some additional thoughts.
Off the top of my head I'm thinking about the State design pattern with state persisted to a database. Check out the Head First Design Patterns's Gumball Machine example.
Generally this will work if you have something (like a client / order / etc.) going through a number of changes of state.
Different things will happen to your object depending on what state you are in, and that might mean sitting in a database table waiting for a flag to be updated by a user manually.
In terms of other languages I know Grails has a workflow module available. I don't know if you would be better off porting to CF or jumping ship to Grails (right tool for the job and all that).
It's just a thought, hope it helps.