How to set up concurrency at the workflow level? - concurrency

I have a workflow that is triggered on the following events:
on:
pull_request:
types:
- opened
- synchronize
pull_request_review:
types:
- submitted
If there are two jobs in queue with the synchronized event then the latest job needs to be considered and the other jobs in queue needs to be cancelled.
If there are two jobs waiting in the queue one with a pull request synchronized event and another with a pull request review event, how to set concurrency in such a way that the pull review review event gets triggered after the pull request synchronized event.

Related

When running GitHub actions with a concurrency restriction, can I get workflow runs enqueued rather than cancelled?

The documentation of GitHub actions says:
You can use jobs.<job_id>.concurrency to ensure that only a single job or workflow using the same concurrency group will run at a time.
...
When a concurrent job or workflow is queued, if another job or workflow using the same concurrency group in the repository is in progress, the queued job or workflow will be pending. Any previously pending job or workflow in the concurrency group will be canceled.
It is annoying that previously pending jobs get cancelled. Evidently the orchestration logic can only maintain a tiny "queue" of one (1) pending job.
I would like to be able to have multiple jobs enqueued. I.e., if I trigger 5 jobs in rapid succession, and they all belong to the same concurrency group, then the first one starts to run immediately (when a runner is availble) and the next 4 get enqueued and wait for their turn to run, one at a time.
Is there any way to achieve this? Or will I need to request this as a feature from GitHub?

I want to know when a batch of messages has completed in a AWS SQS Queue

I think this is more of a 'architecture design' question.
I have a lambda producer that will put ~600 messages on a SQS queue (there are multiple producers) as a batch (so not 1 message with a body of ~600 messages). A consumer lambda that will take individual messages and deal with them (at scale). What I want to do is run another lambda when each batch is complete.
Initial ideas was to create a 'unique batch number', a 'total batch number' and a 'batch position number' and add it to the messages attributes for every message. And then in the consumer lambda check the these to decide if the batch is complete.
But does that mean I would need to use a FIFO queue and partition on the batch number and only have one lambda consumer per batch. Or do I run some sort of state management in DynamoDB (is the a pattern out there for this? please guide me on this).
Regards, J
It seems like the goal is to achieve Fork-Join capabilities in a distributed system. One way to handle this in AWS is using Step Functions. Assuming a queue service needs to be used, state of the overall operation will need to be tracked. Some ways to do this are:
Store state of the overall operation in a DB.
Put a 'terminatation' message in the queue after all others and process FIFO.
Create a metadata service which receives 'start' and 'stop' messages for each service and handles them accordingly.
Reference: Fork and Join with Amazon Lambda

Using Amazon SQS with multiple consumers

I have a service-based application that uses Amazon SQS with multiple queues and multiple consumers. I am doing this so that I can implement an event-based architecture and decouple all the services, where the different services react to changes in state of other systems. For example:
Registration Service:
Emits event 'registration-new' when a new user registers.
User Service:
Emits event 'user-updated' when user is updated.
Search Service:
Reads from queue 'registration-new' and indexes user in search.
Reads from queue 'user-updated' and updates user in search.
Metrics Service:
Reads from 'registration-new' queue and sends to Mixpanel.
Reads from queue 'user-updated' and sends to Mixpanel.
I'm having a number of issues:
A message can be received multiple times when doing polling. I can design a lot of the systems to be idempotent, but for some services (such as the metrics service) that would be much more difficult.
A message needs to be manually deleted from the queue in SQS. I have thought of implementing a "message-handling-service" that handles the deletion of messages when all the services have received them (each service would emit a 'message-acknowledged' event after handling a message).
I guess my question is this: what patterns should I use to ensure that I can have multiple consumers for a single queue in SQS, while ensuring that the messages also get delivered and deleted reliably. Thank you for your help.
I think you are doing it wrong.
It looks to me like you are using the same queue to do multiple different things. You are better of using a single queue for a single purpose.
Instead of putting an event into the 'registration-new' queue and then having two different services poll that queue, and BOTH needing to read that message and both doing something different with it (and then needing a 3rd process that is supposed to delete that message after the other 2 have processed it).
One queue should be used for one purpose.
Create a 'index-user-search' queue and a 'send to mixpanels' queue,
so the search service reads from the search queues, indexes the user
and immediately deletes the message.
The mixpanel-service reads from the mix-panels queue, processes the
message and deletes the message.
The registration service, instead of emiting a 'registration-new' to a single queue, now emits it to two queues.
To take it one step better, add SNS into the mix here and have the registration service emit an SNS message to the 'registration-new' topic (not queue), and then subscribe both of the queues I mentioned above, to that topic in a 'fan-out' pattern.
https://aws.amazon.com/blogs/aws/queues-and-notifications-now-best-friends/
Both queues will receive the message, but you only load it into SNS once - if down the road a 3rd unrelated service needs to also process 'registration-new' events, you create another queue and subscribe it to the topic as well - it can run with no dependencies or knowledge of what the other services are doing - that is the goal.
The primary use-case for multiple consumers of a queue is scaling-out.
The mechanism that allows for multiple consumers is the Visibility Timeout, which gives a consumer time to process and delete a message without it being consumed concurrently by another consumer.
To address the "At-Least-Once Delivery" property of Standard Queues,
the consuming service should be idempotent.
If that isn't possible, one possible solution is to use FIFO queues, but this mode has a limited message delivery rate and is not compatible with SNS subscription.
They even have a tutorial on how to create a fanout scenario using the combo SNS+SQS.
https://aws.amazon.com/getting-started/tutorials/send-fanout-event-notifications/
Too bad it does not support FIFO queues so you have to be careful to handle out of order messages.
It would be nice if they had a consistent hashing solution to have multiple competing consumers while respecting the message order.

Using Timers/Signals to allow human intervention in AWS SWF Workflow

Here's the scenario. A user uploads an Excel file and this kicks off a workflow which validates the file, transforms it into a few different files, then performs an update to a database based on the transforms. After the uploads, the results need to be reviewed by team member before the flow can continue.
I'm using Ruby and have discovered that Signals and Timers are the way to achieve this in SWF. However, the Ruby examples are lacking or non-existent and I need a little help understanding how this would work using Ruby.
Ny understanding so far is that a Timer activity is scheduled which basically pauses the flow until either the timer expires (at which point I could cancel the workflow or email the staff and set another timer) or a signal is sent to the workflow to start the next step. The Decider would handle the signal and then kick off the appropriate activity.
Any thoughts or direction to other sources would be much appreciated.
Thanks,
Thomas
It's somewhat difficult to provide an "answer", given you didn't really ask a specific question. I'm in agreement with you that using a Timer and Signals is what you want.
You don't specify how the team gets notified about the review. I'll assume that you notify them by email and direct them to some website where they can review the changes, and then click on a link to either Approve or Don't Approve. Clicking the link to Approve will send a request to a web server that will "signal" SWF that the review has been approved. Clicking the link to Don't Approve will "signal" SWF that the review has not been approved. You mention that you want to renotify the team (or perhaps escalate to the manager) if no one has taken action on the review. Let's say this renotification happens after 48 hours. After the renotication, you grant them another 72 hours before assumming Don't Approve.
Here's how your workflow looks like to me:
User uploads file and kicks off a workflow
Decider Task schedules "TransformActivity"
TransformActivity runs, transforms the data into different files, and completes successfully
Decider Task schedules "UpdateDatabaseActivity"
UpdateDatabaseActivity runs, updates the database, and completes successfully
Decider Task schedules "EmailTeamActivity"
EmailTeamActivity runs, emails the team, and completes successfully
Decider Task schedules a Timer for 48 hours.
If a signal indicating Approve or Don't Approve is received within 48 hours:
Decider Task schedules the "RecordFinalDecisionActivity"
RecordFinalDecisionActivity will run, record the Approve (or Don't Approve) into the database, and complete successfully.
Decider Task will then close the workflow because it's done.
If no signal is received and the timer fires (after 48 hours):
Decider Task schedules the "EmailTeamAndManagerActivity"
EmailTeamAndManagerActivity runs, emails the team and manager, and completes successfully.
Decider Task schedules another timer for 72 hours.
If a signal indicating Approve or Don't Approve is received within the additional 72 hours given:
Repeat the same logic as the section "If a signal indicating Approve or Don't Approve is received within 48 hours".
If no signal is received and the timer fires (after the additional 72 hours):
At this point, the workflow can assume it was a Don't Approve, schedule the "RecordFinalDecisionActivity" and close the workflow once that activity completes.
The reason why you don't want to have a "review" activity is because that task gets scheduled and then some activity worker needs to reply success. How would that work? When someone clicks the Approve or Don't Approve link, the request to the webserver would have to pull down the activity from the task list. However, if the task list has multiple activities, SWF just gives out any one of them. It might not get the right one. Now, you could argue that you could schedule the different reviews across different task lists, but that's just cumbersome and tedious.
Signals are done to indicate an "external" event, which this very much is. The SWF documentation on Signals does a great job on talking about Signals. Here's the SWF documentation on how to use Timers and Signals. As for the particulars on how to use SWF and Ruby, I can't really help you there. I've only used SWF with Java by using the AWS Flow Framework.
user upload excel file, does "StartWorkflowExecution", that queues a decision task
decision worker notice flow is new / "stage one", it schedules "transform file" activity task
activity worker picks up task, and does the "transform file" activity, when done does "RespondActivityTaskCompleted" with a result of "transformations done", that queues a decision task
decision worker picks up decision task, notices the transformations are done and schedule a new activity task
activity worker picks up activity task, notices it's for a team member (according to the instructions given by the decision worker when scheduling the activity task), team member gets notified, somehow perform his action, then somehow notifies the activity worker which will reply "RespondActivityTaskCompleted"
I don't see the need for a Timer or a Signal, it's just plain flow. Those two concepts are useful if you want recurring events, timeouts, and/or interrupting the flow.
Please note that you can differentiate activity workers by using task lists (for example activity workers for automated work vs activity workers for human participants, whatever).

What's common practice for enabling an locking mechanism for multiple SQS consumers in Django so I can be idempotent

SQS expects your application to be idempotent and I've got multiple consumers/producers where (even if SQS had a deliver-once mechanism) I will have race conditions creating duplicates and race conditions consuming because my consumers run via cron jobs.
My current plan is to use the Django 1.4 select_for_update which should block other consumers on the same row, doing something like:
reminders = EmailReminder.objects.select_for_update().filter(id=some_id)
if not reminders[0].finished:
reminder.send()
reminder.update(finished=datetime.now())
# Delete job.
Are there better ways of dealing with this?
Hook up django-celery to SQS and have it designate a periodic job using celerybeat. Then have celeryd worker(s) running on the same queue anywhere you want. Only one will pick up a job at a time and execute it. No need to introduce DB locking on any level.
As long as your worker is guaranteed to finish its current task before celerybeat fires a new one you will never have a need for a lock. Now if you think there is a chance they may overlap you can introduce states for your notifications where:
Any reminder starts in "unsent" state.
Your celerybeat sends a request to process unsent emails to the queue.
Some worker picks it up and grabs all of them.
Immediately the worker transitions all of them to "sending" state.
Proceeds to send them one at a time (or in bulk).
If sending fails for any, revert their state back to unsent.
For all that succeeded transition to sent.
This way if celerybeat fires another job while your original job is not done with the initial batch, you won't have duplicate emails sent. As an added bonus you can scale the solution and distribute the load.