Is there a AWS CloudWatch Log Export Task finish event? - amazon-web-services

I have a flow that once a day:
(cron at hour H): Creates a AWS CloudWatch Export Tag
(cron at hour H+2): Consumes the logs exported in step 1
Things were kept simple by design:
The two steps are separate scripts that don't relate to each other.
Step 2 doesn't have the task ID created in step 1.
Step 2 is not aware of step 1 and doesn't know if the logs export task is finished.
I could add a mechanism by which the first script publishes the task ID somewhere and the second script consumes that task ID and queries CloudWatch to check if the task is finished and only proceeds when it is.
However, I'd prefer to keep it where there's no such handoff from step 1 to step 2.
What I'd like to do is when the log export is done, step 2 automatically starts.
👉 Is there an event "CloudWatch Export Task finished" that can be used to trigger the start of step 2?

Related

Create dependencies between different AWS Glue workflows

I have two AWS Glue workflows, WorkflowA and WorkflowB, in which I want to run WorkflowA based on a schedule, but WorkflowB should run only after successful completion of WorkflowA.
I attempted to create a trigger called startWorkflowB, that is Event-based on the successful completion of WorkflowA's last task.
Now, when I try to use startWorkflowB trigger in WorkflowB as the first task, I get the following error -
My end result should be that WorkflowB should run only after successful completion of WorkflowA.
What am I doing wrong here ? Is it feasible to have linear dependency between two workflows within multiple Gluw workflows ?

Is there a way to check if AWS lambda is running from java code?

There is a Lambda say my Lambda Name is - XYZ which has a s3 file upload trigger. Now lets say if I upload multiple files say around 100 and the lambda execution starts.
Is there any way I can track if the lambda is running or has processed all the files?
The reason for this is, once the lambda has completed processing all the file, I would want to trigger a step function, so for me to trigger my Step Function I would want to do that only once all the files are processed by my lambda (XYZ).
FYI: There is no current way to track how many files have been uploaded
I think it's not a good design to run a step function state machine after the lambda completes the job without having a perfect logical event.
Because you can for example receive a bunch of files (say 100) completed by the lambda, then using alarms with Cloudwatch once we complete the 100 values of a custom metric, or check value in DynamoDb or number of object in a custom folder, fires a job step function, at the same time you receive another file with +5 sec delay to complete the 101, in this case maybe you miss this file.
If you don't have a special event or a condition to have the completion of the files, then you can work with time scheduling and trigger your Step function with Cloudwatch event ( scheduled event) like every 15 min, check if there is work if not, exit the job.
Otherwise, either include the lambda (file process) in your step function as a step or change your design.

Trigger event when specific 3 AWS jobs are completed

I have submitted 3 jobs in parallel in AWS Batch and I wanted to create a trigger when all these 3 jobs are completed.
Something like I should be able to specify 3 job ids and can update DB once all 3 jobs are done.
I can do this task easily by having long pooling but wanted to do something based on events.
I need your help with this.
The easiest option would be to create a fourth Batch job that specifies the other three jobs as dependencies. This job will sit in the PENDING state until the other three jobs have succeeded, and then it will run. Inside that job, you could update the DB or do whatever other actions you wanted.
One downside to this approach is that if one of the jobs fails, the pending job will automatically go into a FAILED state without running.

Scheduling one time tasks with AWS

I have to implement functionality that requires delayed sending of a message to a user once on a specific date, which can be anytime - from tomorrow till in a few months from now.
All our code is so far implemented as lambda functions.
I'm considering three options on how to implement this:
Create an entry in DynamoDB with hash key being date and range key being unique ID. Schedule lambda to run once a day and pick up all entries/tasks scheduled for this day, send a message for each of them.
Using SDK Create cloudwatch event rule with cron expression indicating single execution and make it invoke lambda function (target) with ID of user/message. The lambda would be invoked on a specific schedule with a specific user/message to be delivered.
Create a step function instance and configure it to sleep & invoke step with logic to send a message when the right moment comes.
Do you have perhaps any recommendation on what would be best practice to implement this kind of business requirement? Perhaps an entirely different approach?
It largely depends on scale. If you'll only have a few scheduled at any point in time then I'd use the CloudWatch events approach. It's very low overhead and doesn't involve running code and doing nothing.
If you expect a LOT of schedules then the DynamoDB approach is very possibly the best approach. Run the lambda on a fixed schedule, see what records have not yet been run, and are past/equal to current time. In this model you'll want to delete the records that you've already processed (or mark them in some way) so that you don't process them again. Don't rely on the schedule running at certain intervals and checking for records between the last time and the current time unless you are recording when the last time was (i.e. don't assume you ran a minute ago because you scheduled it to run every minute).
Step functions could work if the time isn't too far out. You can include a delay in the step that causes it to just sit and wait. The delays in step functions are just that, delays, not scheduled times, so you'd have to figure out that delay yourself, and hope it fires close enough to the time you expect it. This one isn't a bad option for mid to low volume.
Edit:
Step functions include a wait_until option on wait states now. This is a really good option for what you are describing.
As of November 2022, the cleanest approach would be to use EventBridge Scheduler's one-time schedule.
A one-time schedule will invoke a target only once at the date and time that you specify using a valid date, and a timestamp. EventBridge Scheduler supports scheduling in Universal Coordinated Time (UTC), or in the time zone that you specify when you create your schedule. You configure a one-time schedule using an at expression.
Here is an example using the AWS CLI:
aws scheduler create-schedule --schedule-expression "at(2022-11-30T13:00:00)" --name schedule-name \
--target '{"RoleArn": "role-arn", "Arn": "QUEUE_ARN", "Input": "TEST_PAYLOAD" }' \
--schedule-expression-timezone "America/Los_Angeles"
--flexible-time-window '{ "Mode": "OFF"}'
Reference: Schedule types on EventBridge Scheduler - EventBridge Scheduler
User Guide
Instead of using DynamoDB I would suggest to use s3. Store the message and time to trigger as key value pairs.
S3 to store the date and time as key value store.
Use s3 lambda trigger to create the cloudwatch rules that would target specific lambda's etc
You can even schedule a cron to a lambda that will read the files from s3 and update the required cron for the message to be sent.
Hope so this is in line with your requirements

Dataprep: No scheduled destinations set. Create an output to set a destination

What does this error mean?
No scheduled destinations set. Create an output to set a destination.
I am getting this error on dataprep when I attempt to create a run schedule for my jobs. They work perfectly when i simply hit run. But this error appear when I want to have them scheduled
As per the Dataprep documentation (emphasis mine):
To add a scheduled execution of the recipes in your flow:
Define the scheduled time and interval of execution at the flow level.
See Add Schedule Dialog. After the schedule has been created, you can
review, edit, or delete the schedule through the Clock icon.
Define the scheduled destinations for each recipe through its output
object. These destinations are targets for the scheduled job. See View for
Outputs below.
You'll find detailed instructions on how to set these up here.