In our project total 10 Glue jobs are running daily. I would like to build a dashboard to show last 7 days jobs status it means either succeeded or failure. Tried to achieve it in CloudWatch with metrics, but not able do it. Please give an idea to build this dashboard.
Probably a little late for the original questioner, but maybe helpful for others.
We had a similar task in our project. We have many jobs and need to monitor success and failure. In our experience, the built-in metrics aren't really reliable, nor do they really answer the question of whether a job was successful or not.
But we found a good way for us by generating custom metrics in a generic way for all jobs. This also works for existing jobs afterwards without having to change the code.
I wrote an article about it: https://medium.com/#ettefette/metrics-for-aws-glue-jobs-as-you-know-them-from-lambda-functions-e5e1873c615c
We have set cloudwatch alerts based on these metrics and we use the metrics in our grafana dashboard to monitor the glue jobs.
Related
I want to know if there is any metric related to QLDB where I could monitor the active ongoing sessions/transactions?
QLDB doesn't have an Active Query List view like some databases, due to its 30 second transaction limits and PartiQL limits. However, you can use CloudWatch, or some similar logging tool to understand Read and Write IOs, processing time, OccConflictExceptions, and SessionRateExceededExceptions to understand how to tune connections.
https://docs.aws.amazon.com/qldb/latest/developerguide/monitoring-cloudwatch.html
AWS documentation is not descriptive enough for figuring out the significance of PendingTasks metrics.
refer : https://docs.aws.amazon.com/amazonswf/latest/developerguide/cw-metrics.html
I wanted to know if these metrics are worth alarming or monitoring ?
When you schedule a SWF workflow, it automatically creates a task list for you. Or you can select an already existing task list to place the worklfow in.
You can see the task lists on your SWF dashboard:
PendingTasks creates a metric for each task list from each workflow domain and displays how many tasks are pending after each minute.
Now, if this metric worth alarming, that can be decided by you depending on your use case. If the number of pending tasks is getting bigger, probably means something got stack or it takes longer than expected. It might worth alarming in that case.
I have some AWS Lambda functions that get run about twenty thousand times a day. So I would like to enable logging/alert to monitor all the errors and exceptions.
The cloudwatch log is giving too much noise, and difficult to see the error.
Now I'm planning to write the log to AWS S3 Bucket, this will have an impact on the performance.
What's the best way you suggest to log and alert the errors?
An alternative would be to leave everything as it is (from application perspective) and check AmazonCloudWatch Logs Filter.
You use metric filters to search for and match terms, phrases, or
values in your log events. When a metric filter finds one of the
terms, phrases, or values in your log events, you can increment the
value of a CloudWatch metric.
If you defined your filter you can create a CloudWatch Alarm on the metric and get notified as soon as your defined threshold is reached :-)
Edit
I didnt check the link from #Renato Gama. Sorry. Just follow the instructions behind the link and your problem should be solved easily...
If you did not try this already I suggest that you try creating CloudWatch alerts based on custom metric filters; Have a look here; https://www.opsgenie.com/blog/2014/08/how-to-use-cloudwatch-to-generate-alerts-from-logs
(Of course you don't have to use OpsGenie service as suggested on the post I linked, you can implement anything that will help you debug the problems)
I have a task that gathers some information from several web-sites and saves it to disk. I want this task to run on daily basis and automatically.
I took a little tour into google cloud platform, but couldn't understand how to fit this service to my needs.
I would really like it if someone could suggest some key-points/main guidelines on how it should be done.
Thanks!
The easiest way to run any jobs which are time or scheduled based are done via CronJob of Linux. (https://help.ubuntu.com/community/CronHowto)
You can set up your scripts to be run at a specific time or interval and it should work. A checklist for you:
Bash scripts of tasks you want to perform
CronJobs that are schedules to run these scripts at specified time intervals
That should do it.
We're trying to move to AWS and to use DynamoDB. It'd be nice to keep everything under DynamoDB so there aren't extraneous types of databases, but aside from half complete research projects I'm not really finding anything to use for a scheduler. There's going to be dynamically set schedules in the range of thousands+, possibly with many running at the same time. For languages, Java or at least JVM would be awesome.
Does anyone know a good Scheduler for DynamoDB or other AWS technology?
---Addendum
When I say scheduler I'm thinking of something all purpose like quartz. I want to set a cron and it runs at that time with the code I give it. This isn't doing some AWS task, this is a task internal to our product. SWF's cron runs inside the VM, so I'm worried what happens when the VM is down. Data Pipeline seems a bit too much. I've been looking into making a dynamodb job store for quartz, consistent read might get around the transaction and consistency issues, but I'm hesitant, might be biting off a lot with a lot of hard to notice problems.
Have you looked at AWS Simple Workflow? You would use the AWS Flow Framework to program against the service, and they have a well documented Java API with lots of samples. They support continuous workflows with timers which you can use to run periodic code (see code example here). I'm using SWF and the Flow Framework for Ruby to run async code that gets kicked off from my main app, and it's been working great.
Another new option for you is to look at AWS Lambda. You can attach your Lambda function code directly to a DynamoDB table update event, and Lambda will spin up and shut down the compute resources for you, without you having to manage a server to run your code. Also, recently, AWS launched the ability to call the Lambda function directly -- e.g. you could have an external timer or other code that triggers the function on a specific schedule.
Lastly, this SO thread may have other options for you to consider.
Another option is to use AWS Lambda Scheduled Functions (newly announced on October 8th 2015 at AWS re:Invent).
Here is a relevant snippet from the blog (source):
Scheduled Functions (Cron)
You can now invoke a Lambda function on a regular, scheduled basis. You can specify a fixed rate (number of minutes, hours, or days between invocations) or you can specify a Cron-like expression: