what does PendingTasks metrics in SWF signify? - amazon-web-services

AWS documentation is not descriptive enough for figuring out the significance of PendingTasks metrics.
refer : https://docs.aws.amazon.com/amazonswf/latest/developerguide/cw-metrics.html
I wanted to know if these metrics are worth alarming or monitoring ?

When you schedule a SWF workflow, it automatically creates a task list for you. Or you can select an already existing task list to place the worklfow in.
You can see the task lists on your SWF dashboard:
PendingTasks creates a metric for each task list from each workflow domain and displays how many tasks are pending after each minute.
Now, if this metric worth alarming, that can be decided by you depending on your use case. If the number of pending tasks is getting bigger, probably means something got stack or it takes longer than expected. It might worth alarming in that case.

Related

AWS Glue jobs status Dashboard

In our project total 10 Glue jobs are running daily. I would like to build a dashboard to show last 7 days jobs status it means either succeeded or failure. Tried to achieve it in CloudWatch with metrics, but not able do it. Please give an idea to build this dashboard.
Probably a little late for the original questioner, but maybe helpful for others.
We had a similar task in our project. We have many jobs and need to monitor success and failure. In our experience, the built-in metrics aren't really reliable, nor do they really answer the question of whether a job was successful or not.
But we found a good way for us by generating custom metrics in a generic way for all jobs. This also works for existing jobs afterwards without having to change the code.
I wrote an article about it: https://medium.com/#ettefette/metrics-for-aws-glue-jobs-as-you-know-them-from-lambda-functions-e5e1873c615c
We have set cloudwatch alerts based on these metrics and we use the metrics in our grafana dashboard to monitor the glue jobs.

Cron Jobs vs Task Scheduler table for scheduled emails

Preamble: I have a web app, the backend is based on the serverless architecture. It's basically an amplify app hosted on AWS with a dynamoDB database. I've learnt is possible to create a task scheduling system of sorts more here. A quick summary of the article is "Its possible to create a task scheduling table taking advantage of TTL and dynamoDB streams to execute lambda function at specific times. The TTL specifies a set time for an record to be deleted, we can capture this delete event in a dynamoDB stream and run some tasks based on information from the stream"
Problem:
The goal is to send a series of emails to users who sign up for our service. Each user that signs up gets a series of "Getting Started" emails. The first of the emails is sent 24 hours after a user signs up, the second 3 days later and the third exactly 7 days after sign up.
I see how a cron job would be suitable here, but it just seems a bit inefficient to me. I would basically have to search the users table for users whose sign up time falls between a specific 24 hour period and send the email to the users whereas with a Task scheduler table I could add a task to the table ( something like send first email to user300 with a TTL of when I want it to be sent ) and listen for delete events to run the task. No need to run a cron job daily, just a function that handles each task as it comes.
I think this is more like a performance vs storage problem. Having a task scheduler table would take up space, if we add all the emails to be sent to a user as tasks on the table (each email to be sent to a specific user is it's own task) each time a user signs up then I see the task scheduler table growing 3n records for every n user signed up. But this may not really be a problem as tasks are deleted after they are run. I do not know the performance cost of using a cron job for this particular task hence I'm here. I also may be wrong and the cost of running and updating this task scheduler table may be more than that of the cron job.
I initially thought of setting up a dummy user table and running both the cron and the task scheduler and documenting cost of running both, but you can imagine how much time and effort that would take.
So I guess my question is which is a more efficient solution in terms of performance and cost?
There is no perfect solution here. Keep in mind that Dynamodb TTL takes up to 48h to invoke, so it's probably unacceptable. CRON Jobs with Lambda are cheap, and it's easy to set. You coul also use SQS and populate it with daily CRON. Yan Cui wrote great article about this problem https://theburningmonk.com/2019/03/dynamodb-ttl-as-an-ad-hoc-scheduling-mechanism/
This may not exactly be an answer. Based on the medium article you linked the guy had a plausible reason why the TTL and dynamoDB streams would be better than a cron job which you reiterated. Setting up a cron job is easier and cheaper (free) and I doubt the performance will be that much worse unless the database is huge. I don't have any experience doing something like this so I wouldn't know how large the database would have to be for it to make sense to switch over. Alternatively, you can have as many cron jobs as you want so I don't see how you couldn't just set up a user specific cron job whenever someone signs up.
You can setup a CloudWatch Event to fire a Lambda function on a regular schedule. The Lambda function can search a database for an applicable result set and perform other actions - send an email, a text message, etc.
Here is an AWS tutorial that covers a very similar use case with step by step instructions. This tutorial is implemented by using the AWS Java API (but you can implement it using other supported programming languages).
https://github.com/awsdocs/aws-doc-sdk-examples/tree/master/javav2/usecases/creating_scheduled_events
From a Cost perspective - Lambda allows 1M free requests per month. Details are here - https://aws.amazon.com/lambda/pricing/

AWS CloudWatchLog limit

I am trying to find centralized solution to move my application logging from database (RDS).
I was thinking to use CloudWatchLog but noticed that there is a limit for PutLogEvents requests:
The maximum rate of a PutLogEvents request is 5 requests per second
per log stream.
Even if I will break my logs into many streams (based on EC2, log type - error,info,warning,debug) the limit of 5 req. per second is still very restrictive for an active application.
The other solution is to somehow accumulate logs and send PutLogEvents with log records batch, but it means then I am forced to use database to accumulate that records.
So the questions is:
May be I'm wrong and limit of 5 req. per second is not so restrictive?
Is there any other solution that I should consider, for example DynamoDB?
PutLogEvents is designed to put several events by definition (as per it name: PutLogEvent"S") :) Cloudwatch logs agent is doing this on its own and you don't have to worry about this.
However please note: I don't recommend you to generate to much logs (e.g don't run debug mode in prodution), as cloudwatch logs can become pretty expensive as your volume of log is growing.
My advice would be to use a Logstash solution on an AWS instance.
In alternative, you can run logstash on another existing instance or container.
https://www.elastic.co/products/logstash
It is designed for this scope and it does it wonderfully.
Cloudwatch, is not designed mainly for your needs.
I hope this helps somehow.
If you are calling this API directly from your application: the short answer is that you need to batch you log events (it's 5 for PutLogEvents).
If you are writing the logs to disk and after that you are pushing them there is already an agent that knows how to push the logs (http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/QuickStartEC2Instance.html)
Meta: I would suggest that you prototype this and ensure that it works for the log volume that you have. Also, keep in mind that, because of how the cloudwatch api works, only one application/user can push to a log stream at a time (see the token you have to pass in) - so that you probably need to use multiple stream, one per user / maybe per log type to ensure that your applicaitions are not competing for the log.
Meta Meta: think about how your application behaves if the logging subsystem fails and if you can live with the possibility of losing the logs (ie is it critical for you to always/always have the guarantee that you will get the logs?). this will probably drive what you do / what solution you ultimately pick.

What good alternatives are there to Copperegg for monitoring EC2 instances?

I've been using Copperegg for a while now and have generally been happy with it until lately, where I have had a few issues. It's being used to monitor a number of EC2 instances that must be up 24/7.
Last week I was getting phantom alerts that servers had gone down when they hadn't, which I can cope with, but also I didn't get an alert when I should have done. One server had high CPU for over 5 mins when the alert should be triggered after 1 minute. The Copperegg support weren't not all that helpful, merely agreeing that an alert should have been triggered.
The latter of those problems is unacceptable and if it were to happen again outside of working hours then serious problems will follow.
So, I'm looking for alternative services that will do that same job. I've looked at Datadog and New Relic, but both have a significant problem in that they will only alert me of a problem 5 minutes after it's occurred, rather than the 1 minute I can get with Copperegg.
What else is out there that can do the same job and will also integrate with Pager Duty?
tl;dr : Amazon CloudWatch will do what you want and probably much much more.
I believe that Amazon actually offers a service that would accomplish your goal - CloudWatch (pricing). I'm going to take your points one by one. Note that I haven't actually used it before, but the documentation is fairly clear.
One server had high CPU for over 5 mins when the alert should be triggered after 1 minute
It looks like CloudWatch can be configured to send an alert (which I'll get to) after one minute of a condition being met:
One can actually set conditions for many other metrics as well - this is what I see on one of my instances, and I think that detailed monitoring (I use free), might have even more:
What else is out there that can do the same job and will also integrate with Pager Duty?
I'm assuming you're talking about this. It turns out the Pager Duty has a helpful guide just for integrating CloudWatch. How nice!
Pricing
Here's the pricing page, as you would probably like to parse it instead of me telling you. I'll give a brief overview, though:
You don't want basic monitoring, as it only gives you metrics once per five minutes (which you've indicated is unacceptable.) Instead, you want detailed monitoring (once every minute).
For an EC2 instance, the price for detailed monitoring is $3.50 per instance per month. Additionally, every alarm you make is $0.10 per month. This is actually very cheap if compared to CopperEgg's pricing - $70/mo versus maybe $30 per month for 9 instances and copious amounts of alarms. In reality, you'll probably be paying more like $10/mo.
Pager Duty's tutorial suggests you use SNS, which is another cost. The good thing: it's dirt cheap. $0.60 per million notifications. If you ever get above a dollar in a year for SNS, you need to perform some serious reliability improvements on your servers.
Other shiny things!
You're not just limited to Amazon's pre-packaged metrics! You can actually send custom metrics (time it took to complete a cronjob, whatever) to Cloudwatch via a PUT request. Quite handy.
Submit Custom Metrics generated by your own applications (or by AWS resources not mentioned above) and have them monitored by Amazon CloudWatch. You can submit these metrics to Amazon CloudWatch via a simple Put API request.
(from here)
Conclusion
So all in all: CloudWatch is quite cheap, can do 1-minute frequency stats, and will integrate with Pager Duty.
tl;dr: Server Density will do what you want, on top of that it has web checks and custom metrics too.
In short Server Density is a monitoring tool that will monitor all the relevant server metrics. You can take a look at this page where it’s all described.
One server had high CPU for over 5 mins when the alert should be triggered after 1 minute
Server Density’s open source agent collects and posts the data to their server every minute and you can decide yourself when that alert should be triggered. In the alert below you can see that the alert will alert 1 person after 1 minute and then repeatedly alert every 5 minutes.
There is a lot of other metrics that you can alert on too.
What else is out there that can do the same job and will also integrate with Pager Duty?
Server Density also integrates with PagerDuty. The only thing you need to do is to generate an api key at PagerDuty and then provide that in the settings.
Just provide the API key in the settings and you can then in check pagerduty as one of the alert recipients.
Pricing
You can find the pricing page here. I’ll give you a brief overview of it. The pricing starts at $10 for one server plus one web check and then get’s cheaper per server the more servers you add.
Everything will be monitored once every minute and there is no fees added for the amount of alerts added or triggered, even if that is an SMS to your phone number. The cost is slightly more expensive than the Cloudwatch example, but the support is good. If you used copperegg before they have a migration tool too.
Other shiny things!
Server Density allows you to monitor all the things! Then only thing you need to do is to send us custom metrics which you can do with a plugin written by yourself or by someone else.
I have to say that the graphs that Server Density provides is somewhat akin to eye candy too. Most other monitoring solutions I’ve seen out there have quite dull dashboards.
Conclusion
It will do the job for you. Not as cheap as CloudWatch, but doesn’t lock you in into AWS. It’ll give you 1 minute frequency metrics and integrate with pagerduty + a lot more stuff.

Using any of the Amazon Web Services, how could I schedule something to happen 1 year from now?

I'd like to be able to create a "job" that will execute in an arbitrary time from now... Let's say 1 year from now. I'm trying to come up with a stable, distributed system that doesn't rely on me maintaining a server and scheduling code. (Obviously, I'll have to maintain the servers to execute the job).
I realize I can poll simpleDB every few seconds and check to see if there's anything that needs to be executed, but this seems very inefficient. Ideally I could create an Amazon SNS topic that would fire off at the appropriate time, but I don't think it's possible.
Alternatively, I could create a message in the Amazon SQS that would not be visible for 1 year. After 1 year, it becomes visible and my polling code picks up on it and executes it.
It would seem this is a topic like Singletons or Inversion Control that Phd's have discussed and come up with best practices for. I can't find the articles if there any.
Any ideas?
Cheers!
The easiest way for most people to do this would be to run at least an EC2 server with a cron job on the EC2 server to trigger an action. However, the cost of running an EC2 server 24 hours a day for a year just to trigger an action would be around $170 at the cheapest (8G t1.micro with Heavy Utilization Reserved Instance). Plus, you have to monitor that server and recover from failures.
I have sketched out a different approach to running jobs on a schedule that uses AWS resources completely. It's a bit more work, but does not have the expense or maintenance issues with running an EC2 instance.
You can set up an Auto Scaling schedule (cron format) to start an instance at some point in the future, or on a recurring schedule (e.g., nightly). When you set this up, you specify the job to be run in a user-data script for the launch configuration.
I've written out sample commands in the following article, along with special settings you need to take care of for this to work with Auto Scaling:
Running EC2 Instances on a Recurring Schedule with Auto Scaling
http://alestic.com/2011/11/ec2-schedule-instance
With this approach, you only pay for the EC2 instance hours when the job is actually running and the server can shut itself down afterwards.
This wouldn't be a reasonable way to schedule tens of thousands of emails with an individual timer for each, but it can make a lot of sense for large, infrequent jobs (a few times a day to once per year).
I think it really depends on what kind of job you want to execute in 1 year and if that value (1 year) is actually hypothetical. There are many ways to schedule a task, windows and linux both offer a service to schedule tasks. Windows being Task Scheduler, linux being crontab. In addition to those operating system specific solutions you can use Maintenance tasks on MSSQL server and I'm sure many of the larger db's have similar features.
Without knowing more about what you plan on doing its kind of hard to suggest any more alternatives since I think many of the other solutions would be specific to the technologies and platforms you plan on using. If you want to provide some more insight on what you're going to be doing with these tasks then I'd be more than happy to expand my answer to be more helpful.