I have 70 EBS volumes that I need to schedule daily snapshots of. I found this tutorial in the AWS documentation which is helpful, and I already toyed with the AWS CLI to fetch a list of the 70 volume IDs, however, it's not clear to me how I can then feed these many volume IDs back into the Event Rule.
Through the Console, I can only add one Target (Create Snapshot API, Volume ID, and Role) at a time. Looking at the AWS CLI documentation for put-targets, I'm not seeing how to form the command to do this, even if I used some creative find-and-replace work in Notepad to just make a ton of individual commands. Namely, I'm not seeing how I select the Create Snapshot API as the Target, and since each Target has slightly different requirements, I'm not sure then how to supply the volume ID or IAM Role.
What is the most expedient way to get 70 EBS volume IDs added as Create Snapshot API Targets for an EventBridge Rule, or do I just gotta bear down and do them all by hand?
Instead of building such a custom solution, AWS backup is nowadays a much more effective solution for these types of tasks. It also allows you to set a retention period more easily to life cycle your snapshots and create backup policies based on tags.
If you really want to do it with cloudwatch events you need at least as many event rules as you have volumes since the snapshot api is only called once per scheduled rule and the api does not take a list of volumes, just a single volume. So you'll need 70 scheduled rules. Which doesn't scale very well :). Second option is to use a lambda for the event rule target that processes everything but again, it's more work than aws backup.
Related
Is there a way to allow creation of a resource like a DynamoDB table only if the table to be created was PAY_PER_REQUEST or was provisioned with capacity below a certain amount?
I initially looked at IAM condition keys, but they appear to only be available for interactions with the table data operations (scan, update, put operations etc.) but not creation operations for the table.
Alternatively, are there ways to reduce service quotas for an account?
Ideally, I'm wondering if it is possible to scope down the ability to create DynamoDB table resources beyond a certain capacity and I’m not sure how to do it proactively instead of retroactively processing CloudTrail logs or listing existing table properties.
AWS Config
You can use AWS Config to retrospectively query AWS resources and their properties, and then determine if they are compliant or not. There are rules already available out of the box, but I can't see one which matches your use case. You will need to then write a Lambda function to implement this yourself. Here is an example.
After your rule is working you can either create a remediation action to
Delete the Table
Scale the Table Down
Send a Notification
Adjust Autoscaling (i.e. reduce max)
AWS Budgets
(My Preference)
For determining if an account is using too much DynamoDB, probably the easiest is to setup a budget for the DynamoDB Service. That would have a couple of benefits:
Auto-Scaling: Developers would be free to use high amounts of capacity (such as load tests) for short periods of time.
Potentially Cheaper: what I have found is that if you put restrictions on projects often developers will allocate 100% of the maximum, as opposed to using only what they need, in fear for another developer coming along and taking all the capacity.
Just like before with AWS Config you can setup Billing Alarms to take action and notify developers that they are using too much DynamoDB, also when the Budget is at 50%, 80% ... and so on.
CloudWatch
You could also create CloudWatch Alarms as well for certain DynamoDB metrics, looking at the capacity which has been used and again responding to excessive use.
Conclusion
You have a lot of flexibility how to approach this, so make sure you have gathered up your requirements and then the appropriate response will be easier to see. AWS Config requires a bit more work than budgets so if you can get what you want out of Budgets I would do that.
Our CIO had a heart attack upon seeing our AWS bill.
I need to aggregate Apache and Tomcat logs from multiple EC2 (in scaling group) -- what could be the best way to initiate this without breaking the bank? The goal of the logs is to view events by IP address, account names, view the transaction flows (diagnostic/audit logging -- not so much as performance metrics).
ELK is out of the equation (political). Cloudwatch is allowed + anything else.
Depends on volume and access patterns, but pushing the logs to S3 and using Athena to query them is a good shout.
Its cheap because S3 is a really cheap datastore, and Athena is server-less, meaning you only pay for the queries you run.
Make sure you convert the logs to a compressed data format (like Apace Parquet) to save even more dosh.
https://aws.amazon.com/athena
https://docs.aws.amazon.com/athena/latest/ug/querying-apache-logs.html
https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
My arguments against S3/Athena would be that S3 may be the cheapest storage mechanism but how will you get the logs off your box and into S3? I'm not aware of any AWS agents that do this but there may be some commercial or open source projects to do it. Also, there is some setup required to get Athena to work for searching such as defining schemas and/or setting up AWS Glue Crawlers to discover data. You'll often find that Glue Crawlers won't be the great of identifying log data if it's not in something like JSON formatted.
I would highly recommend CloudWatch. AWS has created a CloudWatch agent that is available for multiple OSs that will pull and forward your logs from your EC2 instances. CloudWatch also has some free searching tools and now the more powerful CloudWatch Insights tool to help you search your data in a way similar to what other first-class log aggregators allow.
CloudWatch pricing is also pretty cheap. It's only $0.50/GB ingested and $0.02/GB long term storage (in us-east-1 at least). And there is no charge to use the CloudWatch agent which is the biggest advantage as you don't have to invent and test a new way to pull logs off of your boxes.
I'm hunting down a misbehaving EC2 instance, whose ID I found in my billing logs. I can't do describe-instances on it anymore since it died a few days ago. Is there a way to get its equivalent, i.e. does AWS log this kind of information anywhere? In this particular case, I needed to find out which SSH key it was tied to, but the more details, the merrier.
You can get this information from CloudTrail, as long as it happened in the current region and last 90 days.
While you can do it with the Console, you will probably find the CloudTrail CLI easier. Start with this:
aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=RunInstances --no-paginate > /tmp/$$
This dumps all (?) of the RunInstances events to a file, which you can then open in your editor (check the docs; I think that --no-paginate will dump everything, but if you have a lot of events you might have to manually request additional pages).
A better long-term solution is to enable CloudTrail logging to an S3 bucket. This gathers events across all regions, and for a multi-account organization, all accounts. You can then use a bucket life-cycle policy to hold onto those events as long as you think you'll need them. It is, however, somewhat more challenging to query against events stored in S3.
It is not possible to retrieve meta data information of terminated instances.
In future, you can try some alternate approach. For ex: Use AWS Config and add a custom rule write a little lambda function which which saves the meta data of your instances and trigger the lambda function periodically.
I want to build an end to end automated system which consists of the following steps:
Getting data from source to landing bucket AWS S3 using AWS Lambda
Running some transformation job using AWS Lambda and storing in processed bucket of AWS S3
Running Redshift copy command using AWS Lambda to push the transformed/processed data from AWS S3 to AWS Redshift
From the above points, I've completed pulling data, transforming data and running manual copy command from a Redshift using a SQL query tool.
Doubts:
I've heard AWS CloudWatch can be used to schedule/automate things but never worked on it. So, if I want to achieve the steps above in a streamlined fashion, how to go about it?
Should I use Lambda to trigger copy and insert statements? Or are there better AWS services to do the same?
Any other suggestion on other AWS Services and of the likes are most welcome.
Constraint: Want as many tasks as possible to be serverless (except for semantic layer, Redshift).
CloudWatch:
Your options here are either to use CloudWatch Alarms or Events.
With alarms, you can respond to any metric of your system (eg CPU utilization, Disk IOPS, count of Lambda invocations etc) when it crosses some threshold, and when this alarm is triggered, invoke a lambda function (or send SNS notification etc) to perform a task.
With events you can use either a cron expression or some AWS service event (eg EC2 instance state change, SNS notification etc) to then trigger another service (eg Lambda), so you could for example run some kind of clean-up operation via lambda on a regular schedule, or create a snapshot of an EBS volume when its instance is shut down.
Lambda itself is a very powerful tool, and should allow you to program a decent copy/insert function in a language you are familiar with. AWS has several GitHub repos with lots of examples too, see for example the serverless examples and many samples. There may be other services which could work for you in your specific case, but part of Lambda's power is its flexibility.
I'm trying to write a tool which manages Amazon AWS snapshots automatically according to some very simple rules. These snapshots are created on a schedule set up in Amazon Storage Gateway, and show up as you'd expect in the web interface for that tool.
The Storage Gateway API only has operations for snapshots as far as the snapshot schedule goes. EC2 is the API which talks about snapshots. The problem is that if I DescribeSnapshots through that API I see many many hundreds of snapshots, but none of them have volume IDs which match the volume IDs of the snapshots created from Storage Gateway. They're just random public snapshots which I'm not interested in.
So I guess Storage Gateway snapshots are different somehow, but is there a way to use any of Amazon's APIs to list and manipulate them?
EDIT: Interestingly, they do show up in the EC2 web control panel.
Here's a top tip: the snapshots are there, just make sure you're looking for them using the right function. In this case, my novitiate in Clojure is still in effect and I tried to use contains? to search for an item in a sequence. Again. But it doesn't work like that, it looks for keys in collections which means over sequences it wants a number and will tell you if there's an item at that index or not. Even more fun, pass it a sequence and a string and it won't bat an eyelid, it just says false.
Oh and Amazon's not always consistent with capitalisation of volume IDs either, so make sure you lowercase everything before you compare it. That bit's actually relevant to AWS rather than me stubbornly misinterpreting the documentation of a core function.