How to make sense out of AWS CloudTrail costs - amazon-web-services

I'm responsible for two AWS accounts where a web service is run in two different environments. I'm now trying to have a look at cutting costs, and I'm a bit confused as to how to make sense of the CloudTrail costs, i.e., break it down into different categories etc, and also if there are some common pitfalls when it comes to high CloudTrail costs.
For example, my company's security department is running some monitoring SW in my accounts - both internal like GuardDuty and external, and I think these tools are responsible for a lot of this cost. I'd like to be able to chart down exactly which of these costs are attributed to tools that I am in no control of (and consequently which I may be able to reduce), and which are due to infrastructure I'm responsible of. Right now, CloudTrail costs are the single highest item, and I think it seems odd that we're paying more for it than what our combined cost is for EC2 + Lambda + DynamoDB + S3.
When looking at the CloudTrail event stream itself, most events don't hold that much information useful for me to understand why this event is sent or to group them in a bar chart by "monitoring tool" vs "normal operation". There's basically an event source and a user name. At a glance, a large portion of events seem to be CreateLogStream and UpdateInstanceInformation, which are likely due to normal operation of some of the services?
If it turns out simple events such as CreateLogStream are costing thousands of dollars, how should I attack this problem? Is it possible to manually disable certain CloudTrail events? Is there a best practice as to what events are important and which are not? I mean, I never use CloudTrails for anything during normal operation, but it feels nice to have them in case something nasty happens.

It is unclear to me whether you're asking about the costs to operate the CloudTrail service, or about using CloudTrail to track your operational costs.
If the former, I'd be very surprised if you are running up significant costs for CloudTrail. You get one trail per account for free, and additional trails cost $2.00 per 100,000 events (pricing page). The only way that I can see for that to become a significant cost factor is to have an extremely large number -- as in dozens or hundreds -- of trails per account. Which is unlikely to be the case.
In the latter case, CloudTrail is the wrong tool for the job. Instead, you should use Cost Explorer, which will let you group your costs by multiple factors, including service and usage type.
If you are part of an organization (which it seems you are), and use consolidated billing (in which the organization owner pays all bills), then you'll need to be granted permission to see your cost data.

Related

How to Prevent DynamoDB Table Creation Unless Under Certain Capacity

Is there a way to allow creation of a resource like a DynamoDB table only if the table to be created was PAY_PER_REQUEST or was provisioned with capacity below a certain amount?
I initially looked at IAM condition keys, but they appear to only be available for interactions with the table data operations (scan, update, put operations etc.) but not creation operations for the table.
Alternatively, are there ways to reduce service quotas for an account?
Ideally, I'm wondering if it is possible to scope down the ability to create DynamoDB table resources beyond a certain capacity and I’m not sure how to do it proactively instead of retroactively processing CloudTrail logs or listing existing table properties.
AWS Config
You can use AWS Config to retrospectively query AWS resources and their properties, and then determine if they are compliant or not. There are rules already available out of the box, but I can't see one which matches your use case. You will need to then write a Lambda function to implement this yourself. Here is an example.
After your rule is working you can either create a remediation action to
Delete the Table
Scale the Table Down
Send a Notification
Adjust Autoscaling (i.e. reduce max)
AWS Budgets
(My Preference)
For determining if an account is using too much DynamoDB, probably the easiest is to setup a budget for the DynamoDB Service. That would have a couple of benefits:
Auto-Scaling: Developers would be free to use high amounts of capacity (such as load tests) for short periods of time.
Potentially Cheaper: what I have found is that if you put restrictions on projects often developers will allocate 100% of the maximum, as opposed to using only what they need, in fear for another developer coming along and taking all the capacity.
Just like before with AWS Config you can setup Billing Alarms to take action and notify developers that they are using too much DynamoDB, also when the Budget is at 50%, 80% ... and so on.
CloudWatch
You could also create CloudWatch Alarms as well for certain DynamoDB metrics, looking at the capacity which has been used and again responding to excessive use.
Conclusion
You have a lot of flexibility how to approach this, so make sure you have gathered up your requirements and then the appropriate response will be easier to see. AWS Config requires a bit more work than budgets so if you can get what you want out of Budgets I would do that.

Best Way to Monitor Customer Usage of AWS Lambda

I have newly created an API service that is going to be deployed as a pilot to a customer. It has been built with AWS API Gateway, AWS Lambda, and AWS S3. With a SaaS pricing model, what's the best way for me to monitor this customer's usage and cost? At the moment, I have made a unique API Gateway, Lambda function, and S3 bucket specific to this customer. Is there a good way to create a dashboard that allows me (and perhaps the customer) to detail this monitoring?
Additional question, what's the best way to streamline this process when expanding to multiple different customers? Each customer would have a unique API token — what's the better approach than the naive way of making unique AWS resources per customer?
I am new (a college student), but any insights/resources would help me a long way. Thanks.
Full disclosure: I work for Lumigo, a company that does exactly that.
Regarding your question,
As #gusto2 said, there are many tools that you can use, and the best tool depends on your specific requirements.
The main difference between the tools is the level of configuration that you need to apply.
cloudwatch default metrics - The first tool that you should use. This is an out-of-the-box solution that provides you many metrics on the services, such as: duration, number of invocations and errors, memory. You can configure metrics over different timeslots and aggregators (P99, average, max, etc.)
This tool is great for basic monitoring.
Its limitation is its greatest strength - it provides monitoring which is common to all the services, thus nothing tailored-fit to serverless applications. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
cloudwatch custom metrics - The other side of the scale - getting much more precise metrics, which allows you to upload any metric data and monitor it: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html
This is a great tool if you know exactly what you want to monitor, and you already familiar with your architecture limitations and pain points.
And, of course, you can configure alarms over this data:
Lumigo - 3rd party company (again, as a disclosure, this is my workplace). Provides out-of-the-box monitoring, specifically created for serverless applications, such as an abnormal number of invocations, costs, etc.. This tool also provides troubleshooting capabilities to enable deeper observability.
Of course, there are more 3rd party tools that you can find online. All are great- just find the one that suits your requirement the best.
Is there a good way to create a dashboard
There a are multiple ways and options depending in your scaling, amount of data and requirements. So you could start small and simple, but check if any option is feasible or not.
You can start with the CloudWatch. You can monitor basic metrics, create dashboards and even share with other accounts.
naive way of making unique AWS resources per customer
For the start I would consider creating custom cloudwatch metrics with the customer id as a metric and put the metrics from the Lambda functions.
Looks simple, but you should do the math and a PoC about the number of requested datapoints and the dashboards to prevent a nasty surprise on the billing.
Another option is sending metrics/events to DynamoDB, using atomic functions you could directly build some basic aggregations (kind of naïve stream processing).
When scaling to a lot of events, clients, maybe you will need some serious api analytics, but that may be a different topic.

How to limit number of reads from Amazon S3 bucket

I'm hosting a static website in Amazon S3 with CloudFront. Is there a way to set a limit for how many reads (for example per month) will be allowed for my Amazon S3 bucket in order to make sure I don't go above my allocated budget?
If you are concerned about going over a budget, I would recommend Creating a Billing Alarm to Monitor Your Estimated AWS Charges.
AWS is designed for large-scale organizations that care more about providing a reliable service to customers than staying within a particular budget. For example, if their allocated budget was fully consumed, they would not want to stop providing services to their customers. They might, however, want to tweak their infrastructure to reduce costs in future, such as changing the Price Class for a CloudFront Distribution or using AWS WAF to prevent bots from consuming too much traffic.
Your static website will be rather low-cost. The biggest factor will likely be Data Transfer rather than charges for Requests. Changing the Price Class should assist with this. However, the only true way to stop accumulating Data Transfer charges is to stop serving content.
You could activate CloudTrail data read events for the bucket, create a CloudWatch Event Rule to trigger an AWS Lambda Function that increments the number of reads per object in an Amazon DynamoDB table and restrict access to the objects once a certain number of reads has been reached.
What you're asking for is a very typical question in AWS. Unfortunately with near infinite scale, comes near infinite spend.
While you can put a WAF, that is actually meant for security rather than scale restrictions. From a cost-perspective, I'd be more worried about the bandwidth charges than I would be able S3 requests cost.
Plus once you put things like Cloudfront or Lambda, it gets hard to limit all this down.
The best way to limit, is to put Billing Alerts on your account -- and you can tier them, so you get a $10, $20, $100 alerts, up until the point you're uncomfortable with. And then either manually disable the website -- or setup a lambda function to disable it for you.

How to programmatically detect usage of individual app users use of AWS?

I have an app, built upon multiple AWS services, that provides a document storage service for my users.
Rather than track my users usage based on the way they use my app, and then multiplying their usage with the cost of each service they consume (doing the calculation myself), I was wondering if there was a way to do this automatically (having aws track my users at a granular per user level and compute per user costs automatically)?
For example, when a user consumes some AWS service, is there an option to provide an identifieR to AWS, so AWS tracks usage and computes costs of individual ids itself? That way it would be much simpler to just ask AWS how much my users are consuming and charge them appropriately.
It appears that you are wanting to determine the costs of your service so that you can pass-on these costs to your customers.
I would recommend that you re-think your pricing strategy. Rather than charging as "cost plus some profit", focus your pricing with these rules in mind:
Charge for things that are of value to your customers (that they want to do)
Charge for things that you want them to do less (what you don't want them to do)
Make everything else free
Think about how this applies to other services that charge money:
Water utilities: Charge for water, but charge people for consuming too much water
Netflix: Charge for providing shows, but charge extra for consuming more bandwidth (4K, multiple accounts)
Cell phone: Charge for service, but charge extra for consuming more data
Each of these organizations suffer extra costs for providing more water, bandwidth and data, so they would prefer people either don't consume them much, or pay extra for consuming it.
If your application provides benefits to customer for storing documents, then they will pay for your service. However, you will suffer extra costs for storing more documents so you should charge extra for consuming more storage. Everything else, if possible, should be free.
Therefore, don't calculate how much each user costs you for running the Amazon EC2 instances, Data Transfer bandwidth, domain names, database storage, support staff, programmers, management and your time. Instead, concentrate on the value you are giving to your customers and they should be willing to pay if the benefit is greater than the cost. Find the element that really costs you more if they over-consume and charge extra for that. For example, if somebody stores more documents, which consumes more space in Amazon S3 and costs more Data Transfer, then charge them for having more documents. Don't charge them based on some technical aspect like the size of EC2 instance you are using.
To answer your specific question, you could use tagging to identify resources that relate to a specific user, such as objects stored in Amazon S3. However, you'll probably find that most of your costs are shared costs (eg you can't split-up an EC2 instance hosting a web app or a database used by all customers). Your application database is probably already keeping track of their usage so tagging wouldn't necessarily provide much additional insight unless specific resources are being consumed by specific users.
See: AWS Tagging Strategies

Are there any notifications in AWS CloudFront Access Logs?

I'd like to use AWS AccessLogs for processing website impressions using an existing batch oriented ETL pipeline that grabs last finished hour of impressions and do a lot of further transformations with them.
The problem with AccessLog though is that :
Note, however, that some or all log file entries for a time period can
sometimes be delayed by up to 24 hours
So I would never know when all the logs for a particular hour are complete.
I unfortunately cannot use any streaming solution, I need to use existing pipeline that grabs hourly batches of data.
So my question is, is there any way to be notified that all logs has been delivered to s3 for a particular hour?
You have asked about S3, but your pull-quote is from the documentation for CloudFront.
Either way, though, it doesn't matter. This is just a caveat, saying that log delivery might sometimes be delayed, and that if it's delayed, this is not a bug -- it's a side effect of a massive, distributed system.
Both services operate an an incomprehensibly large scale, so periodically, things go wrong with small parts of the system, and eventually some stranded logs or backlogged logs may be found and delivered. Rarely, they can even arrive days or weeks later.
There is no event that signifies that all of the logs are finished, because there's no single point within such a system that is aware of this.
But here is the takeaway concept: the majority of logs will arrive within minutes, but this isn't guaranteed. Once you start running traffic and observing how the logging works, you'll see what I am referring to. Delayed logs are the exception, and you should be able to develop a sense, fairly rapidly, of how long you need to wait before processing the logs for a given wall clock hour. As long as you track what you processed, you can audit this against the bucket, later, to ensure that yout process is capturing a sufficient proportion of the logs.
Since the days before CloudFront had SNI support, I have been routing traffic to some of my S3 buckets using HAProxy in EC2 in the same region as the bucket. This gave me the ability to use custom hostnames, and SNI, but also gave me real-time logging of all the bucket traffic using HAProxy, which can stream copies of its logs to a log collector for real-time analysis over UDP, as well as writing it to syslog. There is no measurable difference in performance with this solution, and HAProxy runs extremely well on t2-class servers, so it is cost-effective. You do, of course, introduce more costs and more to maintain, but you can even deploy HAProxy between CloudFront and S3 as long as you are not using an origin access identity. One of my larger services does exactly this, a holdover from the days before Lambda#Edge.