AWS Bill Generation Time in GMT - amazon-web-services

I want to know at what time amazon updates the reports that are created in s3 buckets. Is it updated on midnight? Want to know the exact time in GMT

Looking back at 1+ year of billing reports, it doesn't look like you should expect that the billing reports will be generated at a specific time.
This makes perfect sense. Even if some background job was always triggered at some specific time (and this is probably an oversimplification for such a complex billing system), I can't assume even Amazon would be able to guarantee that ALL these background jobs (i.e. for ALL customers) would finish at the same time every time they run.
There is always a different data set + other ongoing workload to consider, which would certainly affect the completion time.
FWIW, I am including a screenshot from my S3 bucket:

Related

Spark History Server ListBucket costs

We are using Spark history 3.2.1 to monitor our Spark applications.
We have thousands of daily jobs (running on Kubernetes) that writes event logs to S3 bucket (in a dedicated folder).
We are using history-server to analyze and compare completed jobs (incomplete running jobs never appeared in the UI but it's not a requirement now).
Recently I've noticed increase in our ListBucket API Operation in AWS billing cost explorer. This cost is higher than the cost of the StandardStorage (the price we pay for storing the data itself). It's up to few hundreds per month!
Running history-server with DEBUG log level exposed the "problem": every 10s the the history-server list the bucket to get all logs and then it iterate over each folder to get it's content. So if I want to keep the last 10,000 jobs, I'll have to pay for 10,101 ListBucket requests every 10s!
Here is one example (out of the 10k) reproduced locally with minio as S3:
22/02/20 06:44:31 DEBUG wire: http-outgoing-57 << "<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>local-audience</Name><Prefix>history-logs/eventlog_v2_spark-ffffdf5903c841259f28b53981746b76/</Prefix><KeyCount>2</KeyCount><MaxKeys>5000</MaxKeys><Delimiter>/</Delimiter><IsTruncated>false</IsTruncated><Contents><Key>history-logs/eventlog_v2_spark-ffffdf5903c841259f28b53981746b76/appstatus_spark-ffffdf5903c841259f28b53981746b76</Key><LastModified>2022-02-12T17:00:15.304Z</LastModified><ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag><Size>0</Size><Owner><ID></ID><DisplayName></DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>history-logs/eventlog_v2_spark-ffffdf5903c841259f28b53981746b76/events_1_spark-ffffdf5903c841259f28b53981746b76</Key><LastModified>2022-02-12T17:00:15.136Z</LastModified><ETag>"f91cc774d92c6f6c2ca4d0e1a1e76e13"</ETag><Size>868837</Size><Owner><ID></ID><DisplayName></DisplayName></Owner><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>"
To ensure that the cost comes from history-server I turned it off for a day and there was no charge per ListBucket since then:
To mitigate the problem (because we still need the history-server), I can set the spark.history.fs.update.interval to higher number (such as 3600s or so). As we are checking the history-server once a day it is overkill and doesn't worth it (cost wise).
Why does it scan the completed jobs every time (over and over again) and not only new jobs? is there a way to configure such behavior to avoid those ListBucket operations?
If I care only for completed jobs, and assuming I can wait few minutes to see the list, is there a mode that can load the list only when I login to the UI? (rather than periodically doing it for nothing).
P.S - I'm using AWS lifecycle rules to clean this folder every few few days (and not the server cleaning feature), by expiration objects after few days.
treewalking in s3 is (a) expensive and (b) horribly slow, especially given that a deep tree scan exists. If you want to fix this and can write scala code, see if you can fix the server to switch to a deep listing by moving to FileSystem.listFiles(path, true). Yes that involves coding, but the OSS community depends on everyone fixing their own personal issues and sharing the outcome
After digging into this issue, I decided to stop using the "rolling" feature for now - as my application jobs are relatively small.
I removed the:
spark.eventLog.rolling.enabled: true
spark.eventLog.rolling.maxFileSize: 16m
from the spark-submit command and the cost is now back to normal...
I also wrote about it here.
#stevel thanks for your answer - I will try to contribute and fix that! :)

bigstore increasing almost linearly Google Cloud

I use many api's from Google Cloud. Recently I noticed that the bigstore is gradually increasing on a daily basis. I am worried that if this continues I wont be able to pay the bill.
I do not know however how to check where this increase is coming from. Is there a way to see which cloud functions are causing this increased traffic?
The reason I am surprised about the increase in the traffic of bigstore is because I have cron jobs that are running multiple times per day to store the data in BigQuery. I have not changed these settings, so I would assume that this traffic should not increase as shown on the chart.
One other explanation I can think of is that the amount of data that I am storing has increased, which is indeed true on a daily basis. But why does this increase the traffic?
What is the way to check this?
There are two main data sources you should use:
GCP-wide billing export. This will tell you an exact breakdown of your costs. This is important to make sure you target your effort where the cost is largest to you. It also provides some level of detail about what the usage is.
Enable access & storage logging. The access log will give you an exact accounting of incoming requests down to the number of bytes transferred. The storage logs give you similar granularity into the cost of storage itself.
In addition, if you have a snapshot of your bigstore, as time goes on and you replace or even rename files, your storage charges will increase because where once you had 2 views of the same storage, as the files change each file forks in 2 copies (one is the current view of your storage, one is the snapshot.)

What good alternatives are there to Copperegg for monitoring EC2 instances?

I've been using Copperegg for a while now and have generally been happy with it until lately, where I have had a few issues. It's being used to monitor a number of EC2 instances that must be up 24/7.
Last week I was getting phantom alerts that servers had gone down when they hadn't, which I can cope with, but also I didn't get an alert when I should have done. One server had high CPU for over 5 mins when the alert should be triggered after 1 minute. The Copperegg support weren't not all that helpful, merely agreeing that an alert should have been triggered.
The latter of those problems is unacceptable and if it were to happen again outside of working hours then serious problems will follow.
So, I'm looking for alternative services that will do that same job. I've looked at Datadog and New Relic, but both have a significant problem in that they will only alert me of a problem 5 minutes after it's occurred, rather than the 1 minute I can get with Copperegg.
What else is out there that can do the same job and will also integrate with Pager Duty?
tl;dr : Amazon CloudWatch will do what you want and probably much much more.
I believe that Amazon actually offers a service that would accomplish your goal - CloudWatch (pricing). I'm going to take your points one by one. Note that I haven't actually used it before, but the documentation is fairly clear.
One server had high CPU for over 5 mins when the alert should be triggered after 1 minute
It looks like CloudWatch can be configured to send an alert (which I'll get to) after one minute of a condition being met:
One can actually set conditions for many other metrics as well - this is what I see on one of my instances, and I think that detailed monitoring (I use free), might have even more:
What else is out there that can do the same job and will also integrate with Pager Duty?
I'm assuming you're talking about this. It turns out the Pager Duty has a helpful guide just for integrating CloudWatch. How nice!
Pricing
Here's the pricing page, as you would probably like to parse it instead of me telling you. I'll give a brief overview, though:
You don't want basic monitoring, as it only gives you metrics once per five minutes (which you've indicated is unacceptable.) Instead, you want detailed monitoring (once every minute).
For an EC2 instance, the price for detailed monitoring is $3.50 per instance per month. Additionally, every alarm you make is $0.10 per month. This is actually very cheap if compared to CopperEgg's pricing - $70/mo versus maybe $30 per month for 9 instances and copious amounts of alarms. In reality, you'll probably be paying more like $10/mo.
Pager Duty's tutorial suggests you use SNS, which is another cost. The good thing: it's dirt cheap. $0.60 per million notifications. If you ever get above a dollar in a year for SNS, you need to perform some serious reliability improvements on your servers.
Other shiny things!
You're not just limited to Amazon's pre-packaged metrics! You can actually send custom metrics (time it took to complete a cronjob, whatever) to Cloudwatch via a PUT request. Quite handy.
Submit Custom Metrics generated by your own applications (or by AWS resources not mentioned above) and have them monitored by Amazon CloudWatch. You can submit these metrics to Amazon CloudWatch via a simple Put API request.
(from here)
Conclusion
So all in all: CloudWatch is quite cheap, can do 1-minute frequency stats, and will integrate with Pager Duty.
tl;dr: Server Density will do what you want, on top of that it has web checks and custom metrics too.
In short Server Density is a monitoring tool that will monitor all the relevant server metrics. You can take a look at this page where it’s all described.
One server had high CPU for over 5 mins when the alert should be triggered after 1 minute
Server Density’s open source agent collects and posts the data to their server every minute and you can decide yourself when that alert should be triggered. In the alert below you can see that the alert will alert 1 person after 1 minute and then repeatedly alert every 5 minutes.
There is a lot of other metrics that you can alert on too.
What else is out there that can do the same job and will also integrate with Pager Duty?
Server Density also integrates with PagerDuty. The only thing you need to do is to generate an api key at PagerDuty and then provide that in the settings.
Just provide the API key in the settings and you can then in check pagerduty as one of the alert recipients.
Pricing
You can find the pricing page here. I’ll give you a brief overview of it. The pricing starts at $10 for one server plus one web check and then get’s cheaper per server the more servers you add.
Everything will be monitored once every minute and there is no fees added for the amount of alerts added or triggered, even if that is an SMS to your phone number. The cost is slightly more expensive than the Cloudwatch example, but the support is good. If you used copperegg before they have a migration tool too.
Other shiny things!
Server Density allows you to monitor all the things! Then only thing you need to do is to send us custom metrics which you can do with a plugin written by yourself or by someone else.
I have to say that the graphs that Server Density provides is somewhat akin to eye candy too. Most other monitoring solutions I’ve seen out there have quite dull dashboards.
Conclusion
It will do the job for you. Not as cheap as CloudWatch, but doesn’t lock you in into AWS. It’ll give you 1 minute frequency metrics and integrate with pagerduty + a lot more stuff.

How long should I wait after applying an AWS IAM policy before it is valid?

I'm adding and removing AWS IAM user policies programmatically, and I'm getting inconsistent results from the application of those policies.
For example, this may or may not succeed (I'm using the Java 1.6.6 SDK):
Start with a user that can read from a particular bucket
Clear user policies (list policies then call "deleteUserPolicy" for each one)
Wait until the user has no user policies (call "listUserPolicies" until it returns an empty set)
Attempt to read from the bucket (this should fail)
If I put in a breakpoint between #3 and #4 and wait a few seconds, the user cannot read from the bucket, which is what I expect. If I remove breakpoints, the user can read from the bucket, which is wrong.
(This is also inconsistent when I add a policy then access a resource)
I'd like to know when a policy change has had an effect on the component (S3, SQS, etc), not just on the IAM system. Is there any way to get a receipt or acknowledgement from this? Or maybe there is a certain amount of time to wait?
Is there any documentation on the internals of policy application?
(FYI I've copied my question from https://forums.aws.amazon.com/thread.jspa?threadID=140383&tstart=0)
The phrase "almost immediately" is used 5 times in the IAM FAQ, and is, of course, somewhat subjective.
Since AWS is a globally-distributed system, your changes have to propagate, and the system as a whole seems to be designed to favor availability and partition tolerance as opposed to immediate consistency.
I don't know whether you've considered it, but it's entirely within the bounds of possibility that you might actually, at step 4 in your flow, see a sequence of pass, fail, pass, pass, fail, fail, fail, fail... because neither a bucket nor an object in a bucket are actually a single thing in a single place, as evidenced by the mixed consistency model of different actions in S3, where new objects are immedately-consistent while overwrites and deletes are eventually consistent... so the concept of a policy having "had an effect" or not on the bucket or an object isn't an entirely meaningful concept since the application of the policy is, itself, almost certainly, a distributed event.
To confirm such an application of policies would require AWS to expose the capability of (at least indirectly) interrogating every entity that has a replicated copy of that policy to see whether it had the current version or not... which would be potentially impractical or unwieldy to say the least in a system as massive as S3, which has grown beyond a staggering 2 trillion objects, and serves peak loads in excess of 1.1 million requests per second.
Official AWS answers to this forum post provide more information:
While changes you make to IAM entities are reflected in the IAM APIs immediately, it can take noticeable time for the information to be reflected globally. In most cases, changes you make are reflected in less than a minute. Network conditions may sometimes increase the delay, and some services may cache certain non-credential information which takes time expire and be replaced.
The accompanying answer to what to do in the mean time was "try again."
We recommend a retry loop after a slight initial delay, since in most circumstances you'll see your changes reflected quite quickly. If you sleep, your code will be waiting far too long in most cases, and possibly not long enough for the rare exceptions.
We actively monitor the performance of the replication system. But like S3, we guarantee only eventual consistency, not any particular upper bound.
I have a far less scientific answer here... but I think it will help some other people feel less insane :). I kept thinking things were not working while they were just taking more time than I expected.
Last night I was adding an inline policy to allow a host to get parameters from the system manager. I thought it wasn't working because many minutes after the change (maybe 5 or so), my CLI commands were still failing. Then, they started working. So, that was a fairly large delay.
Just now, I removed that policy and it took 2-3 minutes (enough to google this and read a couple other pages) before my host lost access.
Generally things are quite snappy for me as well, but if you're pretty sure something should work and it's not, just do yourself a favor and wait 10 minutes. Unfortunately, this makes automation after IAM changes sound harder than I thought!

Log delay in Amazon S3

I have recently hosted in Amazon S3, and I need the log files to calculate the statistics for the "get", "put", "list" operations in the objects.
And I've observed that the log files are organized weirdly. I don't know when the log will appear(not immediatly, at least 20 minutes after the operation) and how many lines of logs will be contained in one log file.
After that, I need to download these log files and analyse them. But I can't figure out how often I will do this.
Can somebody help? Thanks.
What you describe (log files being made available with delays and being in unpredictable order) is exactly what is declared by AWS as behaviour to expect. This is by nature of distributed system, AWS S3 is using to provide S3 service, the same request may be served each time from different server - I have seen 5 different IP addresses being provided for publishing.
So the only solution is: accept the delay, see the delay you experience and add some extra time and learn living with this total delay (I would expect something like 30 to 60 minutes, but statistics could tell more).
If you need log records ordered, you have either sort them yourself, or search for some log processing solutions - I have seen some applications being offered exactly for this purpose.
In case, you really need to get your log file with very short delay, you have to make the logs yourself and this means, you have to write and run some frontend, which gives access to your files on S3 and at the same time keeps logging as needed.
I run such a solution, users get user name and password and url of my frontend. As they send the request, I evaluate, if they provide proper credentials and if they are allowed to see given resource, and if so, I create few minutes valid temporary url for that resource and redirect the request to that.
But such a fronted costs money (you have to run your frontend somewhere) and is less robust, then accessing directly the AWS S3.
Good luck, Lulu.
A lot has changed since the time that the question was originally posted. The delay is still there, but one of OP concerns was when to download the logs to analyze them.
One option right now would be to leverage Event Notifications: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/setup-event-notification-destination.html
This way, whenever an object is created in the access logs bucket, you can trigger a notification either to SNS, SQS or Lamba, and based on that download and analyze the log files.