We've built dashboards for service monitoring using AWS CloudWatch and Logs Insights. Everything looks great from reporting perspective. However, something very annoying is happening on the screen where we want to set it up to constantly display the service performance. Our setup is
We use AWS STS/Assume Role from Identities account to login to our Development and Production accounts
CloudWatch Dashboards are on Production accounts
We've below problems which we are looking at solving immediately:
The STS token expires every 12 hours (max). Is there anyway we can keep the sessions running for more than 12 hours? We don't want to be logging onto every service monitoring machine every morning.
Every few minutes CloudWatch exits from Display Dashboard and lands on CloudWatch home page on the monitoring screen
How to get rid of Alarms by Service and Recent Alarms widgets on CloudWatch home page?
I referred to this thread on AWS Forums, but it has no posts or resolutions from many months :-(
Thanks in advance!!
There isn't really much you can do here until AWS decides to change how that works. I'm with you, it's really annoying, and makes it far less valuable. I did learn one trick on this, however. When you do end up back on the home page you can use your browsers back button and you'll most often be back to where you were. That at least saves you from having to reenter things.
Related
Currently I'm getting charged for $0.5 per day per environment. We have 4 environments and all per day it costs $2
We don't have any traffic at all as we are still in development phase.
When I try to disable Stack Monitoring API, it says it will disable few more api's which isn't expected.
I saw that "google.monitoring.v3.MetricService.ListTimeSeries" has a request count of 1M per month in the metrics and I don't have from where this is getting triggered.
I see stack driver monitoring api costs are per 1000 calls and it can easily push my budgets to the edge.
Is it possible to find out from where this is getting triggered?
Finally found out that the issue is with NewRelic monitoring.
I gave access to NewRelic sometime back for all my projects and once I revoke the access, it stopped showing the costs!
It would be great if there is an option on gcp which resource is consuming and what is triggering it.
Source can be tracked this way
'APIs & Services' -> Library
Search for 'Strackdriver Monitoring Api' and click
Click 'Manage' on the next screen
Click 'Metrics' from the left-hand side menu
In the 'Select Graphs' dropdown, select "Traffic by Credential" and click 'OK'
You will either see newrelic or a service account email or a unique id. That unique ID is the unique ID of the service account used.
I am new to AWS. Can anyone please tell me how to track user activities like login, logout, other stuff etc in AWS CloudTrail. Also, I need to mention that I want to track all users activities which are in my group. Please help.
Also, what kind of user activities we can track by using cloudTrail??
With cloudtrail you could monitor all things that happens in your aws account. The cloudtrail logs are good detailed and have full information of an event, like login or user creation, for example.
In order to visualize and manage alerts, you have to develop and deploy a solution. There are a lot of solutions out there.
Check these:
https://aws.amazon.com/es/blogs/mt/monitor-changes-and-auto-enable-logging-in-aws-cloudtrail/
https://aws.amazon.com/es/blogs/big-data/streamline-aws-cloudtrail-log-visualization-using-aws-glue-and-amazon-quicksight/
In my personal experience, I deployed an ELK solution in order to analize and visualize the logs.
The solution you will deploy depends a lot of your use case. So, for example, if you need a complex systems that manage multiple alerts, notifications and complex alerts conditions, I strongly recommend that you use an ELK system. But if you just want to alert when a critical event is triggered, you could use some of the AWS-Blog solutions.
I found another solution that don't require to develop code and use only aws services:
https://github.com/awsdocs/aws-cloudtrail-user-guide/blob/master/doc_source/monitor-cloudtrail-log-files-with-cloudwatch-logs.md
I've stored analytics in a BigQuery dataset, which I've been doing for over 1.5 years by now, and have hooked up DataStudio, etc and other tools to analyse the data. However, I very rarely look at this data. Now I logged in to check it, and it's just completely gone. No trace of the dataset, and no audit log anywhere showing what happened. I've tracked down when it disappeared via the billing history, and it seems that it mysteriously was deleted in November last year.
My question to the community is: Is there any hope that I can find out what happened? I'm thinking audit logs etc. Does BigQuery have any table-level logging? For how long does GCP store these things? I understand the data is probably deleted since it was last seen so long ago, I'm just trying to understand if we were hacked in some way.
I mean, ~1 TB of data can't just disappear without leaving any traces?
Usually, Cloud Audit Logging is used for this
Cloud Audit Logging maintains two audit logs for each project and organization: Admin Activity and Data Access. Google Cloud Platform services write audit log entries to these logs to help you answer the questions of "who did what, where, and when?" within your Google Cloud Platform projects.
Admin Activity logs contain log entries for API calls or other administrative actions that modify the configuration or metadata of resources. They are always enabled. There is no charge for your Admin Activity audit logs
Data Access audit logs record API calls that create, modify, or read user-provided data. To view the logs, you must have the IAM roles Logging/Private Logs Viewer or Project/Owner. ... BigQuery Data Access logs are enabled by default and cannot be disabled. They do not count against your logs allotment and cannot result in extra logs charges.
The problem for you is retention for Data Access logs - 30 days (Premium Tier) or 7 days (Basic Tier). Of course, for longer retention, you can export audit log entries and keep them for as long as you wish. So if you did not do this you lost these entries and your only way is to contact Support, I think
I am currently signed up to the free tier of AWS. I am enjoying experimenting with various services including those not affording by said free tier. Can AWS's enhanced budgets be used to stop services like EC2 instances if I accidentally spend too much? Or do they merely act as alerts?
This is available for EC2, I don't think it is available for all of the AWS resources.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html
Hope it helps.
There are several posts which looks it from different perspectives, such as this and this.
Having a cost cap might be a crucial requirement based on the usage, especially when considering how complex it is to set the things up properly and keeping everything secure on the cloud for an average user. At least we can expect to have a feature to switch on/off a cost-cap service, so a user can decide their own scenario easily.
Closest solution that I found is here:
Serverless Automated Cost Controls
https://aws.amazon.com/blogs/compute/serverless-automated-cost-controls-part1
It explains how to trigger AWS Lambda function to change IAM permission from EC2FullAccess to EC2ReadOnly when the budget exceeds the limit.
There is no built-in way to terminate services based on budgets or billing alarms.
You can get notified automatically, but it is then up to you to determine how to handle it.
Would you really want AWS automatically terminating your production infrastructure because you went $1 over your estimated monthly spending?
Edit: There is now a way to monitor and alert on free tier usage, and when your predicted usage will exceed the free tier. See here for details. You could probably come up with a way to terminate infrastructure based on an alert using SNS & lambda.
Edit 2: In Oct. 2020, AWS released Budget Actions - the ability to trigger an action when a budget thresholds are reached. This should give you the ability to automate a response - you can shut down servers, change IAM permissions to prevent additional infrastructure from being created, etc.
Recently, Amazon has given "budget action" to carry out actions like stop services automatically if the budget has exceeded.
https://aws.amazon.com/about-aws/whats-new/2020/10/announcing-aws-budgets-actions/
https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/budgets-controls.html#:~:text=select%20Configure%20thresholds.-,To%20configure%20a%20budget%20action,-Under%20Configure%20thresholds
I'd like to use AWS AccessLogs for processing website impressions using an existing batch oriented ETL pipeline that grabs last finished hour of impressions and do a lot of further transformations with them.
The problem with AccessLog though is that :
Note, however, that some or all log file entries for a time period can
sometimes be delayed by up to 24 hours
So I would never know when all the logs for a particular hour are complete.
I unfortunately cannot use any streaming solution, I need to use existing pipeline that grabs hourly batches of data.
So my question is, is there any way to be notified that all logs has been delivered to s3 for a particular hour?
You have asked about S3, but your pull-quote is from the documentation for CloudFront.
Either way, though, it doesn't matter. This is just a caveat, saying that log delivery might sometimes be delayed, and that if it's delayed, this is not a bug -- it's a side effect of a massive, distributed system.
Both services operate an an incomprehensibly large scale, so periodically, things go wrong with small parts of the system, and eventually some stranded logs or backlogged logs may be found and delivered. Rarely, they can even arrive days or weeks later.
There is no event that signifies that all of the logs are finished, because there's no single point within such a system that is aware of this.
But here is the takeaway concept: the majority of logs will arrive within minutes, but this isn't guaranteed. Once you start running traffic and observing how the logging works, you'll see what I am referring to. Delayed logs are the exception, and you should be able to develop a sense, fairly rapidly, of how long you need to wait before processing the logs for a given wall clock hour. As long as you track what you processed, you can audit this against the bucket, later, to ensure that yout process is capturing a sufficient proportion of the logs.
Since the days before CloudFront had SNI support, I have been routing traffic to some of my S3 buckets using HAProxy in EC2 in the same region as the bucket. This gave me the ability to use custom hostnames, and SNI, but also gave me real-time logging of all the bucket traffic using HAProxy, which can stream copies of its logs to a log collector for real-time analysis over UDP, as well as writing it to syslog. There is no measurable difference in performance with this solution, and HAProxy runs extremely well on t2-class servers, so it is cost-effective. You do, of course, introduce more costs and more to maintain, but you can even deploy HAProxy between CloudFront and S3 as long as you are not using an origin access identity. One of my larger services does exactly this, a holdover from the days before Lambda#Edge.