Access AWS CloudWatch Logs and Metrics Off-Prem - amazon-web-services

AWS allows us to capture all kinds of metrics and logs through CloudWatch. Are these data accessible outside the AWS cloud environment (assuming proper permissions and policies are set to allow it to be so)?
For example, could these data be backed up and stored on-prem?
I imagine a Lambda function could be created to access say S3 data and fetch it through the Gateway API, but are CloudWatch data stored in S3?

Log data on CloudWatch is stored in S3 which we can not access. However, you can export log to S3.
Doc says..
You can export log data from your log groups to an Amazon S3 bucket
and use this data in custom processing and analysis, or to load onto
other systems.
...
To begin the export process, you must create an S3 bucket to store the
exported log data. You can store the exported files in your Amazon S3
bucket and define Amazon S3 lifecycle rules to archive or delete
exported files automatically.
Then you can simply download from S3 or use services as you like.

The raw metrics stored in CloudWatch Metrics are not accessible.
For example, when each Amazon EC2 instance sends CPUUtilization to CloudWatch.
Instead, aggregated metrics can be queried, such as "Average CPU Utilization over a 5-minute period".
This is different to CloudWatch Logs, which can be exported to Amazon S3.

Related

Where to find Automatic and Manual DocumentDB snapshots in S3?

I see that AWS DocumentDB is creating automatic snapshots daily and I myself can create manual snapshots from AWS Console. The documentation says that the snapshot is saved in S3 but it is not visible on S3 to me.
I basically want to move the DocumentDB data to S3 in order to propagate it further to other AWS services for monitoring purposes. I was thinking if I can trigger a manual snapshot daily and have a lambda trigger on S3 file upload by DocumentDB.
How can I see the automatic and manual snapshot created by DocumentDB on S3?
Backups in Amazon DocumentDB are stored in service-managed S3 buckets and thus there is no way to access the backups directly.
Two options here are:
1/use mongodump/mongoexport on a schedule: https://docs.aws.amazon.com/documentdb/latest/developerguide/backup_restore-dump_restore_import_export_data.html
2/use change streams to incrementally write to S3: https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html

AWS CloudTrail: Can I disable S3 storage for trails, since I'm using CloudWatch

When setting up CloudTrail, you must specific a S3 bucket to store the data in.
Since I'm using CloudWatch (and CloudWatch metrics/alarms) for storage, I do not believe that I also need to store the data redundantly in S3.
Is there a reason even after configuring CloudWatch for CloudTrail, that I must also keep using S3 storage? Is there a way to turn off S3 storage for CloudTrail?
You can turn off logging for any trail:
When you create a trail, logging is turned on automatically. You can turn off logging for a trail. Previous logs will still be accessible.
See https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-turning-off-logging.html

Identifying and deleting S3 Objects that are not being accessed?

I have recently joined a company that uses S3 Buckets for various different projects within AWS. I want to identify and potentially delete S3 Objects that are not being accessed (read and write), in an effort to reduce the cost of S3 in my AWS account.
I read this, which helped me to some extent.
Is there a way to find out which objects are being accessed and which are not?
There is no native way of doing this at the moment, so all the options are workarounds depending on your usecase.
You have a few options:
Tag each S3 Object (e.g. 2018-10-24). First turn on Object Level Logging for your S3 bucket. Set up CloudWatch Events for CloudTrail. The Tag could then be updated by a Lambda Function which runs on a CloudWatch Event, which is fired on a Get event. Then create a function that runs on a Scheduled CloudWatch Event to delete all objects with a date tag prior to today.
Query CloudTrail logs on, write a custom function to query the last access times from Object Level CloudTrail Logs. This could be done with Athena, or a direct query to S3.
Create a Separate Index, in something like DynamoDB, which you update in your application on read activities.
Use a Lifecycle Policy on the S3 Bucket / key prefix to archive or delete the objects after x days. This is based on upload time rather than last access time, so you could copy the object to itself to reset the timestamp and start the clock again.
No objects in Amazon S3 are required by other AWS services, but you might have configured services to use the files.
For example, you might be serving content through Amazon CloudFront, providing templates for AWS CloudFormation or transcoding videos that are stored in Amazon S3.
If you didn't create the files and you aren't knowingly using the files, can you probably delete them. But you would be the only person who would know whether they are necessary.
There is recent AWS blog post which I found very interesting and cost optimized approach to solve this problem.
Here is the description from AWS blog:
The S3 server access logs capture S3 object requests. These are generated and stored in the target S3 bucket.
An S3 inventory report is generated for the source bucket daily. It is written to the S3 inventory target bucket.
An Amazon EventBridge rule is configured that will initiate an AWS Lambda function once a day, or as desired.
The Lambda function initiates an S3 Batch Operation job to tag objects in the source bucket. These must be expired using the following logic:
Capture the number of days (x) configuration from the S3 Lifecycle configuration.
Run an Amazon Athena query that will get the list of objects from the S3 inventory report and server access logs. Create a delta list with objects that were created earlier than 'x' days, but not accessed during that time.
Write a manifest file with the list of these objects to an S3 bucket.
Create an S3 Batch operation job that will tag all objects in the manifest file with a tag of "delete=True".
The Lifecycle rule on the source S3 bucket will expire all objects that were created prior to 'x' days. They will have the tag given via the S3 batch operation of "delete=True".
Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs

AWS CloudWatch Metrics directly to S3

Is there a way to get CloudWatch Metrics directly into S3? I don't need logs but ELB Metrics. I would like them logged to S3 on a regular basis (ideally as CSV).
Right now, I'm thinking of writing my own script to do it, but maybe there's there's an automatic way to put it in S3 (or Redshift)?
CloudWatch itself does not have a native export feature that will send data periodically to S3.
As you suggest, you would need to develop a scrip tthat pulls the CloudWatch metrics that you wish to store (in this case ELB metrics) using the AWS CLI and copy those metrics to your S3 bucket on a regular basis.
Using the get-metric-statistics command, the script would get the statistics for the specified metric, and store the data to your S3 bucket
See also Elastic Load Balancing Dimensions and Metrics

AWS Cloudwatch monitoring for S3

Amazon Cloudwatch provides some very useful metrics for monitoring my EC2s, load balancers, elasticache and RDS databases, etc and allows me to set alarms for a whole range of criteria; but is there any way to configure it to monitor my S3s as well? Or are there any other monitoring tools (besides simply enabling logging) that will help me monitor the numbers of POST/GET requests and data volumes for my S3 resources? And to provide alarms for thresholds of activity or increased datastorage?
AWS S3 is a managed storage service. The only metrics available in AWS CloudWatch for S3 are NumberOfObjects and BucketSizeBytes. In order to understand your S3 usage better you need to do some extra work.
I have recently written an AWS Lambda function to do exactly what you ask for and it's available here:
https://github.com/maginetv/s3logs-cloudwatch
It works by parsing S3 Server side log files and aggregates/exports metrics to AWS Cloudwatch (CloudWatch allows you to publish custom metrics).
Example graphs that you will get in AWS CloudWatch after deploying this function on your AWS account are:
RestGetObject_RequestCount
RestPutObject_RequestCount
RestHeadObject_RequestCount
BatchDeleteObject_RequestCount
RestPostMultiObjectDelete_RequestCount
RestGetObject_HTTP_2XX_RequestCount
RestGetObject_HTTP_4XX_RequestCount
RestGetObject_HTTP_5XX_RequestCount
+ many others
Since metrics are exported to CloudWatch, you can easily set up alarms for them as well.
CloudFormation template is included in GitHub repo and you can deploy this function very quickly to gain visibility into your S3 bucket usage.
EDIT 2016-12-10:
In November 2016 AWS has added extra S3 request metrics in CloudWatch that can be enabled when needed. This includes metrics like AllRequests, GetRequests, PutRequests, DeleteRequests, HeadRequests etc. See Monitoring Metrics with Amazon CloudWatch documentation for more details about this feature.
I was also unable to find any way to do this with CloudWatch. This question from April 2012 was answered by Derek#AWS as not having S3 support in CloudWatch. https://forums.aws.amazon.com/message.jspa?messageID=338089
The only thing I could think of would be to import the S3 access logs to a log service (like Splunk). Then create a custom cloud watch metric where you post the data that you parse from the logs. But then you have to filter out the polling of the access logs and…
And while you were at it, you could just create the alarms in Splunk instead of in S3.
If your use case is to simply alert when you are using it too much, you could set up an account billing alert for your S3 usage.
I think this might depend on where you are looking to track the access from. I.e. if you are trying to measure/watch usage of S3 objects from outside http/https requests then Anthony's suggestion if enabling S3 logging and then importing into splunk (or redshift) for analysis might work. You can also watch billing status on requests every day.
If trying to guage usage from within your own applications, there are some AWS SDK cloudwatch metrics:
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/metrics/package-summary.html
and
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/metrics/S3ServiceMetric.html
S3 is a managed service, meaning that you don't need to take action based on system events in order to keep it up and running (as long as you can afford to pay for the service's usage). The spirit of CloudWatch is to help with monitoring services that require you to take action in order to keep them running.
For example, EC2 instances (which you manage yourself) typically need monitoring to alert when they're overloaded or when they're underused or else when they crash; at some point action needs to be taken in order to spin up new instances to scale out, spin down unused instances to scale back in, or reboot instances that have crashed. CloudWatch is meant to help you do the job of managing these resources more effectively.
To enable Request and Data transfer metrics in your bucket you can run the below command. Be aware that these are paid metrics.
aws s3api put-bucket-metrics-configuration \
--bucket YOUR-BUCKET-NAME \
--metrics-configuration Id=EntireBucket
--id EntireBucket
This tutorial describes how to do it in AWS Console with point and click interface.