How to disable AWS Lambda Edge Logs - amazon-web-services

I'm using Cloud front + Lambda Edge. Each Lambda invocation creates a cloud watch log entry in the closest AWS region. This results in lots of Cloud watch log streams scattered around the globe in all possible regions.
As the default retention of Cloud watch logs is to never expire, both the data and the number of streams builds quickly.
Locating these logs and setting a reasonable retention is a chore.
Is there a way to disable these logs completely in Lambda Edge?

If you remove the CloudWatch permissions from your Lambda execution role, it will stop putting logs there. By default, every Lambda function gets this permission.

Related

Advice on filtering GCP Cloud Storage event notifications, based on an object's prefix, when triggering a GCP Cloud Function?

Currently, I am moving services from AWS to GCP. Previously, I relied on an AWS S3 bucket and the inbuilt service's logic to configure event notifications to get triggered when an object with a particular prefix was inserted into my bucket. This specific event notification, which contained the prefix, would then be fed forward to trigger a lambda function.
However, now I have to leverage GCP Cloud Storage buckets and trigger a Cloud Function. My observations so far have been that I can't specify a prefix/suffix as part of my Cloud Storage service's bucket directly. Instead, I have to specify a Cloud Storage bucket to monitor during the creation of my Cloud Function. My concern with this approach is that I can't limit the bucket's object events to the three of interest to me: '_MANIFEST' '_PROCESSING' and '_PROCESSED' but rather have to pick an global event notification type of interest to me such as 'OBJECT_FINALIZE'.
There are two viable approaches I can see to this problem:
Have all the 'OBJECT_FINALIZE' event notifications trigger the Cloud Function and filter out any additional objects (those which don't contain the prefix). The issue with this approach is the unneccessary activation of the Cloud Function and the additional log files getting generated - which are of no inherent value.
Use the audit logs generated by the Cloud Storage bucket and create rules to generate events based on the watched trigger file i.e. '_MANIFEST', 'PROCESSING' and 'PROCESSED'. My concern with this approach is that I don't know how easily it will be to forward all the information about the bucket I'm interested in if I'm generating the event based on a logging rule - I am primarily interested in the information which gets forwarded by an event notification. Also, currently I have verified that the object being added to my Cloud Storage bucket is not public and I have enabled the following:
However, I tried to filter the audit logs in the GCP 'Monitoring' service (after adding a _MANIFEST object to the bucket of course) but the logs are not appearing within the 'Log Explorer'.
Any advice on how I should approach filtering the event notification of interest in GCP, when triggering my Cloud Function, would be greatly appreciated.
To achieve this, you can sink the Cloud Storage notification into PubSUb.
Then, you can create a PubSUb push subscription to your Cloud Functions (it's no longer a background functions triggered by Cloud Storage event, but and HTTP function trigger by HTTP request.
The main advantage of doing that is that you can specify a filter on PubSub push subscription that allow you to activate your Cloud Functions (or any other HTTP endpoint) only with the pattern is enforced.

Idea and guidelines on end to end AWS solution

I want to build an end to end automated system which consists of the following steps:
Getting data from source to landing bucket AWS S3 using AWS Lambda
Running some transformation job using AWS Lambda and storing in processed bucket of AWS S3
Running Redshift copy command using AWS Lambda to push the transformed/processed data from AWS S3 to AWS Redshift
From the above points, I've completed pulling data, transforming data and running manual copy command from a Redshift using a SQL query tool.
Doubts:
I've heard AWS CloudWatch can be used to schedule/automate things but never worked on it. So, if I want to achieve the steps above in a streamlined fashion, how to go about it?
Should I use Lambda to trigger copy and insert statements? Or are there better AWS services to do the same?
Any other suggestion on other AWS Services and of the likes are most welcome.
Constraint: Want as many tasks as possible to be serverless (except for semantic layer, Redshift).
CloudWatch:
Your options here are either to use CloudWatch Alarms or Events.
With alarms, you can respond to any metric of your system (eg CPU utilization, Disk IOPS, count of Lambda invocations etc) when it crosses some threshold, and when this alarm is triggered, invoke a lambda function (or send SNS notification etc) to perform a task.
With events you can use either a cron expression or some AWS service event (eg EC2 instance state change, SNS notification etc) to then trigger another service (eg Lambda), so you could for example run some kind of clean-up operation via lambda on a regular schedule, or create a snapshot of an EBS volume when its instance is shut down.
Lambda itself is a very powerful tool, and should allow you to program a decent copy/insert function in a language you are familiar with. AWS has several GitHub repos with lots of examples too, see for example the serverless examples and many samples. There may be other services which could work for you in your specific case, but part of Lambda's power is its flexibility.

CloudWatch not receiving Cloudtrail logs from outside region

I am struggling with detecting activities performed outside of a given region in CloudWatch. For example, if an InternetGateway is created in the same region as the CloudWatch Event (let's say eu-central-1), it is detected by CloudWatch, however if it's somewhere else (let's say eu-west-1) it won't catch the event.
However, Cloudtrail does capture the event in the given region (it is activated across regions) as I can see it in the event history of this particular region. (let's say eu-west-1 again).
How can I get CloudWatch to act upon what is happening regardless of the region of creation?
Should I create the CloudWatch Event in each region, as well as the lambda function associated with the remediation?
Or is there a way to capture the logs of all regions and deal with them in a singular space?
You should be able to get cross-region cloudtrail logs into a single bucket:
Receiving CloudTrail Log Files from Multiple Regions You can configure CloudTrail to deliver log files from multiple regions to a
single S3 bucket for a single account. For example, you have a trail
in the US West (Oregon) Region that is configured to deliver log files
to a S3 bucket, and a CloudWatch Logs log group. When you apply the
trail to all regions, CloudTrail creates a new trail in all other
regions. This trail has the original trail configuration. CloudTrail
delivers log files to the same S3 bucket and CloudWatch Logs log
group.
from: https://docs.aws.amazon.com/awscloudtrail/latest/userguide/receive-cloudtrail-log-files-from-multiple-regions.html
I have the a similar problem with CloudTrail going to CloudWatch Logs. I wanted to receive CloudTrail events for both eu-west-1 and global events for Route 53 (which seem to come from us-east-1) into a CloudWatch Logs stream so I could add some further monitoring and alerting of our AWS account.
The documentation for this at https://docs.aws.amazon.com/awscloudtrail/latest/userguide/send-cloudtrail-events-to-cloudwatch-logs.html is quite good and easy to follow, and even mentions:
Note
A trail that applies to all regions sends log files from all regions to the CloudWatch Logs log group that you specify.
However, I could not get this to work. I also tried making the log delivery IAM policy more permissive - the default policy includes the region name in the stream name and I thought this might change for logs from other regions - but this didn't help. Ultimately I could not get anything from outside eu-west-1 to be delivered to CloudWatch Logs, even though events were correctly appearing in the S3 bucket.
I ended up working around this by creating a second duplicate trail in us-east-1 and delivering logs for that region to Cloudwatch Logs also in that region.

How does Amazon CloudWatch batch logs when streaming to AWS Lambda?

The AWS documentation indicates that multiple log event records are provided to Lambda when streaming logs from CloudWatch.
logEvents
The actual log data, represented as an array of log event
records. The "id" property is a unique identifier for every log event.
How does CloudWatch group these logs?
Time? Count? Randomly, from my perspective?
Currently you get one Lambda invocation for every PutLogEvents batch that CloudWatch Logs had received against that log group. However you should probably not rely on that because AWS could always change it (for example batch more, etc).
You can observe this behavior by running the CWL -> Lambda example in the AWS docs.
Some aws services allow you to configure the log intervals such as elastic load balancing. There's a choice between five and sixty minute log intervals. You may not see a specific increment or parameter in the docs because they are configurable based on each service.

AWS Cloudwatch monitoring for S3

Amazon Cloudwatch provides some very useful metrics for monitoring my EC2s, load balancers, elasticache and RDS databases, etc and allows me to set alarms for a whole range of criteria; but is there any way to configure it to monitor my S3s as well? Or are there any other monitoring tools (besides simply enabling logging) that will help me monitor the numbers of POST/GET requests and data volumes for my S3 resources? And to provide alarms for thresholds of activity or increased datastorage?
AWS S3 is a managed storage service. The only metrics available in AWS CloudWatch for S3 are NumberOfObjects and BucketSizeBytes. In order to understand your S3 usage better you need to do some extra work.
I have recently written an AWS Lambda function to do exactly what you ask for and it's available here:
https://github.com/maginetv/s3logs-cloudwatch
It works by parsing S3 Server side log files and aggregates/exports metrics to AWS Cloudwatch (CloudWatch allows you to publish custom metrics).
Example graphs that you will get in AWS CloudWatch after deploying this function on your AWS account are:
RestGetObject_RequestCount
RestPutObject_RequestCount
RestHeadObject_RequestCount
BatchDeleteObject_RequestCount
RestPostMultiObjectDelete_RequestCount
RestGetObject_HTTP_2XX_RequestCount
RestGetObject_HTTP_4XX_RequestCount
RestGetObject_HTTP_5XX_RequestCount
+ many others
Since metrics are exported to CloudWatch, you can easily set up alarms for them as well.
CloudFormation template is included in GitHub repo and you can deploy this function very quickly to gain visibility into your S3 bucket usage.
EDIT 2016-12-10:
In November 2016 AWS has added extra S3 request metrics in CloudWatch that can be enabled when needed. This includes metrics like AllRequests, GetRequests, PutRequests, DeleteRequests, HeadRequests etc. See Monitoring Metrics with Amazon CloudWatch documentation for more details about this feature.
I was also unable to find any way to do this with CloudWatch. This question from April 2012 was answered by Derek#AWS as not having S3 support in CloudWatch. https://forums.aws.amazon.com/message.jspa?messageID=338089
The only thing I could think of would be to import the S3 access logs to a log service (like Splunk). Then create a custom cloud watch metric where you post the data that you parse from the logs. But then you have to filter out the polling of the access logs and…
And while you were at it, you could just create the alarms in Splunk instead of in S3.
If your use case is to simply alert when you are using it too much, you could set up an account billing alert for your S3 usage.
I think this might depend on where you are looking to track the access from. I.e. if you are trying to measure/watch usage of S3 objects from outside http/https requests then Anthony's suggestion if enabling S3 logging and then importing into splunk (or redshift) for analysis might work. You can also watch billing status on requests every day.
If trying to guage usage from within your own applications, there are some AWS SDK cloudwatch metrics:
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/metrics/package-summary.html
and
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/metrics/S3ServiceMetric.html
S3 is a managed service, meaning that you don't need to take action based on system events in order to keep it up and running (as long as you can afford to pay for the service's usage). The spirit of CloudWatch is to help with monitoring services that require you to take action in order to keep them running.
For example, EC2 instances (which you manage yourself) typically need monitoring to alert when they're overloaded or when they're underused or else when they crash; at some point action needs to be taken in order to spin up new instances to scale out, spin down unused instances to scale back in, or reboot instances that have crashed. CloudWatch is meant to help you do the job of managing these resources more effectively.
To enable Request and Data transfer metrics in your bucket you can run the below command. Be aware that these are paid metrics.
aws s3api put-bucket-metrics-configuration \
--bucket YOUR-BUCKET-NAME \
--metrics-configuration Id=EntireBucket
--id EntireBucket
This tutorial describes how to do it in AWS Console with point and click interface.