In my company we are storing log files in cloudwatch and then after 7days it will get sent to s3 however I have trouble finding exactly where log files are being stored in s3.
Since process of moving from cloudwatch to s3 is automated I've followed https://medium.com/tensult/exporting-of-aws-cloudwatch-logs-to-s3-using-automation-2627b1d2ee37 in hope to find the path.
We are not using step functions so I've check lambda services however there were no function that move log file from cloudwatch to s3.
I've tried looking at cloudwatch rules in hope to fine something like:
{
"region":"REGION",
"logGroupFilter":"prod",
"s3BucketName":"BUCKET_NAME",
"logFolderName":"backend"
}
so I can find which bucket log files are going to and into which folder.
How can I find where my logs are stored, if moving data is being automated why is there no functions visible?
addtional note: I am new to aws, if there is good resource on aws architecture please recommend.
Thanks in advance!
If the rule exists or was created properly then you must see it in the AWS console and same is the true for S3 bucket.
One common problem when it comes to visibility of an asset in AWS console is wrong region selection. So verify in which region the rule and the S3 bucket was created, if they were ever created and selecting the right region on the top right corner should show the assets in that region.
Hope it helps!
Have you tried using the View all exports to Amazon S3 in the CloudWatch -> >Logs console. It is one of the items in the Actions menu.
Related
I am looking for a way to trigger the Jenkins job whenever the s3 bucket is updated with a particular file format.
I have tried a lambda function method with an "Add trigger -> s3 bucket PUT method". I have followed this article. But it's not working. I have explored and I have found out that "AWS SNS" and "AWS SQS" also can use for this, but the problem is some are saying this is outdated. So which is the simplest way to trigger the Jenkins job when the s3 bucket is updated?
I just want a trigger, whenever the zip file is updated from job A in jenkins1 to the S3 bucket name called 'testbucket' in AWS enviornment2. Both Jenkins are in different AWS accounts under seperate private VPC. I have attached my Jenkins workflow as a picture. Please refer below picture.
The approach you are using seems solid and a good way to go. I'm not sure what specific issue you are having so I'll list a couple things that could explain why this might not be working for you:
Permissions issue - Check to ensure that the Lambda can be invoked by the S3 service. If you are doing this in the console (manually) then you probably don't have to worry about that since the permissions should be automatically setup. If you're doing this through infrastructure as code then it's something you need to add.
Lambda VPC config - Your lambda will need to run out of the same subnet in your VPC that the Jenkins instance runs out of. Lambda by default will not be associated to a VPC and will not have access to the Jenkins instance (unless it's publicly available over the internet).
I found this other stackoverflow post that describes the SNS/SQS setup if you want to continue down that path Trigger Jenkins job when a S3 file is updated
I created a S3 bucket and populated it by uploading few files in it. But I am not able to validate my use case to check the bucket size as the default metrics of total bucket size returns no data always.
Note: I am created the bucket in the AWS Console UI with default settings.
I waited for more than a week and even then I still see no data under Metrics tab and hence the s3 bucket is not listed in cloudwatch as well to configure the alarms.
Has any one faced the similar issue and help with how to resolve it?
Attaching screenshot of Metrics tab of my S3 bucket for reference,
The S3 storage metrics are visible under the Metrics Tab in the Bucket and CloudWatch after several hours after creating and bucket showing up at an unspecified time. Link for reference, https://docs.aws.amazon.com/AmazonS3/latest/userguide/cloudwatch-monitoring.html
I'm trying to run a simple GroundTruth labeling job with a public workforce. I upload my images to S3, start creating the labeling job, generate the manifest using their tool automatically, and explicitly specify a role that most certainly has permissions on both S3 bucket (input and output) as well as full access to SageMaker. Then I create the job (standard rest of stuff -- I just wanted to be clear that I'm doing all of that).
At first, everything looks fine. All green lights, it says it's in progress, and the images are properly showing up in the bottom where the dataset is. However, after a few minutes, the status changes to Failure and I get this: ClientError: Access Denied. Cannot access manifest file: arn:aws:sagemaker:us-east-1:<account number>:labeling-job/<job name> using roleArn: null in the reason for failure.
I also get the error underneath (where there used to be images but now there are none):
The specified key <job name>/manifests/output/output.manifest isn't present in the S3 bucket <output bucket>.
I'm very confused for a couple of reasons. First of all, this is a super simple job. I'm just trying to do the most basic bounding box example I can think of. So this should be a very well-tested path. Second, I'm explicitly specifying a role arn, so I have no idea why it's saying it's null in the error message. Is this an Amazon glitch or could I be doing something wrong?
The role must include SageMakerFullAccess and access to the S3 bucket, so it looks like you've got that covered :)
Please check that:
the user creating the labeling job has Cognito permissions: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-getting-started-step1.html
the manifest exists and is at the right S3 location.
the bucket is in the same region as SageMaker.
the bucket doesn't have any bucket policy restricting access.
If that still doesn't fix it, I'd recommend opening a support ticket with the labeling job id, etc.
Julien (AWS)
There's a bug whereby sometimes the console will say something like 401 ValidationException: The specified key s3prefix/smgt-out/yourjobname/manifests/output/output.manifest isn't present in the S3 bucket yourbucket. Request ID: a08f656a-ee9a-4c9b-b412-eb609d8ce194 but that's not the actual problem. For some reason the console is displaying the wrong error message. If you use the API (or AWS CLI) to DescribeLabelingJob like
aws sagemaker describe-labeling-job --labeling-job-name yourjobname
you will see the actual problem. In my case, one of the S3 files that define the UI instructions was missing.
I had the same issue when I tried to write to a different bucket to the one that was used successfully before.
Apparently the IAM role ARN can be assigned permissions for a particular bucket only.
I would suggest to refer to CloudWatch logs and look for a CloudWatch>>CloudWatch Logs >> Log groups >> /aws/sagemaker/LabelingJobs group. I had all points ticked from another post, but my pre-processing Lambda function had wrong id for my region and the error was obvious in the logs.
I have recently joined a company that uses S3 Buckets for various different projects within AWS. I want to identify and potentially delete S3 Objects that are not being accessed (read and write), in an effort to reduce the cost of S3 in my AWS account.
I read this, which helped me to some extent.
Is there a way to find out which objects are being accessed and which are not?
There is no native way of doing this at the moment, so all the options are workarounds depending on your usecase.
You have a few options:
Tag each S3 Object (e.g. 2018-10-24). First turn on Object Level Logging for your S3 bucket. Set up CloudWatch Events for CloudTrail. The Tag could then be updated by a Lambda Function which runs on a CloudWatch Event, which is fired on a Get event. Then create a function that runs on a Scheduled CloudWatch Event to delete all objects with a date tag prior to today.
Query CloudTrail logs on, write a custom function to query the last access times from Object Level CloudTrail Logs. This could be done with Athena, or a direct query to S3.
Create a Separate Index, in something like DynamoDB, which you update in your application on read activities.
Use a Lifecycle Policy on the S3 Bucket / key prefix to archive or delete the objects after x days. This is based on upload time rather than last access time, so you could copy the object to itself to reset the timestamp and start the clock again.
No objects in Amazon S3 are required by other AWS services, but you might have configured services to use the files.
For example, you might be serving content through Amazon CloudFront, providing templates for AWS CloudFormation or transcoding videos that are stored in Amazon S3.
If you didn't create the files and you aren't knowingly using the files, can you probably delete them. But you would be the only person who would know whether they are necessary.
There is recent AWS blog post which I found very interesting and cost optimized approach to solve this problem.
Here is the description from AWS blog:
The S3 server access logs capture S3 object requests. These are generated and stored in the target S3 bucket.
An S3 inventory report is generated for the source bucket daily. It is written to the S3 inventory target bucket.
An Amazon EventBridge rule is configured that will initiate an AWS Lambda function once a day, or as desired.
The Lambda function initiates an S3 Batch Operation job to tag objects in the source bucket. These must be expired using the following logic:
Capture the number of days (x) configuration from the S3 Lifecycle configuration.
Run an Amazon Athena query that will get the list of objects from the S3 inventory report and server access logs. Create a delta list with objects that were created earlier than 'x' days, but not accessed during that time.
Write a manifest file with the list of these objects to an S3 bucket.
Create an S3 Batch operation job that will tag all objects in the manifest file with a tag of "delete=True".
The Lifecycle rule on the source S3 bucket will expire all objects that were created prior to 'x' days. They will have the tag given via the S3 batch operation of "delete=True".
Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs
I'm trying to setup an EventArc trigger on a google cloud run project, to be run whenever a new file is uploaded to a particular bucket.
The problem is, I can only get it to work if I choose any resource, i.e files uploaded to any of my buckets would run the trigger. However, I only want it to run for files uploaded to a specific bucket.
It asks me to enter the 'full resource name' if I choose 'a specific resource'. But there seems to be no documentation on how to format the bucket name so that it will work. I've tried the bucket name, projects/_/buckets/my_bucket_name, but the trigger never runs unless I choose 'any resource'.
Any ideas how to give it the bucket name so this will work?
I think the answer may be buried in here .... cloud.google.com/blog/topics/developers-practitioners/… If we read it deeply, we seem to see that event origination is based on audit records being created. We see that a record is created when a new object is created in your bucket. We then read that we can filter on resource name (the name of the object). However it says that wildcards are not yet supported ... so you could trigger on a specific object name ... but not a name which is prefixed by your bucket name.
Eventarc now supports wildcards (path pattern) for cloud audit log based events (e.g., storage.objects.create)
screenshot