Receiving an EventArc trigger for a specific GCS bucket only - google-cloud-platform

I'm trying to setup an EventArc trigger on a google cloud run project, to be run whenever a new file is uploaded to a particular bucket.
The problem is, I can only get it to work if I choose any resource, i.e files uploaded to any of my buckets would run the trigger. However, I only want it to run for files uploaded to a specific bucket.
It asks me to enter the 'full resource name' if I choose 'a specific resource'. But there seems to be no documentation on how to format the bucket name so that it will work. I've tried the bucket name, projects/_/buckets/my_bucket_name, but the trigger never runs unless I choose 'any resource'.
Any ideas how to give it the bucket name so this will work?

I think the answer may be buried in here .... cloud.google.com/blog/topics/developers-practitioners/… If we read it deeply, we seem to see that event origination is based on audit records being created. We see that a record is created when a new object is created in your bucket. We then read that we can filter on resource name (the name of the object). However it says that wildcards are not yet supported ... so you could trigger on a specific object name ... but not a name which is prefixed by your bucket name.

Eventarc now supports wildcards (path pattern) for cloud audit log based events (e.g., storage.objects.create)
screenshot

Related

finding where my log files are being sent to aws

In my company we are storing log files in cloudwatch and then after 7days it will get sent to s3 however I have trouble finding exactly where log files are being stored in s3.
Since process of moving from cloudwatch to s3 is automated I've followed https://medium.com/tensult/exporting-of-aws-cloudwatch-logs-to-s3-using-automation-2627b1d2ee37 in hope to find the path.
We are not using step functions so I've check lambda services however there were no function that move log file from cloudwatch to s3.
I've tried looking at cloudwatch rules in hope to fine something like:
{
"region":"REGION",
"logGroupFilter":"prod",
"s3BucketName":"BUCKET_NAME",
"logFolderName":"backend"
}
so I can find which bucket log files are going to and into which folder.
How can I find where my logs are stored, if moving data is being automated why is there no functions visible?
addtional note: I am new to aws, if there is good resource on aws architecture please recommend.
Thanks in advance!
If the rule exists or was created properly then you must see it in the AWS console and same is the true for S3 bucket.
One common problem when it comes to visibility of an asset in AWS console is wrong region selection. So verify in which region the rule and the S3 bucket was created, if they were ever created and selecting the right region on the top right corner should show the assets in that region.
Hope it helps!
Have you tried using the View all exports to Amazon S3 in the CloudWatch -> >Logs console. It is one of the items in the Actions menu.

AWS SageMaker GroundTruth permissions issue (can't read manifest)

I'm trying to run a simple GroundTruth labeling job with a public workforce. I upload my images to S3, start creating the labeling job, generate the manifest using their tool automatically, and explicitly specify a role that most certainly has permissions on both S3 bucket (input and output) as well as full access to SageMaker. Then I create the job (standard rest of stuff -- I just wanted to be clear that I'm doing all of that).
At first, everything looks fine. All green lights, it says it's in progress, and the images are properly showing up in the bottom where the dataset is. However, after a few minutes, the status changes to Failure and I get this: ClientError: Access Denied. Cannot access manifest file: arn:aws:sagemaker:us-east-1:<account number>:labeling-job/<job name> using roleArn: null in the reason for failure.
I also get the error underneath (where there used to be images but now there are none):
The specified key <job name>/manifests/output/output.manifest isn't present in the S3 bucket <output bucket>.
I'm very confused for a couple of reasons. First of all, this is a super simple job. I'm just trying to do the most basic bounding box example I can think of. So this should be a very well-tested path. Second, I'm explicitly specifying a role arn, so I have no idea why it's saying it's null in the error message. Is this an Amazon glitch or could I be doing something wrong?
The role must include SageMakerFullAccess and access to the S3 bucket, so it looks like you've got that covered :)
Please check that:
the user creating the labeling job has Cognito permissions: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-getting-started-step1.html
the manifest exists and is at the right S3 location.
the bucket is in the same region as SageMaker.
the bucket doesn't have any bucket policy restricting access.
If that still doesn't fix it, I'd recommend opening a support ticket with the labeling job id, etc.
Julien (AWS)
There's a bug whereby sometimes the console will say something like 401 ValidationException: The specified key s3prefix/smgt-out/yourjobname/manifests/output/output.manifest isn't present in the S3 bucket yourbucket. Request ID: a08f656a-ee9a-4c9b-b412-eb609d8ce194 but that's not the actual problem. For some reason the console is displaying the wrong error message. If you use the API (or AWS CLI) to DescribeLabelingJob like
aws sagemaker describe-labeling-job --labeling-job-name yourjobname
you will see the actual problem. In my case, one of the S3 files that define the UI instructions was missing.
I had the same issue when I tried to write to a different bucket to the one that was used successfully before.
Apparently the IAM role ARN can be assigned permissions for a particular bucket only.
I would suggest to refer to CloudWatch logs and look for a CloudWatch>>CloudWatch Logs >> Log groups >> /aws/sagemaker/LabelingJobs group. I had all points ticked from another post, but my pre-processing Lambda function had wrong id for my region and the error was obvious in the logs.

Identifying and deleting S3 Objects that are not being accessed?

I have recently joined a company that uses S3 Buckets for various different projects within AWS. I want to identify and potentially delete S3 Objects that are not being accessed (read and write), in an effort to reduce the cost of S3 in my AWS account.
I read this, which helped me to some extent.
Is there a way to find out which objects are being accessed and which are not?
There is no native way of doing this at the moment, so all the options are workarounds depending on your usecase.
You have a few options:
Tag each S3 Object (e.g. 2018-10-24). First turn on Object Level Logging for your S3 bucket. Set up CloudWatch Events for CloudTrail. The Tag could then be updated by a Lambda Function which runs on a CloudWatch Event, which is fired on a Get event. Then create a function that runs on a Scheduled CloudWatch Event to delete all objects with a date tag prior to today.
Query CloudTrail logs on, write a custom function to query the last access times from Object Level CloudTrail Logs. This could be done with Athena, or a direct query to S3.
Create a Separate Index, in something like DynamoDB, which you update in your application on read activities.
Use a Lifecycle Policy on the S3 Bucket / key prefix to archive or delete the objects after x days. This is based on upload time rather than last access time, so you could copy the object to itself to reset the timestamp and start the clock again.
No objects in Amazon S3 are required by other AWS services, but you might have configured services to use the files.
For example, you might be serving content through Amazon CloudFront, providing templates for AWS CloudFormation or transcoding videos that are stored in Amazon S3.
If you didn't create the files and you aren't knowingly using the files, can you probably delete them. But you would be the only person who would know whether they are necessary.
There is recent AWS blog post which I found very interesting and cost optimized approach to solve this problem.
Here is the description from AWS blog:
The S3 server access logs capture S3 object requests. These are generated and stored in the target S3 bucket.
An S3 inventory report is generated for the source bucket daily. It is written to the S3 inventory target bucket.
An Amazon EventBridge rule is configured that will initiate an AWS Lambda function once a day, or as desired.
The Lambda function initiates an S3 Batch Operation job to tag objects in the source bucket. These must be expired using the following logic:
Capture the number of days (x) configuration from the S3 Lifecycle configuration.
Run an Amazon Athena query that will get the list of objects from the S3 inventory report and server access logs. Create a delta list with objects that were created earlier than 'x' days, but not accessed during that time.
Write a manifest file with the list of these objects to an S3 bucket.
Create an S3 Batch operation job that will tag all objects in the manifest file with a tag of "delete=True".
The Lifecycle rule on the source S3 bucket will expire all objects that were created prior to 'x' days. They will have the tag given via the S3 batch operation of "delete=True".
Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs

Domain bucket name taken Google Cloud Platform

I have a project on Google Cloud and I am trying to create a bucket to store my web files for my website. The only problem is I have a CNAME going from my website to 'c.storage.googleapis.com' so my bucket name has to be the same as my website name which is 'plains.cc'. When I try to create the bucket however, it says the name is already in use. I used this bucket name on a previous account but deleted it so I don't understand why I can't reuse it.
Are you still unable to create it? As per the doc, If you have deleted the bucket from your previous project then I guess this should be a timing issue. But if you have deleted the previous project directly without deleting the bucket contained within it, it could take more a month or more to get the associated data to be eventually deleted. Read document on this here.

Setting Google Cloud Platform Log sink to a specific folder within a bucket

I have created five separate log export sinks within Google Cloud Stackdriver. Currently they are all set to the same bucket (my-bucket) with the destination:
storage.googleapis.com/my-bucket
The bucket (my-bucket) has the following 5 folders:
iam, compute, firewall, project and storage
I would like to associate each log sink to one of those folders within my bucket, is this possible? The answer from a related question (Pointing multiple projects' log sinks to one bucket) seems to indicate that it is, however, I do not understand the "FOLDER_ID" and if this is what I need and if so where do I get that.
I have tried to update the destination manually in google cloud shell with the command
gcloud logging sinks update my-compute-log-sink storage.googleapis.com/my-bucket/compute
and get confirmation that the sink has been updated, however, running gcloud logging sinks list shows no change and now I am stuck.
This isn't currently possible.
The location you used when attempting to update the sink (when you include the directory) will only point to the bucket as a whole despite adding the additional information after it, so when you see the updated message, you are probably overwriting the current sink with the same information as before.
I've opened a feature request to see if it will be possible to implement the usage of specified bucket directory locations in Google Cloud Platform logging sinks. You can follow it here:
https://issuetracker.google.com/72036905
There is also a separate feature request which relates to the task you are trying to acehive that you can follow here:
https://issuetracker.google.com/69371200
In terms of the FOLDER_ID value, that's not related to the folder in the bucket, but relates to the Cloud Platform Resource Hierarchy as explained here. It's therefore not something related to your main issue.
The logging sink destination (for cloud storage) must be a bucket.
The folder referred to in the answer to Pointing multiple projects' log sinks to one bucket is for grouping projects. That answer links to the documentation for folders, which describes them as:
Folders are nodes in the Cloud Platform Resource Hierarchy. A folder
can contain projects, other folders, or a combination of both.
Cloud Storage does not support something like 'folders' within buckets, only buckets and objects. See https://cloud.google.com/storage/docs/key-terms