AWS SageMaker GroundTruth permissions issue (can't read manifest) - amazon-web-services

I'm trying to run a simple GroundTruth labeling job with a public workforce. I upload my images to S3, start creating the labeling job, generate the manifest using their tool automatically, and explicitly specify a role that most certainly has permissions on both S3 bucket (input and output) as well as full access to SageMaker. Then I create the job (standard rest of stuff -- I just wanted to be clear that I'm doing all of that).
At first, everything looks fine. All green lights, it says it's in progress, and the images are properly showing up in the bottom where the dataset is. However, after a few minutes, the status changes to Failure and I get this: ClientError: Access Denied. Cannot access manifest file: arn:aws:sagemaker:us-east-1:<account number>:labeling-job/<job name> using roleArn: null in the reason for failure.
I also get the error underneath (where there used to be images but now there are none):
The specified key <job name>/manifests/output/output.manifest isn't present in the S3 bucket <output bucket>.
I'm very confused for a couple of reasons. First of all, this is a super simple job. I'm just trying to do the most basic bounding box example I can think of. So this should be a very well-tested path. Second, I'm explicitly specifying a role arn, so I have no idea why it's saying it's null in the error message. Is this an Amazon glitch or could I be doing something wrong?

The role must include SageMakerFullAccess and access to the S3 bucket, so it looks like you've got that covered :)
Please check that:
the user creating the labeling job has Cognito permissions: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-getting-started-step1.html
the manifest exists and is at the right S3 location.
the bucket is in the same region as SageMaker.
the bucket doesn't have any bucket policy restricting access.
If that still doesn't fix it, I'd recommend opening a support ticket with the labeling job id, etc.
Julien (AWS)

There's a bug whereby sometimes the console will say something like 401 ValidationException: The specified key s3prefix/smgt-out/yourjobname/manifests/output/output.manifest isn't present in the S3 bucket yourbucket. Request ID: a08f656a-ee9a-4c9b-b412-eb609d8ce194 but that's not the actual problem. For some reason the console is displaying the wrong error message. If you use the API (or AWS CLI) to DescribeLabelingJob like
aws sagemaker describe-labeling-job --labeling-job-name yourjobname
you will see the actual problem. In my case, one of the S3 files that define the UI instructions was missing.

I had the same issue when I tried to write to a different bucket to the one that was used successfully before.
Apparently the IAM role ARN can be assigned permissions for a particular bucket only.

I would suggest to refer to CloudWatch logs and look for a CloudWatch>>CloudWatch Logs >> Log groups >> /aws/sagemaker/LabelingJobs group. I had all points ticked from another post, but my pre-processing Lambda function had wrong id for my region and the error was obvious in the logs.

Related

AWS Redshift UNLOAD completes successfully but no data in S3

I am trying to make use of RedShift's UNLOAD to copy some data into S3. My query looks like this:
unload ('select ________________')
to 's3://my-bucket-name'
iam_role 'arn #';
If I run the select statement on it's own, I get 3 records returned, if I run the query as shown above, it is able to complete but I never see anything show up in my S3 bucket. Is there somewhere where I can find more detailed logs?
Also, the IAM role I'm giving it, is an Admin role that has full access to everything, was planning on getting it to work for a proof of concept and then make/use a role that is better suited. Also, can someone explain the IAM role? Why does it not use the IAM role for me, the user running the query?

finding where my log files are being sent to aws

In my company we are storing log files in cloudwatch and then after 7days it will get sent to s3 however I have trouble finding exactly where log files are being stored in s3.
Since process of moving from cloudwatch to s3 is automated I've followed https://medium.com/tensult/exporting-of-aws-cloudwatch-logs-to-s3-using-automation-2627b1d2ee37 in hope to find the path.
We are not using step functions so I've check lambda services however there were no function that move log file from cloudwatch to s3.
I've tried looking at cloudwatch rules in hope to fine something like:
{
"region":"REGION",
"logGroupFilter":"prod",
"s3BucketName":"BUCKET_NAME",
"logFolderName":"backend"
}
so I can find which bucket log files are going to and into which folder.
How can I find where my logs are stored, if moving data is being automated why is there no functions visible?
addtional note: I am new to aws, if there is good resource on aws architecture please recommend.
Thanks in advance!
If the rule exists or was created properly then you must see it in the AWS console and same is the true for S3 bucket.
One common problem when it comes to visibility of an asset in AWS console is wrong region selection. So verify in which region the rule and the S3 bucket was created, if they were ever created and selecting the right region on the top right corner should show the assets in that region.
Hope it helps!
Have you tried using the View all exports to Amazon S3 in the CloudWatch -> >Logs console. It is one of the items in the Actions menu.

Connecting DMS to S3

We are trying to get DMS set up with an S3 Source however we are unable to connect the replication instance to the Source S3 endpoint.
When we run a connection test on the source endpoint, the error we receive is:
Error Details: [errType=ERROR_RESPONSE, status=1020414, errMessage= Failed to connect to database., errDetails=]
We have followed the documentation however we are still unable to get the connection to work. The bucket is within the VPC that the replication instance has access to, and the IAM role has the GetObject, ListBucket and dms* permissions. I'm 95% sure that the JSON mapping file is set up correctly with schema and table names pointing to the right place.
Due to the lack of error messages or detailed reasons why we can't connect to the source database (the S3 bucket/CSV file), debugging this feels a tad hit and miss. We are using the Amazon Console and not the CLI, if that makes much of a difference.
I had this same error.
Check this troubleshooting guide. It covers the basic configuration problems you might run into.
My answer wasn't there, tho, and I couldn't find it anywhere, not even asking in the official forums.
In my case, for some reason I thought I should use the full bucket name in the "Bucket Name" field, like "arn:aws:s3:::my-bucket". Probably because I had to use the ARN for the role in the previous field.
And the error message when you try to connect to it will not be clear, it only says it couldn't connect to the bucket. Anyway, you don't need to provide an ARN, just the bucket's name, as in "my-bucket".

BitMovin - Unable to connect Amazon S3 Output

I am setting up an Amazon S3 output on BitMovin and it is telling me my values are incorrect. I don't know which ones because they all have been copied and pasted over. It may be another issue with my bucket.
I have setup a bucket in Oregon so us-west-2, copy and pasted the name, access key and access secret in. My policies match what they have on this document too:
Tutorial: Policies for BitMovin
your Copy&Paste went wrong, but just a bit :)
In your second statement, you would have to remove the "/*"-part from the string "arn:aws:s3:::test-bitmovin/*" within the "Resource"-Array.
The allowed actions of the second statement apply to the bucket but not to the objects within. Therefore the stated resource should refer to a bucket.
Then it should work as expected!

Receiving an EventArc trigger for a specific GCS bucket only

I'm trying to setup an EventArc trigger on a google cloud run project, to be run whenever a new file is uploaded to a particular bucket.
The problem is, I can only get it to work if I choose any resource, i.e files uploaded to any of my buckets would run the trigger. However, I only want it to run for files uploaded to a specific bucket.
It asks me to enter the 'full resource name' if I choose 'a specific resource'. But there seems to be no documentation on how to format the bucket name so that it will work. I've tried the bucket name, projects/_/buckets/my_bucket_name, but the trigger never runs unless I choose 'any resource'.
Any ideas how to give it the bucket name so this will work?
I think the answer may be buried in here .... cloud.google.com/blog/topics/developers-practitioners/… If we read it deeply, we seem to see that event origination is based on audit records being created. We see that a record is created when a new object is created in your bucket. We then read that we can filter on resource name (the name of the object). However it says that wildcards are not yet supported ... so you could trigger on a specific object name ... but not a name which is prefixed by your bucket name.
Eventarc now supports wildcards (path pattern) for cloud audit log based events (e.g., storage.objects.create)
screenshot