Amazon Transcribe and Golang SDK BadRequestException - amazon-web-services

I uploaded a .flac file to an Amazon S3 bucket but when I try to transcribe the audio using the Amazon Transcribe Golang SDK I get the error below. I tried making the .flac file in the S3 bucket public but still get the same error, so I don't think its a permission issue. Is there anything that prevents the Transcribe service from accessing the file from the S3 bucket that I'm missing? The api user that is uploading and transcribing have full access for the S3 and Transcribe services.
example Go code:
jobInput := transcribe.StartTranscriptionJobInput{
JobExecutionSettings: &transcribe.JobExecutionSettings{
AllowDeferredExecution: aws.Bool(true),
DataAccessRoleArn: aws.String("my-arn"),
},
LanguageCode: aws.String("en-US"),
Media: &transcribe.Media{
MediaFileUri: aws.String("https://s3.us-east-1.amazonaws.com/{MyBucket}/{MyObjectKey}"),
},
Settings: &transcribe.Settings{
MaxAlternatives: aws.Int64(2),
MaxSpeakerLabels: aws.Int64(2),
ShowAlternatives: aws.Bool(true),
ShowSpeakerLabels: aws.Bool(true),
},
TranscriptionJobName: aws.String("jobName"),
}
Amazon Transcribe response:
BadRequestException: The S3 URI that you provided can't be accessed. Make sure that you have read permission and try your request again.

My issue was the audio file being uploaded to s3 was specifying an ACL. I removed that from the s3 upload code and I no longer get the error. Also per the docs, if you have "transcribe" in your s3 bucket name, the transcribe service will have permission to access it. I also made that change but you still need to ensure you aren't using an ACL

Related

Not able to retrieve processed file from S3 Bucket

I'm an AWS newbie trying to use Textract API, their OCR service.
As far as I understood I need to upload files to a S3 bucket and then run textract on it.
I got the bucket on and the file inside it:
I got the permissions:
But when I run my code it bugs.
import boto3
import trp
# Document
s3BucketName = "textract-console-us-east-1-057eddde-3f44-45c5-9208-fec27f9f6420"
documentName = "ok0001_prioridade01_x45f3.pdf"
]\[\[""
# Amazon Textract client
textract = boto3.client('textract',region_name="us-east-1",aws_access_key_id="xxxxxx",
aws_secret_access_key="xxxxxxxxx")
# Call Amazon Textract
response = textract.analyze_document(
Document={
'S3Object': {
'Bucket': s3BucketName,
'Name': documentName
}
},
FeatureTypes=["TABLES"])
Here is the error I get:
botocore.errorfactory.InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the AnalyzeDocument operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
What am I missing? How could I solve that?
You are missing S3 access policy, you should add AmazonS3ReadOnlyAccess policy if you want a quick solution according to your needs.
A good practice is to apply the least privilege access principle and keep granting access when needed. So I'd advice you to create a specific policy to access your S3 bucket textract-console-us-east-1-057eddde-3f44-45c5-9208-fec27f9f6420 only and only in us-east-1 region.
Amazon Textract currently supports PNG, JPEG, and PDF formats. Looks like you are using PDF.
Once you have a valid format, you can use the Python S3 API to read the data of the object in the S3 object. Once you read the object, you can pass the byte array to the analyze_document method. TO see a full example of how to use the AWS SDK for Python (Boto3) with Amazon Textract to
detect text, form, and table elements in document images.
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/python/example_code/textract/textract_wrapper.py
Try following that code example to see if your issue is resolved.
"Could you provide some clearance on the params to use"
I just ran the Java V2 example and it works perfecly. In this example, i am using a PNG file located in a specific Amazon S3 bucket.
Here are the parameters that you need:
Make sure when implementing this in Python, you set the same parameters.

How can i secure my aws s3 bucket to be write only

I have s3 bucket where i will upload pdf files from my front end. The front end will have 'careers' page where everybody can apply for some work position and upload his CV. I am using aws sdk for that on my nodejs api. But the CV on the S3 bucket needs to be private of course. The problem is that SDK code upload works if the s3 bucket is maked public only.
For file upload i am using multer.
const upload = multer({
fileFilter: fileFilter,
storage: multerS3({
acl: 'public-read',
s3,
bucket: bucket_name',
key: function(req, file, cb) {
req.file = file.originalname;
cb(null, file.originalname);
}
})
});
How can i make the uploaded cv secure on my s3 bucket ? So everyody can upload files on it, but it should be private and no one will not get access to it to READ.
Use the SDK to gnerate a pre-signed url. Here is the JS example for this task.
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javascriptv3/example_code/s3/src/s3_put_presignedURL.ts

how to download file from public s3 bucket using console

I want to download the file from a public S3 bucket using AWS console. When I put the below in the browser but getting an error. Wanted to visually see what else is there in that folder and explore
Public S3 bucket :
s3://us-east-1.elasticmapreduce.samples/flightdata/input
It appears that you are wanting to access an Amazon S3 bucket that belongs to a different AWS account. This cannot be done via the Amazon S3 management console.
Instead, I recommend using the AWS Command-Line Interface (CLI). You can use:
aws s3 ls s3://flightdata/input/
That will show you the objects stored in that bucket/path.
You could then download the objects with:
aws s3 sync s3://flightdata/input/ input

AWS Sagermaker ClientError: Data download failed:PermanentRedirect (301)

ClientError: Data download failed:PermanentRedirect (301): The bucket is in this region: us-west-1. Please use this region to retry the request
Found it on my own.
The S3 bucket I am using is in a different location. Used a different s# bucket. Everything works well now.

Amazon S3 upload works with one credentials but not other

I have two S3 buckets under two different accounts. Permissions and CORS setting of both buckets are same.
Regions of two buckets are as following (First one working)
Region: Asia Pacific (Singapore) (ap-southeast-1) works
Region: US East (Ohio) (us-east-2) does not work
I created Upload script with Node.js and supplied region plus following
Key : __XXXX__
secret: __XXXXX____,
bucket: _____XXXX__
'x-amz-acl': 'public-read',
ACL: 'public-read'
Code works fine with first, uploaded files is also accessible publicly. But with 2nd account(Region: us-east-2), script runs successfully and return URL also, but when I look in bucket there is no upload and url is saying permission denied which means resource is not available. Strange things are
Why URL is returned if file is not uploaded in bucket?
Why same code does not working for other account,
I tried AWS documentation also but that seems like it's not written for human like me. Help will be highly appreciated.
script runs successfully and return URL also, but when I look in bucket there is no upload
If you can really see no resources in the bucket, then the upload really failed (I've seen too many scripts just ignoring any error response) or the upload is executed to different place than you expect. Care to share the script?
and url is saying permission denied which means resource is not available
Unfortunatelly that's something you have find out yourself. If the object is in the bucket, it has public access, correct cors settings, then maybe the url is not correct.