AWS Textract InvalidParameterException - amazon-web-services

I have a .Net core client application using amazon Textract with S3,SNS and SQS as per the AWS Document , Detecting and Analyzing Text in Multipage Documents(https://docs.aws.amazon.com/textract/latest/dg/async.html)
Created an AWS Role with AmazonTextractServiceRole Policy and added the Following Trust relation ship as per the documentation (https://docs.aws.amazon.com/textract/latest/dg/api-async-roles.html)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "textract.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Subscribed SQS to the topic and Given Permission to the Amazon SNS Topic to Send Messages to the Amazon SQS Queue as per the aws documentation .
All Resources including S3 Bucket, SNS ,SQS are in the same us-west2 region
The following method shows a generic error "InvalidParameterException"
Request has invalid parameters
But If the NotificationChannel section is commented the code is working fine and returning the correct job id.
Error message is not giving a clear picture about the parameter. Highly appreciated any help .
public async Task<string> ScanDocument()
{
string roleArn = "aws:iam::xxxxxxxxxxxx:instance-profile/MyTextractRole";
string topicArn = "aws:sns:us-west-2:xxxxxxxxxxxx:AmazonTextract-My-Topic";
string bucketName = "mybucket";
string filename = "mytestdoc.pdf";
var request = new StartDocumentAnalysisRequest();
var notificationChannel = new NotificationChannel();
notificationChannel.RoleArn = roleArn;
notificationChannel.SNSTopicArn = topicArn;
var s3Object = new S3Object
{
Bucket = bucketName,
Name = filename
};
request.DocumentLocation = new DocumentLocation
{
S3Object = s3Object
};
request.FeatureTypes = new List<string>() { "TABLES", "FORMS" };
request.NotificationChannel = channel; /* Commenting this line work the code*/
var response = await this._textractService.StartDocumentAnalysisAsync(request);
return response.JobId;
}

Debugging Invalid AWS Requests
The AWS SDK validates your request object locally, before dispatching it to the AWS servers. This validation will fail with unhelpfully opaque errors, like the OP.
As the SDK is open source, you can inspect the source to help narrow down the invalid parameter.
Before we look at the code: The SDK (and documentation) are actually generated from special JSON files that describe the API, its requirements and how to validate them. The actual code is generated based on these JSON files.
I'm going to use the Node.js SDK as an example, but I'm sure similar approaches may work for the other SDKs, including .NET
In our case (AWS Textract), the latest Api version is 2018-06-27. Sure enough, the JSON source file is on GitHub, here.
In my case, experimentation narrowed the issue down to the ClientRequestToken. The error was an opaque InvalidParameterException. I searched for it in the SDK source JSON file, and sure enough, on line 392:
"ClientRequestToken": {
"type": "string",
"max": 64,
"min": 1,
"pattern": "^[a-zA-Z0-9-_]+$"
},
A whole bunch of undocumented requirements!
In my case the token I was using violated the regex (pattern in the above source code). Changing my token code to satisfy the regex solved the problem.
I recommend this approach for these sorts of opaque type errors.

After a long days analyzing the issue. I was able to resolve it .. as per the documentation topic only required SendMessage Action to the SQS . But after changing it to All SQS Action its Started Working . But Still AWS Error message is really misleading and confusing

you would need to change the permissions to All SQS Action and then use the code as below
def startJob(s3BucketName, objectName):
response = None
response = textract.start_document_text_detection(
DocumentLocation={
'S3Object': {
'Bucket': s3BucketName,
'Name': objectName
}
})
return response["JobId"]
def isJobComplete(jobId):
# For production use cases, use SNS based notification
# Details at: https://docs.aws.amazon.com/textract/latest/dg/api-async.html
time.sleep(5)
response = textract.get_document_text_detection(JobId=jobId)
status = response["JobStatus"]
print("Job status: {}".format(status))
while(status == "IN_PROGRESS"):
time.sleep(5)
response = textract.get_document_text_detection(JobId=jobId)
status = response["JobStatus"]
print("Job status: {}".format(status))
return status
def getJobResults(jobId):
pages = []
response = textract.get_document_text_detection(JobId=jobId)
pages.append(response)
print("Resultset page recieved: {}".format(len(pages)))
nextToken = None
if('NextToken' in response):
nextToken = response['NextToken']
while(nextToken):
response = textract.get_document_text_detection(JobId=jobId, NextToken=nextToken)
pages.append(response)
print("Resultset page recieved: {}".format(len(pages)))
nextToken = None
if('NextToken' in response):
nextToken = response['NextToken']
return pages

Invoking textract with Python, I received the same error until I truncated the ClientRequestToken down to 64 characters
response = client.start_document_text_detection(
DocumentLocation={
'S3Object':{
'Bucket': bucket,
'Name' : fileName
}
},
ClientRequestToken= fileName[:64],
NotificationChannel= {
"SNSTopicArn": "arn:aws:sns:us-east-1:AccountID:AmazonTextractXYZ",
"RoleArn": "arn:aws:iam::AccountId:role/TextractRole"
}
)
print('Processing started : %s' % json.dumps(response))

Related

How do you setup AWS Cloudfront to provide custom access to S3 bucket with signed cookies using wildcards

AWS Cloudfront with Custom Cookies using Wildcards in Lambda Function:
The problem:
On AWS s3 Storage to provide granular access control the preferred method is to use AWS Cloudfront with signed URL's.
Here is a good example how to setup cloudfront a bit old though, so you need to use the recommended settings not
the legacy and copy the generated policy down to S3.
https://medium.com/#himanshuarora/protect-private-content-using-cloudfront-signed-cookies-fd9674faec3
I have provided an example below on how to create one of these signed URL's using Python and the newest libraries.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-creating-signed-url-canned-policy.html
However this requires the creation of a signed URL for each item in the S3 bucket. To give wildcard access to a
directory of items in the S3 bucket you need use what is called a custom Policy. I could not find any working examples
of this code using Python, many of the online expamples have librarys that are depreciated. But attached is a working example.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-creating-signed-url-custom-policy.html
I had trouble getting the python cryptography package to work by building the lambda function on an Amazon Linux 2
instance on AWS EC2. Always came up with an error of a missing library. So I use Klayers for AWS and worked
https://github.com/keithrozario/Klayers/tree/master/deployments.
A working example for cookies for a canned policy (Means only a signed URL specific for each S3 file)
https://www.velotio.com/engineering-blog/s3-cloudfront-to-deliver-static-asset
My code for cookies for a custom policy (Means a single policy statement with URL wildcards etc). You must use the Cryptology
package type examples but the private_key.signer function was depreciated for a new private_key.sign function with an extra
argument. https://cryptography.io/en/latest/hazmat/primitives/asymmetric/rsa/#signing
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
import base64
import datetime
class CFSigner:
def sign_rsa(self, message):
private_key = serialization.load_pem_private_key(
self.keyfile, password=None, backend=default_backend()
)
signature = private_key.sign(message.encode(
"utf-8"), padding.PKCS1v15(), hashes.SHA1())
return signature
def _sign_string(self, message, private_key_file=None, private_key_string=None):
if private_key_file:
self.keyfile = open(private_key_file, "rb").read()
elif private_key_string:
self.keyfile = private_key_string.encode("utf-8")
return self.sign_rsa(message)
def _url_base64_encode(self, msg):
msg_base64 = base64.b64encode(msg).decode("utf-8")
msg_base64 = msg_base64.replace("+", "-")
msg_base64 = msg_base64.replace("=", "_")
msg_base64 = msg_base64.replace("/", "~")
return msg_base64
def generate_signature(self, policy, private_key_file=None):
signature = self._sign_string(policy, private_key_file)
encoded_signature = self._url_base64_encode(signature)
return encoded_signature
def create_signed_cookies2(self, url, private_key_file, keypair_id, expires_at):
policy = self.create_custom_policy(url, expires_at)
encoded_policy = self._url_base64_encode(
policy.encode("utf-8"))
signature = self.generate_signature(
policy, private_key_file=private_key_file)
cookies = {
"CloudFront-Policy": encoded_policy,
"CloudFront-Signature": signature,
"CloudFront-Key-Pair-Id": keypair_id,
}
return cookies
def sign_to_cloudfront(object_url, expires_at):
cf = CFSigner()
url = cf.create_signed_url(
url=object_url,
keypair_id="xxxxxxxxxx",
expire_time=expires_at,
private_key_file="xxx.pem",
)
return url
def create_signed_cookies(self, object_url, expires_at):
cookies = self.create_signed_cookies2(
url=object_url,
private_key_file="xxx.pem",
keypair_id="xxxxxxxxxx",
expires_at=expires_at,
)
return cookies
def create_custom_policy(self, url, expires_at):
return (
'{"Statement":[{"Resource":"'
+ url
+ '","Condition":{"DateLessThan":{"AWS:EpochTime":'
+ str(round(expires_at.timestamp()))
+ "}}}]}"
)
def lambda_handler(event, context):
response = event["Records"][0]["cf"]["response"]
headers = response.get("headers", None)
cf = CFSigner()
path = "https://www.example.com/*"
expire = datetime.datetime.now() + datetime.timedelta(days=3)
signed_cookies = cf.create_signed_cookies(path, expire)
headers["set-cookie"] = [{
"key": "set-cookie",
"value": "CloudFront-Policy={signed_cookies.get('CloudFront-Policy')}"
}]
headers["Set-cookie"] = [{
"key": "Set-cookie",
"value": "CloudFront-Signature={signed_cookies.get('CloudFront-Signature')}",
}]
headers["Set-Cookie"] = [{
"key": "Set-Cookie",
"value": "CloudFront-Key-Pair-Id={signed_cookies.get('CloudFront-Key-Pair-Id')}",
}]
print(response)
return response ```

Uploading to Amazon S3 using a signed URL

I'm implementing a file uploading functionality that would be used by an angular application. But I am having numerous issues getting it to work. And need help figuring out what I am missing. Here is an overview of the resources in place, and the testing and results I'm getting.
Infrastructure
I have an Amazon S3 bucket created with versioning enabled, encryption enabled and all public access is blocked.
An API gateway with a Lambda function that generates a pre-signed URL. The code is shown below.
def generate_upload_url(self):
try:
conditions = [
{"acl": "private"},
["starts-with", "$Content-Type", ""]
]
fields = {"acl": "private"}
response = self.s3.generate_presigned_post(self.bucket_name,
self.file_path,
Fields=fields,
Conditions=conditions,
ExpiresIn=3600)
except ClientError as e:
logging.error(e)
return None
return response
The bucket name and file path are set as part of the class constructor. In this example the bucket and file path are
def construct_file_names(self):
self.file_path = self.account_number + '-' + self.user_id + '-' + self.experiment_id + '-experiment-data.json'
self.bucket_name = self.account_number + '-' + self.user_id + '-resources'
Testing via Postman
Before implementing it within my angular application. I am testing the upload functionality via Postman.
The response from my API endpoint for the pre-signed URL is shown below
Using these values, I make another API call from Postman and receive the response below
If anybody can see what I might be doing wrong here. I have played around with different fields in the boto3 method, but ultimately, I am getting 403 errors with different messages related to Policy conditions. Any help would be appreciated.
Update 1
I tried to adjust the order of "file" and "acl" but received another error shown below
Update Two - Using signature v4
I updated the pre-signed URL code, shown below
def upload_data(x):
try:
config = Config(
signature_version='s3v4',
)
s3 = boto3.client('s3', "eu-west-1", config=config)
sts = boto3.client('sts', "eu-west-1")
data_upload = UploadData(x["userId"], x["experimentId"], s3, sts)
return data_upload.generate_upload_url()
except Exception as e:
logging.error(e)
When the Lambda function is triggered by the API call, the following is received by Postman
Using the new key values returned from the API, I proceeded to try another test upload. The results are shown below
Once again an error but I think I'm going in the correct direction.
Try moving acl above the file row. Make sure file is at the end.
I finally got this working, so I will post an answer here summarising the steps taken.
Python Code for generating pre-signed URL via boto in eu-west-1
Use signature v4 signing - https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html
def upload_data(x):
try:
config = Config(
signature_version='s3v4',
)
s3 = boto3.client('s3', "eu-west-1", config=config)
sts = boto3.client('sts', "eu-west-1")
data_upload = UploadData(x["userId"], x["experimentId"], s3, sts)
return data_upload.generate_upload_url()
except Exception as e:
logging.error(e)
def generate_upload_url(self):
try:
conditions = [
{"acl": "private"},
["starts-with", "$Content-Type", ""]
]
fields = {"acl": "private"}
response = self.s3.generate_presigned_post(self.bucket_name,
self.file_path,
Fields=fields,
Conditions=conditions,
ExpiresIn=3600)
except ClientError as e:
logging.error(e)
return None
return response
Uploading via Postman
Ensure the order is correct with "file" being last
Ensure "Content-Type" matches what you have in the code to generate the URL. In my case it was "". Once added, the conditions error received went away.
S3 Bucket
Enable a CORS policy if required. I needed one and it is shown below, but this link can help - https://docs.aws.amazon.com/AmazonS3/latest/userguide/ManageCorsUsing.html
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"PUT",
"POST",
"DELETE"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [],
"MaxAgeSeconds": 3000
}
]
Upload via Angular
My issue arose during testing with Postman, but I was implementing the functionality to be used by an Angular application. Here is a code snippet of how the upload is done by first calling the API for the pre-signed URL, and then uploading directly.
Summary
Check your S3 infrastructure
Enable CORS if need be
Use sig 4 explicitly in the SDK of choice, if working in an old region
Ensure the form data order is correct
Hope all these pieces help others who are trying to achieve the same. Thanks for all the hints from SO members.

How to use NextToken in Boto3

The below-mentioned code is created for exporting all the findings from the security hub to an S3 bucket using lambda functions. The filters are set for exporting only CIS-AWS foundations benchmarks. There are more than 20 accounts added as the members in security hub. The issue that I'm facing here is even though I'm using the NextToken configuration. The output doesn't have information about all the accounts. Instead, it just displays any one of the account's data randomly.
Can somebody look into the code and let me know what could be the issue, please?
import boto3
import json
from botocore.exceptions import ClientError
import time
import glob
client = boto3.client('securityhub')
s3 = boto3.resource('s3')
storedata = {}
_filter = Filters={
'GeneratorId': [
{
'Value': 'arn:aws:securityhub:::ruleset/cis-aws-foundations-benchmark',
'Comparison': 'PREFIX'
}
],
}
def lambda_handler(event, context):
response = client.get_findings(
Filters={
'GeneratorId': [
{
'Value': 'arn:aws:securityhub:::ruleset/cis-aws-foundations-benchmark',
'Comparison': 'PREFIX'
},
],
},
)
results = response["Findings"]
while "NextToken" in response:
response = client.get_findings(Filters=_filter,NextToken=response["NextToken"])
results.extend(response["Findings"])
storedata = json.dumps(response)
print(storedata)
save_file = open("/tmp/SecurityHub-Findings.json", "w")
save_file.write(storedata)
save_file.close()
for name in glob.glob("/tmp/*"):
s3.meta.client.upload_file(name, "xxxxx-security-hubfindings", name)
TooManyRequestsException error is also getting now.
The problem is in this code that paginates the security findings results:
while "NextToken" in response:
response = client.get_findings(Filters=_filter,NextToken=response["NextToken"])
results.extend(response["Findings"])
storedata = json.dumps(response)
print(storedata)
The value of storedata after the while loop has completed is the last page of security findings, rather than the aggregate of the security findings.
However, you're already aggregating the security findings in results, so you can use that:
save_file = open("/tmp/SecurityHub-Findings.json", "w")
save_file.write(json.dumps(results))
save_file.close()

Can't Send S3 Bucket Notification to SQS

I'm trying to publish a bucket notification to an SQS queue when an object is created in S3. I have read that I might be trying to use boto3 in mixed ways (i.e. as client & resource but neither will work).
s3 = boto3.client('s3')
s3.create_bucket(Bucket='my-bucket',
CreateBucketConfiguration={
'LocationConstraint': 'eu-west-2',
},
)
bucket_notification = s3.BucketNotification('my-bucket')
s3_notification_config = {
'QueueConfigurations': [
{
'QueueArn': 'arn:aws:sqs:location:number:number',
'Events': [
's3:ObjectCreated:*',
],
},
],
}
response = bucket_notification.put(NotificationConfiguration=s3_notification_config)
This gives me the following error:
AttributeError: 'S3' object has no attribute 'BucketNotification'
When I change the first line of code to be:
s3 = boto3.resource('s3')
I get the following error:
An error occurred (InvalidArgument) when calling the PutBucketNotificationConfiguration operation: Unable to validate the following destination configurations
I understand the first error but not sure how to work around it by using resource instead of client when presented with the second error.

How to enable s3 server access logging using the boto3 sdk?

I am trying to use the boto3 SDK to enable server access logging through python. However, I keep getting the error of:
You must give the log-delivery group WRITE and READ_ACP permissions to the target bucket
I know I need to add permissions to that group, but I don't know how to do that through the Python SDK.
I've tried following Enabling Logging Programmatically - Amazon Simple Storage Service but I was unable to convert it to Python.
I've additionally tried putting the Grantee and Permissions inside of the put_bucket_logging call, but to no avail.
Listed below is my function to attempt to do this resulting in the aforementioned error:
def enableAccessLogging(clientS3, bucketName, storageBucket,
targetPrefix):
#Give the group log-delievery WRITE and READ_ACP permisions to the
#target bucket
acl = get_bucket_acl(clientS3, storageBucket)
new_grant = {
'Grantee': {
'ID' : 'LogDelivery',
'Type' : 'Group'
},
'Permission': 'FULL_CONTROL',
}
modified_acl = copy.deepcopy(acl)
modified_acl['Grants'].append(new_grant)
setBucketAcl(clientS3, bucketName, modified_acl)
response = clientS3.put_bucket_logging(
Bucket=bucketName,
BucketLoggingStatus={
'LoggingEnabled': {
'TargetBucket': storageBucket,
'TargetPrefix': targetPrefix
}
}
)
I figured it out, I made the new acl correctly, but when I applied it, I applied it to the source bucket not the targetBucket so for anyone else doing this, the correct code is below:
def enableAccessLogging(clientS3, bucketName, storageBucket,
targetPrefix):
#Give the group log-delievery WRITE and READ_ACP permisions to the
#target bucket
acl = get_bucket_acl(clientS3, storageBucket)
new_grant = {
'Grantee': {
'URI': "http://acs.amazonaws.com/groups/s3/LogDelivery",
'Type' : 'Group'
},
'Permission': 'FULL_CONTROL',
}
modified_acl = copy.deepcopy(acl)
modified_acl['Grants'].append(new_grant)
setBucketAcl(clientS3, storageBucket, modified_acl)
response = clientS3.put_bucket_logging(
Bucket=bucketName,
BucketLoggingStatus={
'LoggingEnabled': {
'TargetBucket': storageBucket,
'TargetPrefix': targetPrefix
}
}
)