Uploading to Amazon S3 using a signed URL - amazon-web-services

I'm implementing a file uploading functionality that would be used by an angular application. But I am having numerous issues getting it to work. And need help figuring out what I am missing. Here is an overview of the resources in place, and the testing and results I'm getting.
Infrastructure
I have an Amazon S3 bucket created with versioning enabled, encryption enabled and all public access is blocked.
An API gateway with a Lambda function that generates a pre-signed URL. The code is shown below.
def generate_upload_url(self):
try:
conditions = [
{"acl": "private"},
["starts-with", "$Content-Type", ""]
]
fields = {"acl": "private"}
response = self.s3.generate_presigned_post(self.bucket_name,
self.file_path,
Fields=fields,
Conditions=conditions,
ExpiresIn=3600)
except ClientError as e:
logging.error(e)
return None
return response
The bucket name and file path are set as part of the class constructor. In this example the bucket and file path are
def construct_file_names(self):
self.file_path = self.account_number + '-' + self.user_id + '-' + self.experiment_id + '-experiment-data.json'
self.bucket_name = self.account_number + '-' + self.user_id + '-resources'
Testing via Postman
Before implementing it within my angular application. I am testing the upload functionality via Postman.
The response from my API endpoint for the pre-signed URL is shown below
Using these values, I make another API call from Postman and receive the response below
If anybody can see what I might be doing wrong here. I have played around with different fields in the boto3 method, but ultimately, I am getting 403 errors with different messages related to Policy conditions. Any help would be appreciated.
Update 1
I tried to adjust the order of "file" and "acl" but received another error shown below
Update Two - Using signature v4
I updated the pre-signed URL code, shown below
def upload_data(x):
try:
config = Config(
signature_version='s3v4',
)
s3 = boto3.client('s3', "eu-west-1", config=config)
sts = boto3.client('sts', "eu-west-1")
data_upload = UploadData(x["userId"], x["experimentId"], s3, sts)
return data_upload.generate_upload_url()
except Exception as e:
logging.error(e)
When the Lambda function is triggered by the API call, the following is received by Postman
Using the new key values returned from the API, I proceeded to try another test upload. The results are shown below
Once again an error but I think I'm going in the correct direction.

Try moving acl above the file row. Make sure file is at the end.

I finally got this working, so I will post an answer here summarising the steps taken.
Python Code for generating pre-signed URL via boto in eu-west-1
Use signature v4 signing - https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html
def upload_data(x):
try:
config = Config(
signature_version='s3v4',
)
s3 = boto3.client('s3', "eu-west-1", config=config)
sts = boto3.client('sts', "eu-west-1")
data_upload = UploadData(x["userId"], x["experimentId"], s3, sts)
return data_upload.generate_upload_url()
except Exception as e:
logging.error(e)
def generate_upload_url(self):
try:
conditions = [
{"acl": "private"},
["starts-with", "$Content-Type", ""]
]
fields = {"acl": "private"}
response = self.s3.generate_presigned_post(self.bucket_name,
self.file_path,
Fields=fields,
Conditions=conditions,
ExpiresIn=3600)
except ClientError as e:
logging.error(e)
return None
return response
Uploading via Postman
Ensure the order is correct with "file" being last
Ensure "Content-Type" matches what you have in the code to generate the URL. In my case it was "". Once added, the conditions error received went away.
S3 Bucket
Enable a CORS policy if required. I needed one and it is shown below, but this link can help - https://docs.aws.amazon.com/AmazonS3/latest/userguide/ManageCorsUsing.html
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"PUT",
"POST",
"DELETE"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [],
"MaxAgeSeconds": 3000
}
]
Upload via Angular
My issue arose during testing with Postman, but I was implementing the functionality to be used by an Angular application. Here is a code snippet of how the upload is done by first calling the API for the pre-signed URL, and then uploading directly.
Summary
Check your S3 infrastructure
Enable CORS if need be
Use sig 4 explicitly in the SDK of choice, if working in an old region
Ensure the form data order is correct
Hope all these pieces help others who are trying to achieve the same. Thanks for all the hints from SO members.

Related

How do you setup AWS Cloudfront to provide custom access to S3 bucket with signed cookies using wildcards

AWS Cloudfront with Custom Cookies using Wildcards in Lambda Function:
The problem:
On AWS s3 Storage to provide granular access control the preferred method is to use AWS Cloudfront with signed URL's.
Here is a good example how to setup cloudfront a bit old though, so you need to use the recommended settings not
the legacy and copy the generated policy down to S3.
https://medium.com/#himanshuarora/protect-private-content-using-cloudfront-signed-cookies-fd9674faec3
I have provided an example below on how to create one of these signed URL's using Python and the newest libraries.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-creating-signed-url-canned-policy.html
However this requires the creation of a signed URL for each item in the S3 bucket. To give wildcard access to a
directory of items in the S3 bucket you need use what is called a custom Policy. I could not find any working examples
of this code using Python, many of the online expamples have librarys that are depreciated. But attached is a working example.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-creating-signed-url-custom-policy.html
I had trouble getting the python cryptography package to work by building the lambda function on an Amazon Linux 2
instance on AWS EC2. Always came up with an error of a missing library. So I use Klayers for AWS and worked
https://github.com/keithrozario/Klayers/tree/master/deployments.
A working example for cookies for a canned policy (Means only a signed URL specific for each S3 file)
https://www.velotio.com/engineering-blog/s3-cloudfront-to-deliver-static-asset
My code for cookies for a custom policy (Means a single policy statement with URL wildcards etc). You must use the Cryptology
package type examples but the private_key.signer function was depreciated for a new private_key.sign function with an extra
argument. https://cryptography.io/en/latest/hazmat/primitives/asymmetric/rsa/#signing
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import hashes
import base64
import datetime
class CFSigner:
def sign_rsa(self, message):
private_key = serialization.load_pem_private_key(
self.keyfile, password=None, backend=default_backend()
)
signature = private_key.sign(message.encode(
"utf-8"), padding.PKCS1v15(), hashes.SHA1())
return signature
def _sign_string(self, message, private_key_file=None, private_key_string=None):
if private_key_file:
self.keyfile = open(private_key_file, "rb").read()
elif private_key_string:
self.keyfile = private_key_string.encode("utf-8")
return self.sign_rsa(message)
def _url_base64_encode(self, msg):
msg_base64 = base64.b64encode(msg).decode("utf-8")
msg_base64 = msg_base64.replace("+", "-")
msg_base64 = msg_base64.replace("=", "_")
msg_base64 = msg_base64.replace("/", "~")
return msg_base64
def generate_signature(self, policy, private_key_file=None):
signature = self._sign_string(policy, private_key_file)
encoded_signature = self._url_base64_encode(signature)
return encoded_signature
def create_signed_cookies2(self, url, private_key_file, keypair_id, expires_at):
policy = self.create_custom_policy(url, expires_at)
encoded_policy = self._url_base64_encode(
policy.encode("utf-8"))
signature = self.generate_signature(
policy, private_key_file=private_key_file)
cookies = {
"CloudFront-Policy": encoded_policy,
"CloudFront-Signature": signature,
"CloudFront-Key-Pair-Id": keypair_id,
}
return cookies
def sign_to_cloudfront(object_url, expires_at):
cf = CFSigner()
url = cf.create_signed_url(
url=object_url,
keypair_id="xxxxxxxxxx",
expire_time=expires_at,
private_key_file="xxx.pem",
)
return url
def create_signed_cookies(self, object_url, expires_at):
cookies = self.create_signed_cookies2(
url=object_url,
private_key_file="xxx.pem",
keypair_id="xxxxxxxxxx",
expires_at=expires_at,
)
return cookies
def create_custom_policy(self, url, expires_at):
return (
'{"Statement":[{"Resource":"'
+ url
+ '","Condition":{"DateLessThan":{"AWS:EpochTime":'
+ str(round(expires_at.timestamp()))
+ "}}}]}"
)
def lambda_handler(event, context):
response = event["Records"][0]["cf"]["response"]
headers = response.get("headers", None)
cf = CFSigner()
path = "https://www.example.com/*"
expire = datetime.datetime.now() + datetime.timedelta(days=3)
signed_cookies = cf.create_signed_cookies(path, expire)
headers["set-cookie"] = [{
"key": "set-cookie",
"value": "CloudFront-Policy={signed_cookies.get('CloudFront-Policy')}"
}]
headers["Set-cookie"] = [{
"key": "Set-cookie",
"value": "CloudFront-Signature={signed_cookies.get('CloudFront-Signature')}",
}]
headers["Set-Cookie"] = [{
"key": "Set-Cookie",
"value": "CloudFront-Key-Pair-Id={signed_cookies.get('CloudFront-Key-Pair-Id')}",
}]
print(response)
return response ```

Why is AWS Lambda returning a Key Error when trying to upload an image to S3 and updating a DynamoDB Table with API Gateway?

I am trying to upload a binary Image to S3 and update a DynamoDB table in the same AWS Lambda Function. The problem is, whenever I try to make an API call, I get the following error in postman:
{
"errorMessage": "'itemId'",
"errorType": "KeyError",
"requestId": "bccaead6-cb60-4a5e-9fc7-14ff25380451",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 14, in lambda_handler\n s3_upload = s3.put_object(Bucket=bucket, Key=event[\"itemId\"] + \".png\", Body=decode_content)\n"
]
}
My events section takes in 3 Strings and whenever I try and access those strings, I get this error. However, if I try and access them without trying to upload to an S3 Bucket, everything works fine. My Lambda Function looks like this:
import json
import boto3
import base64
dynamoclient = boto3.resource("dynamodb")
s3 = boto3.client("s3")
table = dynamoclient.Table("Items")
bucket = "images"
def lambda_handler(event, context):
get_file_content = event["content"]
decode_content = base64.b64decode(get_file_content)
s3_upload = s3.put_object(Bucket=bucket, Key=event["itemId"] + ".png", Body=decode_content)
table.put_item(
Item={
'itemID': event["itemId"],
'itemName': event['itemName'],
'itemDescription': event['itemDescription']
}
)
return {
"code":200,
"message": "Item was added successfully"
}
Again, if I remove everything about the S3 file upload, everything works fine and I am able to update the DynamoDB table successfully. As for the API Gateway side, I have added the image/png to the Binary Media Types section. Additionally, for the Mapping Templates section for AWS API Gateway, I have added the content type image/png. In the template for the content type, I have the following lines:
{
"content": "$input.body"
}
For my Postman POST request, in the headers section, I have put this:
Finally, for the body section, I have added the raw event data with this:
{
"itemId": "0fx170",
"itemName": "Mouse",
"itemDescription": "Smooth"
}
Lastly, for the binary section, I have uploaded my PNG file.
What could be going wrong?

What's the most secure way of accessing s3 resource?

The question heading is broad but my question is not. I just want clarification on my approach. I have an s3 bucket with blocked public access. The bucket policy is set to http-referer. This is how it looks.
{
"Version": "2008-10-17",
"Id": "http referer policy example",
"Statement": [
{
"Sid": "Allow get requests referred by www.mysite.com and mysite.com",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::storage/*",
"Condition": {
"StringLike": {
"aws:Referer": [
"https://www.example.com/*",
"https://example.com/*",
]
}
}
}
]
}
I still get an error if my frontend (on my tld) tries to access the s3 resource through the URL. (URL example - https://storage.s3.amazonaws.com/path/to/my/file.png).
I dropped the approach of hitting the s3 URL directly and decided to build a backend utility on my TLD that'd fetch the s3 resource in question and send it back to the frontend. So the URL would look something like this, https://<tld>/fetch-s3-resource/path/to/file.png.
I wanna know if this approach is correct or if there's a better one out there. In my mind even setting a http-referer policy doesn't make sense cause anyone can make a call to my bucket with http-referer manually set to my TLD.
UPDATE - I found out about signed URL's which should supposedly allow user's to publically access a resourec through the URL. This should solve my problem but I still have set public access to "off" and I don't really know which switch to toggle in order to allow for the user's with signed urls to be able to access the resource.
Here's the sample of s3 signed URL. Here's the link to the doc
import logging
import boto3
from botocore.exceptions import ClientError
def create_presigned_url(bucket_name, object_name, expiration=3600):
"""Generate a presigned URL to share an S3 object
:param bucket_name: string
:param object_name: string
:param expiration: Time in seconds for the presigned URL to remain valid
:return: Presigned URL as string. If error, returns None.
"""
# Generate a presigned URL for the S3 object
s3_client = boto3.client('s3')
try:
response = s3_client.generate_presigned_url('get_object',
Params={'Bucket': bucket_name,
'Key': object_name},
ExpiresIn=expiration)
except ClientError as e:
logging.error(e)
return None
# The response contains the presigned URL
return response
UPDATE UPDATE - Since the question isn't already clear enough, and it seems like I took some liberties with my "loose" language, let me clarify some things.
1 - What am I actually trying to do?
I want to keep my s3 bucket secure in such a way that only user's with presigned URL's generated by "me" can access whatever resource is there.
2 - When I ask if "my approach" is better or if there's any other approach what do I mean by that?
I wanna know if there's a "native" / aws provided way of accessing the bucket without having to write a backend endpoint that'd fetch the resource and throw it back on the frontend.
3 - How do I measure one approach against another ?
I think this one is quite obvious, you don't try to write an authentication flow from scratch if there's one provided by your framework*. This logic applies here too, if there's a way to access the objects that's listed by AWS then I probably shouldn't go about writing my own "hack"
Presigned URLs came through. You can have all public access blocked and still be able to generate signed URL's and serve it the frontend. I've already linked the official documentation in my question, here's the final piece of code I ended up with.
def create_presigned_url(bucket_name, bucket_key, expiration=3600, signature_version='s3v4'):
"""Generate a presigned URL for the S3 object
:param bucket_name: string
:param bucket_key: string
:param expiration: Time in seconds for the presigned URL to remain valid
:param signature_version: string
:return: Presigned URL as string. If error, returns None.
"""
s3_client = boto3.client('s3',
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
config=Config(signature_version=signature_version),
region_name='us-east-1'
)
try:
response = s3_client.generate_presigned_url('get_object',
Params={'Bucket': bucket_name,
'Key': bucket_key},
ExpiresIn=expiration)
except ClientError as e:
logging.error(e)
return None
# The response contains the pre-signed URL
return response

AWS Textract InvalidParameterException

I have a .Net core client application using amazon Textract with S3,SNS and SQS as per the AWS Document , Detecting and Analyzing Text in Multipage Documents(https://docs.aws.amazon.com/textract/latest/dg/async.html)
Created an AWS Role with AmazonTextractServiceRole Policy and added the Following Trust relation ship as per the documentation (https://docs.aws.amazon.com/textract/latest/dg/api-async-roles.html)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "textract.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Subscribed SQS to the topic and Given Permission to the Amazon SNS Topic to Send Messages to the Amazon SQS Queue as per the aws documentation .
All Resources including S3 Bucket, SNS ,SQS are in the same us-west2 region
The following method shows a generic error "InvalidParameterException"
Request has invalid parameters
But If the NotificationChannel section is commented the code is working fine and returning the correct job id.
Error message is not giving a clear picture about the parameter. Highly appreciated any help .
public async Task<string> ScanDocument()
{
string roleArn = "aws:iam::xxxxxxxxxxxx:instance-profile/MyTextractRole";
string topicArn = "aws:sns:us-west-2:xxxxxxxxxxxx:AmazonTextract-My-Topic";
string bucketName = "mybucket";
string filename = "mytestdoc.pdf";
var request = new StartDocumentAnalysisRequest();
var notificationChannel = new NotificationChannel();
notificationChannel.RoleArn = roleArn;
notificationChannel.SNSTopicArn = topicArn;
var s3Object = new S3Object
{
Bucket = bucketName,
Name = filename
};
request.DocumentLocation = new DocumentLocation
{
S3Object = s3Object
};
request.FeatureTypes = new List<string>() { "TABLES", "FORMS" };
request.NotificationChannel = channel; /* Commenting this line work the code*/
var response = await this._textractService.StartDocumentAnalysisAsync(request);
return response.JobId;
}
Debugging Invalid AWS Requests
The AWS SDK validates your request object locally, before dispatching it to the AWS servers. This validation will fail with unhelpfully opaque errors, like the OP.
As the SDK is open source, you can inspect the source to help narrow down the invalid parameter.
Before we look at the code: The SDK (and documentation) are actually generated from special JSON files that describe the API, its requirements and how to validate them. The actual code is generated based on these JSON files.
I'm going to use the Node.js SDK as an example, but I'm sure similar approaches may work for the other SDKs, including .NET
In our case (AWS Textract), the latest Api version is 2018-06-27. Sure enough, the JSON source file is on GitHub, here.
In my case, experimentation narrowed the issue down to the ClientRequestToken. The error was an opaque InvalidParameterException. I searched for it in the SDK source JSON file, and sure enough, on line 392:
"ClientRequestToken": {
"type": "string",
"max": 64,
"min": 1,
"pattern": "^[a-zA-Z0-9-_]+$"
},
A whole bunch of undocumented requirements!
In my case the token I was using violated the regex (pattern in the above source code). Changing my token code to satisfy the regex solved the problem.
I recommend this approach for these sorts of opaque type errors.
After a long days analyzing the issue. I was able to resolve it .. as per the documentation topic only required SendMessage Action to the SQS . But after changing it to All SQS Action its Started Working . But Still AWS Error message is really misleading and confusing
you would need to change the permissions to All SQS Action and then use the code as below
def startJob(s3BucketName, objectName):
response = None
response = textract.start_document_text_detection(
DocumentLocation={
'S3Object': {
'Bucket': s3BucketName,
'Name': objectName
}
})
return response["JobId"]
def isJobComplete(jobId):
# For production use cases, use SNS based notification
# Details at: https://docs.aws.amazon.com/textract/latest/dg/api-async.html
time.sleep(5)
response = textract.get_document_text_detection(JobId=jobId)
status = response["JobStatus"]
print("Job status: {}".format(status))
while(status == "IN_PROGRESS"):
time.sleep(5)
response = textract.get_document_text_detection(JobId=jobId)
status = response["JobStatus"]
print("Job status: {}".format(status))
return status
def getJobResults(jobId):
pages = []
response = textract.get_document_text_detection(JobId=jobId)
pages.append(response)
print("Resultset page recieved: {}".format(len(pages)))
nextToken = None
if('NextToken' in response):
nextToken = response['NextToken']
while(nextToken):
response = textract.get_document_text_detection(JobId=jobId, NextToken=nextToken)
pages.append(response)
print("Resultset page recieved: {}".format(len(pages)))
nextToken = None
if('NextToken' in response):
nextToken = response['NextToken']
return pages
Invoking textract with Python, I received the same error until I truncated the ClientRequestToken down to 64 characters
response = client.start_document_text_detection(
DocumentLocation={
'S3Object':{
'Bucket': bucket,
'Name' : fileName
}
},
ClientRequestToken= fileName[:64],
NotificationChannel= {
"SNSTopicArn": "arn:aws:sns:us-east-1:AccountID:AmazonTextractXYZ",
"RoleArn": "arn:aws:iam::AccountId:role/TextractRole"
}
)
print('Processing started : %s' % json.dumps(response))

RDS generate_presigned_url does not support the DestinationRegion parameter

I was trying to set up encrypted RDS replica in another region, but I got stuck on generating pre-signed URL.
It seems that boto3/botocore does not allow DestinationRegion parameter, which is defined as a requirement on AWS API (link) in case we want to generate PreSignedUrl.
Versions used:
boto3 (1.4.7)
botocore (1.7.10)
Output:
botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in input: "DestinationRegion", must be one of: DBInstanceIdentifier, SourceDBInstanceIdentifier, DBInstanceClass, AvailabilityZone, Port, AutoMinorVersionUpgrade, Iops, OptionGroupName, PubliclyAccessible, Tags, DBSubnetGroupName, StorageType, CopyTagsToSnapshot, MonitoringInterval, MonitoringRoleArn, KmsKeyId, PreSignedUrl, EnableIAMDatabaseAuthentication, SourceRegion
Example code:
import boto3
url = boto3.client('rds', 'eu-east-1').generate_presigned_url(
ClientMethod='create_db_instance_read_replica',
Params={
'DestinationRegion': 'eu-east-1',
'SourceDBInstanceIdentifier': 'abc',
'KmsKeyId': '1234',
'DBInstanceIdentifier': 'someidentifier'
},
ExpiresIn=3600,
HttpMethod=None
)
Same issue was already reported but got closed.
Thanks for help,
Petar
Generate Pre signed URL from the source region, then populate the create_db_instance_read_replica with that url.
The presigned URL must be a valid request for the CreateDBInstanceReadReplica API action that can be executed in the source AWS Region that contains the encrypted source DB instance
PreSignedUrl (string) --
The URL that contains a Signature Version 4 signed request for the CreateDBInstanceReadReplica API action in the source AWS Region that contains the source DB instance.
import boto3
session = boto3.Session(profile_name='profile_name')
url = session.client('rds', 'SOURCE_REGION').generate_presigned_url(
ClientMethod='create_db_instance_read_replica',
Params={
'DBInstanceIdentifier': 'db-1-read-replica',
'SourceDBInstanceIdentifier': 'database-source',
'SourceRegion': 'SOURCE_REGION'
},
ExpiresIn=3600,
HttpMethod=None
)
print(url)
source_db = session.client('rds', 'SOURCE_REGION').describe_db_instances(
DBInstanceIdentifier='database-SOURCE'
)
print(source_db)
response = session.client('rds', 'DESTINATION_REGION').create_db_instance_read_replica(
SourceDBInstanceIdentifier="arn:aws:rds:SOURCE_REGION:account_number:db:database-SOURCE",
DBInstanceIdentifier="db-1-read-replica",
KmsKeyId='DESTINATION_REGION_KMS_ID',
PreSignedUrl=url,
SourceRegion='SOURCE'
)
print(response)