How does Boto3 S3 put_object function works in python - python-2.7

With Boto3:
I am using put_object() function to upload object in s3. I am using put_object() with customer encryption key parameter for server side encryption.
With Boto:
I am using upload_chunk function to upload object in s3. Here I am using aws managed keys for server side encryption and not customer given as it is not supported in API
So With Boto3 approach my program is taking more memory than Boto approach.
Please tell me how does put_object function works in boto3 for server side encryption .
is it using memory of machine for encryption on which it has called ?
should I explicitly clean data buffer which is passed as Body parameter to put_object function ?
Code:
def put_s3_object(self, target_key_name, data, sse_cust_key, sse_cust_key_md5):
''' description: Upload file as s3 object using SSE with customer key
It will store s3 object in encrypted format
input:
target_key_name (#string) data(in memory string/bytes)
sse_cust_key (#string)
sse_cust_key_md5 (#string)
output: response
'''
if not target_key_name:
raise
try:
response = self.s3_client.put_object(Bucket = self.source_bucket, Body = data, Key = target_key_name, SSECustomerAlgorithm = awsParams.CLOUD_DR_AWS_SSE_ALGO, SSECustomerKey = sse_cust_key, SSECustomerKeyMD5 = sse_cust_key_md5)
del data
except botocore.exceptions.ClientError, fault:
raise
except Exception, fault:
raise

Instead of Boto3 put_object(), We can go with set_contents_to_string() and get_contents_as_string() functions from boto.
These calls also supports server side encryption with customer keys(SSE-C). we just need to give all keys information in headers
For more details
http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html

Related

When calling an operation from the AWS PHP SDK, is there a way to secure its payload contents in a non-readable format?

I've recently used the PHP SDK to test some operations under the SecretsManager service. Everything works fine. However, I needed to ensure the information sent in using the createSecret operation was safe from any third-party threats.
So I did a small investigation to view the request's body contents. I was able to view this content under StreamRequestPayloadMiddleware.php.
After modifying it by using json_decode to view the request's contents, I came across this:
array(4) {
["Name"]=> string(9) "demo/Test"
["SecretString"]=> string(39) "{"username":"Tom","password":"Test123"}"
["KmsKeyId"]=> string(xx) "arn:aws:kms:xx-xxxx-x:xxxxxxxxxx:key/xxx-xxx-xxx-xxx-xxxxxxxxxx"
["ClientRequestToken"]=> string(xx) "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
I then realized the plaintext contents of the SecretString were visible in the request's body.
I'm aware the SecretsManager uses a KMS key to encrypt the secret values; however, this only happens once the operation has been sent to the server-side (AWS Console).
Therefore, I need to know if there is any way to protect the payload contents in an encrypted format so that the SecretsManager service or AWS can unpack this content to its original value without having it saved in that encrypted format, on a new secret.
Traffic between user and service endpoint is encrypted by default through a secure HTTPS/TLS connection.
https://docs.aws.amazon.com/cli/latest/userguide/data-protection.html

Kinesis put records not returned in response from get records request

I have a Scala app using the aws-java-sdk-kinesis to issue a series of putRecord requests to a local kinesis stream.
The response returned after each putRecord request indicates its successfully putting the records into the stream.
The scala code making the putRecordRquest:
def putRecord(kinesisClient: AmazonKinesis, value: Array[Byte], streamName: String): Try[PutRecordResult] = Try {
val putRecordRequest = new PutRecordRequest()
putRecordRequest.setStreamName(streamName)
putRecordRequest.setData(ByteBuffer.wrap(value))
putRecordRequest.setPartitionKey("integrationKey")
kinesisClient.putRecord(putRecordRequest)
}
To confirm this I have a small python app that basically consumes from the stream (initialStreamPosition: LATEST). And prints the records it finds by iterating through the shard-iterators. But unexpectedly however it returns an empty set of records for each obtained shardIterator.
Trying this using the aws cli tool, I do however get records returned for the same shardIterator. I am confused? How can that be?
Running the python consumer (with LATEST), returns:
Shard-iterators: ['AAAAAAAAAAH9AUYVAkOcqkYNhtibrC9l68FcAQKbWfBMyNGko1ypHvXlPEuQe97Ixb67xu4CKzTFFGoLVoo8KMy+Zpd+gpr9Mn4wS+PoX0VxTItLZXxalmEfufOqnFbz2PV5h+Wg5V41tST0c4X0LYRpoPmEnnKwwtqwnD0/VW3h0/zxs7Jq+YJmDvh7XYLf91H/FscDzFGiFk6aNAVjyp+FNB3WHY0d']
Records: []
If doing the "same" with the aws cli tool however I get:
> aws kinesis get-records --shard-iterator AAAAAAAAAAH9AUYVAkOcqkYNhtibrC9l68FcAQKbWfBMyNGko1ypHvXlPEuQe97Ixb67xu4CKzTFFGoLVoo8KMy+Zpd+gpr9Mn4wS+PoX0VxTItLZXxalmEfufOqnFbz2PV5h+Wg5V41tST0c4X0LYRpoPmEnnKwwtqwnD0/VW3h0/zxs7Jq+YJmDvh7XYLf91H/FscDzFGiFk6aNAVjyp+FNB3WHY0d --endpoint-url http://localhost:4567
Returns:
{"Records":[{"SequenceNumber":"49625122979782922897342908653629584879579547704307482626","ApproximateArrivalTimestamp":1640263797.328,"Data":{"type":"Buffer","data":[123,34,116,105,109,101,115,116,97,109,112,34,58,49,54,52,48,50,54,51,55,57,55,44,34,100,116,109,34,58,49,54,52,48,50,54,51,55,57,55,44,34,101,34,58,34,101,34,44,34,116,114,97,99,107,101,114,95,118,101,114,115,105,111,110,34,58,34,118,101,114,115,105,111,110,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,101,115,116,46,99,111,109,34,44,34,104,99,99,34,58,102,97,108,115,101,44,34,115,99,34,58,49,44,34,99,111,110,116,101,120,116,34,58,123,34,101,116,34,58,34,101,116,34,44,34,100,101,118,34,58,34,100,101,118,34,44,34,100,119,101,108,108,34,58,49,44,34,111,105,100,34,58,49,44,34,119,105,100,34,58,49,44,34,115,116,97,116,101,34,58,123,34,108,99,34,58,123,34,99,111,100,101,34,58,34,115,111,109,101,45,99,111,100,101,34,44,34,105,100,34,58,34,115,111,109,101,45,105,100,34,125,125,125,44,34,121,117,105,100,34,58,34,102,53,101,52,57,53,98,102,45,100,98,102,100,45,52,102,53,102,45,56,99,56,98,45,53,97,56,98,50,56,57,98,52,48,49,97,34,125]},"PartitionKey":"integrationKey"},{"SequenceNumber":"49625122979782922897342908653630793805399163707871723522","ApproximateArrivalTimestamp":1640263817.338,"Data":{"type":"Buffer","data":[123,34,116,105,109,101,115,116,97,109,112,34,58,49,54,52,48,50,54,51,56,49,55,44,34,100,116,109,34,58,49,54,52,48,50,54,51,56,49,55,44,34,101,34,58,34,101,34,44,34,116,114,97,99,107,101,114,95,118,101,114,115,105,111,110,34,58,34,118,101,114,115,105,111,110,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,101,115,116,46,99,111,109,34,44,34,104,99,99,34,58,102,97,108,115,101,44,34,115,99,34,58,49,44,34,99,111,110,116,101,120,116,34,58,123,34,101,116,34,58,34,101,116,34,44,34,100,101,118,34,58,34,100,101,118,34,44,34,100,119,101,108,108,34,58,49,44,34,111,105,100,34,58,49,44,34,119,105,100,34,58,49,44,34,115,116,97,116,101,34,58,123,34,108,99,34,58,123,34,99,111,100,101,34,58,34,115,111,109,101,45,99,111,100,101,34,44,34,105,100,34,58,34,115,111,109,101,45,105,100,34,125,125,125,44,34,121,117,105,100,34,58,34,102,53,101,52,57,53,98,102,45,100,98,102,100,45,52,102,53,102,45,56,99,56,98,45,53,97,56,98,50,56,57,98,52,48,49,97,34,125]},"PartitionKey":"integrationKey"},{"SequenceNumber":"49625122979782922897342908653632002731218779711435964418","ApproximateArrivalTimestamp":1640263837.347,"Data":{"type":"Buffer","data":[123,34,116,105,109,101,115,116,97,109,112,34,58,49,54,52,48,50,54,51,56,51,55,44,34,100,116,109,34,58,49,54,52,48,50,54,51,56,51,55,44,34,101,34,58,34,101,34,44,34,116,114,97,99,107,101,114,95,118,101,114,115,105,111,110,34,58,34,118,101,114,115,105,111,110,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,101,115,116,46,99,111,109,34,44,34,104,99,99,34,58,102,97,108,115,101,44,34,115,99,34,58,49,44,34,99,111,110,116,101,120,116,34,58,123,34,101,116,34,58,34,101,116,34,44,34,100,101,118,34,58,34,100,101,118,34,44,34,100,119,101,108,108,34,58,49,44,34,111,105,100,34,58,49,44,34,119,105,100,34,58,49,44,34,115,116,97,116,101,34,58,123,34,108,99,34,58,123,34,99,111,100,101,34,58,34,115,111,109,101,45,99,111,100,101,34,44,34,105,100,34,58,34,115,111,109,101,45,1pre05,100,34,125,125,125,44,34,121,117,105,100,34,58,34,102,53,101,52,57,53,98,102,45,100,98,102,100,45,52,102,53,102,45,56,99,56,98,45,53,97,56,98,50,56,57,98,52,48,49,97,34,125]},"PartitionKey":"integrationKey"}],"NextShardIterator":"AAAAAAAAAAE+9W/bI4CsDfzvJGN3elplafFFBw81/cVB0RjojS39hpSglW0ptfsxrO6dCWKEJWu1f9BxY7OZJS9uUYyLn+dvozRNzKGofpHxmGD+/1WT0MVYMv8tkp8sdLdDNuVaq9iF6aBKma+e+iD079WfXzW92j9OF4DqIOCWFIBWG2sl8wn98figG4x74p4JuZ6Q5AgkE41GT2Ii2J6SkqBI1wzM","MillisBehindLatest":0}
The actual python consumer I have used in many other settings to introspec other kinesis streams we have and its working as expected. But for some reason here its not working.
Does anyone have a clue what might be going on here?
So I was finally able to identify the issue, and perhaps it will be useful for someone else with similar problem.
In my setup, I am using a local kinesis stream (kinesalite) which doesn't support CBOR. You have to disable this explicitly otherwise I was seeing the following error when trying to deserialize the received record.
Unable to unmarshall response (We expected a VALUE token but got: START_OBJECT). Response Code: 200, Response Text: OK
In my case, setting the environment variable: AWS_CBOR_DISABLE=1 did the trick

What is the difference between the Amazon S3 API calls GetObject and GetObjectRequest?

I am new to the Amazon S3 API and I am attempting to build a client using Go. I was confused about how I would go about writing a Get function to get an object from an S3 bucket. The documentation for the API calls are a little confusing to me, I am wondering what is the difference between using the GetObject call vs the GetObjectRequest call? And when is it appropriate to use one over the other?
Per the documentation:
Calling the request form of a service operation, which follows the naming pattern OperationName Request, provides a simple way to control when a request is built, signed, and sent. Calling the request form immediately returns a request object. The request object output is a struct pointer that is not valid until the request is sent and returned successfully.
So, use GetObject if you want to immediately send the request and wait for the response. Use GetObjectRequest if you prefer to construct the request but not send it till later.
For most scenarios, you'd probably just use GetObject.

What is the convention when using Boto3 clients vs resources?

So I have an API that makes calls to AWS services and I am using Boto3 in order to do this within my python application. The question I have deals with Boto3's client vs resource access levels. I think I understand the difference between them (one is low-level access the other is higher-level object-oriented service access) but my question is if it is okay to instantiate both a client and resource? For example, some resource functionality is easier to access using a resource over a client, but there is some functionality only the client has. Is it bad to instantiate both and use the easiest access level when needed or will there be some sort of disconnect when using two separate access levels when connecting to the same resource?
I am not running into any errors with my code to connect to SQS shown below, however I want to make sure that down the line I am not shooting myself in the foot by arbitrarily choosing between the client/resource for the same aws connection.
import boto3
REGION = 'us-east-1'
sqs_r = boto3.resource('sqs', REGION)
sqs_c = boto3.client('sqs', REGION)
def create_queue(queue_name):
queue_attributes = {
'FifoQueue': 'true',
'DelaySeconds': '0',
'MessageRetentionPeriod': '900', # 15 minutes to complete a command, else deleted.
'ContentBasedDeduplication': 'true'
}
try:
queue = sqs_r.get_queue_by_name(QueueName=queue_name)
except:
queue = sqs_r.create_queue(QueueName=queue_name, Attributes=queue_attributes)
def list_all_queues(queue_name_prefix=''):
all_queues = sqs_c.list_queues(QueueNamePrefix=queue_name_prefix)
print(all_queues['QueueUrls'])
print(type(all_queues))
Both of the above function work properly, one creates a queue and the other lists all of the queues at sqs. However, one function uses a resource and the other uses a client. Is this okay?
You can certainly use both.
The resource method actually uses the client method behind-the-scenes, so AWS only sees client-like calls.
In fact, the resource even contains a client. You can access it like this:
import boto3
s3 = boto3.resource('s3')
copy_source = {
'Bucket': 'mybucket',
'Key': 'mykey'
}
s3.meta.client.copy(copy_source, 'otherbucket', 'otherkey')
This example is from the boto3 documentation. It shows how a client is being extracted from a resource, and makes a client call, effectively identical to s3_client.copy().
Both client and resource just create a local object. There is no back-end activity involved.

How to constrain client to send correct sha256 as a file key on s3 upload? (presigned url)

I need to create a signed url to upload a file to an s3 bucket.
The s3 file key should be its sha256 hash.
The question then is: how can I make sure the client sends a valid hash? I'm creating the signed url at my lambda function and avoid passing the file through it, so the lambda of course cannot calculate the hash.
I'm thinking I can achieve this using 2 steps:
Force the client to send its calculated sha256 with the upload. Based on spec I am assuming this will be auto-checked when providing it in a x-amz-content-sha256 header.
Force client to send the same hash to the lambda so I can force it to be the key.
First, I tried this:
s3.getSignedUrl('putObject', { Key: userProvidedSha256 }, callback)
I tried adding a condition like { header: { 'X-Amz-Content-Sha256': userProvidedSha256 } }.
But I found no way of adding such a definition so that it actually forces the client to send a X-Amz-Content-Sha256 header.
Also, I would have taken the same approach to enforce a fixed required Content-Length header (client sends desired length to back-end, there we sign it), but not sure that would work because of this issue.
Because I found out that s3.createPresignedPost also lets me limit max attachment size and appears more flexible, I went down that route:
const signPostFile = () => {
const params = {
Fields: {
key: userProvidedSha256
},
Expires: 86400,
Conditions: [
['content-length-range', 0, 10000000],
{ 'X-Amz-Content-Sha256': userProvidedSha256]
]
}
s3.createPresignedPost(params, callback)
}
But while that works (it forces the client to send the enforced sha256 header, and the header gets passed, see request log below), it looks like the client now has to add the x-amz-content-sha256 into the form fields rather than the header. This seems to be as intended, but it clearly appears that s3 won't check the submitted file against the provided sha256: any file I append to the form is successfully uploaded even if the sha256 is a mismatch.
Any suggestion what's wrong, or how else I can enforce the sha256 condition, while also limiting content length?
Update: I'm using signature v4, and I've tried a S3 policy Deny for this condition:
Condition:
StringEquals:
s3:x-amz-content-sha256: UNSIGNED-PAYLOAD
Relevant request log for submitting a file containing the string "hello world":
----------------------------986452911605138616518063
Content-Disposition: form-data; name="X-Amz-Content-Sha256"
b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
----------------------------986452911605138616518063
Content-Disposition: form-data; name="key"
b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
To my knowledge S3 does not provide sha256 by default. However, by listening to S3 events you can implement a Lambda function that does this automatically for you. Here is a suggestion that comes to mind:
The client requests a S3 signed url based on the user provided sha256
The client uploads the file using the signed url
A Lambda function is configured to listen to s3:ObjectCreated:* events from the upload bucket
When the upload is completed, the Lambda function is triggered by a S3 Message Event. A part of the event is the S3 object key
The Lambda function downloads the uploaded file and re-calculate the sha256
The Lambda function deletes the file if the calculated sha256 value differs from the sha256 value that was provided by the client (either as the object key or available from the objects metadata)
Alternatively, if the main objective is to verify the file integrity of uploaded files, S3 provides another option to use sha256 when calculating checksums.
Configure a bucket policy to only accept requests that have been signed
Configure the client AWS S3 sdk to use AWS signature version 4, e.g.
const s3 = new AWS.S3({apiVersion: '2006-03-01', signatureVersion: 'v4'});
The S3.putObject() function will sign the request before uploading the file
S3 will not store an object if the signature is wrong as described in the AWS CLI S3 FAQ