I'm trying to use put_object_lock_configuration() API call to disable object locking on an Amazon S3 bucket using python boto3.
This is how I use it:
response = s3.put_object_lock_configuration(Bucket=bucket_name,
ObjectLockConfiguration={
'ObjectLockEnabled': 'Disabled'});
I always get exception with the following error.
botocore.exceptions.ClientError: An error occurred (MalformedXML) when calling the PutObjectLockConfiguration operation: The XML you provided was not well-formed or did not validate against our published schema
I suspect I miss the 2 parameters 'Token' and 'ContentMD5'. Does anyone know how do I get these values?
The only value of 'ObjectLockEnabled' allowed is 'Enabled'. My intention is to disable object lock. but this is not possible. because object lock is defined during bucket creation time and it can't be changed afterward. However, I can provide empty rule and the retention mode will become 'None', which is essentially no object lock.
Here is the boto3 code for blank retention rule, the precondition is to use mode=GOVERNANCE in the first place.
client.put_object_retention(
Bucket=bucket_name, Key=object_key,
Retention={},
BypassGovernanceRetention=True
)
Related
I am trying to retrieve a deleted file from s3, without restoring it. I can do it through the console (clicking Download), but I need to do it programmatically.
The following call lists the version I need:
s3.list_object_versions(
Bucket="...",
KeyMarker="/.../part-00000-5ceb032b-c918-47df-a2ad-f02f3790077a-c000.csv",
VersionIdMarker = "A1GxocexjsirkzKfo47lvQ0r7ythwCWM",
MaxKeys=1
)
However, s3.get_object() with the same parameters, returns "ClientError: An error occurred (NoSuchVersion) when calling the GetObject operation: The specified version does not exist."
What is the proper way of retrieving a specific version of a deleted file?
Based on the comments. The issue was caused by using / in a key a prefix. Prefixes do not start with a /. Thus it should be:
KeyMarker=".../part-00000-5ceb032b-c918-47df-a2ad-f02f3790077a-c000.csv"
I have 2 api calls I want to make to AWS:
put item into s3
write a row to DynamoDB
I'd like either both to happen, or if there's an error, neither to happen.
Is it possible to achieve that using boto3?
This isn't possible to do automatically. There is no facility to flag multiple actions in Boto3 as atomic. You will need to write code to check the response code, and also catch exceptions, from both of those actions, and then skip or roll-back the other action.
For example if you already successfully PUT an object to S3, but the DynamoDB insert fails, you would have to capture that failure, and then run an S3 delete operation.
I'm trying to restore the files for which I had enabled life cycle rule to Glacier deep archive. When I am trying to restore to a different directory with below command on AWS CLI it's throwing an error after downloading few files.
Command used to restore the directory:aws s3 cp s3://xxxxxxx/cf-ant-prod/year=2020/ s3://xxxxxxxx/atest/ --force-glacier-transfer --storage-class STANDARD --recursive --profile mfa
Error: An error occurred (InvalidObjectState) when calling the CopyObject operation: Operation is not valid for the source object's storage clas
As mentioned on your other question, the --force-glacier-transfer parameter does not restore objects stored in Glacier. It is simply a way to avoid warning notices.
To retrieve from Glacier Deep Archive you will need to:
Use restore-object to change the Storage Class to Standard or Standard-IA -- this will take some time to restore
Copy the file to your desired location
It is not possible to do an instant restore or a Restore+Copy.
As mentioned by John Rotenstein - it appears a simple restore of an object from Glacier must be done "in place" and once restored it can be manipulated (copied) as needed.
I was attempting to do something similar to the question topic via Lambda and I struggled for a while because I found the documentation to be murky regarding the fact that restoreObject() requests are either an SQL Select object restoration OR a simple single object restore... and most significantly which parameters apply to which operational mode.
My goal was to restore an object out of Glacier and to a new location/file name in the same bucket. The documentation strongly suggests that this is possible because there are parameters within OutputLocation that allow the BucketName and Prefix to be specified... as it seems to be the case these parameters only apply to SQL Select object restoration.
The confusing part for me was related to the parameters for the restoreObject() method there isn't sufficient differentiation to know that you can't for example provide the Description parameter when making a simple restore request using the GlacierJobParameters parameter... What was frustrating for me was that I would get errors such as:
MalformedXML: The XML you provided was not well-formed or did not validate against our published schema
There was no indication as to where the published schema is located and Googling for the published schemas yielded no results that seemed to apply to the S3 API... my hope was that I could get out of the API documentation and directly refer to the "published schema"... (published where/how?)
My suggestion would be that the documentation for the restoreObject() method be improved and/or the restoreObject() method is split into a simpleRestoreObject() and an sqlRestoreObject() object so that the parameter schemas are cleanly distinct.
Restoring objects from S3 Glacier Deep Archive (or Glacier, for that matter) must be done individually, and before copying those objects to some other location.
One way to accomplish this is by first retrieving the list of objects in the desired folder using s3 ls, for example
aws s3 ls s3://xxxxxxx/cf-ant-prod/year=2020/ --recursive
and, using each of those object names, running a restore command individually:
aws s3api restore-object --bucket s3://xxxxxxx --key <keyName> --restore-request Days=7
This will initiate a standard restore request for each object, so expect this to take 12-24 hours. Then, once the restores are complete, you are free to copy those objects using your above syntax.
Another option would be to use a tool such as s3cmd, which supports recursive restores given a bucket and folder. However, you'll still have to wait for the restore requests to complete before running a cp command.
I have written a Lambda function which gets invoked automatically when a file comes into my S3 bucket.
I perform certain validations on this file, modify the particular and put the file at the same location.
Due to this "put", my lambda is called again and the process goes on till my lambda execution times out.
Is there any way to trigger this lambda only once?
I found an approach where I can store the file name in DynamoDB and can apply a check in lambda function, but can there be any other approach where DynamoDB's use can be avoided?
You have a couple options:
You can put the file to a different location in s3 and delete the original
You can add a metadata field to the s3 object when you update it. Then check for the presence of that field in s3 so you know if you have processed it already. Now this might not work perfectly since s3 does not always provide the most recent data on reads after updates.
AWS allows different type of s3 event triggers. You can try playing s3:ObjectCreated:Put vs s3:ObjectCreated:Post.
You can upload your files in a folder, say
s3://bucket-name/notvalidated
and store the validated in another folder, say
s3://bucket-name/validated.
Update your S3 Event notification to invoke your lambda function whenever there is a ObjectCreate(All) event in the /notvalidated prefix.
The second answer does not seem to be correct (put vs post) - there is not really a concept of update in S3 in terms of POST or PUT. The request to update an object will be the same as the initial POST of the object. See here for details on the available S3 events.
I had this exact problem last year - I was doing an image resize on PUT and every time a file was overwritten, it would be triggered again. My recommended solution would be to have two folders in your s3 bucket - one for the original file and one for the finalized file. You could then create the lambda trigger with the lambda prefix so it only checks the files in the original folder
The events are triggered in S3 based on if the object is put/post/copy/complete Multipart Upload - All these operations corresponds to ObjectCreate as per AWS documentation .
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
The best solution is to restrict your S3 object create event to particular bucket location. So that any change in that bucket location will trigger lambda function.
You can do the modification in some other bucket location which is not configured to trigger lambda function when object is created in that location.
Hope it helps!
If I update an object in an S3 Bucket, and trigger on that S3 PUT event as my Lambda trigger, is there a chance that the Lambda could operate on the older version of that object given S3’s eventual consistency model?
I’m having a devil of a time parsing out an authoritative answer either way...
Yes, there is a possibility that a blind GET of an object could fetch a former version.
There are at least two solutions that come to mind.
Weak: the notification event data contains the etag of the newly-uploaded object. If the object you fetch doesn't have this same etag in its response headers, then you know it isn't the intended object.
Strong: enable versioning on the bucket. The event data then contains the object versionId. When you download the object from S3, specify this exact version in the request. The consistency model is not as well documented when you overwrite an object and then download it with a specific version-id, so it is possible that this might result in an occasional 404 -- in which case, you almost certainly just spared yourself from fetching the old object -- but you can at least be confident that S3 will never give you a version other than the one explicitly specified.
If you weren't already using versioning on the bucket, you'll want to consider whether to keep old versions around, or whether to create a lifecycle policy to purge them... but one brilliantly-engineered feature about versioning is that the parts of your code that were written without awareness of versioning should still function correctly with versioning enabled -- if you send non-versioning-aware requests to S3, it still does exactly the right thing... for example, if you delete an object without specifying a version-id and later try to GET the object without specifying a version-id, S3 will correctly respond with a 404, even though the "deleted" version is actually still in the bucket.
How does the file get there in the first place? I'm asking, because if you could reverse the order, it'd solve your issue as you put your file in s3 via a lambda that before overwriting the file, can first get the existing version from the bucket and do whatever you need.