Restoring files on a version enabled amazon s3 bucket - amazon-web-services

I am trying to enable versioning and lifecycle policies on my Amazon S3 buckets. I understand that it is possible to enable Versioning first and then apply LifeCycle policy on that bucket. If you see the image below, that will confirm this idea.
I have then uploaded a file several times which created several versions of the same file. I then deleted the file and still able to see several versions. However, if I try to restore a file, I see that the Initiate Restore option is greyed out.
I would like to ask anyone who had any similar issue or let me know what I am doing wrong.
Thanks,

Bucket Versioning on Amazon S3 keeps all versions of objects, even when they are deleted or when a new object is uploaded under the same key (filename).
As per your screenshot, all previous versions of the object are still available. They can be downloaded/opened in the S3 Management Console by selecting the desired version and choosing Open from the Actions menu.
If Versions: Hide is selected, then each object only appears once. Its contents is equal to the latest uploaded version of the object.
Deleting an object in a versioned bucket merely creates a Delete Marker as the most recent version. This makes the object appear as though it has been deleted, but the prior versions are still visible if you click the Versions: Show button at the top of the console. Deleting the Delete Marker will make the object reappear and the contents will be the latest version uploaded (before the deletion).
If you want a specific version of the object to be the "current" version, either:
Delete all versions since that version (making the desired version that latest version), or
Copy the desired version back to the same object (using the same key, which is the filename). This will add a new version, but the contents will be equal to the version you copied. The copy can be performed in the S3 Management Console -- just choose Copy and then Paste from the Actions Menu.
Initiate Restore is used with Amazon Glacier, which is an archival storage system. This option is not relevant unless you have created a Lifecycle Policy to move objects to Glacier.

With the new console, you can do it as following.
Click on the Deleted Objects button
You will see your deleted object below, Select it
Click on More -> Undo delete

If you have a lot of deleted files to restore. You might want to use a script to do the job for you.
The script should
Get the versions of objects in your bucket using the Get object versions API
Inspect the versions data to get Delete Marker (i.e. delete objects) name and version id
Delete the markers found using the marker names and version ids using Delete object API
Python example with boto:
This example script deletes delete markers found one by one once.
#!/usr/bin/env python
import boto
BUCKET_NAME = "examplebucket"
DELETE_DATE = "2015-06-08"
bucket = boto.connect_s3().get_bucket(BUCKET_NAME)
for v in bucket.list_versions():
if (isinstance(v, boto.s3.deletemarker.DeleteMarker) and
v.is_latest and
DELETE_DATE in v.last_modified):
bucket.delete_key(v.name, version_id=v.version_id)
Python example with boto3:
However, if you have thousands of objects, this could be a slow process. AWS does provide a way to batch delete objects with a maximum batch size of 1000.
The following example script searches your objects with a prefix, and test them if they are deleted ( i.e. current version is a delete marker ) and them batch delete them. It is set to search 500 objects in your bucket in each batch, and try to delete multiple object with a batch no more than 1000 objects.
import boto3
client = boto3.client('s3')
def get_object_versions(bucket, prefix, max_key, key_marker):
kwargs = dict(
Bucket=bucket,
EncodingType='url',
MaxKeys=max_key,
Prefix=prefix
)
if key_marker:
kwargs['KeyMarker'] = key_marker
response = client.list_object_versions(**kwargs)
return response
def get_delete_markers_info(bucket, prefix, key_marker):
markers = []
max_markers = 500
version_batch_size = 500
while True:
response = get_object_versions(bucket, prefix, version_batch_size, key_marker)
key_marker = response.get('NextKeyMarker')
delete_markers = response.get('DeleteMarkers', [])
markers = markers + [dict(Key=x.get('Key'), VersionId=x.get('VersionId')) for x in delete_markers if
x.get('IsLatest')]
print '{0} -- {1} delete markers ...'.format(key_marker, len(markers))
if len(markers) >= max_markers or key_marker is None:
break
return {"delete_markers": markers, "key_marker": key_marker}
def delete_delete_markers(bucket, prefix):
key_marker = None
while True:
info = get_delete_markers_info(bucket, prefix, key_marker)
key_marker = info.get('key_marker')
delete_markers = info.get('delete_markers', [])
if len(delete_markers) > 0:
response = client.delete_objects(
Bucket=bucket,
Delete={
'Objects': delete_markers,
'Quiet': True
}
)
print 'Deleting {0} delete markers ... '.format(len(delete_markers))
print 'Done with status {0}'.format(response.get('ResponseMetadata', {}).get('HTTPStatusCode'))
else:
print 'No more delete markers found\n'
break
delete_delete_markers(bucket='data-global', prefix='2017/02/18')

I have realised that I can perform and Initiate Restore operation once the object is stored on Gliacer, as shown by the Storage Class of the object. To restore a previous copy on S3, the Delete marker on the current Object has to be removed.

Related

How to move a File from One folder to Another Folder in the same AWS S3 bucket using Lambda?

I am trying to Automate File movement from One folder to another folder within the same S3 bucket on the file creation event in the S3 bucket.
I was hoping to use Lambda function's triggers to do this but I feel, Lambda triggers at the Root directory level and can not use it at the Folder Level.
Example:
Bucket Name: my-only-s3-bucket
Source Folder: s3://my-only-s3-bucket/Landing
Target Folder: s3://my-only-s3-bucket/Staging
Requirement:
When a file gets created or uploaded into, Source Folder: s3://my-only-s3-bucket/Landing, it should get moved to s3://my-only-s3-bucket/Staging automatically without any manual intervention
How to achieve this?
I was hoping to use Lambda function's triggers to do this but I feel, Lambda triggers at the Root directory level and can not use it at the Folder Level.
This is not true. S3 has no concept of folders. You can trigger at any "level" using a filter prefix i.e prefix -> "Landing/" and/or a suffix (as example ".jpg").
S3 trigger will call the lambda and pass the event with the new object as input. Then just use any language you are familiar with and use s3 copy built in function from any of the available AWS SDK(.Net, Java, python, etc..) to copy to the destination.
example:
def object_copied?(
s3_client,
source_bucket_name,
source_key,
target_bucket_name,
target_key)
return true if s3_client.copy_object(
bucket: target_bucket_name,
copy_source: source_bucket_name + '/' + source_key,
key: target_key
)
rescue StandardError => e
puts "Error while copying object: #{e.message}"
end
I think the concept of relative path can solve your problem. Here's the code snippet that solves your problem using a library called s3pathlib, a objective-oriented s3 file system interface.
# import the library
from s3pathlib import S3Path
# define source and target folder
source_dir = S3Path("my-only-s3-bucket/Landing/")
target_dir = S3Path("my-only-s3-bucket/Staging/")
# let's say you have a new file in Landing folder, the s3 uri is
s3_uri = "s3://my-only-s3-bucket/Landing/my-subfolder/data.csv"
# I guess you want to cut the file to the new location and delete the original one
def move_file(p_file, p_source_dir, p_target_dir):
# validate if p_file is inside of p_source_dir
if p_file.uri.startswith(p_source_dir.uri):
raise ValueError
# find new s3 path based on the relative path
p_file_new = S3Path(
p_target_dir, p_file.relative_to(p_source_dir)
)
# move
p_file.move_to(p_file_new)
# if you want copy you can do p_file.copy_to(p_file_new)
# then let's do your work
if __name__ == "__main__":
move_file(
p_file=S3Path.from_s3_uri(s3_uri),
p_source_dir=source_dir,
p_target_dir=target_dir,
)
If you want more advanced path manipulation, you can reference this document. And the S3Path.change(new_abspath, new_dirpath, new_dirname, new_basename, new_fname, new_ext) would be the most important one you need to know.

aws c++ sdk s3 ListObjects in oldest to newest order

With AWS SDK for C++ ListObjectsRequest, is there a way to have the SDK return the newest objects in the bucket based on the value of Set Marker? In the below code, the problem is that the objects in the object_list vector are newest to oldest. So, setting "last_key" to the key of the "back()" element results in 0 results the next time through in the loop (because there is nothing older than "last_key.")
If I set "last_key" to the key value of "front()" you end up looping over the same bucket objects over again.
Is there a way to use "last_key" that tells the SDK to get objects NEWER than "last_key"?
Aws::String last_marker;
while(true) {
Aws::S3::Model::ListObjectsRequest objects_request;
objects_request.WithBucket(bucket);
if (last_key.length() > 0) {
objects_request.SetMarker(last_key);
}
Aws::S3::Model::ListObjectsOutcome outcome;
outcome = s_client.ListObjects(objects_request);
Aws::Vector<Aws::S3::Model::Object> object_list =
outcome.GetResult().GetContents();
//loop through object vector.
last_marker.assign(object_list.back().GetKey());
}
Typically, the marker you supply to a ListObjects call is the 'next marker' that was returned to you in the result of your previous call to ListObjects. It's a pagination marker. No, it doesn't support the notion of "later than", which is a timestamp-based concept, whereas the marker relates to S3 object keys, not timestamps.
Also note: you should use ListObjectsV2 rather than ListObjects.

Amazon S3 copying multiple files from one bucket to another

I have a large list of objects in source S3 bucket and i selectively want to copy a subset of objects in to destination bucket.
As per doc here it seems its possible with TransferManager.copy(from_bucket, from_key, to_bucket, to_key), however i need to do it one at a time.
Is anyone aware of other ways, preferably to copy in a batched fashion instead of calling copy() for each object ?
If you wish to copy a whole directory, you could use the AWS Command-Line Interface (CLI):
aws s3 cp --recursive s3://source-bucket/folder/* s3://destination-bucket/folder/
However, since you wish to selectively copy files, there's no easy way to indicate which files to copy (unless they all have the same prefix).
Frankly, when I need to copy selective files, I actually create an Excel file with a list of filenames. Then, I create a formula like this:
="aws s3 cp s3://source-bucket/"&A1&" s3://destination-bucket/"
Then just use Fill Down to replicate the formula. Finally, copy the commands and paste them into a Terminal window.
If you are asking whether there is a way to programmatically copy multiples between buckets using one API call, then the answer is no, this is not possible. Each API call will only copy one object. You can, however, issue multiple copy commands in parallel to make things go faster.
I think it's possible via the S3 console but using the SDK there's no such option. Although this isn't the solution to your problem, this script selectively copies objects one at a time and if you're reading from an external file, it's just a matter of entering your file names there.
ArrayList<String> filesToBeCopied = new ArrayList<String>();
filesToBeCopied.add("sample.svg");
filesToBeCopied.add("sample.png");
String from_bucket_name = "bucket1";
String to_bucket = "bucket2";
BasicAWSCredentials creds = new BasicAWSCredentials("<key>","<secret>");
final AmazonS3 s3 = AmazonS3ClientBuilder.standard().withRegion(Regions.AP_SOUTH_1)
.withCredentials(new AWSStaticCredentialsProvider(creds)).build();
ListObjectsV2Result result = s3.listObjectsV2(from_bucket_name);
List<S3ObjectSummary> objects = result.getObjectSummaries();
try {
for (S3ObjectSummary os : objects) {
String bucketKey = os.getKey();
if (filesToBeCopied.contains(bucketKey)) {
s3.copyObject(from_bucket_name, bucketKey, to_bucket, bucketKey);
}
}
} catch (AmazonServiceException e) {
System.err.println(e.getErrorMessage());
System.exit(1);
}

Is it possible to rename an AWS Lambda function?

I have created some AWS Lambda functions for testing purposes (named as test_function something), then after testing I found those functions can be used in prod environment.
Is it possible to rename the AWS Lambda function? and how?
Or should I create a new one and copy paste source code?
The closest you can get to renaming the AWS Lambda function is using an alias, which is a way to name a specific version of an AWS Lambda function. The actual name of the function though, is set once you create it. If you want to rename it, just create a new function and copy the exact same code into it. It won't cost you any extra to do this (since you are only charged for execution time) so you lose nothing.
For a reference on how to name versions of the AWS Lambda function, check out the documentation here: Lambda function versions
.
You cannot rename the function, your only option is to follow the suggestions already provided here or create a new one and copypaste the code.
It's a good thing actually that you cannot rename it: if you were able to, it would cease to work because the policies attached to the function still point to the old name, unless you were to edit every single one of them manually, or made them generic (which is ill-advised).
However, as a best practice in terms of software development, I suggest you to always keep production and testing (staging) separate, effectively duplicating your environment.
This allows you to test stuff on a safe environment, where if you make a mistake you don't lose anything important, and when you confirm that your new features work, replicate them in production.
So in your case, you would have two lambdas, one called 'my-lambda-staging' and the other 'my-lambda-prod'. Use the ENV variables of lambdas to adapt to the current environment, so you don't need to refactor!
My solution is to export the function, create a new Lambda, then upload the .zip file to the new Lambda.
My solution for lambda rename, basically use boto3 describe previous lambda info for configuration setting and download the previous lambda function code to create a new lambda, but the trigger won't be set so you need to add trigger back manually
from boto3.session import Session
from botocore.client import Config
from botocore.handlers import set_list_objects_encoding_type_url
import boto3
import pprint
import urllib3
pp = pprint.PrettyPrinter(indent=4)
session = Session(aws_access_key_id= {YOUR_ACCESS_KEY},
aws_secret_access_key= {YOUR_SECRET_KEY},
region_name= 'your_region')
PREV_FUNC_NAME = 'your_prev_function_name'
NEW_FUNC_NAME = 'your_new_function_name'
def prev_lambda_code(code_temp_path):
'''
download prev function code
'''
code_url = code_temp_path
http = urllib3.PoolManager()
response = http.request("GET", code_url)
if not 200 <= response.status < 300:
raise Exception(f'Failed to download function code: {response}')
return response.data
def rename_lambda_function(PREV_FUNC_NAME , NEW_FUNC_NAME):
'''
Copy previous lambda function and rename it
'''
lambda_client = session.client('lambda')
prev_func_info = lambda_client.get_function(FunctionName = PREV_FUNC_NAME)
if 'VpcConfig' in prev_func_info['Configuration']:
VpcConfig = {
'SubnetIds' : prev_func_info['Configuration']['VpcConfig']['SubnetIds'],
'SecurityGroupIds' : prev_func_info['Configuration']['VpcConfig']['SecurityGroupIds']
}
else:
VpcConfig = {}
if 'Environment' in prev_func_info['Configuration']:
Environment = prev_func_info['Configuration']['Environment']
else:
Environment = {}
response = client.create_function(
FunctionName = NEW_FUNC_NAME,
Runtime = prev_func_info['Configuration']['Runtime'],
Role = prev_func_info['Configuration']['Role'],
Handler = prev_func_info['Configuration']['Handler'],
Code = {
'ZipFile' : prev_lambda_code(prev_func_info['Code']['Location'])
},
Description = prev_func_info['Configuration']['Description'],
Timeout = prev_func_info['Configuration']['Timeout'],
MemorySize = prev_func_info['Configuration']['MemorySize'],
VpcConfig = VpcConfig,
Environment = Environment,
PackageType = prev_func_info['Configuration']['PackageType'],
TracingConfig = prev_func_info['Configuration']['TracingConfig'],
Layers = [Layer['Arn'] for Layer in prev_func_info['Configuration']['Layers']],
)
pp.pprint(response)
rename_lambda_function(PREV_FUNC_NAME , NEW_FUNC_NAME)

Adding a File object programatically in Plone using PloneFormGen

I'm writing a PloneFormGen custom action adapter in order to add a File object to a folder from the File Field in the form. Here is the script:
target = context.filefolder
form = request.form
uid = str(DateTime().millis())
target.invokeFactory("File", id=uid, file=form['arquivo-do-cv_file'])
obj = target[uid]
"filefolder" is the name of a folder inside the parent folder for the PFG FormFolder. This script is configured to run with a Manager proxy role.
Problem is that the File objects created this way won't show the "Click here to download the file" link when I view them. The files can be downloaded though, if I suppress the "/view" part from the end of the URL. What am I missing when calling invokeFactory to create the File object?
UPDATE: What I meant is that I don't get the "filename - filetype, size in KBs (size in bytes)" link for the document, below the byline. When I create a File object using the normal Plone UI, it does show up.
I suspect nothing; I think that is the default behavior in Plone 4.
I just added a File and I don't see any "Click here to download the file".
And a quick search does not reveal the string "click here to download":
aclark#Alex-Clarks-MacBook-Pro:~/Developer/test-4.1/ > grep -ir "Click here to download" parts/omelette
parts/omelette/plone/app/jquerytools/browser/jquery.tools.plugins.js: (root.tagName == 'A' ? "<p>Click here to download latest version</p>" :
parts/omelette/plone/app/jquerytools/browser/jquery.tools.plugins.min.js:" or greater is required</h2><h3>"+(g[0]>0?"Your version is "+g:"You have no flash plugin installed")+"</h3>"+(a.tagName=="A"?"<p>Click here to download latest version</p>":"<p>Download latest version from <a href='"+k+"'>here</a></p>");if(a.tagName=="A")a.onclick=function(){location.href=k}}if(b.onFail){var d=b.onFail.call(this);if(typeof d=="string")a.innerHTML=d}}if(i)window[b.id]=document.getElementById(b.id);f(this,{getRoot:function(){return a},getOptions:function(){return b},getConf:function(){return c},
I don't have a Plone instance to test it, but try to call processForm() after invokeFactory. It will:
unmark creation flag;
rename object according to title;
reindex the object;
invoke the after_creation script and fire the ObjectInitialized event.
These actions are detailed on Object Construction Lifecycle. Maybe some of these actions are needed to create the KB information you're after (I'm hoping it's the index).