S3 Lifecycle Policy Delete All Objects WITHOUT A Certain Tag Value - amazon-web-services

While reading over this S3 Lifecycle Policy document I see that it's possible to delete an S3 object containing a particular key=value pair e.g.,
<LifecycleConfiguration>
<Rule>
<Filter>
<Tag>
<Key>key</Key>
<Value>value</Value>
</Tag>
</Filter>
transition/expiration actions.
...
</Rule>
</LifecycleConfiguration>
But is it possible to create a similar rule that deletes any object NOT in the key=value pair? For example, anytime my object is accessed I could update it's tag with the days current date e.g., object-last-accessed=07-26-2019. Then I could create a Lambda function that deletes the current S3 Lifecycle policy each day and then create a new lifecycle policy that has a tag for each of the last 30 days, then my lifecycle policy would automatically delete any object that has not been accessed in the last 30 days; anything that was accessed longer than 30 days would have a date value older than any value inside the lifecycle policy and hence it would get deleted.
Here's an example of what I desire (note I added the desired field <exclude>,
<LifecycleConfiguration>
<Rule>
<Filter>
<exclude>
<Tag>
<Key>last-accessed</Key>
<Value>07-30-2019</Value>
</Tag>
...
<Tag>
<Key>last-accessed</Key>
<Value>07-01-2019</Value>
</Tag>
<exclude>
</Filter>
transition/expiration actions.
...
</Rule>
</LifecycleConfiguration>
Is something like my made up <exclude> value possible? I want to delete any S3 Object that has not been accessed in the last 30 days (that's different than an object which is older than 30 days).

From what I understand, this is possible but via a different mechanism.
My solution is to take a slightly different approach and set a tag on every object and then alter that tag as you need.
So in your instance when the object is created set object-last-accessed to "default" do that through an S3 trigger to a piece of Lambda or when the object is written to S3.
When the object is accessed, then update the tag value to the current date.
If you already have a bucket full of objects, you can use S3 batch to set the tag to the current date and use that as a delta reference point from which to assume files were last accessed
https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectTagging.html
Now set the lifecycle rule to remove objects with a tag of "default" after 10 days (or whatever you want).
Add additional rules to remove files with a tag of a date 10 days after that date. You will need to update the lifecycle rule periodically, but you can create 1000 at a time.
this doc gives details of the formal for a rule
https://docs.aws.amazon.com/AmazonS3/latest/API/API_LifecycleRule.html
I'd suggest something like this
<LifecycleConfiguration>
<Rule>
<ID>LastAccessed Default Rule</ID>
<Filter>
<Tag>
<Key>object-last-accessed</Key>
<Value>default</Value>
</Tag>
</Filter>
<Status>Enabled</Status>
<Expiration>
<Days>10</Days>
</Expiration>
</Rule>
<Rule>
<ID>Last Accessed 2020-05-19 Rule</ID>
<Filter>
<Tag>
<Key>object-last-accessed</Key>
<Value>2020-05-19</Value>
</Tag>
</Filter>
<Status>Enabled</Status>
<Expiration>
<Date>2020-05-29</Date>
</Expiration>
</Rule>
</LifecycleConfiguration>

Reading further on this, as I'm faced with this problem, an alternative is to just use the object lock retention mode which allows you to set a default retention on a bucket and then change that retention period as the file is accessed. This works at an version level, i.e. each version is retained for a period not the whole file, so may not be suitable for all. more details in https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lock-overview.html#object-lock-retention-modes

Related

AWS S3: Lifecycle rules for permanently deleting delete markers

I currently have versioning set up and thus a delete results to a delete marker and not to a permanent delete, which is what I like.
Is there a lifecycle rule that I can add so that after 30 days since the delete markers were created, they are permanently deleted?
I only want to do this for delete markers, not for expiring anything else.
Straight from the documentation:
<LifecycleConfiguration>
<Rule>
...
<NoncurrentVersionExpiration>
<NoncurrentDays>30</NoncurrentDays>
</NoncurrentVersionExpiration>
</Rule>
</LifecycleConfiguration>
Important thing to note is:
A delete marker with zero noncurrent versions is referred to as an
expired object delete marker.

AWS S3 Transfer file with expire days with s3cmd by perl

Can you help me with that script? I would like to transfer files
to my bucket on S3 AWS.
My code:
$cmd = "s3cmd -v -c /root/.s3cfg put /var/project_db_" . $date . ".sql.xz s3://bucket600";
My second code - What I'm using. And doesn't work.
$cmd = "s3cmd expire -v -c /root/.s3cfg put /var/project_db_" . $date . ".sql.xz --expiry-day=90 s3://bucket600";
Thank you for your help
I searched Google for s3cmd expire and the first result took me to this page which says this:
Advanced features
Object Expiration with s3cmd
You can set an object
expiration policy on a bucket, so that objects older than a particular
age will be deleted automatically. The expiration policy can have a
prefix, an effective date, and number of days to expire after.
s3cmd v2.0.0 can be used to set or review the policy:
[lxplus-cloud]$ s3cmd expire s3://dvanders-test --expiry-days 2
Bucket 's3://dvanders-test/': expiration configuration is set.
[lxplus-cloud]$ s3cmd getlifecycle s3://dvanders-test
<?xml version="1.0" ?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Rule>
<ID>ir0smpb610i0lthrl31jpxzegwximbcz3rrgb1he2yfxgudm</ID>
<Prefix/>
<Status>Enabled</Status>
<Expiration>
<Days>2</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
Additional s3cmd expire options include:
--expiry-date=EXPIRY_DATE
Indicates when the expiration rule takes effect. (only
for [expire] command)
--expiry-days=EXPIRY_DAYS
Indicates the number of days after object creation the
expiration rule takes effect. (only for [expire] command)
--expiry-prefix=EXPIRY_PREFIX
Identifying one or more objects with the prefix to
which the expiration rule applies. (only for [expire]
command)
So you use the --expiry-date or --expiry-days command line option to do what you want.
(This question has nothing at all to do with Perl.)
when you have to place any file into the s3 bucket. You need to provide the location inside the s3 where you would want the file to be stored. If the specified folder(abstract) does not exist then it will be created.
In case you want to put the file to / (root) of the s3 then the destination would be set as "s3://bucket600/" (notice the slash at the end).

change ACL policy to XACML

I'm trying to test a security method in MapReduce and i'm wondering if my approach makes sens.
I would like to transform access control list policy which exist in MapReduce to an XACML policy to do that i take the file where the ACL is defined and copy the name and value of each propriety then put it in a policy following the XACML format.
this is the ACL definition
<property>
<name>mapreduce.job.acl-modify-job</name>
<value>user </value>
</property>
<property>
<name>mapreduce.job.acl-view-job</name>
<value>user </value>
</property>
this is the policy in XACML
<Policy PolicyId="GeneratedPolicy" RuleCombiningAlgId="urn:oasis:names:tc:xacml:1.0:rule-combining-algorithm:ordered-permit-overrides">
<Target>
<Subjects>
<Subject>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">
<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">user </AttributeValue>
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:1.0:subject:subject-id" DataType="http://www.w3.org/2001/XMLSchema#string"/>
</SubjectMatch>
</Subject>
</Subjects>
<Resources>
</AnyResource>
</Resources>
</Target>
<Rule RuleId="rule1" Effect="Permit">
<Target>
<Actions>
<Action>
<ActionMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">
<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">mapreduce.job.acl-view-job</AttributeValue>
<ActionAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:1.0:action:action-id" DataType="http://www.w3.org/2001/XMLSchema#string"/>
</ActionMatch>
</Action>
</Actions>
</Target>
</Rule>
<Rule RuleId="rule2" Effect="Deny"/>
</Policy>
is this considred correct?
A couple of comments on your policy:
it uses XACML 2.0. That's old! Switch to XACML 3.0
You have a whitespace in the value <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">user </AttributeValue>. Get rid of it (unless you really mean to test on 'user '.
Your policy contains two rules:
the first one grants access if urn:oasis:names:tc:xacml:1.0:action:action-id == mapreduce.job.acl-view-job
the second one always denies access. I assume the intent is to deny access if no action matched. That's fine. I often call that a "catch-all" or safety-harness. There is another way of achieving this by using a combining algorithm on the policy called deny-unless-permit. If none of the rules apply, then the policy will yield deny. This only exists in XACML 3.0
Your policy uses a combining algorithm called permit-overrides (urn:oasis:names:tc:xacml:1.0:rule-combining-algorithm:ordered-permit-overrides). Generally I avoid using it because it means that in the case of a Deny and a Permit, Permit wins. That's too permissive to my liking. Use first-applicable (urn:oasis:names:tc:xacml:1.0:rule-combining-algorithm:first-applicable) instead. You can read up on combining algorithms here.
Ultimately, to make your policy scale, you may want to externalize the list of users rather than have a value for each user inside the policy. So rather than comparing your username to Alice or Bob or Carol, you would compare to an attribute called allowedUsers which you'd maintain inside a database for instance.
Another tip: could you make your policy easier to understand and more scalable if you split the value mapreduce.job.acl-view-job into the different parts (appName="mapreduce"; objectType="job"; action="job"). That would let you have policies about viewing, editing, deleting jobs more easily.

Multiple lifecycles s3cmd

I want to have multiple lifecycles for many folders in my bucket.
This seems easy if I use web interface but this should be an automated process so, at least in my case, it must use s3cmd.
It works fine when I use:
s3cmd expire ...
But, somehow, everytime I run this my last lifecycle gets overwrited.
There's an issue on github:
https://github.com/s3tools/s3cmd/issues/863
My question is: is there another way?
You made me notice I had the exact same problem as you. Another way to access the expire rules with s3cmd is to show the lifecycle configuration of the bucket.
s3cmd getlifecycle s3://bucketname
This way you get some xml formatted text:
<?xml version="1.0" ?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Rule>
<ID>RULEIDENTIFIER</ID>
<Prefix>PREFIX</Prefix>
<Status>Enabled</Status>
<Expiration>
<Days>NUMBEROFDAYS</Days>
</Expiration>
</Rule>
<Rule>
<ID>RULEIDENTIFIER2</ID>
<Prefix>PREFIX2</Prefix>
<Status>Enabled</Status>
<Expiration>
<Days>NUMBEROFDAYS2</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
If you put that text in a file, changing the appropriate fields (put identifiers of your choice, set the prefixes you want and the number of days until expiration), you now can use the following command (changing FILE for the path where you put the rules):
s3cmd setlifecycle FILE s3://bucketname
That should work (in my case, now I see several rules when I execute the getlifecycle command, although I do not know yet if the objects actually expire or not).

Amazon S3 Object Lifecycle Management via header

I've been searching for an answer to this question for quite some time but apparently I'm missing something.
I use s3cmd heavily to automate document uploads to AWS S3, via script. One of the parameters that can be used in s3cmd is --add-header, which I assume allows for lifecycle rules to be added.
My objective is to add this parameters and specify a +X (where X is days) to the upload. In the event of ... --add-header=...1 ... the lifecyle rule would delete this file after 24h.
I know this can be easily done via the console, but I would like to have a more detailed control over individual files/scripts.
I've read the parameters that can be passed to S3 via s3cmd, but I somehow can't understand how to put all of those together to get the intended result.
Thank you very much for any help or assistance!
The S3 API itself does not implement support for any request header that triggers lifecycle management at the object level.
The --add-header option for s3cmd can add headers that S3 understands, such as Content-Type, but there is no lifecycle header you can send using any tool.
You might be thinking of this:
If you make a GET or a HEAD request on an object that has been scheduled for expiration, the response will include an x-amz-expiration header that includes this expiration date and the corresponding rule Id
https://aws.amazon.com/blogs/aws/amazon-s3-object-expiration/
This is a reaponse header, and is read-only.