Combining AWS ECR lifecycle rules using "and" - amazon-ecr

I would like to set up an AWS ECR lifecycle policy such that an image is expired iff it is older than 90 days and there are at least 10 more recent images. In other words, I want to keep all images that are newer than 90 days, and I want to keep at least the newest 10 images regardless of how old they are.
If I am reading the documentation correctly, this is not possible:
An image is expired by exactly one or zero rules.
and a single rule cannot specify both sinceImagePushed and imageCountMoreThan.
Is this correct? Is there any workaround?

Related

Amazon AWS S3 Lifecycle rule exception?

I've got a few s3 buckets that I'm using as a storage backend for Duplicacy which stores its metadata in chunks right alongside the backup data.
I current have a lifecycle rule to move all objects with the prefix "chunks/" to Glacier Deep Archive. The problem is, I then can't list the contents of a backup revision because some of those chunks have backup metadata in them that's needed to list, initiate a restore, etc...
The question is, is there a method where I could apply some tag to certain objects such then, even though they are in the "chunks/" folder, the are exempt from the lifecycle rule?
Looking for solution to basically the same problem.
I've seen this which seems consistent with what I'm finding which is it can't be done in a straightforward fashion. This is a few years old, I'll be disappointed if this is the answer.
Expected to see the exclude use case in these examples but no luck.

AWS Lambda vs AWS step function

I am designing an application for which input is a large text file (size ranges from 1-30 GB) uploaded to S3 bucket every 15 min. It splits the file into n no of small ones and copy these files to 3 different S3 buckets in 3 different aws regions. Then 3 loader applications read these n files from respective s3 buckets and load the data into respective aerospike cluster.
I am thinking to use AWS lambda function to split the file as well as to load the data. I recently came across AWS step function which can also serve the purpose based on what I read. I am not sure which one to go with and which will be cheaper in terms of pricing. Any help is appreciated.
Thanks in advance!
Lambda and Step functions are like floors and steps to each floor. You cannot replace one with another.
Lambda is computing, steps functions take them to the desired step.
Youtube video explains very well: https://www.youtube.com/watch?v=Dh7h3lkpeP4
To the analogy again, you can have multiple computes (lambda) in a single floor before you pass it on the next floor.
One of the example is as shown below.
Usecase: https://john.soban.ski/transcribe-customer-service-voicemails-and-alert-on-keywords.html
Hope it helps.
Step functions are excellent at coordinating workflows that involve multiple predefined steps. It can do parallel tasks and error handling well. It mainly uses Lambda functions to perform each task.
Based on your use-case, step functions sound like a good fit. As far as pricing, it adds a very small additional charge on top of Lambdas. Based on your description, I doubt you'd even notice the additional cost. You'd need to evaluate that based on the number of "state transitions" you would be using. Of course, you'll also have to pay for your Lambda invocations.

Is there an alternative way to know when DynamoDb table descreased?

I have a weird problem with one of my tables in DynamoDb. When I make a petition to describe it I find that it was decreased three times today, while in the AWS console I can see only one scale down, that coincides with the one returned by LastDecreaseDateTime when performing a describe_table(TableName="tableName") on boto3 library.
Is there any other way to check when were the other decresing actions executed?
Also, is it possible that DynamoDb is fooling me someway? I am a little bit lost with this, because all what I can see from the metrics tab in the console is that it was just decreased once. I have other tables configured exactly the same way and they work like a charm.
CloudTrail will record all UpdateTable api calls. Enable CloudTrail, then when this happens again you will be able to see all the api calls.
If you have scaled down multiple times with in 5 minutes you will not see that reflected in the Provisioned Capacity metrics since they have a 5 minute resolution,.

S3 life-cycle is not working

S3 life cycle is not working, I have configured this life-cycle policy 2 days back but still, my objects are showing in S3-RRS
Any help would be appreciated.
The process that archives files to Glacier can take up to 48 hours to take effect. However, you will receive the billing benefit immediately.
So, check again after 48 hours to confirm whether it is working as expected.
By the way, RRS (Reduced Redundancy Storage) stores data in only two data centers instead of the normal three data centers. Historically it was a lower cost, but these days it is not cheaper, so should be avoided. There is no benefit to using RRS (and it is possibly more expensive).

Saving ALS latent factors in Spark ML to S3 taking too long

I am using a Python script to compute users and items latent factors using Spark ML's ALS routine as described here.
After computing latent factors, I am trying to save those to S3 using the following:
model = als.fit(ratings)
# save items latent factors
model.itemFactors.rdd.saveAsTextFile(s3path_items)
# save users latent factors
model.userFactors.rdd.saveAsTextFile(s3path_users)
There are around 150 million users. LFA is computed quickly (~15 min) but writing out the latent factors to S3 takes almost 5 hours. So clearly, something is not right. Could you please help identify the problem?
I am using 100 users blocks and 100 items blocks in computing LFA using ALS - in case this info might be relevant.
Using 100 r3.8xlarge machines for the job.
Is this EMR, the official ASF Spark version, or something else?
One issue here is that the S3 clients have tended to buffer everything locally onto disk, then only start the upload afterwards.
If it's ASF code, you could make sure you are using Hadoop 2.7.x, use s3a:// as the output schema, and play with the fast output stream options, which can do incremental writes as things get generated. It's a bit brittle in 2.7, will be way better in 2.7.
If you are on EMR, you are on your own there.
Another possible cause is that S3 throttles clients generating lots of HTTPS requests to a particular shard of S3, which means: specific bits of an S3 bucket, with the first 5-8 characters apparently determining the shard. If you can use very unique names there, then you may get throttled less.