I've got a amazon S3 server that is connected to Simple DB.
in this server I've got different buckets, now, since i'm limited in space i need to delete some content from this buckets from time to time.
This deletion needs to be done for specific buckets and based to date (nothing older then a week), of course the deletion needs to be done in both of the server and the DB,
and to run as a scheduled task in the server (the server is Windows server 2008 + SQL 2008 R2)
Can anyone suggest a script (Any language will be ok) for doing this task ?
For S3 objects, you can use the S3 lifecycle feature to "expire" (delete) the objects when it crosses a specific age: Object Expiration - Amazon Simple Storage Service
I am not aware of such a convenient way to do this on Simple DB. You might have to write a periodic script using Simple DB API do delete stuff on interval.
Related
According to this: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/User.SQLServer.Options.S3-integration.html we should be able to write a file from RDS to S3. But when we try it fails with
blocked because RDS is a managed service with SLA and guard rails to help deliver it.
Anyone know a way around this?
Seems like running your task will interfere with SLA's, so it would be directly related to the Note mentioned in the link you shared:
Note: S3 integration tasks share the same queue as native backup and
restore tasks. At maximum, you can have only two tasks in progress at
any time in this queue. Therefore, two running native backup and
restore tasks will block any S3 integration tasks.
There must be some automated backup activity going on when you run your task.
I'm working on a project to manage documents (eg: create, read, maintain different versions etc...) and my plan is to use the following AWS architecture.
When a document is created/updated it will be saved on to a version enabled s3 bucket via API Gateway S3 proxy. S3 put event will trigger a lambda to get latest version and all version ids and save it to DynamoDB. Once it is saved on a DynamoDB table, it will be indexed in Elasticsearch via DynamoDB stream.
My Plan is to use Elasticsearch for all search queries. And I will load the latest documents from DynamoDB. Since each record has S3 version ids i can query old versions from S3 as well.
Since my architecture relies much on eventual consistency i.e. (S3 to DynamoDB and DynamoDB to Elastic Search) I'm worried that I would not get the latest document data either when I query the Elasticsearch or query DynamoDB after I create a document.
Any suggestions for improvements will be much appreciated.
Thanks!
As you said your application architecture has multiple points where eventual consistency is used.
If your application business case absolutely requires that when you query data, you get the absolute latest version, then your architecture choices are bad and you should, for example, consider using a RDS persistence instead.
If not, then you just design the rest of your system keeping in mind that getting a completed PUT does not guarantee that queries immediately return the data. Giving instructions on how to do this vastly depends on your application and cannot feasibly be generalized.
Since you use a dynamodb stream, your dynamodb insert will reach your elastic search server but with a delay. In case of write failure it's up to the client to issue a retry.
Also you have to keep in mind the time it takes to trigger a dynamodb stream and the time it takes for the elastic search indexing (Plus the s3 event).
So your problem has to do more with the time it takes to reach the elastic search server.
If you want something more consistent that depicts the current status (since that is the problem you will end up with) without any delays you need to change the tools.
Simple problem, i have got a google bucket which gets content 3 times a day from an external provider. I want to fetch this content as soon as it arrives and push it onto a S3 bucket. I have been able to achieve this via running my python scripts as a cron job. But I have to provide high availability and such if i follow this route.
My idea was to set this up in aws lambda, so I don't have to sweat the infrastructure limitations. Any pointers on this marriage between gs and lambda. I am not a native Node speaker so any pointers will be really helpful.
GCS can send object notifications when an object is created/updated. You can catch the notifications (which are HTTP post requests) by a simple web app hosted on GAE, and then handle the file transfer to S3. Highly available, event driven solution.
I'm new to AWS and have a feasibility question for a file management system I'm trying to build. I would like to set up a system where people will use the Amazon S3 browser and drop either a csv or excel file into their specific bucket. Then I would like to automate the process of taking that csv/excel file and inserting that into a table within RDS. Now this is assuming that the table has already been built and those excel/csv file will always be formatted the same and will be in the same exact place every single time. Is it possible to automate this process or at least get it to point where very minimal human interference is needed. I'm new to AWS so I'm not exactly sure of the limits of S3 to RDS. Thank you in advance.
It's definitely possible. AWS supports notifications from S3 to SNS, which can be forwarded automatically to SQS: http://aws.amazon.com/blogs/aws/s3-event-notification/
S3 can also send notifications to AWS Lambda to run your own code directly.
Trying to sync a large (millions of files) S3 bucket from cloud to local storage seems to be troublesome process for most S3 tools, as virtually everything I've seen so far uses GET Bucket operation, patiently getting the whole list of files in bucket, then diffing it against a list local of files, then performing the actual file transfer.
This looks extremely unoptimal. For example, if one could list files in a bucket that were created / changed since the given date, this could be done quickly, as list of files to be transferred would include just a handful, not millions.
However, given that answer to this question is still true, it's not possible to do so in S3 API.
Are there any other approaches to do periodic incremental backups of a given large S3 bucket?
On AWS S3 you can configure event notifications (Ex: s3:ObjectCreated:*). To request notification when an object is created. It supports SNS, SQS and Lambda services. So you can have an application that listens on the event and updates the statistics. You may also want to ad timestamp as part of the statistic. Then just "query" the result for a certain period of time and you will get your delta.