I have a Question Bank application with +10k questions.
The questions are a combination of text (stored within a database) and images (hosted on Amazon S3). The images are embedded as links through a admin panel.
For security purposes, I want to figure out if I can back-up the entire S3 bucket and later restore it with the same identical links.
Any experience with this?
I presume that your users are accessing the images via an Amazon S3 URL.
The URL consists of the bucket name and the Key (filename) of the object. To provide a file with the "same identical link" simply means putting it in the same bucket and giving it the same Key.
Please note that data stored in Amazon S3 is replicated between multiple Availability Zones. Therefore, it is typically not necessary to backup data for resilience purposes. However, it can be wise to backup in case somebody accidentally or maliciously deletes the objects. (Turning on Versioning could also be a way to safeguard against such cases.)
If you do wish to "backup an entire bucket", one option is to use Same-Region replication, which will automatically replicate objects between buckets. It requires that Versioning is activated.
Related
My Gcp project name is Mobisium. I found out that there are 2 bucket auto created in the storage browser named mobisum-bucket and mobisium-daisy-bkt-asia.I have never used bucket in the project. mobisium-bucket bucket is empty and the mobisium-daisy-bkt-asia contains one file called daisy.log. Both buckets are Location Type: Multi-region. I read in a stack overflow question's comments that If bucket are created automatically multi-region, you will be charged.
My questions is:
Am I being charged for this buckets.
Are these buckets required, If not should I delete them.
According to documentation you are charged for:
data storage
network
operations
So you will be charged for them if they contain data. You can also view all charges assosciated with your billing account
This buckets names suggests that some services created them - the buckets name is hard to figure out which services. Sometimes when you turn on the services, they create buckets for themselves.
Creating new project, there shouldn't be any buckets, so if this is really new project (created from scratch) you could try to delete them.
If this will be repeated for another project (nor only for this one) it will be good idea to contact support, because this is not normal action.
I want to set up a bucket for testing, where I pull in a previous month of data from an S3 bucket in a different AWS account, and then continue to consume data from that S3 bucket as it lands.
All the documentation I am seeing talks about full copies or syncs between buckets, and there is far too much data in the initial bucket for that. I need to be able to just pull fresh data in as it lands. I'm not sure what the best method for this would be, as it would be likely a several thousand files a day.
You should look at S3 replication for your use case to copy data from primary bucket to secondary bucket. S3 replication will copy only the new data from the primary bucket to secondary bucket.
However the secondary bucket will have to be in a different AWS region. (Hopefully thats not the limiting case for you)
check this link for setup (https://medium.com/#chrisjerry9618/s3-cross-region-replication-2e20f2dc86e0)
Upvote this or mark as answer for others to get help
Is it better to have multiple s3 buckets per category of uploads or one bucket with sub folders OR a linked s3 bucket? I know for sure there will be more user-images than there will be profille-pics and that there is a 5TB limit per bucket and 100 buckets per account. I'm doing this using aws boto library and https://github.com/amol-/depot
Which is the structure my folders in which of the following manner?
/app_bucket
/profile-pic-folder
/user-images-folder
OR
profile-pic-bucket
user-images-bucket
OR
/app_bucket_1
/app_bucket_2
The last one implies that its really a 10TB bucket where a new bucket is created when the files within bucket_1 exceeds 5TB. But all uploads will be read as if in one bucket. Or is there a better way of doing what I'm trying to do? Many thanks!
I'm not sure if this is correct... 100 buckets per account?
https://www.reddit.com/r/aws/comments/28vbjs/requesting_increase_in_number_of_s3_buckets/
Yes, there is actually a 100 bucket limit per account. I asked the reason for that to an architect in an AWS event. He said this is to avoid people hosting unlimited static websites on S3 as they think this may be abused. But you can apply for an increase.
By default, you can create up to 100 buckets in each of your AWS
accounts. If you need additional buckets, you can increase your bucket
limit by submitting a service limit increase.
Source: http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html
Also, please note that there are actually no folders in S3, just a flat file structure:
Amazon S3 has a flat structure with no hierarchy like you would see in
a typical file system. However, for the sake of organizational
simplicity, the Amazon S3 console supports the folder concept as a
means of grouping objects. Amazon S3 does this by using key name
prefixes for objects.
Source: http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
Finally, the 5TB limit only applies to a single object. There is no limit on the number of objects or total size of the bucket.
Q: How much data can I store?
The total volume of data and number of objects you can store are
unlimited.
Source: https://aws.amazon.com/s3/faqs/
Also the documentation states there is no performance difference between using a single bucket or multiple buckets so I guess both option 1 and 2 would be suitable for you.
Hope this helps.
Simpler Permission with Multiple Buckets
If the images are used in different use cases, using multiple buckets will simplify the permissions model, since you can give clients/users bucket level permissions instead of directory level permissions.
2-way doors and migrations
On a similar note, using 2 buckets is more flexible down the road.
1 to 2:
If you switch from 1 bucket to 2, you now have to move all clients to the new set-up. You will need to update permissions for all clients, which can require IAM policy changes for both you and the client. Then you can move your clients over by releasing a new client library during the transition period.
2 to 1:
If you switch from 2 buckets to 1 bucket, your clients will already have access to the 1 bucket. All you need to do is update the client library and move your clients onto it during the transition period.
*If you don't have a client library than code changes are required in both cases for the clients.
Using S3 cross-region replication, if a user downloads http://mybucket.s3.amazonaws.com/myobject , will it automatically download from the closest region like cloudfront? So no need to specify the region in the URL like http://mybucket.s3-[region].amazonaws.com/myobject ?
http://aws.amazon.com/about-aws/whats-new/2015/03/amazon-s3-introduces-cross-region-replication/
Bucket names are global, and cross-region replication involves copying it to a different bucket.
In other words, example.us-west-1 and example.us-east-1 is not valid, as there can only be one bucket named 'example'.
That's implied in the announcement post- Mr. Barr is using buckets named jbarr and jbarr-replication.
Using S3 cross-Region replication will put your object into two (or more) buckets in two different Regions.
If you want a single access point that will choose the closest available bucket then you want to use Multi-Region Access Points (MRAP)
MRAP makes use of Global Accelerator and puts bucket requests onto the AWS backbone at the closest edge location, which provides faster, more reliable connection to the actual bucket. Global Accelerator also chooses the closest available bucket. If a bucket is not available, it will serve the request from the other bucket providing automatic failover
You can also configure it in an active/passive configuration, always serving from one bucket until you initiate a failover
From the MRAP page on AWS console it even shows you a graphical representation of your replication rules
s3 is global service, no need specify the region. The S3 name has to be unique globally.
when you create s3, you need specify the region, however it doesn't mean you need put the region name when you access it. To speed up the access speed from other region, there are several options like
-- Amazon S3 Transfer Acceleration with same bucket name.
-- Or use set up another bucket with different name in different region and enable cross region replication. Create an origin group with two origins for cloudfront.
I'd like to set up a separate s3 bucket folder for each of my mobile app users for them to store their files. However, I also want to set up size limits so that they don't use up too much storage. Additionally, if they do go over the limit I'd like to offer them increased space if they sign up for a premium service.
Is there a way I can set folder file size limits through s3 configuration or api? If not would I have to use the apis somehow to calculate folder size on every upload? I know that there is the devpay feature in Amazon but it might be a hassle for users to sign up with Amazon if they want to just use small amount of free space.
There does not appear to be a way to do this, probably at least in part because there is actually no such thing as "folders" in S3. There is only the appearance of folders.
Amazon S3 does not have concept of a folder, there are only buckets and objects. The Amazon S3 console supports the folder concept using the object key name prefixes.
— http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
All of the keys in an S3 bucket are actually in a flat namespace, with the / delimiter used as desired to conceptually divide objects into logical groupings that look like folders, but it's only a convenient illusion. It seems impossible that S3 would have a concept of the size of a folder, when it has no actual concept of "folders" at all.
If you don't maintain an authoritative database of what's been stored by clients (which suggests that all uploads should pass through an app server rather than going directly to S3, which is the the only approach that makes sense to me at all) then your only alternative is to poll S3 to discover what's there. An imperfect shortcut would be for your application to read the S3 bucket logs to discover what had been uploaded, but that is only provided on a best-effort basis. It should be reliable but is not guaranteed to be perfect.
This service provides a best effort attempt to log all access of objects within a bucket. Please note that it is possible that the actual usage report at the end of a month will slightly vary.
Your other option is to develop your own service that sits between users and Amazon S3, that monitors all requests to your buckets/objects.
— http://aws.amazon.com/articles/1109#13
Again, having your app server mediate all requests seems to be the logical approach, and would also allow you to detect immediately (as opposed to "discover later") that a user had exceeded a threshold.
I would maintain a seperate database in the cloud to hold each users total hdd usage count. Its easy to manage the count via S3 Object Lifecycle Events which could easily trigger a Lambda which in turn writes to a DB.