I have a website which gets backup from different social media services and then stores the data on server and then that is displayed on my website. content includes, videos, images, and text data.
Currently i am using an EC2 instance with RDS and EBS. Data is stored in EBS Volumes, But as the amount of the data is big enough more than 1 TB and that is increasing. Every time my EBS volume gets filled i attach another volume.
Then i added S3 to my Setup. Cron jobs runs and stores data on S3 and the EC2 instance displays data from the S3. I am using PHP SDK for this purpose.
The problem which i am facing is that the S3 is very slow in my current setup.
Please suggest whether my setup is good or i need some change in my setup and the other way how can i speedup S3. or i should opt some other way to my setup.
EC2 instance is large reserved instance running CentOS.
I have listened some about the S3fs that mount S3 bucket to Ec2 as a volume. Is this a good choice, as when i mounted S3 Bucket to Ec2 instance the transfer rate was very slow.
I am new to the AWS. My users does not access files directly from S3, but they access through my website which is running on EC2 Instance.
RDS is a good choice for storing metadata such as tags, comments and other relevant information about your multimedia files. S3 is good for storing static content such as Video, Audio and Pictures. I think your approach with RDS and S3 is good enough.
EBS backed instances are good for persistence. If you store your metadata on RDS and static content on S3, the only reason why you should use EBS backed EC2 instances is that you have some configuration files which are unversioned right now. If that's not the case, assuming that your configuration is checked into version control and can be pulled on-demand for a fresh instance every time, then you might want to ditch EBS volumes in favor of ephemeral storage. That may give you some performance boost, nothing significant though.
Regarding your concern with S3's latency, yes, S3 is slow. While all your writes may happen directly to S3, I would highly recommend that you set up Amazon CloudFront for your S3 buckets and let your website consume multimedia content from the CloudFront. CloudFront is a Content Delivery Network (CDN) which works with disk volumes (EBS backed or ephemeral) as well as with S3. Setting it up would take not more than a few minutes. CloudFront also supports streaming media files over RTMP. You may need a library like GPAC for hinting multimedia files to make them streamable if not being done already. You might then want to consider creating one distribution for Video/Audio files for streaming and another distribution for Images, Javascript, Stylesheets and other text files.
Hope this helps.
For faster getting and uploading files from Amazon S3 I use batch() found here.
Also you can use cloudfront for faster getting files. I think 9gag uses cloudfront also..
Related
Is it more efficient to store icons and small decorative images (eg arrow.jpg or Submit.jpg etc) on S3 vs on the hosting server's local disk ?
images/arrow.jpg (relative url to the hosting server)
vs
https://aws.com/images/arrow.jpg
I am not concerned about the storage/space but the number of hits to the server because every page may contain 50 to 100 such icons and small images. I am not using CDN
When you say hosting server, you are ultimately going to store your volume on EBS, EFS ,( not instance store because it is not persistent).
There are many factors to distinguish based on efficiency but ultimately it will boil down to COST.
S3 is way cheaper, highly available, and more durable than EBS.
EBS is AZ bound but s3 is not. if you want to access to ebs in another az you will need to copy and snapshot and restore to another AZ.
with s3 you can utilize intelligent teiring with lifecycle policies to move your files automatically between different storage tiers for cost optimization.
however, if you want to fast access and low latency then EBS and EFS is a better option without a doubt.
s3 gives you unlimited storage but individual element size is limited to 5TB
For performance go for ebs , for cost go for s3.
NOTE : if you want to access your data on ebs or efs you would need to spin up an ec2 instance which also add cost
FAq for comparison
s3 v/s ebs v/s efs
This post on stack is also a good read AWS EFS vs EBS vs S3 (differences & when to use?)
Basically, I'm trying to figure out what design to use. I'm collecting 1TB of data per month using an EC2 instance mounted to EBS. I created another Elastic Beanstalk instance serving as the website, and I wanted to figure out if it's better to access this EC2 instance's data through EFS or S3. Also, the amount of data that the elastic beanstalk webpage would access maybe be 10 - 50GB occasionally from a web application.
Basically, it depends upon the type of data you want to store.
EFS - Amazon EFS is automatically scalable - that means that your running applications won't have any problems if the workload suddenly becomes higher - the storage will scale itself automatically. If the workload decreases - the storage will scale down, so you won't pay anything for the storage you don't use. Good for shareable applications and workloads , Faster than S3
S3 - Amazon S3 also allows hosting static website content. provides simple object storage, useful for hosting website images and videos, data analytics, and both mobile and web applications. Object storage manages data as objects, meaning all data types are stored in their native formats.
So I would suggest, as you are collecting 1TB of data and webpage would access 10 - 50GB occasionally, so S3 will make your process (API's) slow and its good the amount of disk space you use, have to pay for that only.
And as you are talking about 1Tb, if data goes beyond that, the disk will be scalable and the application will be highly available.
Which one is better for storing pictures and videos uploaded by user ?
Amazon s3 or Filesystem EC2 ?
While opinion-based questions are discouraged on StackOverflow, and answers always depend upon the particular situation, it is highly likely that Amazon S3 is your better choice.
You didn't say whether only wish to store the data, or whether you also wish to serve the data out to users. I'll assume both.
Benefits of using Amazon S3 to store static assets such as pictures and videos:
S3 is pay-as-you-go (only pay for the storage consumed, with different options depending upon how often/fast you wish to retrieve the objects)
S3 is highly available: You don't need to run any servers
S3 is highly durable: Your data is duplicated across three data centres, so it is more resilient to failure
S3 is highly scalable: It can handle massive volumes of requests. If you served content from Amazon EC2, you'd have to scale-out to meet requests
S3 has in-built security at the object, bucket and user level.
Basically, Amazon S3 is a fully-managed storage service that can serve static assets out to the Internet.
If you were to store data on an Amazon EC2 instance, and serve the content from the EC2 instance:
You would need to pre-provision storage using Amazon EBS volumes (and you pay for the entire volume even if it isn't all used)
You would need to Snapshot the EBS volumes to improve durability (EBS Snapshots are stored in Amazon S3, replicated between data centres)
You would need to scale your EC2 instances (make them bigger, or add more) to handle the workload
You would need to replicate data between instances if you are running multiple EC2 instances to meet request volumes
You would need to install and configure the software on the EC2 instance(s) to manage security, content serving, monitoring, etc.
The only benefit of storing this static data directly on an Amazon EC2 instance rather than Amazon S3 is that it is immediately accessible to software running on the instance. This makes the code simpler and access faster.
There is also the option of using Amazon Elastic File System (EFS), which is NAS-like storage. You can mount an EFS volume simultaneously on multiple EC2 instances. Data is replicated between multiple Availability Zones. It is charged on a pay-as-you-go basis. However, it is only the storage layer - you'd still need to use Amazon EC2 instance(s) to serve the content to the Internet.
I am new to cloud environment. I do understand the definition and storage types EBS and S3. I wanted to understand the application of EBS as compared to S3.
I do understand EBS looks like a device for heavy though put operations. I cannot find any application where this can be used in comparison to S3. I could think of putting server logs on EBS on magnetic storage, as one EBS can be attached to one instance.
S3 you can use the scaling property to add some heavy data and expand in real time. We can deploy our slef managed dbs on this service.
Please correct me if I am wrong. Please help me understand what is best suited for what and application of them in comparison with one another.
As you stated, they are primarily different types of storage:
Amazon Elastic Block Store (EBS) is a persistent disk-storage service, which provides storage volumes to a virtual machine (similar to VMDK files in VMWare)
Amazon Simple Storage Service (S3) is an object store system that stores files as objects and optionally makes them available across the Internet.
So, how do people choose which to use? It's quite simple... If they need a volume mounted on an Amazon EC2 instance, they need to use Amazon EBS. It gives them a C:, D: drive, etc in Windows and a mountable volume in Linux. Computers traditionally expect to have locally-attached disk storage. Put simply: If the operating system or an application running on an Amazon EC2 instance wants to store data locally, it will use EBS.
EBS Volumes are actually stored on two physical devices in case of failure, but an EBS volume appears as a single volume. The volume size must be selected when the volume is created. The volume exists in a single Availability Zone and can only be attached to EC2 instances in the same Availability Zone. EBS Volumes persist even when the attached EC2 instance is Stopped; when the instance is Started again, the disk remains attached and all data has been presrved.
Amazon S3, however, is something quite different. It is a storage service that allows files to be uploaded/downloaded (PutObject, GetObject) and files are replicated across at least three data centers. Files can optionally be accessed via the Internet via HTTP/HTTPS without requiring a web server. There are no limits on the amount of data that can be stored. Access can be granted per-object, per-bucket via a Bucket Policy, or via IAM Users and Groups.
Amazon S3 is a good option when data needs to be shared (both locally and across the Internet), retained for long periods, backed-up (or even for storing backups) and made accessible to other systems. However, applications need to specifically coded to use Amazon S3 and many traditional application expect to store data on a local drive rather than on a separate storage service.
While Amazon S3 has many benefits, there are still situations where Amazon EBS is a better storage choice:
When using applications that expect to store data locally
For storing temporary files
When applications want to partially update files, because the smallest storage unit in S3 is a file and updating a part of a file requires re-uploading the whole file
For very High-IO situations, such as databases (EBS Provisioned IOPS can provide volumes up to 20,000 IOPS)
For creating volume snapshots as backups
For creating Amazon Machine Images (AMIs) that can be used to boot EC2 instances
Bottom line: They are primarily different types of storage and each have their own usage sweet-spot, just like a Database is a good form of storage depending upon the situation.
I am using solr on amazon ec2, and I am hoping to configure the solr instance so that it automatically stores data in amazon s3 instead of anywhere on the server. However I couldn't find any useful information on how to implement this. Does anyone know how? If this can't be achieved using amazon s3, what cloud storage do you recommend?
Thanks in advance.
You will want to store the Solr indexes on an EBS volume, which you can attach to the server. S3 is meant for serving files directly out to the internet (such as images and css files), or for general file storage (such as backups.) It is not meant to be used as a mounted disk for a database.
Solr likes very high IO, so the SSD backed EBS volumes are great for this. You can also make snapshots of an EBS volume to backup its data.
If you setup Solr slaves, you can also get away with using the server's ephemeral storage. This is a large partition that comes with most instance types. It is volatile storage, meaning all of the data is lost if the server is shutdown. However, it is free and quite fast. It is perfect for a slave which replicates its data from a master Solr instance backed by EBS.