Reduce data transfer cost in aws - amazon-web-services

I have aws setup for my website, What I am doing is when a user uploads an image , we are saving it to a folder on ec2 and then transferring it to s3, post which we are fetching images from s3.
I have also stored all the js and css on ec2 and fetching all from ec2 itself.
My data transfer cost is very high now, Please suggest if storing images on ec2 is costing me more ? should I directly store it on s3?

Always think of using CDN or dedicated web hosting services if your web traffics is high. EC2 are only recommended for back-office processing usage than serving web page. There is no free lunch in AWS if you are not careful. You must always check AWS bandwidth pricing before you want to host anything inside AWS. In certain extend, the data transfer costs can be many time more expensive than the EC2 server and (s3, EBS) storage.
AWS only give EC2 1 GB free data transfer to the Internet. After that, it is $0.09/GB. If you open your web server to everyone and 20 bots go download 100GB data daily from your EC2 web server, you will get a hefty bill, i.e. (100GB x $0.09 x 30days = $270 ) - $0.09 (Free 1GB) = $269.01
Also remember, S3 data transfer out to internet is NOT FREE. You only get free unlimited data transfer from S3 to your EC2/lambda within the same region. If you signed the S3 file as a URL to let people download the file, you get billed by "internet OUT" bandwidth charge.

Data Transfer charges only apply to data going from an AWS Region to the Internet. There is no charge for uploading to AWS, nor for moving data between S3 and EC2 in the same region.
If your data transfer costs are high, it suggests that you are serving a lot of traffic to the Internet, either from EC2 or S3.

Related

What is the cheapest way to allow others to download a dataset I have?

I have some datasets (can go up to 10 GBs (zipped) altogether possibly) for my Machine Learning applications
In order to expose these datasets to others, I believe I have to host a server and let others to download over the network.
what is the cheapest server I can use for this? (I checked AWS free tiers, can these be used?)
Do I need to write up a web server? is there a premade tool that I can use for my use case?
You haven't indicated how much data will be downloaded (GB/month) and that's important because you pay for data transfer out to the internet (about $0.09 per GB) beyond an initial free amount (1 GB/month, I believe, but check if free tier offers more), and that's relevant to both S3 and EC2.
That said, I'd consider a few options.
Storing the files in S3 and serving them from S3 via CloudFront may be cheaper than running a server 24x7 to host and serve the files.
A small EC2 server that fits into the free tier usage plan, running a web or FTP server, serving up your files.
Similar to #1 but you can also configure requester pays for S3 downloads. This option requires your downloaders to have AWS credentials and for you to manage their access. May not be feasible in your case.
Create an EBS volume containing your data, take a snapshot of that volume, and share the snapshot with other AWS accounts, then shut down your EC2 instance. This option requires your users to be AWS account holders and that they share their AWS account numbers with you. May not be feasible in your case.
AWS SFTP serving up data stored in S3.

S3 AWS download Speed

I'm having troubles with downloading files from S3, If I download a file like 200MB, and then i download another files, the download speed it's just really slow like (40KB/s) as you can see in the follow pic:
And when the first download finish, the second continues with the 40KB/s...
Any ideas about that?
Amazon S3 has huge bandwidth.
If you are downloading from Amazon S3 to your own computer (outside of AWS), then the only limitations that would impact you are your own Internet bandwidth, and any speed limitations imposed within your own network.
I will presume that you are downloading an object from Amazon S3 to an Amazon EC2 instance in the same region as the S3 bucket.
In this scenario, the only bandwidth limitation is the Network Performance on the Amazon EC2 instance. Basically, the bigger the instance, the more bandwidth is available.
In the Launch screen, a t2.large is listed as Low to Moderate Network Performance. This is reasonably good, but not as good as larger instance types.
See:
Amazon EC2 Instance Configuration - Amazon Elastic Compute Cloud
EC2 Network Performance Cheat Sheet | cloudonaut
It might also be a result of the software you are using to download the files and how it multi-tasks and shares network bandwidth between the downloads.
If you're connected to the Internet via WiFi router from your laptop, try to use cable connection instead.

Elastic Beanstalk with EFS or S3

Basically, I'm trying to figure out what design to use. I'm collecting 1TB of data per month using an EC2 instance mounted to EBS. I created another Elastic Beanstalk instance serving as the website, and I wanted to figure out if it's better to access this EC2 instance's data through EFS or S3. Also, the amount of data that the elastic beanstalk webpage would access maybe be 10 - 50GB occasionally from a web application.
Basically, it depends upon the type of data you want to store.
EFS - Amazon EFS is automatically scalable - that means that your running applications won't have any problems if the workload suddenly becomes higher - the storage will scale itself automatically. If the workload decreases - the storage will scale down, so you won't pay anything for the storage you don't use. Good for shareable applications and workloads , Faster than S3
S3 - Amazon S3 also allows hosting static website content. provides simple object storage, useful for hosting website images and videos, data analytics, and both mobile and web applications. Object storage manages data as objects, meaning all data types are stored in their native formats.
So I would suggest, as you are collecting 1TB of data and webpage would access 10 - 50GB occasionally, so S3 will make your process (API's) slow and its good the amount of disk space you use, have to pay for that only.
And as you are talking about 1Tb, if data goes beyond that, the disk will be scalable and the application will be highly available.

Amazon s3 vs Ec2 Storing Files

Which one is better for storing pictures and videos uploaded by user ?
Amazon s3 or Filesystem EC2 ?
While opinion-based questions are discouraged on StackOverflow, and answers always depend upon the particular situation, it is highly likely that Amazon S3 is your better choice.
You didn't say whether only wish to store the data, or whether you also wish to serve the data out to users. I'll assume both.
Benefits of using Amazon S3 to store static assets such as pictures and videos:
S3 is pay-as-you-go (only pay for the storage consumed, with different options depending upon how often/fast you wish to retrieve the objects)
S3 is highly available: You don't need to run any servers
S3 is highly durable: Your data is duplicated across three data centres, so it is more resilient to failure
S3 is highly scalable: It can handle massive volumes of requests. If you served content from Amazon EC2, you'd have to scale-out to meet requests
S3 has in-built security at the object, bucket and user level.
Basically, Amazon S3 is a fully-managed storage service that can serve static assets out to the Internet.
If you were to store data on an Amazon EC2 instance, and serve the content from the EC2 instance:
You would need to pre-provision storage using Amazon EBS volumes (and you pay for the entire volume even if it isn't all used)
You would need to Snapshot the EBS volumes to improve durability (EBS Snapshots are stored in Amazon S3, replicated between data centres)
You would need to scale your EC2 instances (make them bigger, or add more) to handle the workload
You would need to replicate data between instances if you are running multiple EC2 instances to meet request volumes
You would need to install and configure the software on the EC2 instance(s) to manage security, content serving, monitoring, etc.
The only benefit of storing this static data directly on an Amazon EC2 instance rather than Amazon S3 is that it is immediately accessible to software running on the instance. This makes the code simpler and access faster.
There is also the option of using Amazon Elastic File System (EFS), which is NAS-like storage. You can mount an EFS volume simultaneously on multiple EC2 instances. Data is replicated between multiple Availability Zones. It is charged on a pay-as-you-go basis. However, it is only the storage layer - you'd still need to use Amazon EC2 instance(s) to serve the content to the Internet.

Best setup to work with amazon AWS

I have a website which gets backup from different social media services and then stores the data on server and then that is displayed on my website. content includes, videos, images, and text data.
Currently i am using an EC2 instance with RDS and EBS. Data is stored in EBS Volumes, But as the amount of the data is big enough more than 1 TB and that is increasing. Every time my EBS volume gets filled i attach another volume.
Then i added S3 to my Setup. Cron jobs runs and stores data on S3 and the EC2 instance displays data from the S3. I am using PHP SDK for this purpose.
The problem which i am facing is that the S3 is very slow in my current setup.
Please suggest whether my setup is good or i need some change in my setup and the other way how can i speedup S3. or i should opt some other way to my setup.
EC2 instance is large reserved instance running CentOS.
I have listened some about the S3fs that mount S3 bucket to Ec2 as a volume. Is this a good choice, as when i mounted S3 Bucket to Ec2 instance the transfer rate was very slow.
I am new to the AWS. My users does not access files directly from S3, but they access through my website which is running on EC2 Instance.
RDS is a good choice for storing metadata such as tags, comments and other relevant information about your multimedia files. S3 is good for storing static content such as Video, Audio and Pictures. I think your approach with RDS and S3 is good enough.
EBS backed instances are good for persistence. If you store your metadata on RDS and static content on S3, the only reason why you should use EBS backed EC2 instances is that you have some configuration files which are unversioned right now. If that's not the case, assuming that your configuration is checked into version control and can be pulled on-demand for a fresh instance every time, then you might want to ditch EBS volumes in favor of ephemeral storage. That may give you some performance boost, nothing significant though.
Regarding your concern with S3's latency, yes, S3 is slow. While all your writes may happen directly to S3, I would highly recommend that you set up Amazon CloudFront for your S3 buckets and let your website consume multimedia content from the CloudFront. CloudFront is a Content Delivery Network (CDN) which works with disk volumes (EBS backed or ephemeral) as well as with S3. Setting it up would take not more than a few minutes. CloudFront also supports streaming media files over RTMP. You may need a library like GPAC for hinting multimedia files to make them streamable if not being done already. You might then want to consider creating one distribution for Video/Audio files for streaming and another distribution for Images, Javascript, Stylesheets and other text files.
Hope this helps.
For faster getting and uploading files from Amazon S3 I use batch() found here.
Also you can use cloudfront for faster getting files. I think 9gag uses cloudfront also..