I created a new (not even a week old) public S3-bucket to access some files remotely (it has to be public - that is the crux). Things like logging and versioning are deactivated, and pretty much all the standard options were kept.
However, I have a huge amount of requests per day (over 3000requests per day and maybe even increasing) that is not connected at all to my assessing of the files. Where does such a traffic come from? Does amazon access the files itself by default? Something like a tracking of the files? Can this be deactivated?
You can activate Amazon S3 server access logging to obtain this information. The logs will show the requests and their origin.
FYI, requests are charged at $0.0004 per 1000, therefore your 3000 requests per day are costing 1.2c per day.
Related
I have an EC2 service in AWS, in which the only thing I do is upload a .txt file 4 times a day, which all my clients, when using my software, use the last txt I uploaded, it can be updated by all as many times as they want in the day.
Lately with the EC2 service I am being surprised by the cost of ec2 $0.090 per GB - first 10 TB / month data transfer.....
I wanted to know if there is another option to continue using an AWS service where I can host these txt, my clients can consume it and not pay as much as I am doing (more than 200 dollars per month)
DISCLAIMER I AM FROM ARGENTINA
ok the first thing that you have to know is that all the data uploaded is free, BUT if you expose your instance through AWS Load Balancer you will be charged for connections and data processing, the data transfer fees in AWS is basically a head ache IMO.
My suggestion -> AWS S3
If your txt files can be publicly accessible or you can modify your app to create S3 pre-signed URLs to make the files privates but accessible from your customer side, put those files in AWS S3, basically you will pay exactly the same data transfer fee but you will save on Ec2 instance capacity and EBS is a little bit more expensive than S3, additionally you don't need to care about HA or backups.
I think you don't need Cloudfront at the very beginning
AWS S3 has a standard public bucket and folder (Asia Pacific region) which hosts ~30 GB of images/media. On another hand, the website and app access these images by using a direct S3 object URL. Unknowingly we run into high data transfer cost and its significantly unproportionate:
Amazon Simple Storage Service: USD 30
AWS Data Transfer: USD 110
I have also read that if EC2 and S3 is in the same region cost will be significantly lower, but problem is S3 objects are accessible from anywhere in the world from client machine directly and no EC2 is involved in between.
Can someone please suggest how can data transfer costs be reduced?
The Data Transfer charge is directly related to the amount of information that goes from AWS to the Internet. Depending on your region, it is typically charged at 9c/GB.
If you are concerned about the Data Transfer charge, there are a few things you could do:
Activate Amazon S3 Server Access Logging, which will create a log file for each web request. You can then see how many requests are coming and possibly detect strange access behaviour (eg bots, search engines, abuse).
You could try reducing the size of files that are typically accessed, such as making images smaller. Take a look at the Access Logs and determine which objects are being accessed the most, and are therefore causing the most costs.
Use less large files on your website (eg videos). Again, look at the Access Logs to determine where the data is being used.
A cost of $110 suggests about 1.2TB of data being transferred.
I am currently implementing image storing architecture for my service.
As I read in one article it is a good idea to move whole
image upload and download traffic to the external cloud object storage.
https://medium.com/#jgefroh/software-architecture-image-uploading-67997101a034
As I noticed there are many cloud object storage providers:
- Amazon S3
- Google Cloud Storage
- Microsoft Azure Blob Storage
- Alibaba Object Storage
- Oracle Object Storage
- IBM Object Storage
- Backblaze B2 Object
- Exoscale Object Storage
- Aruba Object Storage
- OVH Object Storage
- DreamHost DreamObjects
- Rackspace Cloud Files
- Digital Ocean Spaces
- Wasabi Hot Object Storage
My first choice was Amazon S3 because
almost all of my system infrastructure is located on AWS.
However I see a lot of problems with this object storage.
(Please correct me if I am wrong in any point below)
1) Expensive log delivery
AWS is charging for all operational requests. If I have to pay for all requests I would like to see all request logs. and I would like to get these logs as fast as possible. AWS S3 provide log delivery, but with a big delay and each log is provided as a separate file in other S3 bucket, so each log is a separate S3 write request. Write requests are more expensive, they cost approximately 5$ per 1M requests. There is another option to trigger AWS Lambda whenever request is made, however it is also additional cost 0,2 $ per 1M lambda invocations. In summary - in my opinion log delivery of S3 requests is way to expensive.
2) Cannot configure maximum object content-length globally for a whole bucket.
I have not found the possibility to configure maximum object size (content-length) restriction for a whole bucket. In short - I want to have a possibility to block uploading files larger than specified limit for a chosen bucket. I know that it is possible to specify content-length of uploaded file in a presigned PUT urls, however I think this should be available to configure globally for a whole bucket.
3) Cannot configure request rate limit per IP numer per minute directly on a bucket.
Because all S3 requests are chargable I would like to have a possibility
to restrict a limit of requests that will be made on my bucket from one IP number.
I want to prevent massive uploads and downloads from one IP number
and I want it to be configurable for a whole bucket.
I know that this functionality can be privided by AWS WAF attached to Cloudfront
however such WAF inspected requests are way to expensive!
You have to pay 0,60$ per each 1M inspected requests.
Direct Amazon S3 requests costs 0,4$ per 1M requests,
so there is completely no point and it is completely not profitable
to use AWS WAF as a rate limit option for S3 requests as a "wallet protection" for DOS attacks.
4) Cannot create "one time - upload" presigned URL.
Generated presigned URLs can be used multiple times as long as the didnt expired.
It means that you can upload one file many times using same presigned URL.
It would be great if AWS S3 API would provide a possibility to create "one time upload" presigned urls. I know that I can implement such "one time - upload" functionality by myself.
For example see this link https://serverless.com/blog/s3-one-time-signed-url/
However in my opinion such functionality should be provided directly via S3 API
5) Every request to S3 is chargable!
Let's say you created a private bucket.
No one can access data in it however....
Anybody from the internet can run bulk requests on your bucket...
and Amazon will charge you for all that forbidden 403 requests!!!
It is not very comfortable that someone can "drain my wallet"
anytime by knowing only the name of my bucket!
It is far from being secure!, especially if you give someone
direct S3 presigned URL with bucket address.
Everyone who knows the name of a bucket can run bulk 403 requests and drain my wallet!!!
Someone already asked that question here and I guess it is still a problem
https://forums.aws.amazon.com/message.jspa?messageID=58518
In my opinion forbidden 403 requests should not be chargable at all!
6) Cannot block network traffic to S3 via NaCL rules
Because every request to S3 is chargable.
I would like to have a possibility to completely block
network traffic to my S3 bucket in a lower network layer.
Because S3 buckets cannot be placed in a private VPC
I cannot block traffic from a particular IP number via NaCl rules.
In my opinion AWS should provide such NaCl rules for S3 buckets
(and I mean NaCLs rules not ACLs rules that block only application layer)
Because of all these problems I am considering using nginx
as a proxy for all requests made to my private S3 buckets
Advantages of this solution:
I can rate limit requests to S3 for free however I want
I can cache images on my nginx for free - less requests to S3
I can add extra layer of security with Lua Resty WAF (https://github.com/p0pr0ck5/lua-resty-waf)
I can quickly cut off requests with request body greater than specified
I can provide additional request authentication with the use of openresty
(custom lua code can be executed on each request)
I can easily and quickly obtain all access logs from my EC2 nginx machine and forward them to cloud watch using cloud-watch-agent.
Disadvantages of this solution:
I have to transfer all the traffic to S3 through my EC2 machines and scale my EC2 nginx machines with the use of autoscaling group.
Direct traffic to S3 bucket is still possible from the internet for everyone who knows my bucket name!
(No possibility to hide S3 bucket in private network)
MY QUESTIONS
Do you think that such approach with reverse proxy nginx server in front of object storage is good?
Or maybe a better way is to just find alternative cloud object storage provider and not proxy object storage requests at all?
I woud be very thankful for the recommendations of alternative storage providers.
Such info about given recommendation would be preferred.
Object storage provider name
A. What is the price for INGRESS traffic?
B. What is the price for EGRESS traffic?
C. What is the price for REQUESTS?
D. What payment options are available?
E. Are there any long term agreement?
F. Where data centers are located?
G. Does it provide S3 compatible API?
H. Does it provide full access for all request logs?
I. Does it provide configurable rate limit per IP number per min for a bucket?
J. Does it allow to hide object storage in private network or allow network traffic only from particular IP number?
In my opinion a PERFECT cloud object storage provider should:
1) Provide access logs of all requests made on bucket (IP number, response code, content-length, etc.)
2) Provide possibility to rate limit buckets requests per IP number per min
3) Provide possibility to cut off traffic from malicious IP numbers in network layer
4) Provide possibility to hide object storage buckets in private network or give access only for specified IP numbers
5) Do not charge for forbidden 403 requests
I would be very thankful for allt the answers, comments and recommendations
Best regards
Using nginx as a reverse proxy for cloud object storage is a good idea for many use-cases and you can find some guides online on how to do so (at least with s3).
I am not familiar with all features available by all cloud storage providers, but I doubt that any of them will give you all the features and flexibility you have with nginx.
Regarding your disadvantages:
Scaling is always an issue, but you can see with benchmark tests
that nginx can handle a lot of throughput even in small machines
There are solution for that in AWS. First make your S3 bucket private, and then you can:
Allow access to your bucket only from the EC2 instance/s running your nginx servers
generate pre-signed URLs to your S3 bucket and serve them to your clients using nginx.
Note that both solutions for your second problem require some development
If you have an AWS Infrastructure and want to implement a on-prem S3 compatible API, you can look into MinIO.
It is a performant object storage which protects data protection through Erasure Coding
I have been using the AmazonS3 service to store some files.
I have uploaded 4 videos and they are public. I'm using a third party video player for those videos (JW Player). As a new user on the AWS Free Tier, my free PUT, POST and LIST requests are almost used up from 2000 allowed requests, and for four videos that seems ridiculous.
Am I missing something or shouldn't one upload be one PUT request, I don't understand how I've hit that limit already.
The AWS Free Tier for Amazon S3 includes:
5GB of standard storage (normally $0.023 per GB)
20,000 GET requests (normally $0.0004 per 1,000 requests)
2,000 PUT requests (normally $0.005 per 1,000 requests)
In total, it is worth up to 13.3 cents every month!
So, don't be too worried about your current level of usage, but do keep an eye on charges so you don't get too many surprises. You can always Create a Billing Alarm to Monitor Your Estimated AWS Charges.
The AWS Free Tier is provided to explore AWS services. It is not intended for production usage.
It would be very hard to find out the reason for this without debugging a bit. So I would suggest you try the following debugging :
See if you have cloudtrail enabled. If yes, then you can track the API calls to S3 to see if anything is wrong there.
If you have cloudtrail enabled then it itself put data into the S3 bucket that might also take up some of the requests.
See if you have logging enabled at the bucket level, that might give you more insight on what all requests are reaching your bucket.
Your vides are public and that is the biggest concern here as you don't know who all can access it.
Setup cloudwatch alarms to avoid any surprises and try to look at logs to find out the issue.
I am learning AWS , and came across hosting static websites using Amazon S3 and distributing to edge locations using Cloud Front and Route53.
I know that for Cloud front we pay for what we use. So my monthly bill will reflect the number of requests I get once the free tier is over.
My question is what if a hacker or someone sends a lots of requests like spamming, then will I be charged higher?
How to prevent this and does AWS has any security measures like limiting the number of requests to serve per minute or something for this ?
Pardon me if my question is very basic. I am just learning . Thanks
My question is what if a hacker or someone sends a lots of requests like spamming, then will I be charged higher?
Yes. You are charged a per-request price, as well as data transfer charges. The per-request charges are relatively low, but if they find a large file to download they can quickly run up the bandwidth charge.
does AWS has any security measures like limiting the number of requests to serve per minute or something for this ?
Yes, you want WAF, the Web Application Firewall. With it you can configure a rate-limited rule that will block an IP address after N requests within a five-minute period.
As I know you pay for Caches Invalidations on AWS but not for a number of requests done to your CloudFront distributions.