Cheapest way to use AWS for simple response - amazon-web-services

What I wanted to achieve is pretty simple, if you send a request to some address, the response you get is a single integer number, like 13 for example. I think it is equivalent to hosting a .html page with single number on that page and then I can parse that string in my application. (It is a Unity game, using the WWW class to send the request.)
(This is actually a version number. If it is greater than what I stored in my app I would update it and then send another request to other place and retrieve something bigger)
I am looking for the cheapest way that can handle this. I planned to use AWS but confused what component should be use? S3? EC2? Lambda? CloudFront?
If you think doing this on a web hosting or Heroku or something else is better, I also wanted to hear about it.

To serve up a simple value, S3 should do the trick.
Create a bucket in the console, using lonely lowercase letters, digits, and dashes in the name. The name has to be globally unique among all of S3, so make up something unique. We'll call the bucket name example-bucket.
Create your file on your computer with the desired contents. If plain text, call it version.txt.
In the AWS console, select the bucket, and upload the file. While clicking through the "next" screens, put a check next to "make everything public" and accept the defaults. Upload the file.
Now, go to https://example-bucket.s3.amazonaws.com/version.txt in your browser and verify (using your actual bucket name. That's your download link.
Done. As long as you don't expect to handle over about 800 requests per second, this will do exactly what you want.
Review the S3 pricing, of course.

Although this question is suitable for Server Fault,
EC2 using nginx or apache web server will be sufficient.
Put Load balancer in front of EC2 instances.

Related

AWS S3 filename

I’m trying to build application with backend in java that allows users to create a text with images in it (something like a a personal blog). I’m planning to store these images to s3 bucket. When uploading image files to bucket i’m hashing the original name and store the hashed one in the bucket. Images are for display purpose only, no user will be able to download them. Frontend displays these images by getting a path to them from the server. So the question is, is there any need to store original name of the image file in the database? And what are the reasons, if any, of doing so?
I guess in general it is not needed because what is more important is how these resources are used or managed in the system.
Assuming your service is something like data access (similar to google drive), I don't think it's necessary to store it in DB, unless you want to make faster search queries.

What is the correct way to set up S3 for loading content in the browser?

I want to do the following: a user in a browser types some text and after he presses a 'Save' button, the text should be saved in a file (for example: content.txt) in a folder (for example: /username_text) on the root of an S3 bucket.
Also, I want the user to be able, when he visits the same page, load the content from S3 and continue working on the file. Then, if he/she is done, save the file to S3 again.
Probably important to mention, but I plan on using NodeJS for my back-end...
My question now is: What is the best way to set this storing-and-retrieving thing up? Do I create an API gateway + Lambda function to GET and POST files through that? Or do I for example use the aws-sdk in Node to directly push and pull files from S3? Or is there a better way to do this?
I looked at the following two guides:
Using AWS S3 Buckets in a NodeJS App – Codebase – Medium
Image Upload and Retrieval from S3 Using AWS API Gateway and Lambda
Welcome to StackOverflow!
I think you are worrying too much about the not-so-important stuff. S3 is nothing but a storage system. You could have decided to store the content of these files on DynamoDB, RDS, etc. What would you do if you stored its contents on these real databases? You'd fetch for data and display it to the user, wouldn't you?
This is what you need to do with S3! S3 is a smart choice on your scenario because your "file" can grow very big and S3 is a great place for storing files. However, apparently, you're not actually storing files (think of .pdf, .mp4, .mov, etc.), you're essentially only storing human-readable text.
So here's one approach on how to solve your problem:
FETCHING FILE CONTENT
User logs in
You fetch the user's personal information based on some token. You can store all the metadata in DynamoDB, where given a user_id, fetch all the "files" from this user. These "files" (metadata only) would be the bucket and key for the actual file on S3.
You use the getObject API from S3 to fetch the file based on your query and display the body of your file to your user in a RESTful way. Your response should look something like this:
{
"content": "some content"
}
SAVING FILE CONTENT
User logs in
The user writes anything in a form and submits it. In your Lambda function, you grab the content of this form and process it. This request should look something like this:
{
"file_id": "some-id",
"user_id": "some-id",
"content": "some-content"
}
If the file_id exists, update the content in S3. Otherwise, upload a new file in S3 and then create a new entry in DynamoDB. You'd then, of course, have to handle if the user submitting the changes actually owns the file, but if you're using UUIDs it shouldn't be too much of a problem, but still worth checking in case an ID is leaked somehow.
This way, you don't need to worry about uploading/downloading files as these are CPU intensive tasks, so you can keep your costs low as well as using very little RAM in your functions (128MB should be more than enough), after all, you're now only serving text. Not only this will simplify your way of designing it, but will also make things simpler both in API Gateway and in your code as you won't have to deal with binary types. The maximum you'll do is convert the buffer from S3 to a String when serving some content, but this should be completely fine.
EDIT
On your question regarding whether you should upload it from the browser or not, I suggest you take a look into this answer where I cover the pros/cons of doing it via API Gateway vs from the Browser.

CloudFront Top Referrers Report - ALL referrer URLs

In AWS I can find under:
Cloudfront >> Reports & Analytics >> Top Referrers (CloudFront Top Referrers Report)
There I get the top25 items. How can I get ALL of them?
I have turned on logging in my bucket, but it seems that the referrer is not part of the log-file. Any idea how amazon collects its top25 and how I can according to that get the whole list?
Thanks for your help, in advance.
Amazon's built in analytics are, as you've noticed, rather basic. The data you're looking for all lives in the logfiles that you can set cloudfront up to export (in the cs(Referer) field). If you know what you're looking for, you can set up a little pipeline to download logs, pull out the numbers you care about and generate reports.
Amazon also makes it easy[1] to set up Athena or Redshift to look directly at Cloudfront or S3 logfiles in their target bucket. After a one-time setup, you could query them directly for the numbers you need.
There are also paid services built to fill in the holes in Amazon's default reports. S3stat (https://www.s3stat.com/), for example, will give you a Top 200 Referrer list in its reports, with the ability to export complete lists.
[1] "easy", using Amazon's definition of the word, meaning really really hard.

Resize images on the fly in CloudFront and get them in the same URL instantly: AWS CloudFront -> S3 -> Lambda -> CloudFront

TLDR: We have to trick CloudFront 307 redirect caching by creating new cache behavior for responses coming from our Lambda function.
You will not believe how close we are to achieve this. We have stucked so badly in the last step.
Business case:
Our application stores images in S3 and serves them with CloudFront in order to avoid any geographic slow downs around the globe.
Now, we want to be really flexible with the design and to be able to request new image dimentions directly in the CouldFront URL!
Each new image size will be created on demand and then stored in S3, so the second time it is requested it will be
served really quickly as it will exist in S3 and also will be cached in CloudFront.
Lets say the user had uploaded the image chucknorris.jpg.
Only the original image will be stored in S3 and wil be served on our page like this:
//xxxxx.cloudfront.net/chucknorris.jpg
We have calculated that we now need to display a thumbnail of 200x200 pixels.
Therefore we put the image src to be in our template:
//xxxxx.cloudfront.net/chucknorris-200x200.jpg
When this new size is requested, the amazon web services have to provide it on the fly in the same bucket and with the requested key.
This way the image will be directly loaded in the same URL of CloudFront.
I made an ugly drawing with the architecture overview and the workflow on how we are doing this in AWS:
Here is how Python Lambda ends:
return {
'statusCode': '301',
'headers': {'location': redirect_url},
'body': ''
}
The problem:
If we make the Lambda function redirect to S3, it works like a charm.
If we redirect to CloudFront, it goes into redirect loop because CloudFront caches 307 (as well as 301, 302 and 303).
As soon as our Lambda function redirects to CloudFront, CloudFront calls the API Getaway URL instead of fetching the image from S3:
I would like to create new cache behavior in CloudFront's Behaviors settings tab.
This behavior should not cache responses from Lambda or S3 (don't know what exactly is happening internally there), but should still cache any followed requests to this very same resized image.
I am trying to set path pattern -\d+x\d+\..+$, add the ARN of the Lambda function in add "Lambda Function Association"
and set Event Type Origin Response.
Next to that, I am setting the "Default TTL" to 0.
But I cannot save the behavior due to some error:
Are we on the right way, or is the idea of this "Lambda Function Association" totally different?
Finally I was able to solve it. Although this is not really a structural solution, it does what we need.
First, thanks to the answer of Michael, I have used path patterns to match all media types. Second, the Cache Behavior page was a bit misleading to me: indeed the Lambda association is for Lambda#Edge, although I did not see this anywhere in all the tooltips of the cache behavior: all you see is just Lambda. This feature cannot help us as we do not want to extend our AWS service scope with Lambda#Edge just because of that particular problem.
Here is the solution approach:
I have defined multiple cache behaviors, one per media type that we support:
For each cache behavior I set the Default TTL to be 0.
And the most important part: In the Lambda function, I have added a Cache-Control header to the resized images when putting them in S3:
s3_resource.Bucket(BUCKET).put_object(Key=new_key,
Body=edited_image_obj,
CacheControl='max-age=12312312',
ContentType=content_type)
To validate that everything works, I see now that the new image dimention is served with the cache header in CloudFront:
You're on the right track... maybe... but there are at least two problems.
The "Lambda Function Association" that you're configuring here is called Lambda#Edge, and it's not yet available. The only users who can access it is users who have applied to be included in the limited preview. The "maximum allowed is 0" error means you are not a preview participant. I have not seen any announcements related to when this will be live for all accounts.
But even once it is available, it's not going to help you, here, in the way you seem to expect, because I don't believe an Origin Response trigger allows you to do anything to trigger CloudFront to try a different destination and follow the redirect. If you see documentation that contradicts this assertion, please bring it to my attention.
However... Lambda#Edge will be useful for setting Cache-Control: no-cache on the 307 so CloudFront won't cache it, but the redirect itself will still need to go all the way back to the browser.
Note also, Lambda#Edge only supports Node, not Python... so maybe this isn't even part of your plan, yet. I can't really tell, from the question.
Read about the Lambda#Edge limited preview.
The second problem:
I am trying to set path pattern -\d+x\d+\..+$
You can't do that. Path patterns are string matches supporting * wildcards. They are not regular expressions. You might get away with /*-*x*.jpg, though, since multiple wildcards appear to be supported.

AWS S3 - Privacy error when accessing file from link

I am working with a team that is using S3 to host content and they moved from a single bucket for all brands to one bucket for each brand and now we are having trouble when linking to the content from within salesforce site.com page. When I copy the link from S3 as HTTPS, I get a >"Your connection is >not private, Attackers might be trying to steal your information from >spiritxpress.s3.varsity.s3.amazonaws.com (for example, passwords, messages, or credit cards)."
I have asked them to compare the settings from the one that is working, and I don't have access to dig into it myself, and we are pretty new to this as well so thought I would see if there were any known paths to walk down. The ID and Key have not changed and I can access the content via CyberDuck, it just is not loading when reached via a link.
Let me know if additional information is needed and I will provide as quickly as I can.
[EDIT] the bucket naming convention they are using is all lowercase and meets convention guidelines as well, but it seems strange to me they way it is structured as they have named the bucket "brandname.s3.companyname" and when copying the link it comes across as "https://brandname.s3.company.s3.amazonaws.com/directory/filename" where the other bucket was being rendered as "https://s3.amazonaws.com/bucketname/......
Whoever made this change has failed to account for the way wildcard certificates work in HTTPS.
Requests to S3 using HTTPS are greeted with a certificate identifying itself as "*.s3[-region].amazonaws.com" and in order for the browser to consider this to be valid when compared to the link you're hitting, there cannot be any dots in the part of the hostname that matches the * offered by the cert. Bucket names with dots are valid, but they cannot be used on the left side of "s3[-region].amazonaws.com" in the hostname unless you are willing and able to accept a certificate that is deemed invalid... they can only be used as the first element of the path.
The only way to make dotted bucket names and S3 native wildcard SSL to work together is the other format: https://s3[-region].amazonaws.com/example.dotted.bucket.name/....
If your bucket isn't in us-standard, you likely need to use the region in the hostname, so that the request goes to the correct endpoint, e.g. https://s3-us-west-2.amazonaws.com/example.dotted.bucket.name/path... for a bucket in us-west-2 (Oregon). Otherwise S3 may return an error telling you that you need to use a different endpoint (and the endpoint they provide in the error message will be valid, but probably not the one you're wanting for SSL).
This is a limitation on how SSL certificates work, not a limitation in S3.
Okay, it appears it did boil down to some permissions that were missed and we were able to get the file to display as expected. Other issues are present, but the present one is resolved so marking as answered.