Amazon S3 + Lambda + DynamoDB Website Hosting

Amazon S3 + Lambda + DynamoDB Website Hosting - amazon-web-services

I'm interested in hosting a website for a small business (< 100 users / month) and I wanted to try going 'serverless'. I've read that using Amazon S3, Lambda and DynamoDB is a way to set this up, by hosting the front-end on S3, using Lambda functions to access the back-end, and storing data in DynamoDB. I'll need to run a script on page load to get data to display, save user profiles/allow logins, and acccept payments using Stripe or Braintree.
Is this a good situation to use this setup, or am I better off just using EC2 with a LAMP stack? Which is better in terms of cost?

It is a perfectly good solution, and will probably cost you nothing at all to host on AWS - literally pennies a month. I host several low traffic sites this way and it works well.
The only caveat would be, since your traffic is so slow, almost every time someone hits a page, if it needs to make any back-end calls, those lambda functions will likely need a 'cold-start', which may introduce a delay and cause the page to load a bit slower than if it had more traffic that tended to keep the lambda cache 'warm'.

Related

Using lambda#edge for dynamicaly changing orgins

I have an application running on EC2 instances behind an ALB
application basically serves dynamic HTML pages. to reduce the loads on application instances i was panning to save the rendered HTML to S3 and serve it from their instead of the application instances.
Whenever we receive a request on a route that should be routed to either ALB or S3 depending on if the page is stored in S3.
for that we are planning to use cloudfront and edge lamabda to dynamically route the traffic to different origins depending on value set on dynamoDB for a route.
So far on testing seems to be working fine only issue is increased latency when using dynamo DB and for pages not stored in S3 adds considerable latency from lambda and application.
I would like to know if their are any better approaches than this and is their any better storage mechanism than DynamoDB that we can use with lambda#edge.
if we can apply this logic with a feature lag to only certain routes (thought of cloudfront behaviors but have more than 500+ routes different behaviors won't be feesible)
How good is caching things in lambda memory for certain amount of time is it possible?
We have tried using global dynamo DB to add replica in the region closer to the users but still add more latency

How to limit number of reads from Amazon S3 bucket

I'm hosting a static website in Amazon S3 with CloudFront. Is there a way to set a limit for how many reads (for example per month) will be allowed for my Amazon S3 bucket in order to make sure I don't go above my allocated budget?

If you are concerned about going over a budget, I would recommend Creating a Billing Alarm to Monitor Your Estimated AWS Charges.
AWS is designed for large-scale organizations that care more about providing a reliable service to customers than staying within a particular budget. For example, if their allocated budget was fully consumed, they would not want to stop providing services to their customers. They might, however, want to tweak their infrastructure to reduce costs in future, such as changing the Price Class for a CloudFront Distribution or using AWS WAF to prevent bots from consuming too much traffic.
Your static website will be rather low-cost. The biggest factor will likely be Data Transfer rather than charges for Requests. Changing the Price Class should assist with this. However, the only true way to stop accumulating Data Transfer charges is to stop serving content.

You could activate CloudTrail data read events for the bucket, create a CloudWatch Event Rule to trigger an AWS Lambda Function that increments the number of reads per object in an Amazon DynamoDB table and restrict access to the objects once a certain number of reads has been reached.

What you're asking for is a very typical question in AWS. Unfortunately with near infinite scale, comes near infinite spend.
While you can put a WAF, that is actually meant for security rather than scale restrictions. From a cost-perspective, I'd be more worried about the bandwidth charges than I would be able S3 requests cost.
Plus once you put things like Cloudfront or Lambda, it gets hard to limit all this down.
The best way to limit, is to put Billing Alerts on your account -- and you can tier them, so you get a $10, $20, $100 alerts, up until the point you're uncomfortable with. And then either manually disable the website -- or setup a lambda function to disable it for you.

AWS Lambda with Elasticache Redis without NAT

I am going to mention my needs and what I have currently in place so bear with me. Firstly, a lambda function say F1 which when invoked will get 100 links from a site. Most of these links say about 95 are the same as when F1 was invoked the previous time, so further processing must be done with only those 5 "new" links. One solution was to write to a Dynamodb database the links that are processed already and each time the F1 is invoked, query the database and skip those links. But I found that the "database read" although in milliseconds is doubling up lambda runtime and this can add up especially if F1 is called frequently and if there are say a million processed links. So I decided to use Elasticache with Redis.
I quickly found that Redis can be accessed only when F1 runs on the same VPC and because F1 needs access to the internet you need NAT. (I don't know much about networking) So I followed the guidelines and set up VPC and NAT and got everything to work. I was delighted with performance improvements, almost reduced the expected lambda cost in half to 30$ per month. But then I found that NAT is not included in the free tier and I have to pay almost 30$ per month just for NAT. This is not ideal for me as this project can be in development for months and I feel like I am paying the same amount as compute just for internet access.
I would like to know if I am making any fundamental mistakes. Am I using the Elasticache in the right way? Is there a better way to access both Redis and the internet? Is there any way to structure my stack differently so that I retain the performance without essentially paying twice the amount after free tier ends. Maybe add another lambda function? I don't have any ideas. Any minute improvements are much appreciated. Thank you.

There are many ways to accomplish this, and all of them have some trade-offs. A few other ideas for you to consider:
Run F1 without a VPC. It will have connectivity directly to DynamoDB without need for a NAT, saving you the cost of the NAT gateway.
Run your function on a micro EC2 instance rather than in Lambda, and persist your link lookups to some file on local disk, or even a local Redis. With all the Serverless hype, I think people sometimes overestimate the difficulty (and stability) of simply running an OS. It's not that hard to manage, it's easy to set up backups, and may be an option depending upon your availability requirements and other needs.
Save your link data to S3 and set up a VPC endpoint to S3 gateway endpoint. Not sure if it will be fast enough for your needs.

AWS EC2 vs Serverless Cost Comparison

I am currently using AWS EC2 for my workloads.
Now I want to convert the EC2 server to the Serverless Platform i.e(API Gateway and Lambda).
I have also followed different blogs and I am ready to go with the serverless. But, my one concern is on pricing.
How can I predict per month cost for the serverless according to my use of EC2? Will the EC2 Cloudwatch metrics help me to calculate the costing?
How can I make cost comparison?

Firstly, there is no simple answer to your question as a simple lift and shift from a VM to Lambda is not ideal. To make the most of lambda, you need to architect your solution to be serverless. This means making use of the event-driven nature of Lambda.
Now to answer the question simply, you are charged only for the time it takes to serve a request (to the next 100ms). So if your lambda responds to the request in 70ms you pay for 100ms of execution time. If you serve the request in 210ms then you pay for 300ms.
You also pay for the memory allocated to the function on the order of GB per/month.
If you have a good logging or monitoring strategy you could check how long it takes to serve each type of request and how often they occur. If your application is fairly low-scale and is not accessed often (say all requests come within an 8 hour window) then you may end up saving money with lambda as you are only paying AWS for the time spent serving the request.
Also, it may help to read the following article on common pitfalls:
https://medium.com/#emaildelivery/serverless-pitfalls-issues-you-may-encounter-running-a-start-up-on-aws-lambda-f242b404f41c

How to cache the images stored in Amazon S3?

I have a RESTful webservice running on Amazon EC2. Since my application needs to deal with large number of photos, I plan to put them on Amazon S3. So the URL for retrieving a photo from S3 could look like this:
http://johnsmith.s3.amazonaws.com/photos/puppy.jpg
Is there any way or necessity to cache the images on EC2? The pros and cons I can think of is:
1) Reduced S3 usage and cost with improved image fetching performance. However on the other hand EC2 cost can rise plus EC2 may not have the capability to handle the image cache due to bandwidth restrictions.
2) Increased development complexity cuz you need to check the cache first and ask S3 to transfer the image to EC2 and then transfer to the client.
I'm using the EC2 micro instance and feel it might be better not to do the image cache on EC2. But the scale might grow fast and eventually will need a image cache.(Am I right?) If cache is needed, is it better to do it on EC2, or on S3? (Is there a way for caching for S3?)
By the way, when the client uploads an image, should it be uploaded to EC2 or S3 directly?

Why bring EC2 into the equation? I strongly recommend using CloudFront for the scenario.
When you use CloudFront in conjunction with S3 as origin; the content gets distributed to 49 different locations worldwide ( as of count of edge locations worldwide today ) directly working out as a cache globally and the content being fetched from nearest location based on the latency to your end users.
The way you don't need to worry about the scale and performance of Cache and EC2 can straightforward offload this to CloudFront and S3.

Static vs dynamic
Generally speaking, here are the tiers:
best CDN (cloudfront)
good static hosting (S3)
okay dynamic (EC2)
Why? There are a few reasons.
maintainability and scalability: cloudfront and S3 scale "for free". You don't need to worry about capacity or bandwidth or request rate.
price: approximately speaking, it's cheaper to use S3 than EC2.
latency: CDNs are located around the world, leading to shorter load times.
Caching
No matter where you are serving your static content from, proper use of the Cache-Control header will make life better. With that header you can tell a browser how long the content is good for. If it is something that never changes, you can instruct a browser to keep it for a year. If it frequently changes, you can instruct a browser to keep it for an hour, or a minute, or revalidate every time. You can give similar instructions to a CDN.
Here's a good guide, and here are some examples:
# keep for one year
Cache-Control: max-age=2592000
# keep for a day on a CDN, but a minute on client browsers
Cache-Control: s-maxage=86400, maxage=60
You can add this to pages served from your EC2 instance (no matter if it's nginx, Tornado, Tomcat, IIS), you can add it to the headers on S3 files, and CloudFront will use these values.
I would not pull the images from S3 to EC2 and then serve them. It's wasted effort. There are only a small number of use cases where that makes sense.

Few scenarios when EC2 caching instance:
your upload/download ratio is far from 50/50
you hit S3 limit 100req/sec
you need URL masking
you want to optimise kernel, TCP/IP settings, cache SSL session for clients
you want proper cache invalidating mechanism for all geo locations
you need 100% control where data is stored
you need to count number of requests
you have custom authentication mechanism
For number of reasons I recommend to take a look at Nginx S3 proxy.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js