I have an application that uses Dynamodb as persistence layer and api gateway as interface to the internet. To make it globally accessible with the least amount of latency for the consumers of the api, I thought about enable Dynamodb global tables for various regions, deploy my api to the same regions and have Route 53 route traffic with an geolocation routing policy to the nearest api end point.
My questions are:
Is that the right way to do it? Am I missing something? Are there better ways?
What are the cost implications? As far as I understand are all services (Route 53, Dynamodb, API gateway) billed based on consumption. Therefore deploying to all regions does not add costs
Thank you
You are perhaps missing a Lambda to interact with DynamoDB. Not sure about your use case -- and it is not unheard of to expose DynamoDB directly -- but the most obvious pattern would be API Gateway -> Lambda -> DynamoDB. But, as I say, your particular use case will drive that -- would be keen to learn more, if you want to share.
There are no particular pricing call-outs at this level of detail, as long as you are sure you want to run DynamoDB Global Tables. You may consider provisioned capacity for DynamoDB, if you have stable consumption, but note that provisioned Global Tables are charged by hour.
There are probably like a hundred more questions I would ask about your solution architecture, but this is perhaps not the right forum. Hope this much helps.
Related
I am trying to set up an event trigger lambda#Edge function from cloudFront.
This function needs to access the database and replace the url's metadata before distributing out to users.
Issues I am facing:
My DocumentDB is placed in a VPC private subnet. Can't be accessed outside the VPC.
My Lambda edge function can't connect to my VPC since they are both in different region.
The method I had in mind is to create an API in my web server(public subnet) for my lambda function to call, but this seems like not a very efficient method
Appreciate If you can give me some advice or an alternative way for implementation.
Thanks in Advance
Lambda#Edge has a few limitations you can read about here.
Among them is:
You can’t configure your Lambda function to access resources inside your VPC.
That means the VPC being in another region is not your problem, you just can't place a Lambda # Edge function in any VPC.
The only solution I can think of is making your DocumentDB available publicly on the internet, which doesn't seem like a great idea. You might be able to create a security group that only allows access from the CloudFront IP-Ranges although I couldn't find out if Lambda#Edge actually uses the same ranges :/
Generally I'd avoid putting too much business logic in Lambda#Edge functions - keep in mind they're run on every request (or at the very least every request to the origin) and increase the latency for these requests. Particularly network requests are expensive in terms of time, more so if you communicate across continents to your primary region with the database.
If the information you need to update the URLs metadata is fairly static, I'd try to serialize it and distribute it in the lambda package - reading from local storage is considerably cheaper and faster.
I have newly created an API service that is going to be deployed as a pilot to a customer. It has been built with AWS API Gateway, AWS Lambda, and AWS S3. With a SaaS pricing model, what's the best way for me to monitor this customer's usage and cost? At the moment, I have made a unique API Gateway, Lambda function, and S3 bucket specific to this customer. Is there a good way to create a dashboard that allows me (and perhaps the customer) to detail this monitoring?
Additional question, what's the best way to streamline this process when expanding to multiple different customers? Each customer would have a unique API token — what's the better approach than the naive way of making unique AWS resources per customer?
I am new (a college student), but any insights/resources would help me a long way. Thanks.
Full disclosure: I work for Lumigo, a company that does exactly that.
Regarding your question,
As #gusto2 said, there are many tools that you can use, and the best tool depends on your specific requirements.
The main difference between the tools is the level of configuration that you need to apply.
cloudwatch default metrics - The first tool that you should use. This is an out-of-the-box solution that provides you many metrics on the services, such as: duration, number of invocations and errors, memory. You can configure metrics over different timeslots and aggregators (P99, average, max, etc.)
This tool is great for basic monitoring.
Its limitation is its greatest strength - it provides monitoring which is common to all the services, thus nothing tailored-fit to serverless applications. https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
cloudwatch custom metrics - The other side of the scale - getting much more precise metrics, which allows you to upload any metric data and monitor it: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html
This is a great tool if you know exactly what you want to monitor, and you already familiar with your architecture limitations and pain points.
And, of course, you can configure alarms over this data:
Lumigo - 3rd party company (again, as a disclosure, this is my workplace). Provides out-of-the-box monitoring, specifically created for serverless applications, such as an abnormal number of invocations, costs, etc.. This tool also provides troubleshooting capabilities to enable deeper observability.
Of course, there are more 3rd party tools that you can find online. All are great- just find the one that suits your requirement the best.
Is there a good way to create a dashboard
There a are multiple ways and options depending in your scaling, amount of data and requirements. So you could start small and simple, but check if any option is feasible or not.
You can start with the CloudWatch. You can monitor basic metrics, create dashboards and even share with other accounts.
naive way of making unique AWS resources per customer
For the start I would consider creating custom cloudwatch metrics with the customer id as a metric and put the metrics from the Lambda functions.
Looks simple, but you should do the math and a PoC about the number of requested datapoints and the dashboards to prevent a nasty surprise on the billing.
Another option is sending metrics/events to DynamoDB, using atomic functions you could directly build some basic aggregations (kind of naïve stream processing).
When scaling to a lot of events, clients, maybe you will need some serious api analytics, but that may be a different topic.
I am going to mention my needs and what I have currently in place so bear with me. Firstly, a lambda function say F1 which when invoked will get 100 links from a site. Most of these links say about 95 are the same as when F1 was invoked the previous time, so further processing must be done with only those 5 "new" links. One solution was to write to a Dynamodb database the links that are processed already and each time the F1 is invoked, query the database and skip those links. But I found that the "database read" although in milliseconds is doubling up lambda runtime and this can add up especially if F1 is called frequently and if there are say a million processed links. So I decided to use Elasticache with Redis.
I quickly found that Redis can be accessed only when F1 runs on the same VPC and because F1 needs access to the internet you need NAT. (I don't know much about networking) So I followed the guidelines and set up VPC and NAT and got everything to work. I was delighted with performance improvements, almost reduced the expected lambda cost in half to 30$ per month. But then I found that NAT is not included in the free tier and I have to pay almost 30$ per month just for NAT. This is not ideal for me as this project can be in development for months and I feel like I am paying the same amount as compute just for internet access.
I would like to know if I am making any fundamental mistakes. Am I using the Elasticache in the right way? Is there a better way to access both Redis and the internet? Is there any way to structure my stack differently so that I retain the performance without essentially paying twice the amount after free tier ends. Maybe add another lambda function? I don't have any ideas. Any minute improvements are much appreciated. Thank you.
There are many ways to accomplish this, and all of them have some trade-offs. A few other ideas for you to consider:
Run F1 without a VPC. It will have connectivity directly to DynamoDB without need for a NAT, saving you the cost of the NAT gateway.
Run your function on a micro EC2 instance rather than in Lambda, and persist your link lookups to some file on local disk, or even a local Redis. With all the Serverless hype, I think people sometimes overestimate the difficulty (and stability) of simply running an OS. It's not that hard to manage, it's easy to set up backups, and may be an option depending upon your availability requirements and other needs.
Save your link data to S3 and set up a VPC endpoint to S3 gateway endpoint. Not sure if it will be fast enough for your needs.
I have an API Gateway endpoint that I would like to limit access to. For anonymous users, I would like to set both daily and monthly limits (based on IP address).
AWS WAF has the ability to set rate limits, but the interval for them is a fixed 5 minutes, which is not useful in this situation.
API Gateway has the ability to add usage plans with longer term rate quotas that would suit my needs, but unfortunately they seem to be based on API keys, and I don't see a way to do it by IP.
Is there a way to accomplish what I'm trying to do using AWS Services?
Is it maybe possible to use a usage plan and automatically generate an api key for each user who wants to access the api? Or is there some other solution?
Without more context on your specific use-case, or the architecture of your system, it is difficult to give a “best practice” answer.
Like most things tech, there are a few ways you could accomplish this. One way would be to use a combination of CloudWatch API logging, Lambda, DynamoDB (with Streams) and WAF.
At a high level (and regardless of this specific need) I’d protect my API using WAF and the AWS security automations quickstart, found here, and associate it with my API Gateway as guided in the docs here. Once my WAF is setup and associated with my API Gateway, I’d enable CloudWatch API logging for API Gateway, as discussed here. Now that I have things setup, I’d create two Lambdas.
The first will parse the CloudWatch API logs and write the data I’m interested in (IP address and request time) to a DynamoDB table. To avoid unnecessary storage costs, I’d set the TTL on the record I’m writing to my DynamoDB table to be twice whatever my analysis’s temporal metric is... ie If I’m looking to limit it to 1000 requests per 1 month, I’d set the TTL on my DynamoDB record to be 2 months. From there, my CloudWatch API log group will have a subscription filter that sends log data to this Lambda, as described here.
My second Lambda is going to be doing the actual analysis and handling what happens when my metric is exceeded. This Lambda is going to be triggered by the write event to my DynamoDB table, as described here. I can have this Lambda run whatever analysis I want, but I’m going to assume that I want to limit access to 1000 requests per month for a given IP. When the new DynamoDB item triggers my Lambda, the Lambda is going to query the DynamoDB table for all records that were created in the preceding month from that moment, and that contain the IP address. If the number of records returned is less than or equal to 1000, it is going to do nothing. If it exceeds 1000 then the Lambda is going to update the WAF WebACL, and specifically UpdateIPSet to reject traffic for that IP, and that’s it. Pretty simple.
With the above process I have near real-time monitoring of request to my API gateway, in a very efficient, cost-effective, scaleable manner in a way that can be deployed entirely Serverless.
This is just one way to handle this, there are definitely other ways you could accomplish this with say Kinesis and Elastic Search, or instead of logs you could analyze CloudTail events, or by using a third party solution that integrates with AWS, or something else.
I haven't been able to find a clear answer on this from the documentation.
Is is discouraged to access DynamoDB from outside the region it is hosted in? For example, I want to do a lot of writes to a DynamoDB table in us-west-2, from a cluster in us-east-1 (or even ap-southeast-1). My writes are batched and non-real-time, so I don't care so much about a small increase in latency.
Note that I am not asking about cross-region replication.
DynamoDB is a hosted solution but that doesn't mean you need to be inside AWS to use it.
There are cases, especially for storing user information for clients making queries against DynamoDB - outside of "AWS region".
So to answer your question - best performance will be achieved when you mitigate the geo barrier, but you can work with any endpoint you'd like from anywhere in the world.