I have been looking into AWS AppSync to create a managed GraphQL API with DynamoDB as the datastore. I know AppSync can use Apache Velocity Template Language as a resolver to fetch data from dynamoDB. However, that means I have to introduce an extra language to the programming stack, so I would prefer to write the resolvers in Javascript/Node.js
Is there any downside of using a lambda function to fetch data from DynamoDB? What reasons are there to use VTL instead of a lambda for resolvers?
There are pros and cons to using lambda functions as your AppsSync resolvers (although note you'll still need to invoke your lambdas from VTLs):
Pros
Easier to write and maintain
More powerful for marshalling and validating requests and responses
Common functionality can be more DRY than possible with VTLs (macros are not supported)
More flexible debugging and logging
Easier to test
Better tooling and linting available
If you need to support long integers in your DynamoDB table (DynamoDB number types do support long, but AppSync resolvers only support 32-bit integers. You can get around this if you use a lambda, for example by serializing longs to a string before transport through the AppSync resolver layer) - See (currently) open Feature Request: https://github.com/aws/aws-appsync-community/issues/21
Cons
Extra latency for every invocation
Cold starts = even more latency (although this can usually be minimised by keeping your lambdas warm if this is a problem for your use case)
Extra cost
Extra resources for each lambda, eating up the fixed 200 limit
If you're doing a simple vanilla DynamoDB operation it's worth giving VTLs a go. The docs from AWS are pretty good for this: https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-dynamodb.html
If you're doing anything mildly complex, such as marshalling fields, looping, or generally hacky non-DRY code, then lambdas are definitely worth considering for the speed of writing and maintaining your code provided you're comfortable with the extra latency and cost.
Related
We have a "shared" layer that has a few resources accessed by different services in the project. There is a table storing shared information (user permission on each of the resources in the project, since it can get big so not being stored in JWT token)
Should we have a Lamba read the dynamoDB table and give other microservices access to the shared lambda only or should we give the microservices access to the table directly so that they can just use a lib method to read the permissions from the table? I am leaning towards direct DynamoDB table access since that avoids the extra hoop through a lambda.
Both approaches have advantages & disadvantages:
Direct Access to DynamoDB - Good Sides
The authors of the other Lambda functions can build on their own phases. Faster teams can sprint and not wait for the slower team
If one lambda function is misbehaving / failing, the other lambdas are still decoupled from it and the blast radius gets limited
Direct Access to DynamoDB - Bad sides
The effort for writing similar stuff is duplicated in different lambda instances.
Each lambda can write their own logic and introduce differences in implementations. This could be intentionally designed to work that way but it could also be that one developer misunderstood the requirements
If this DynamoDB gets poisoned by wrong coding by one of the consuming lambdas, the other lambdas can also go down.
It becomes hard to measure the reserve capacity, Some of the lambdas can easily become greedy when it comes to read units.
Mediating Lambda - Good Sides
Reduces the effort required to implement similar logic for different consumers
If the shared lambda that manages the DynamoDB is performing actions like audit trail storing, you will be able to easily measure the required read & write capacity units.
If it is decoupled from the consumers, then the failure can be reduced and contained within it.
Mediating Lambda - Bad Sides
This shared lambda can easily become a single point of failure if the consuming lambdas are expecting return values from it.
More communication is required between the team managing this lambda and the consuming teams. Politics can easily be introduced by this Lambda :D
If the consuming teams are developing in a much faster rate than the owner of this shared lambda, it could easily be a blocker to other teams if integration is done poorly.
I have an application with 3 modules and 25 endpoints (between modules). Modules: Users, CRM, PQR.
I want to optimize AWS costs and generally respect the architecture best practices.
Should I build a lambda function for each endpoint?
Does using many functions cost more than using only one?
The link in Gustavos' answer provides a decent starting point. I'll elaborate on that based on the criteria you mentioned in the comments.
You mentioned that you want to optimize for cost and architecture best practices, let's start with the cost component.
Lambda pricing is fairly straightforward and you can check it out on the pricing page. Basically you pay for how long your code runs in 1MS increments. How much each millisecond costs depends on how many resources you provision for your Lambda function. Lambda is typically not the most expensive item on your bill, so I'd start optimizing it, once it becomes a problem.
From a pricing perspective it doesn't really matter if you have fewer or more Lambda functions.
In terms of architecture best practices, there is no single one-size-fits-all reference architecture, but the post Gustavo mentioned is a good starting point: Best practices for organizing larger serverless applications. How you structure your application can depend on many factors:
Development team size
Development team maturity/experience (in terms of AWS technologies)
Load patterns in the application
Development process
[...]
You mention three main components/modules with 25 endpoints in total:
Users
CRM
PQR
Since you didn't tell us much about the technology stack, I'm going to assume you're trying to build a REST API that serves as the backend for some frontend application.
In that case you could think of the three modules as three microservices, which implement specific functionality for the application. Each of them implements a few endpoints (combination of HTTP-Method and path). If you start with an API Gateway as the entry point for your architecture, you can use that as an abstraction of the internal architecture for your clients.
The API Gateway can route requests to different Lambda functions based on the HTTP method and path. You can now choose how to implement the backend. I'd probably start off with a common codebase from which multiple Lambdas are built and use the API gateway to map each endpoint to a Lambda function. You can also start with larger multi-purpose Lambdas and refactor them in time to extract specific endpoints and then use the API Gateway to route to the more specialized Lambdas.
You might have noticed, that this is a bit vague and that's on purpose. I think you're going to end up with roughly as many Lambdas as you'll have endpoints, but it doesn't mean you have to start that way. If you're just getting started with AWS, managing a bunch of Lambdas and there interaction can seem daunting. Start with more familiar architectures and then refactor them to be more cloud native over time.
It depends on your architecture and how decoupled you want it to be. Here is a good starting point for you to take a look into best practices:
https://aws.amazon.com/blogs/compute/best-practices-for-organizing-larger-serverless-applications/
I am building a service using AWS. My use case is a simple CRUD operation, of a product configuration, on Dynamo DB using API's.
Approach 1: I was initially thinking to design it using API Gateway, Lambda and DynamoDB.
Approach 2: One of my peers asked me to directly integrate API Gateway with DynamoDB.
In my understanding, as of now, using Lambda as a middle layer will help me better to deliver customized responses and also would do some extra error validation (like bad keys supplied by user) in addition to API Gateway. But I am still not very much convinced to go by approach 1 or 2.
I was wondering if anyone could help me elaborate some pros and cons of approach 2 in relation to approach 1. Any help would be much appreciated.
My product configuration is a bunch of 15 key value pairs.
I was wondering if anyone could help me elaborate some pros and cons of approach 2 in relation to approach 1.
There could be several pros and cons. Example of a positive is that you pay only for API Gateway and DynamoDb - not for lambda invocations. Example of a negative is is that an initial setup and maintainability of API->DynamoDB can be more complex and tiresome then when using API->Lambda->DynamoDb.
However, one drawback of the first approach that can be important in may use cases is time efficiency. Lambda function is known to suffer from so called, cold start latency (more and how to deal with this is here - Provisioned Concurrency). Subsequently, in Approach 1 with lambda, you may find that the cold start is a troublesome, especially for in-frequently used APIs. In contrast, in the Approach 2, you have direct connection between API and DynamoDB and don't have to worry about any delays caused by intermediates between API gateway and DynamoDB.
Customers (around 1000) sign up to my service and receive a customer unique api key. They then use the key when calling a AWS lambda function through AWS api gateway in to access data in DynamoDb.
Requirement 1: The customers get billed by the number of api calls, so I have to be able to count those. AWS only provides metrics for total number of api calls per lambda so I have a few options:
At every api hit increment a counter in DynamoDB.
At every api hit enqueue a message in SQS, receive it in "hit
counter" lambda and increment a counter in DynamoDB.
Deploy a separate lambda for each customer. Use AWS built-in call
counter.
Requirement 2: The data that the lambda can access is unique for each customer and thus dependent on the api key provided.
To enable this I also have a number of options:
Store the required api key together with the data that the customer
has the right to access.
Deploy a separate lambda for each customer. Use api gateway to
protect it with a key.
Create a separate endpoint in api gateway for each customer, protect
it with the api key.
None of the options above seem like a good way to design the solution. Is there a canonical way of doing this? If not, which of the options above is the best? Have I missed an obvious solution due to my unfamiliarity with AWS?
I will try to break your problems down with my experience, but maybe Michael - Sqlbot or John Rotenstein may be able to give more appropriate answers.
Requirement 1
1) This sounds like a good approach. I don't see anything critical here.
2) This, IMHO, is the best out of the 3. It will decouple data access from the billing service, which is a great thing in a Microservices world.
3) This is not scalable. Imagine your system grows and you end up with 10K Lambda functions. Not only you'll have to build a very reliable mechanism to automate this process, but also you'll need to monitor 10K different things (imagine CloudWatch logs, API Gateway, etc), not to mention you'll have 10 thousand functions with exactly the same code (client specific parameters apart). I wouldn't even think about this one.
Requirement 2
1) It could work and it fits nicely in the DynamoDB model of doing things: store as much data as you can in a unique table, so you can fetch everything in one go. From what I see, you could even use this ApiKey as your partition key and, for the sake of simplicity for this answer, store the client's data as JSON in a column named data. Since your query only needs to query by the ApiKey, storing a JSON in DynamoDB won't hurt (do keep in mind, however, that if you need to query by any of its JSON attributes than you're in bad shoes, since DynamoDB's query capabilities are very limited)
2) No, because of Requirement 1.3
3) No, because of the above.
If you still need to store the ApiKey in a different table so you can run different analysis and keep a finer grained control over the client's calls, access, billing and etc., that's not a problem either, just make sure you duplicate your ApiKey on your ClientData table instead of creating a FK (DynamoDB doesn't support FKs, so you'd need to manage these constraints yourself). Duplication is just fine in a NoSQL world.
Your use case is clearly a Multi-Tenancy one, so I'd also recommend you to read Multi-Tenant Storage with Amazon DynamoDB which will give you some more insights and broaden your options a little bit. Multi-Tenancy is not an easy task and can give you lots of headaches if not implemented correctly. I think this is why AWS has also prepared this nice read for us :)
Happy to continue this on the comments section in case you have more info to share
Hope this helps!
I'm trying to deploy an API suite by using Api Gateway and implementing code in Java using lambda. Is it ok to have many ( related, of course ) lambdas in a single jar ( what I'm supposing to do ) or it is better to create a single jar for each lambda I want to deploy? ( this will became a mess very easily)
This is really a matter of taste but there are a few things you have to consider.
First of all there are limitations to how big a single Lambda upload can be (50MB at time of writing).
Second, there is also a limit to the total size of all all code that you upload (currently 1.5GB).
These limitations may not be a problem for your use case but are good to be aware of.
The next thing you have to consider is where you want your overhead.
Let's say you deploy a CRUD interface to a single Lambda and you pass an "action" parameter from API Gateway so that you know which operation you want to perform when you execute the Lambda function.
This adds a slight overhead to your execution as you have to route the action to the appropriate operation. This is likely a very fast routing but nevertheless, it adds CPU cycles to your function execution.
On the other hand, deploying the same jar over several Lambda function will quickly get you closer to the limits I mentioned earlier and it also adds administrative overhead in managing your Lambda functions as that number grows. They can of course be managed via CloudFormation or cli scripts but it will still add an administrative overhead.
I wouldn't say there is a right and a wrong way to do this. Look at what you are trying to do, think about what you would need to manage the deployment and take it from there. If you get it wrong you can always start over with another approach.
Personally I like the very small service Lambdas that do internal routing and handles more than just a single operation but they are still very small and focused on a specific type of task be it a CRUD for a database table or managing a selected few very closely related operations.
There's some nice advice on serverless.com
As polythene say's, the answer is "it depends". But they've listed the pros and cons for 4 ways of going about it:
Microservices Pattern
Services Pattern
Monolithic Pattern
Graph Pattern
https://serverless.com/blog/serverless-architecture-code-patterns/