Customers (around 1000) sign up to my service and receive a customer unique api key. They then use the key when calling a AWS lambda function through AWS api gateway in to access data in DynamoDb.
Requirement 1: The customers get billed by the number of api calls, so I have to be able to count those. AWS only provides metrics for total number of api calls per lambda so I have a few options:
At every api hit increment a counter in DynamoDB.
At every api hit enqueue a message in SQS, receive it in "hit
counter" lambda and increment a counter in DynamoDB.
Deploy a separate lambda for each customer. Use AWS built-in call
counter.
Requirement 2: The data that the lambda can access is unique for each customer and thus dependent on the api key provided.
To enable this I also have a number of options:
Store the required api key together with the data that the customer
has the right to access.
Deploy a separate lambda for each customer. Use api gateway to
protect it with a key.
Create a separate endpoint in api gateway for each customer, protect
it with the api key.
None of the options above seem like a good way to design the solution. Is there a canonical way of doing this? If not, which of the options above is the best? Have I missed an obvious solution due to my unfamiliarity with AWS?
I will try to break your problems down with my experience, but maybe Michael - Sqlbot or John Rotenstein may be able to give more appropriate answers.
Requirement 1
1) This sounds like a good approach. I don't see anything critical here.
2) This, IMHO, is the best out of the 3. It will decouple data access from the billing service, which is a great thing in a Microservices world.
3) This is not scalable. Imagine your system grows and you end up with 10K Lambda functions. Not only you'll have to build a very reliable mechanism to automate this process, but also you'll need to monitor 10K different things (imagine CloudWatch logs, API Gateway, etc), not to mention you'll have 10 thousand functions with exactly the same code (client specific parameters apart). I wouldn't even think about this one.
Requirement 2
1) It could work and it fits nicely in the DynamoDB model of doing things: store as much data as you can in a unique table, so you can fetch everything in one go. From what I see, you could even use this ApiKey as your partition key and, for the sake of simplicity for this answer, store the client's data as JSON in a column named data. Since your query only needs to query by the ApiKey, storing a JSON in DynamoDB won't hurt (do keep in mind, however, that if you need to query by any of its JSON attributes than you're in bad shoes, since DynamoDB's query capabilities are very limited)
2) No, because of Requirement 1.3
3) No, because of the above.
If you still need to store the ApiKey in a different table so you can run different analysis and keep a finer grained control over the client's calls, access, billing and etc., that's not a problem either, just make sure you duplicate your ApiKey on your ClientData table instead of creating a FK (DynamoDB doesn't support FKs, so you'd need to manage these constraints yourself). Duplication is just fine in a NoSQL world.
Your use case is clearly a Multi-Tenancy one, so I'd also recommend you to read Multi-Tenant Storage with Amazon DynamoDB which will give you some more insights and broaden your options a little bit. Multi-Tenancy is not an easy task and can give you lots of headaches if not implemented correctly. I think this is why AWS has also prepared this nice read for us :)
Happy to continue this on the comments section in case you have more info to share
Hope this helps!
Related
I currently am retrieving a JWKS keys using the Auth0 JWKS library for my Lambda custom authoriser function.
As explained in this issue on the JWKS library, apparently the caching built into JWKS for the public key ID does not work on lambda functions and as such they recommend writing the key to the tmp file.
What reasons could there be as to why cache=true would not work?
As far as I was aware, there should be no difference that would prevent in-memory caching working with lambda functions but allow file-based caching on the tmp folder to be the appropriate solution.
As far as I can tell, the only issues that would occur would be from the spawning of containers rate-limiting JWKS API and not the act of caching using the memory of the created containers.
In which case, what would be the optimal pattern of storing this token externally in Lambda?
There are a lot of option how to solve this. All have different advantages and disadvantages.
First of, storing the keys in memory or on the disk (/tmp) has the same result in terms of persistence. Both are available across calls to the same Lambda instance.
I would recommend storing the keys in memory, because memory access is a lot faster than reading from a file (on every request).
Here are other options to solve this:
Store the keys in S3 and download during init.
Store the keys on an EFS volume, mount that volume in your Lambda instance, load the keys from the volume during init.
Download the keys from the API during init.
Package the keys with the Lambdas deployment package and load them from disk during init.
Store the keys in AWS SSM parameter store and load them during init.
As you might have noticed, the "during init" phase is the most important part for all of those solutions. You don't want to do that for every request.
Option 1 and 2 would require some other "application" that you build do regularly download the keys and store them on S3 or a EFS volume. That is extra effort, but might in certain circumstances be a good idea for more complex setups.
Option 3 is basically what you are already doing at the moment and is probably the best tradeoff between simplicity and sound engineering for simple use cases. As stated before, you should store the key in memory.
Option 4 is a working "hack" that is the easiest way to get your key to your Lambda. I'd never recommend doing this, because sudden changes to the key would require a re-deployment of the Lambda, while in the meantime requests can't be authenticated, resulting in a down time.
Option 5 can be a valid alternative to option 3, but requires the same key management by another application like option 1 and 2. So it is not necessarily a good fit for a simple authorizer.
Some functions I am writing would need to store and share a set of cryptographic keys (<1kb) somewhere so that:
it is shared across functions and within instances of the same function
it is maintained after function deploys
The keys are modified (and written) every 4 hours or so, based on whether a key has expired or a new key needs to be created.
Right now, I am storing the keys as encrypted binary in a cloud bucket with access limited to that function. It works, except that it is fairly slow (~500ms for the read / write that is required when updating the keys).
I have considered some other solutions:
Redis: fast, but overkill given the price ($40/month) it would cost to store a single value
Cloud SQL: the functions are already connected to a cloud instance so it would not incur more costs
Dropping everything and using a KMS. Unfortunately it would not meet the requirements I have.
The library I use in my functions is available here.
Is there a better way to store a single small blob of data for cloud functions (and possibly other tools like GKE) ?
Edit
The solution I ended up using was using a single table in a database that the app was already connected to. It is also about 5 times faster than using a bucket (<100ms).
The moral of the story is to use whatever is already provisioned to store the keys. If storing a key is a problem, then using the combo KMS + cloud functions for rotations described below seems like a good option.
All the code + more details are available here.
A better approach would be to manage your keys with Cloud KMS. However, as you mentioned before Cloud KMS does not automatically delete the old key version material and you will need to manually delete old versions which I suspect is a thing that you don’t want to do.
Another possibility is to just keep the keys in Firestore. Since for this you don’t have to provision any specific infrastructure such as with Redis Memorystore and Postgres Cloud SQL it will be easier to manage and to scale in the long run.
The general idea would be to have a Cloud Function triggered by Cloud Scheduler every 4 hours, and this function will rotate the keys on your Cloud Firestore.
How does this sound to you?
I am building a service using AWS. My use case is a simple CRUD operation, of a product configuration, on Dynamo DB using API's.
Approach 1: I was initially thinking to design it using API Gateway, Lambda and DynamoDB.
Approach 2: One of my peers asked me to directly integrate API Gateway with DynamoDB.
In my understanding, as of now, using Lambda as a middle layer will help me better to deliver customized responses and also would do some extra error validation (like bad keys supplied by user) in addition to API Gateway. But I am still not very much convinced to go by approach 1 or 2.
I was wondering if anyone could help me elaborate some pros and cons of approach 2 in relation to approach 1. Any help would be much appreciated.
My product configuration is a bunch of 15 key value pairs.
I was wondering if anyone could help me elaborate some pros and cons of approach 2 in relation to approach 1.
There could be several pros and cons. Example of a positive is that you pay only for API Gateway and DynamoDb - not for lambda invocations. Example of a negative is is that an initial setup and maintainability of API->DynamoDB can be more complex and tiresome then when using API->Lambda->DynamoDb.
However, one drawback of the first approach that can be important in may use cases is time efficiency. Lambda function is known to suffer from so called, cold start latency (more and how to deal with this is here - Provisioned Concurrency). Subsequently, in Approach 1 with lambda, you may find that the cold start is a troublesome, especially for in-frequently used APIs. In contrast, in the Approach 2, you have direct connection between API and DynamoDB and don't have to worry about any delays caused by intermediates between API gateway and DynamoDB.
I'm new to AWS, so apologies in advance if this question is missing some important considerations, or has incorrect assumptions.
But basically I want to implement a service on AWS to store and retrieve data from multiple clients, which may be Android apps, Windows applications, websites etc. The way I've considered doing this is via a RESTful service using API Gateway front end, with a Lambda back end and maybe an S3 bucket to hold the data.
The basic requirements are:
(1) Clients can publish data to the server, where it is stored, perhaps with some kind of key/value structure.
(2) Clients can retrieve said data by key.
(3) If it is possible, clients to be able to subscribe to events from the service, so that they are notified if the value of a piece of data changes. This would avoid the need to poll the service, which would presumably start racking up unnecessary charges if the data doesn't change often.
Any pointers on how to get started with this welcome!
Creating a RESTful API on top of Lambda and API Gateway is one of the main use cases for this architecture. You can think of Lambda functions as controllers with methods and API Gateway as a router that forwards requests to functions based on the URL pattern. There are many frameworks and approaches that can help out here if you don't want to write from scratch:
Lambdasync
https://medium.com/#fredrikanderzon/create-a-rest-api-on-aws-lambda-using-lambdasync-e46c68f8043f
Serverless
https://serverless.com/framework/docs/providers/aws/events/apigateway/
Swagger
https://cloudonaut.io/create-a-serverless-restful-api-with-api-gateway-swagger-lambda-and-dynamodb/
As far as event subscriptions go (requirement #3) you can model this in many datastores, certainly in a relational/SQL database, with a table like this:
Subscription (key_of_interest, user_id, events_of_interest)
I'm leaving out data types for you to figure out, but you get the idea hopefully. After each data modification on a particular key, see if that key is of interest in the subscription table, then wire up a response to the user's who indicated interest. The details of this of course depend on your particular requirements. A caution though: this approach will increase the cost of data modifications because of the additional overhead needed to process subscriptions.
EDIT: One other thing I forgot. S3 is better suited for non-structured data (think 'files'). For relational databases, checkout RDS. For a simple NoSQL database you might use DynamoDB, or host your own NoSQL database of choice on an EC2 instance.
I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.