I currently am retrieving a JWKS keys using the Auth0 JWKS library for my Lambda custom authoriser function.
As explained in this issue on the JWKS library, apparently the caching built into JWKS for the public key ID does not work on lambda functions and as such they recommend writing the key to the tmp file.
What reasons could there be as to why cache=true would not work?
As far as I was aware, there should be no difference that would prevent in-memory caching working with lambda functions but allow file-based caching on the tmp folder to be the appropriate solution.
As far as I can tell, the only issues that would occur would be from the spawning of containers rate-limiting JWKS API and not the act of caching using the memory of the created containers.
In which case, what would be the optimal pattern of storing this token externally in Lambda?
There are a lot of option how to solve this. All have different advantages and disadvantages.
First of, storing the keys in memory or on the disk (/tmp) has the same result in terms of persistence. Both are available across calls to the same Lambda instance.
I would recommend storing the keys in memory, because memory access is a lot faster than reading from a file (on every request).
Here are other options to solve this:
Store the keys in S3 and download during init.
Store the keys on an EFS volume, mount that volume in your Lambda instance, load the keys from the volume during init.
Download the keys from the API during init.
Package the keys with the Lambdas deployment package and load them from disk during init.
Store the keys in AWS SSM parameter store and load them during init.
As you might have noticed, the "during init" phase is the most important part for all of those solutions. You don't want to do that for every request.
Option 1 and 2 would require some other "application" that you build do regularly download the keys and store them on S3 or a EFS volume. That is extra effort, but might in certain circumstances be a good idea for more complex setups.
Option 3 is basically what you are already doing at the moment and is probably the best tradeoff between simplicity and sound engineering for simple use cases. As stated before, you should store the key in memory.
Option 4 is a working "hack" that is the easiest way to get your key to your Lambda. I'd never recommend doing this, because sudden changes to the key would require a re-deployment of the Lambda, while in the meantime requests can't be authenticated, resulting in a down time.
Option 5 can be a valid alternative to option 3, but requires the same key management by another application like option 1 and 2. So it is not necessarily a good fit for a simple authorizer.
Related
Some functions I am writing would need to store and share a set of cryptographic keys (<1kb) somewhere so that:
it is shared across functions and within instances of the same function
it is maintained after function deploys
The keys are modified (and written) every 4 hours or so, based on whether a key has expired or a new key needs to be created.
Right now, I am storing the keys as encrypted binary in a cloud bucket with access limited to that function. It works, except that it is fairly slow (~500ms for the read / write that is required when updating the keys).
I have considered some other solutions:
Redis: fast, but overkill given the price ($40/month) it would cost to store a single value
Cloud SQL: the functions are already connected to a cloud instance so it would not incur more costs
Dropping everything and using a KMS. Unfortunately it would not meet the requirements I have.
The library I use in my functions is available here.
Is there a better way to store a single small blob of data for cloud functions (and possibly other tools like GKE) ?
Edit
The solution I ended up using was using a single table in a database that the app was already connected to. It is also about 5 times faster than using a bucket (<100ms).
The moral of the story is to use whatever is already provisioned to store the keys. If storing a key is a problem, then using the combo KMS + cloud functions for rotations described below seems like a good option.
All the code + more details are available here.
A better approach would be to manage your keys with Cloud KMS. However, as you mentioned before Cloud KMS does not automatically delete the old key version material and you will need to manually delete old versions which I suspect is a thing that you don’t want to do.
Another possibility is to just keep the keys in Firestore. Since for this you don’t have to provision any specific infrastructure such as with Redis Memorystore and Postgres Cloud SQL it will be easier to manage and to scale in the long run.
The general idea would be to have a Cloud Function triggered by Cloud Scheduler every 4 hours, and this function will rotate the keys on your Cloud Firestore.
How does this sound to you?
As I understand it, Netlify Environment Variables have some restrictions on size. Looking into it, they use AWS under the hood and are subject to the same restrictions as this service. Most notably:
Keys can contain up to 128 characters. Values can contain up to 256 characters.
The combined size of all environment properties cannot exceed 4,096 bytes when stored as strings with the format key=value.
I'm passing JWT keys to my serverless functions via environment variables in Netlify. The keys in question (particularly the private key) are long enough to flout these restrictions. My private key is at least 3K characters; well over the 256 outlined above.
How have others managed to get round this issue? Is there another way to add lengthy keys without having to include them in your codebase?
You should not store those keys in the environment variables. Even though environment variables can be encrypted, for sensitive information like this I would not recommend using it. There are two possible solutions I can think of.
Solution 1
Use AWS Systems Manager (SSM). Specifically, you should use the parameter store. There you can create key-value pairs like your environment variables and they can be marked as "SecureString", so they are encrypted.
Then you use the AWS SDK to read the value from SSM in your application.
One benefit of this approach is, that you can use IAM to restrict access to those SSM parameters and make sure that only trusted people/applications have access. If you use environment variables, you can not separately manage access to those values.
Solution 2
As far as I am aware, those keys must be coming from somewhere. Typically, there are "key endpoints" from whatever authentication provider you use (e.g. Auth0, Okta). In your application you could get the keys from the endpoint using a HTTP call and then cache them in your application for a while, to avoid unnecessary HTTP requests.
The benefit of this approach is that you don't have to manage those keys yourself. When they change for whatever reason, you will not need to change anything/deploy anything to make your application work with the new keys. Although this should not happen too often, so it is still reasonable from my point of view to "hardcode" the keys in SSM.
A little late to the party here, but there's now a build plugin that allows inlining of extra-long environment variables for use in Netlify functions.
There's a tutorial here: https://ntl.fyi/3Ie1MXH
And you can find the build plugin and docs here: https://github.com/bencao/netlify-plugin-inline-functions-env
If your environment variable needs to preserve new lines (such as in the case of a private key), you’ll need do something like this on the environment variable:
process.env.PRIVATE_KEY.replace(/\\n/g, "\n")
Other than that, it worked a treat for me!
I have a web app to be hosted on AWS cloud. We are reading all application configuration from AWS parameter store. However, I am not sure if I should have all the variables as a single parameter in a json format or have one parameter for each variable in parameter store.
Problem with having a single parameter as json string is, AWS parameter store does not return a JSON object, but a string. So we have to bind string to a model which involves reflection (which is a very heavy operation). Having separate parameter for each variable means having additional lines of code in the program (which is not expensive).
Also, my app is a multi-tenant app, which has a tenant resolver in the middleware. So configuration variables will be present for every tenant.
There is no right answer here - it depends. What I can share is my team's logic.
1) Applications are consistently built to read env variables to override default
All configuration/secrets are designed this way in our applications. The primary reason is we don't like secrets stored unencrypted on disk. Yes, env variables can be read even so, but less risky than disk which might get backed up
2) SSM Parameter Store can feed values into environment variables
This includes Lambda, ECS Containers, etc.
This allows us to store encrypted (SSM Secure), transmit encrypted, and inject into applications. It handles KMS decryption for you (assuming you setup permissions).
3) Jenkins (our CI) can also inject env variables from Jenkins Credentials
4) There is nothing stopping you from building a library that supports both techniques
Our code reads an env variable called secrets_json and if it exists and passes validation, it sets the key/values in them as env variables.
Note: This also handles the aspect you mentioned about JSON being a string.
Conclusion
The key here is I believe you want to have code that is flexible and handles several different situations. Use it as a default in all your application designs. We have historically used 1:1 mapping because initially SSM length was limited. We may still use it because it is flexible and supports some of our special rotation policies that secrets manager doesn't yet support.
Hope our experience lets you choose the best way for you and your team.
Customers (around 1000) sign up to my service and receive a customer unique api key. They then use the key when calling a AWS lambda function through AWS api gateway in to access data in DynamoDb.
Requirement 1: The customers get billed by the number of api calls, so I have to be able to count those. AWS only provides metrics for total number of api calls per lambda so I have a few options:
At every api hit increment a counter in DynamoDB.
At every api hit enqueue a message in SQS, receive it in "hit
counter" lambda and increment a counter in DynamoDB.
Deploy a separate lambda for each customer. Use AWS built-in call
counter.
Requirement 2: The data that the lambda can access is unique for each customer and thus dependent on the api key provided.
To enable this I also have a number of options:
Store the required api key together with the data that the customer
has the right to access.
Deploy a separate lambda for each customer. Use api gateway to
protect it with a key.
Create a separate endpoint in api gateway for each customer, protect
it with the api key.
None of the options above seem like a good way to design the solution. Is there a canonical way of doing this? If not, which of the options above is the best? Have I missed an obvious solution due to my unfamiliarity with AWS?
I will try to break your problems down with my experience, but maybe Michael - Sqlbot or John Rotenstein may be able to give more appropriate answers.
Requirement 1
1) This sounds like a good approach. I don't see anything critical here.
2) This, IMHO, is the best out of the 3. It will decouple data access from the billing service, which is a great thing in a Microservices world.
3) This is not scalable. Imagine your system grows and you end up with 10K Lambda functions. Not only you'll have to build a very reliable mechanism to automate this process, but also you'll need to monitor 10K different things (imagine CloudWatch logs, API Gateway, etc), not to mention you'll have 10 thousand functions with exactly the same code (client specific parameters apart). I wouldn't even think about this one.
Requirement 2
1) It could work and it fits nicely in the DynamoDB model of doing things: store as much data as you can in a unique table, so you can fetch everything in one go. From what I see, you could even use this ApiKey as your partition key and, for the sake of simplicity for this answer, store the client's data as JSON in a column named data. Since your query only needs to query by the ApiKey, storing a JSON in DynamoDB won't hurt (do keep in mind, however, that if you need to query by any of its JSON attributes than you're in bad shoes, since DynamoDB's query capabilities are very limited)
2) No, because of Requirement 1.3
3) No, because of the above.
If you still need to store the ApiKey in a different table so you can run different analysis and keep a finer grained control over the client's calls, access, billing and etc., that's not a problem either, just make sure you duplicate your ApiKey on your ClientData table instead of creating a FK (DynamoDB doesn't support FKs, so you'd need to manage these constraints yourself). Duplication is just fine in a NoSQL world.
Your use case is clearly a Multi-Tenancy one, so I'd also recommend you to read Multi-Tenant Storage with Amazon DynamoDB which will give you some more insights and broaden your options a little bit. Multi-Tenancy is not an easy task and can give you lots of headaches if not implemented correctly. I think this is why AWS has also prepared this nice read for us :)
Happy to continue this on the comments section in case you have more info to share
Hope this helps!
I have a Lambda that is generating and returning a value. This value can expire. Therefore I need to check the values validity before returning.
As generating is quite expensive (taken from another service) I'd like to store the value somehow.
What is the best practice for storing those 2 values (timestamp and a corresponding value)?
DynamoDB, but using a database service for 2 values seems to be a lot of overhead. There will never be more items; The same entry will only get updated.
I thought about S3, but this would also imply creating a S3-Bucket and storing one object containing the information, only for this 2 values (but probably the most "lean" way?)
Would love to update Lambdas configuration in order to update the environment variables (but even if this is possible, its probably no best practice?! Also not sure about inconsistencies with Lambda runtimes...)
Whats best practice here? Whats the way to go in terms of performance?
Use DynamoDB. There is no overhead for "running a database" -- it is a fully-managed service. You pay only for storage and provisioned capacity. It sounds like your use-case would fit within the Free Usage Tier.
Alternatively, you could use API Gateway with a cache setting so that it doesn't even call the Lambda function unless a timeout has passsed.
You could consider AWS Parameter Store
AWS Systems Manager Parameter Store provides secure, hierarchical
storage for configuration data management and secrets management. You
can store data such as passwords, database strings, and license codes
as parameter values. You can store values as plain text or encrypted
data. You can then reference values by using the unique name that you
specified when you created the parameter. Highly scalable, available,
and durable, Parameter Store is backed by the AWS Cloud. Parameter
Store is offered at no additional charge.
For cases like this I would use some fast in-memory data store, like Redis or Memcached:
https://redis.io/
https://memcached.org/
And luckily there is Amazon ElastiCache:
https://aws.amazon.com/elasticache/
which is managed Redis and Memcached, but you don't need to use it for your use case - you could easily use Redis on your own EC2, or you could use an external service like Compose that also supports instances inside of Amazon data centers:
https://www.compose.com/
Lots of ways to use it but I would certainly use Redis, especially for simple cases like this.
Just set the environment variable with a value and a date.
Then check the date every time lambda is executed.
https://docs.aws.amazon.com/lambda/latest/dg/API_UpdateFunctionConfiguration.html