As I understand it, Netlify Environment Variables have some restrictions on size. Looking into it, they use AWS under the hood and are subject to the same restrictions as this service. Most notably:
Keys can contain up to 128 characters. Values can contain up to 256 characters.
The combined size of all environment properties cannot exceed 4,096 bytes when stored as strings with the format key=value.
I'm passing JWT keys to my serverless functions via environment variables in Netlify. The keys in question (particularly the private key) are long enough to flout these restrictions. My private key is at least 3K characters; well over the 256 outlined above.
How have others managed to get round this issue? Is there another way to add lengthy keys without having to include them in your codebase?
You should not store those keys in the environment variables. Even though environment variables can be encrypted, for sensitive information like this I would not recommend using it. There are two possible solutions I can think of.
Solution 1
Use AWS Systems Manager (SSM). Specifically, you should use the parameter store. There you can create key-value pairs like your environment variables and they can be marked as "SecureString", so they are encrypted.
Then you use the AWS SDK to read the value from SSM in your application.
One benefit of this approach is, that you can use IAM to restrict access to those SSM parameters and make sure that only trusted people/applications have access. If you use environment variables, you can not separately manage access to those values.
Solution 2
As far as I am aware, those keys must be coming from somewhere. Typically, there are "key endpoints" from whatever authentication provider you use (e.g. Auth0, Okta). In your application you could get the keys from the endpoint using a HTTP call and then cache them in your application for a while, to avoid unnecessary HTTP requests.
The benefit of this approach is that you don't have to manage those keys yourself. When they change for whatever reason, you will not need to change anything/deploy anything to make your application work with the new keys. Although this should not happen too often, so it is still reasonable from my point of view to "hardcode" the keys in SSM.
A little late to the party here, but there's now a build plugin that allows inlining of extra-long environment variables for use in Netlify functions.
There's a tutorial here: https://ntl.fyi/3Ie1MXH
And you can find the build plugin and docs here: https://github.com/bencao/netlify-plugin-inline-functions-env
If your environment variable needs to preserve new lines (such as in the case of a private key), you’ll need do something like this on the environment variable:
process.env.PRIVATE_KEY.replace(/\\n/g, "\n")
Other than that, it worked a treat for me!
Related
From a development perspective, defining variables and connections inside the UI is effective but not robust, as it is impossible to keep track of what has been added and removed.
Airflow came up with a way to store variables as environment variables. But a few natural questions arise from this:
Does this need to be defined before every DAG? What if I have multiples DAGs sharing the same env values? Seems a bit redundant to be defining it every time.
If defined this way, do they still display on the UI? The UI is still a great idea for taking quick look at some of the key value pairs.
I guess in a perfect world, the solution I would be looking for is somehow, just define the value of the variables and connections in the airflow.cfg file which would automatically populate the variables and connections in the UI.
Any kind of help is appreciated. Thank you in advance!
There is one more way of storing and managing and connections, one that is most versatile, secure and gives you all the versioning and auditing support - namely Secret Backends.
https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html
It has built-in integration with Vault, GCP Secret Store, AWS Secret store, you can use Local Filesystem Secret Backend, and you can also roll your own backend.
When you use one of those then you get all the versioning, management, security, access management coming from the Secret Backend you use (most of the secret backends have all those built-in).
This also means that you CANNOT see/edit the values via Airflow UI as it's all delegated to those backends. But the backends usually come with their own UIs for that.
Answering your questions:
If you define connections/variables via env vars, you should define the variables in your Workers and Scheduler, not in the DAGs. That means that (if your system is distributed) you need to have a mechanism to update those variables and restart all airflow processes when they change (for example via deploying new images with those variables or upgrading helm chart or similar)
No. The UI only displays variables/connections defined in the DB.
I currently am retrieving a JWKS keys using the Auth0 JWKS library for my Lambda custom authoriser function.
As explained in this issue on the JWKS library, apparently the caching built into JWKS for the public key ID does not work on lambda functions and as such they recommend writing the key to the tmp file.
What reasons could there be as to why cache=true would not work?
As far as I was aware, there should be no difference that would prevent in-memory caching working with lambda functions but allow file-based caching on the tmp folder to be the appropriate solution.
As far as I can tell, the only issues that would occur would be from the spawning of containers rate-limiting JWKS API and not the act of caching using the memory of the created containers.
In which case, what would be the optimal pattern of storing this token externally in Lambda?
There are a lot of option how to solve this. All have different advantages and disadvantages.
First of, storing the keys in memory or on the disk (/tmp) has the same result in terms of persistence. Both are available across calls to the same Lambda instance.
I would recommend storing the keys in memory, because memory access is a lot faster than reading from a file (on every request).
Here are other options to solve this:
Store the keys in S3 and download during init.
Store the keys on an EFS volume, mount that volume in your Lambda instance, load the keys from the volume during init.
Download the keys from the API during init.
Package the keys with the Lambdas deployment package and load them from disk during init.
Store the keys in AWS SSM parameter store and load them during init.
As you might have noticed, the "during init" phase is the most important part for all of those solutions. You don't want to do that for every request.
Option 1 and 2 would require some other "application" that you build do regularly download the keys and store them on S3 or a EFS volume. That is extra effort, but might in certain circumstances be a good idea for more complex setups.
Option 3 is basically what you are already doing at the moment and is probably the best tradeoff between simplicity and sound engineering for simple use cases. As stated before, you should store the key in memory.
Option 4 is a working "hack" that is the easiest way to get your key to your Lambda. I'd never recommend doing this, because sudden changes to the key would require a re-deployment of the Lambda, while in the meantime requests can't be authenticated, resulting in a down time.
Option 5 can be a valid alternative to option 3, but requires the same key management by another application like option 1 and 2. So it is not necessarily a good fit for a simple authorizer.
Some functions I am writing would need to store and share a set of cryptographic keys (<1kb) somewhere so that:
it is shared across functions and within instances of the same function
it is maintained after function deploys
The keys are modified (and written) every 4 hours or so, based on whether a key has expired or a new key needs to be created.
Right now, I am storing the keys as encrypted binary in a cloud bucket with access limited to that function. It works, except that it is fairly slow (~500ms for the read / write that is required when updating the keys).
I have considered some other solutions:
Redis: fast, but overkill given the price ($40/month) it would cost to store a single value
Cloud SQL: the functions are already connected to a cloud instance so it would not incur more costs
Dropping everything and using a KMS. Unfortunately it would not meet the requirements I have.
The library I use in my functions is available here.
Is there a better way to store a single small blob of data for cloud functions (and possibly other tools like GKE) ?
Edit
The solution I ended up using was using a single table in a database that the app was already connected to. It is also about 5 times faster than using a bucket (<100ms).
The moral of the story is to use whatever is already provisioned to store the keys. If storing a key is a problem, then using the combo KMS + cloud functions for rotations described below seems like a good option.
All the code + more details are available here.
A better approach would be to manage your keys with Cloud KMS. However, as you mentioned before Cloud KMS does not automatically delete the old key version material and you will need to manually delete old versions which I suspect is a thing that you don’t want to do.
Another possibility is to just keep the keys in Firestore. Since for this you don’t have to provision any specific infrastructure such as with Redis Memorystore and Postgres Cloud SQL it will be easier to manage and to scale in the long run.
The general idea would be to have a Cloud Function triggered by Cloud Scheduler every 4 hours, and this function will rotate the keys on your Cloud Firestore.
How does this sound to you?
I have a web app to be hosted on AWS cloud. We are reading all application configuration from AWS parameter store. However, I am not sure if I should have all the variables as a single parameter in a json format or have one parameter for each variable in parameter store.
Problem with having a single parameter as json string is, AWS parameter store does not return a JSON object, but a string. So we have to bind string to a model which involves reflection (which is a very heavy operation). Having separate parameter for each variable means having additional lines of code in the program (which is not expensive).
Also, my app is a multi-tenant app, which has a tenant resolver in the middleware. So configuration variables will be present for every tenant.
There is no right answer here - it depends. What I can share is my team's logic.
1) Applications are consistently built to read env variables to override default
All configuration/secrets are designed this way in our applications. The primary reason is we don't like secrets stored unencrypted on disk. Yes, env variables can be read even so, but less risky than disk which might get backed up
2) SSM Parameter Store can feed values into environment variables
This includes Lambda, ECS Containers, etc.
This allows us to store encrypted (SSM Secure), transmit encrypted, and inject into applications. It handles KMS decryption for you (assuming you setup permissions).
3) Jenkins (our CI) can also inject env variables from Jenkins Credentials
4) There is nothing stopping you from building a library that supports both techniques
Our code reads an env variable called secrets_json and if it exists and passes validation, it sets the key/values in them as env variables.
Note: This also handles the aspect you mentioned about JSON being a string.
Conclusion
The key here is I believe you want to have code that is flexible and handles several different situations. Use it as a default in all your application designs. We have historically used 1:1 mapping because initially SSM length was limited. We may still use it because it is flexible and supports some of our special rotation policies that secrets manager doesn't yet support.
Hope our experience lets you choose the best way for you and your team.
I have a Lambda that is generating and returning a value. This value can expire. Therefore I need to check the values validity before returning.
As generating is quite expensive (taken from another service) I'd like to store the value somehow.
What is the best practice for storing those 2 values (timestamp and a corresponding value)?
DynamoDB, but using a database service for 2 values seems to be a lot of overhead. There will never be more items; The same entry will only get updated.
I thought about S3, but this would also imply creating a S3-Bucket and storing one object containing the information, only for this 2 values (but probably the most "lean" way?)
Would love to update Lambdas configuration in order to update the environment variables (but even if this is possible, its probably no best practice?! Also not sure about inconsistencies with Lambda runtimes...)
Whats best practice here? Whats the way to go in terms of performance?
Use DynamoDB. There is no overhead for "running a database" -- it is a fully-managed service. You pay only for storage and provisioned capacity. It sounds like your use-case would fit within the Free Usage Tier.
Alternatively, you could use API Gateway with a cache setting so that it doesn't even call the Lambda function unless a timeout has passsed.
You could consider AWS Parameter Store
AWS Systems Manager Parameter Store provides secure, hierarchical
storage for configuration data management and secrets management. You
can store data such as passwords, database strings, and license codes
as parameter values. You can store values as plain text or encrypted
data. You can then reference values by using the unique name that you
specified when you created the parameter. Highly scalable, available,
and durable, Parameter Store is backed by the AWS Cloud. Parameter
Store is offered at no additional charge.
For cases like this I would use some fast in-memory data store, like Redis or Memcached:
https://redis.io/
https://memcached.org/
And luckily there is Amazon ElastiCache:
https://aws.amazon.com/elasticache/
which is managed Redis and Memcached, but you don't need to use it for your use case - you could easily use Redis on your own EC2, or you could use an external service like Compose that also supports instances inside of Amazon data centers:
https://www.compose.com/
Lots of ways to use it but I would certainly use Redis, especially for simple cases like this.
Just set the environment variable with a value and a date.
Then check the date every time lambda is executed.
https://docs.aws.amazon.com/lambda/latest/dg/API_UpdateFunctionConfiguration.html