What is the Azure Key Vault reference equivalent in AWS Secrets Manager? - amazon-web-services

There is a simple integration between Azure Key Vault and Azure Functions that automatically grabs the latest version of a secret and loads it as an environment variable:
#Microsoft.KeyVault(VaultName=myvault;SecretName=mysecret)
This entry needs to be added to the application settings. Azure will also refresh the cached key within 24 hours of rotation.
Does AWS have similar integration and caching functionality for the Secrets Manager client in .NET?

No, unfortunately, AWS Secrets Manager does not have an equivalent for Key Vault references i.e. loading secrets from Secrets Manager for them to be available as Lambda environment variables. This applies to all Lambda function runtimes, not just .NET.
You will have to use the Secrets Manager SDK, preferably reading secret names from your application settings and then loading the secret values on startup.
Or if you need a like-for-like replacement, you can make calls to obtain the secret value(s) during your build pipeline & modify your application settings to contain the secret value.
This would need a compromise on the caching aspect unless you then create a specific, scheduled pipeline that runs every 24 hours to obtain the latest secret(s) value and updates the application settings for your environment(s).
However, for the former preferred option, you can take advantage of the official AWSSDK.SecretsManager.Caching Nuget package for secret caching.
The (configurable) cache item refresh time/TTL is set to 1 hour by default.
For your use case, create an instance of the SecretCacheConfiguration class & set the CacheItemTTL property to 86400000 (24 hours in milliseconds). Then, create your SecretsManagerCache, passing in your secrets manager client & your cache configuration.
This will configure the cache with an auto-refresh interval of 24 hours, resulting in similar behaviour.
For Lambda functions, keeping in mind that the cache will be cleared on cold start invocations, it would be best to create a singleton instance of SecretsManagerCache that is kept alive for the lifetime of the Lambda container.
If you are loading more than 3-5 secrets, I would recommend looking at the layer code referenced by this AWS Prescriptive Guidance pattern or looking at the Github repository for Square's Lambda Secrets Prefetch layer.
Both are Lambda layers that cache secret values, which could potentially reduce your Lambda duration overall. Square details around a 20-25% duration decrease in their blog post, which contains more detailed information. As always, YMMV.
The main difference between the two is that the AWS layer stores the secrets in memory, as opposed to locally in the /tmp directory; functionally, both are pretty much the same.
Take a look at the below official links for more in-depth information:
AWS Secrets Manager User Guide: Retrieve AWS Secrets Manager secrets in .NET applications
AWS Security Blog: How to use AWS Secrets Manager client-side caching in .NET

Related

Best practices to Initialize and populate a serverless PostgreSQL RDS instance by a CloudFormation stack deployment

We are successfully spinning up an AWS CloudFormation stack that includes a serverless RDS PostgreSQL instance. Once the PostgreSQL instance is in place, we're automatically restoring a PostgreSQL database dump (in binary format) that was created using pg_dump on a local development machine on the instance PostgreSQL instance just created by CloudFormation.
We're using a Lambda function (instantiated by the CloudFormation process) that includes a build of the pg_restore executable within a Lambda layer and we've also packaged our database dump file within the Lambda.
The above approached seems complicated for something that presumably has been solved many times... but Google searches have revealed almost nothing that corresponds to our scenario. ​We may be thinking about our situation in the wrong way, so please feel free to offer a different approach (e.g., is there a CodePipeline/CodeBuild approach that would automate everything). Our preference would be to stick with the AWS toolset as much as possible.
This process will be run every time we deploy to a new environment (e.g., development, test, stage, pre-production, demonstration, alpha, beta, production, troubleshooting) potentially per release as part of our CI/CD practices.
Does anyone have advice or a blog post that illustrates another way to achieve our goal?
If you have provisioned everything via IaC (Infrastructure as Code) anyway that is most of the time-saving done and you should already be able to replicate your infrastructure to other accounts/ stacks via different roles in your AWS credentials and config files by passing the —profile some-profile flag. I would recommend AWS SAM (Serverless application model) over Cloudformation though as I find I only need to write ~1/2 the code (roles & policies are created for you mostly) and much better & faster feedback via the console. I would also recommend sam sync (it is in beta currently but worth using) so you don’t need to create a ‘change set’ on code updates, purely the code is updated so deploys take 3-4 secs. Some AWS SAM examples here, check both videos and patterns: https://serverlessland.com (official AWS site)
In terms of the restore of RDS, I’d probably create a base RDS then take a snapshot of that and restore all other RDS instances from snapshots rather than manually creating it all the time you are able to copy snapshots and in fact automate backups to snapshots cross-account (and cross-region if required) which should be a little cleaner
There is also a clean solution for replicating data instantaneously across AWS Accounts using AWS DMS (Data Migration Services) and CDC (Change Data Capture). So, process is you have a source DB and a target DB and also a 'Replication Instance' (eg EC2 Micro) that monitors your source DB and can replicate that across to different instances (so you are always in sync on data, for example if you have several devs working on separate 'stacks' to not pollute each others' logs, you can replicate data and changes out seamlessly if required from one DB) - so this works with Source DB and Destination DB both already in AWS however you can also use AWS DMS of course for local DB migration over to AWS, some more info here: https://docs.aws.amazon.com/dms/latest/sbs/chap-manageddatabases.html

Backing up each and every resources in AWS account

I am exploring backing up our AWS services configuration to a backup disk or source control.
Only configs. eg -iam policies, users, roles, lambdas,route53 configs,cognito configs,vpn configs,route tables, security groups etc....
We have a tactical account where we have created some resources on adhoc basis and now we have a new official account setup via cloud formation.
Also in near future planning to migrate tactical account resources to new account either manually or using backup configs.
Looked at AWS CLI, but it is time consuming. Any script which crawls through AWS and backup the resources?
Thank You.
The "correct" way is not to 'backup' resources. Rather, it is to initially create those resources in a reproducible manner.
For example, creating resources via an AWS CloudFormation template allows the same resources to be deployed in a different region or account. Only the data itself, such as the information stored in a database, would need a 'backup'. Everything else could simply be redeployed.
There is a poorly-maintained service called CloudFormer that attempts to create CloudFormation templates from existing resources, but it only supports limited services and still requires careful editing of the resulting templates before they can be deployed in other locations (due to cross-references to existing resources).
There is also the relatively recent ability to Import Existing Resources into a CloudFormation Stack | AWS News Blog, but it requires that the template already includes the resource definition. The existing resources are simply matched to that definition rather than being recreated.
So, you now have the choice to 'correctly' deploy resources in your new account (involves work), or just manually recreate the ad-hoc resources that already exist (pushes the real work to the future). Hence the term Technical debt - Wikipedia.

Where to store credentials for AWS EMR Apache Spark application submitted from Airflow task

I'm working on Apache Spark application which I submit to AWS EMR cluster from Airflow task.
In Spark application logic I need to read files from AWS S3 and information from AWS RDS. For example, in order to connect to AWS RDS on PostgreSQL from Spark application, I need to provide the username/password for the database.
Right now I'm looking for the best and secure way in order to keep these credentials in the safe place and provide them as parameters to my Spark application. Please suggest where to store these credentials in order to keep the system secured - as env vars, somewhere in Airflow or where?
In Airflow you can create Variables to store this information. Variables can be listed, created, updated and deleted from the UI (Admin -> Variables). You can then access them from your code as follows:
from airflow.models import Variable
foo = Variable.get("foo")
Airflow has got us covered beautifully on credentials-management front by offering Connection SQLAlchemy model that can be accessed from WebUI (where passwords still remain hidden)
You can control the salt that Airflow uses to encrypt passwords while storing Connection-details in it's backend meta-db.
It also provides you extra param for storing unstructured / client-specific stuff such as {"use_beeline": true} config for Hiveserver2
In addition to WebUI, you can also edit Connections via CLI (which is true for pretty much every feature of Airflow)
Finally if your use-case involves dynamically creating / deleting a Connection, that is also possible by exploiting the underlying SQLAlchemy Session. You can see implementation details from cli.py
Note that Airflow treats all Connections equal irrespective of their type (type is just a hint for the end-user). Airflow distinguishes them on the basis of conn_id only

AWS SSM parameter store reliability

I am looking at using AWS SSM Parameter Store to store secrets such as database connection strings for applications deployed on EC2, Elastic Beanstalk, Fargate docker containers etc).
The linked document states that the service is Highly scalable, available, and durable, but I can't find more details on what exactly that means. For example, is it replicated across all regions?
Is it best to:
a) read secrets from the parameter store at application startup (i.e. rely on it being highly available and scalable, even if, say, another region has gone down)?
or
b) read and store secrets locally when the application is deployed? Arguably less secure, but it means that any unavailability of the Parameter Store service would only impact deployment of new versions.
If you want to go with the parameter store go with your option a. And fail the app if get parameter call failed. (This happens, I have seen rate limiting happening for Parameter Store API requests) See here.
Or
The best option is AWS secrets manager. Secrets manager is a superset of the parameter store. It supports RDS password rotation and many more. Also its paid.
Just checked the unthrottled throughput of SSM. It is not in the spec, but it is ca. 50req/s.

Terraform Workflow At Scale

I am having a unique opportunity to suggest a workflow for IaC for a part of a big company which has number of technical agencies working for it.
I am trying to work out a solution that would be enterprise-level safe but have as much self-service as possible.
In scope:
Code management [repository per project/environment/agency/company]
Environment handling [build promotion/statefile per env, one statefile, terraform envs etc]
Governance model [Terraform Enterprise/PR system/custom model]
Testing and acceptance [manual acceptance/automated tests(how to test tf files?)/infra test environment]
I have read many articles, but most of them describe a situation of a development team in-house, which is much easier in terms of security and governance.
I would love to learn how what is the optimal solution for IaC management and govenance in enterprise. Is Terraform Enterprise a valid option?
I recommend using Terraform modules as Enterprise "libraries" for (infrastructure) code.
Then you can:
version, test, and accept your libraries at the Enterprise level
control what variables developers or clients can set (e.g. provide a module for AWS S3 buckets with configurable bucket name, but restricted ACL options)
provide abstractions over complex, repeated configurations to save time, prevent errors and encourage self-service (e.g. linking AWS API Gateway with AWS Lambda and Dynamodb)
For governance, it helps to have controlled cloud provider accounts or environments where every resource is deployed from scratch via Terraform (in addition to sandboxes where users can experiment manually).
For example, you could:
deploy account-level settings from Terraform (e.g. AWS password policy)
tag all Enterprise module resources automatically with
the person who last deployed changes (e.g. AWS caller ID)
the environment they used (with Terraform interpolation: "${terraform.workspace}")
So, there are lots of ways to use Terraform modules to empower your clients / developers without giving up Enterprise controls.