Terraform Workflow At Scale - amazon-web-services

Terraform Workflow At Scale - amazon-web-services

I am having a unique opportunity to suggest a workflow for IaC for a part of a big company which has number of technical agencies working for it.
I am trying to work out a solution that would be enterprise-level safe but have as much self-service as possible.
In scope:
Code management [repository per project/environment/agency/company]
Environment handling [build promotion/statefile per env, one statefile, terraform envs etc]
Governance model [Terraform Enterprise/PR system/custom model]
Testing and acceptance [manual acceptance/automated tests(how to test tf files?)/infra test environment]
I have read many articles, but most of them describe a situation of a development team in-house, which is much easier in terms of security and governance.
I would love to learn how what is the optimal solution for IaC management and govenance in enterprise. Is Terraform Enterprise a valid option?

I recommend using Terraform modules as Enterprise "libraries" for (infrastructure) code.
Then you can:
version, test, and accept your libraries at the Enterprise level
control what variables developers or clients can set (e.g. provide a module for AWS S3 buckets with configurable bucket name, but restricted ACL options)
provide abstractions over complex, repeated configurations to save time, prevent errors and encourage self-service (e.g. linking AWS API Gateway with AWS Lambda and Dynamodb)
For governance, it helps to have controlled cloud provider accounts or environments where every resource is deployed from scratch via Terraform (in addition to sandboxes where users can experiment manually).
For example, you could:
deploy account-level settings from Terraform (e.g. AWS password policy)
tag all Enterprise module resources automatically with
the person who last deployed changes (e.g. AWS caller ID)
the environment they used (with Terraform interpolation: "${terraform.workspace}")
So, there are lots of ways to use Terraform modules to empower your clients / developers without giving up Enterprise controls.

Related

Separating Dev Prod Environments In AWS

I’m my scenario wants to separate out the production environment from our development environments.
We'd like to only have our production systems on one AWS account and all other systems and services on another.
I'd like to split/separate for billing purposes. If I do add more monitoring services many charge by the number of running instances. I have considerably more running instances than I need to monitor though so I'd like the separation. This also would make managing permissions in the future a lot easier I believe (e.g. security hub scores wouldn't be affected by LMS instances).
I'd like to split out all public facing assets to a separate AWS account. So RDS, all EC2 instances relating to prod-webserver (instances, target group, AMI, scaling, VPC, etc.), S3 cloudfront.abc.com bucket, jenkins, OpenVPN, all Seoul assets.
Perhaps I could achieve the goal with 'Organizations' or the 'Control Tower' as well. Could anyone please advise what would be best in my scenario? Is there Better alternative for this ?

The fact that you was to split for billing purposes means you should use separate AWS Accounts. While you could split some billing by tags within a single account, it's much easier to use multiple accounts to split the billing.
The typical split is Production / Testing / Development.
You can join the accounts together by using AWS Organizations, which gives some overall security controls.

Separating workloads and environments is considered a best practice in AWS according to the AWS Well-Architected Framework. Nowadays Control Tower (which builds upon AWS Organizations) is the standard for building multi-account setups in AWS.
Regarding multi-account setups I recommend reading the Organizing Your AWS Environment Using Multiple Accounts.
Also have a look at the open-source AWS Quickstart superwerker which sets up a well-architected AWS landing zone using AWS Control Tower, Security Hub, GuardDuty, and more.

AWS provides a lot information about this topic. E.g. a very detailed Whitepaper about Organizing Your AWS Environment in which they say
Using multiple AWS accounts to help isolate and manage your business applications and data can
help you optimize across most of the AWS Well-Architected Framework pillars, including operational
excellence, security, reliability, and cost optimization.
With accounts, you logically separate all resources (unless you allow something else) and therefore ensure independence between e.g. the development environment and the production environment.
You should also take a look at Organizational Units (OUs)
The following benefits of using OUs helped shape the Recommended OUs and accounts and Patterns for organizing your AWS accounts.
Group similar accounts based on function
Apply common policies
Share common resources
Provision and manage common resources
Control Tower is a tool which allows you to manage all your AWS accounts in one place. You can apply policies for every account, OU, or prohibit regions. You can use the Account Factory to create new accounts based on blueprints.
But still you need to collect a lot of knowledge about these tools and best practices because they're just that. Best practices and recommendations you can use to get started and build a good foundation, but they're nothing you can fully rely on because you may have individual factors.
So understanding these factor and consequences is very important.

Native GCP solution for automatically deploying IAM resources to projects?

I'm trying to come up with a way in GCP to automatically deploy defined IAM roles, policies and policy bindings to selected GCP projects or all GCP projects.
I am aware that GCP organizations exist and that they can be used to define IAM resources in one place to have them inherited to child projects. However, organizations are not mandatory in GCP and some customers will be using the old structure where projects exist side by side without inheritance and not wanting to migrate to an organization.
One solution would be to create scripts which iterate over projects and create everything. However, a GCP native solution would be preferrable. Is there a GCP native way of deploying defined IAM resources like this - and possibly other project level configurations - to specific GCP projects or all projects which works regardless of whether the customer uses organizations or not and without iterating over projects?

I'm trying to come up with a way in GCP to automatically deploy
defined IAM roles, policies and policy bindings to selected GCP
projects or all GCP projects.
Deployment tools use concise descriptions of resources called configuration files. These tools manage resource state, meaning you declare what you want and they make it so. They are not dynamic in that you do not say sometimes do X and sometimes do Y. You say do X to Y and if different make it Y.
Deployment tools are IaaC - Infrastructure as Code. The configuration files are the blueprint for your goal of "desired state". You write the configuration files and the tools know how to build the resources that match the desired state.
If your goal is dynamic configuration based upon inputs, conditionals, and/or external factors, IaaC based tools will fail to meet your goal.
For IaaC based tools, you have two well-supported options.
Google Deployment Manager. This is an official Google product. This product is vendor-specific.
Terraform Google Provider. Terraform is a HashiCorp product. The Google Provider is developed by Google.
I recommend choosing Terraform and the Google Provider. Terraform is cross-platform with most of the world supporting Terraform. Terraform is very easy to use, there are numerous training resources, example configurations, Internet guides, getting-started articles, and YouTube videos. I have written a few articles on Terraform with Google Cloud.
In your question, you mention writing scripts. That is possible, but I do not recommend that. For one-off configurations, using the Google Cloud CLI in a script is workable and sometimes necessary. The benefits of a deployment language, once mastered, are tremendous.
without iterating over projects?
Unless you implement organizations, Google Cloud Projects are separate independent resources. Deployment tools are project-specific, meaning if you want to manage resources in more than one project, you must declare that in the deployment configuration. They do not iterate projects, you declare them.

Execute managed AWS Config rule on demand via SDK

Is it possible to run a managed AWS Config rule (for example https://docs.aws.amazon.com/config/latest/developerguide/root-account-mfa-enabled.html) on demand via the SDK?
Scenario:
As a consultant I want to easily assess a customer's environment without spending time applying all the AWS Config rules to my customer's environment. Instead I want to use the SDK to quickly execute many rules and get the results back.
Is this possible?

Cloud Custodian
For adhoc execution you might be best served by evaluating Cloud Custodian instead. When I tried this out previously, I was pretty impressed with the immediate value I could get with minimal deployment.
The adhoc nature of your execution can benefit as you can run a report only action, or actually have it create lambda functions to remediate in certain cases if you need that.
The tool is cross platform, dockerized as well, and most of the configuration for rules is yaml based, supporting AWS Config, Security Hub, AWS SSM, and more.
If you look at the Run Your First Policy section in AWS you'll see it can be as simple as:
AWS_ACCESS_KEY_ID="foo" AWS_SECRET_ACCESS_KEY="bar" custodian run --output-dir=. custodian.yml
There is a pretty extensive list of example-policies which include items like
AWS Config Integration
Can be deployed as config-rule for any resource type supported by config.
Can use config as resource database instead of querying service describe apis. Custodian supports server side querying resources with Config’s SQL expression language.
Can filter resources based on their compliance with one or more config rules.
Can be deployed as a config-poll-rule against any resource type supported by cloudformation.
source: AWS Config Integration
It supports custom config rules as well.
Note: I'm not involved in the project, just found it useful and promising for similar situations as you describe. Seems to reduce a lot of "DevOps plumbing" required to get value out of several AWS services with far less service specific knowledge and setup required.

Cloudformation/Serverless vs Terraform in AWS

I would like to understand the need of tools like Terraform. When we do have Cloudformation template available and one can create/update all AWS services with that , What is the point in using a service like Terraform.
Please Suggest.

CloudFormation (CFN) and Terraform (CF) are both Infrastructure as Code (IaC) development tools.
However, CFN is only for AWS. You can't use it with Azure, GCP or anything else outside of AWS ecosystem. In contrast, TF is cloud agnostic. You can use it across not only multiple cloud providers, but also to work with non-cloud products, such as docker, various databases and even domino pizza if you want.
So the main advantage of TF is that once you learn it only once, you can apply it to a number of cloud providers. CFN is only useful in AWS, and once you stop using CFN, you have to learn something new to work with other cloud.
There are also difference in how TF and CFN work. Both have their strengths and weekends. For example:
when you deploy using CFN all resources are available to view in one central location in AWS along with template's source code. Whereas with TF there is no such place. If you login to the AWS console, you have no idea what was created by TF, what was the source code used, etc.
TF has loops and complex data structures and condtions, while CFN does not.
CFN has creation policies and update policies, TF has not.
You can control access to CFN using CFN policies and IAM policies. You can't do same with TF as it "lives" outside of AWS.

There are a couple of reasons why you might choose Terraform over CloudFormation:
Vendor Agnostic: There might be a point in the future where you need to migrate your cloud infrastructure. This could be due to several reasons (e.g. costs, regulatory compliance, etc.). With Terraform you are still able to use the same tool to deploy the new infrastructure. With smart use of Terraform modules you can even leave large parts of your infrastucture as code repository in tact.
Support for other tools: This also builds a bit on the previous point, but Terraform can deploy a lot more then just AWS resources. For example, you can use Terraform to orchestrate the deployment of an EC2 machine that is then configured with Ansible. Or you could use Terraform to deploy applications on top of your Kubernetes cluster. While CloudFormation supports custom resources via the creation of custom Lambdas, it is quite a lot of work to maintain.
Wider ecosystem: Due to the Open Source nature of Terraform, there is a huge ecosystem of tools that help you solve all kinds of issues, such as testing the infrastructure as code or building in compliance in a continuous fashion.
Arguably a better language: Personally I think Terraform is a way more suited for Infrastructure as Code then CloudFormation. Terraform has a lot more flexibility build in to the language (HCL) and their module system allows for a lot more composability then what can be achieved in CloudFormation.

How to set up a remote backend bucket with Terraform before creating the rest of my infrastructure? (GCP)

How would I go about initializing Terraform's backend state bucket on GCP first with Gitlab's pipeline, and then the rest of my infrastructure? I found this but not sure what that implies with Gitlab's pipeline.

This is always a difficult question. My post won't answer your question directly but will give my view about the subject. (too long to be a comment)
It's a bit like asking to manage the server where you have your CI tools with the same CI tools (for exemple: the gitlab server managing itself).
If you use gitlab CI to create your repository, you won't be able to keep the state as you would not have remote state to store it for this specific task. This would mean you would have an inconsistent resource with a tf but no state.
If you want to integrate it with your CI I would recommend using gcloud cli inside your ci, checking if the gcs exists and if not creating it.
If you really want to use terraform, maybe use the free tier of terraform cloud with remote backend only for this specific resource. Like this you have all resources managed by tf, and all with a tfstate.

You now have another option, which does not involve GCP, with GitLab 13.0 (May 2020)
GitLab HTTP Terraform state backend
Users of Terraform know the pain of setting up their state file (a map of your configuration to real-world resources that also keeps track of additional metadata).
The process includes starting a new Terraform project and setting up a third party backend to store the state file that is reliable, secure, and outside of your git repo.
Many users wanted a simpler way to set up their state file storage without involving additional services or setups.
Starting with GitLab 13.0, GitLab can be used as an HTTP backend for Terraform, eliminating the need to set up state storage separately for every new project.
The GitLab HTTP Terraform state backend allows for a seamless experience with minimal configuration, and the ability to store your state files in a location controlled by the GitLab instance.
They can be accessed using Terraform’s HTTP backend, leveraging GitLab for authentication.
Users can migrate to the GitLab HTTP Terraform backend easily, while also accessing it from their local terminals.
The GitLab HTTP Terraform state backend supports:
Multiple named state files per project
Locking
Object storage
Encryption at rest
It is available both for GitLab Self-Managed installations and on GitLab.com.
See documentation and issue.
Furthermore, this provider will be supported for the foreseeable futur, with GitLab 13.4 (September 2020):
Taking ownership of the GitLab Terraform provider
We’ve recently received maintainer rights to the GitLab Terraform provider and plan to enhance it in upcoming releases.
In the past month we’ve merged 21 pull requests and closed 31 issues, including some long outstanding bugs and missing features, like supporting instance clusters.
You can read more about the GitLab Terraform provider in the Terraform documentation.
See Documentation and Issue.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js