What's the point of full configurations in .tfstate? - amazon-web-services

So I've gone through some basic reading:
https://blog.gruntwork.io/an-introduction-to-terraform-f17df9c6d180
and
https://blog.gruntwork.io/how-to-manage-terraform-state-28f5697e68fa
So I understand that .tfstate tells the terraform CLI which resources it's actually responsible for managing. But couldn't that be done more minimally with a list of ID's?
Why does .tfstate need to contain full configurations of all resources, if terraform refetch is run implicitly before terraform apply?
Would the fetch not get complete information from the infrastructure? And then that could be used to do the diff etc...
I suppose if you get complete information every time, you might as well record it. But I'm wondering if it's a necessary step. Thanks!

Related

Create Infrastructure Documentation from terraform + gitlab-ci system

Our infra pipeline is setup using terraform + gitlab-ci. I am given with task to provide documentation on setup with what's implemented and what's left. I am new to infra world and finding it hard to come up template to start documentation.
So far I thought of having a table with resources needed with details on dependencies, source of the module, additional notes, etc
If you have a template, can you share OR any other suggestions?
For starters, you could try one or both of the below approaches:
a) create a graph of the Terraform resources using its graph command
b) group and then list all of your resources for a specific tag using AWS Resource Groups, specifically its Create Resource Group functionality
The way I do documentation is to keep it as simple as possible, explain how it works, how to use it and also provide instructions on how it was setup from scratch for reference and as an insurance policy. So that if it's destroyed, someone other than the person that set it all up could recreate it.
Since this is just a pipeline there is probably not much to diagram. The structure of documentation I would provide would be something like this and I would add this either as part of the README.md, in Confluence or however your team does documentation.
Summary
1-2 Sentences about the work and why it was created.
How the Repo is Structured
An explanation on how the repo is structured and decisions behind why it was structured the way it was.
How To Use
Provide steps on how a user can use the pipeline
How It Was Created
Provide steps on how it was setup so anybody can manage it and work on it going forward.

Importing existing resources with multiple accounts

We have four AWS accounts used to define different environments: dev, sqe, stg, prd. We're only now using CF and I'd like to import an existing resource into a stack. As we roll this out each environment will get the new stack and I'm wondering if there's an easier way to import the resource in each env. than to initially go through the console to import the reasource while add the stack (would be nice if we could just deploy via our deployment system.)
What I was hoping for was something I could specify in the stack definition itself (e.g., "here's a bucket that already exists, take ownership"), but I'm not finding anything. Currently it seems like the easiest route would be to create an empty stack in each environment which imports the resource and then just deploy as normal.
Also, what happens when/if an update fails and a stack gets stuck in ROLLBACK_COMPLETE? Do I have to go through this again after deleting the stack?
What you have described sounds exactly like your after a Continuous Integration / Continuous Deployment (CICD) pipeline. Instead of trying to import existing resources into your accounts, your better off designing the cloudformation templates then deploying them to each environment through Code Pipeline. This will also provide a clean separation between the accounts instead of importing stg resources to prd.
A fantastic example and quickstart is the serverless-cicd-for-enterprise which should serve as a good starting point for you.
You can't get stuck on 'rollback complete', as that is the last action a failed change set executes. What it means is that it tried to update, couldn't and has reverted to the last successful deployment. If this is the first deployment (no successful deployments) you will need to delete the stack and try again. However, if you have had a successful deployment you can run an update stack.

Terraform destroy add the same resources

I'm facing a difficult issue to resolve.
I'm terraforming the deployement for multiple ressources on GCP's plateform.
Those ressources are all included in the terraform GCP's network module. (https://github.com/terraform-google-modules/terraform-google-network).
I'm building 2 projects with VPC (shared) and some subnetworks. Easy at first glance.
First terraform init/plan & apply was Ok, the tfstate file is on gcs backend with versionning set on true.
Today, I launch an terraform plan on it to check if everything was ok before doing some modifications.
The output of the plan is telling me that terraform wants to destroy some resources ... and recreate (adding) ... strictly the same resources ...
The code is on our bitbucket repo, no changes on it till the last apply who was ok.
I tried to retreive an old version of the tfstate files, disable the gcs backend to debug and correct it localy, but I can't find a way to refresh the current state.
I tried those tricks :
terraform refresh
terraform import (40 ressources with my little hands ... and even if import commands are working, the plan command still want to destroy my existing resources to recreate strictly the same ...)
So I'm wondering if you already encountered the same problem.
If yes, how you managed it ?
I can share my source on demand.
Terraform v0.12.9
provider.google v2.19.0
provider.google-beta v3.3.0
provider.null v2.1.2
provider.random v2.2.1
Ok, big rookie mistake, terraform providers save my day. No versions was set on the source's module version ... I just define it, replan, everything was fine again.

Import current state of my cloud AWS account with terraform

I would like to version control my cloud resources initially before using it to apply through Terraform. Is there anyway I can run a single command and store the current state of my cloud?
I have tried to use Terraform's import command:
terraform import ADR ID
But this takes a long time to identify all the resources and import them.
I have tried terraforming but this also needs a resource type to import:
terraforming s3
Is there any tool that can help in importing all existing resources?
While this doesn't technically answer your question I would strongly advise not to try and import an entire existing AWS account into Terraform in a single way even if it was possible.
If you look at any Terraform best practices an awful lot of it comes down to minimising blast radius of things so only things that make sense to be changed at the same time as each other are ever applied at the same time. Charity Majors wrote up a good blog post about this and the impact it had when that wasn't the case.
Any tool that would mass import things (eg terraforming) is just going to dump everything in a single state file. Which, as mentioned before, is a bad idea.
While it sounds laborious I'd recommend that you being your migration to Terraform more carefully and methodically. In general I'd probably say that only new infrastructure should use Terraform, utilising Terraform's data sources to look up existing things such as VPC IDs that already exist.
Once you feel comfortable with using Terraform and structuring your infrastructure code and state files in a particular way you can then begin to think about how you would map your existing infrastructure code into Terraform state files etc and begin manually importing specific resources as necessary.
Doing things this way also allows you to find your feet with Terraform a bit better and understand its limitations and strengths while also working out how your team and/or CI will work together (eg remote state and state file locking and orchestration) without tripping over each other or causing potentially crippling state issues.
I'm using terraformer to import my existing AWS infrastructure. It's much more flexible than terraforming and has no issues mentioned in answers.

How to automate the Updating/Editing of Amazon Data Pipeline

I want to use AWS Data Pipeline service and have created some using the manual JSON based mechanism which uses the AWS CLI to create, put and activate the pipeline.
My question is that how can I automate the editing or updating of the pipeline if something changes in the pipeline definition? Things that I can imagine changing could be schedule time, addition or removal of Activities or Preconditions, references to DataNodes, resources definition etc.
Once the pipeline is created, we cannot edit quite a few things as mentioned here in the official doc: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-manage-pipeline-modify-console.html#dp-edit-pipeline-limits
This makes me believe that if I want to automate the updating of pipeline then I would have to delete and re-create/activate a new pipeline? If yes, then the next question is that how can I create a automated process which identifies the previous version's ID, deletes it and creates a new one? Essentially trying to build a release management flow for this where the configuration JSON file is released and deployed automatically.
Most commands like activate, delete, list-runs, put-pipeline-definition etc. take the pipeline-id which is not known until a new pipeline created. I am unable to find anything which remains constant across updates or recreation (the unique-id and name parameters of the createpipeline command are consistent but then I can't use them for the above mentioned tasks (I need pipeline-id for that.
Of course I can try writing shell scripts which grep and search the output and try to create a script but is there any other better way? Some other info that I am missing?
Thanks a lot.
You cannot edit schedules completely or change references so creating/deleting pipelines seems to be the best way for your scenario.
You'll need the pipeline-id to delete a pipeline. Is it not possible to keep a record of that somewhere? You can have a file with the last used id stored locally or in S3 for instance.
Some other ways I can think of are:
If you have only 1 pipeline in the account you can list-pipelines and
use the only result
If you have the pipeline name you can list-pipelines and find the id