Create Infrastructure Documentation from terraform + gitlab-ci system - amazon-web-services

Our infra pipeline is setup using terraform + gitlab-ci. I am given with task to provide documentation on setup with what's implemented and what's left. I am new to infra world and finding it hard to come up template to start documentation.
So far I thought of having a table with resources needed with details on dependencies, source of the module, additional notes, etc
If you have a template, can you share OR any other suggestions?

For starters, you could try one or both of the below approaches:
a) create a graph of the Terraform resources using its graph command
b) group and then list all of your resources for a specific tag using AWS Resource Groups, specifically its Create Resource Group functionality

The way I do documentation is to keep it as simple as possible, explain how it works, how to use it and also provide instructions on how it was setup from scratch for reference and as an insurance policy. So that if it's destroyed, someone other than the person that set it all up could recreate it.
Since this is just a pipeline there is probably not much to diagram. The structure of documentation I would provide would be something like this and I would add this either as part of the README.md, in Confluence or however your team does documentation.
Summary
1-2 Sentences about the work and why it was created.
How the Repo is Structured
An explanation on how the repo is structured and decisions behind why it was structured the way it was.
How To Use
Provide steps on how a user can use the pipeline
How It Was Created
Provide steps on how it was setup so anybody can manage it and work on it going forward.

Related

How to use AWS CLI to create a stack from scratch?

The problem
I'm approaching AWS, and the first test project will be a website, but i'm struggling on how to approach the resource and the tools to accomplish this.
AWS documentation is not really beginner-friendly, so to me it is like to being punched in the face at the first boxe training session.
First attempt
I've installed bot AWS and SAM cli tools, so what I would expect is to be able to create an empty stack at first and adding the resource one by one as the specifications are given/outlined, but instead what I see is that i need to give a template to the tool to create the new stack, but that means I need to know how to write it beforehand and therefore the template specifications for each resource type.
Second attempt
This lead me to create the stack and the related resources from the online console to get the final stack template, but then I need to test every new resource or any updated resource locally, so I have to copy the template from the online console to my machine and run the cli tools with this, but obviously it is not the desired development flow.
What I expected
Coming from a standard/classical web development I would expect to be able to create the project locally, test the related resources locally, version it, and delegate the deployment to the pipeline.
So what?
All this made me understand that "probably" I'm missing somenthing on how to use the aws cli tools and how the development for an aws-hosted application is meant to be done.
I'm not seeking for a guide on specific resource types like every single tutorial I've found online, but something on a higher level on how to handle a project development on aws, best practices and stuffs like that, I can then dig deeper on any resource later when needed.
AWS's Cloud Development Kit ticks the boxes on your specific criteria.
Caveat: the CDK has a learning curve in line with its power and flexibility. There are much easier ways to deploy a web app on AWS, like the higher-level AWS Amplify framework, with abstractions tailored to front-end devs who want to minimise the mental energy spent on the underlying infrastructure.
Each of the squillion AWS and 3rd Party deploy tools is great for somebody. Nevertheless, looking at your explicit requirements in "What I expected", we can get close to the CDK as an objective answer:
Coming from a standard/classical web development
So you know JS/Python. With the CDK, you code infrastructure as functions and classes, rather than 500 lines of YAML as with SAM. The CDK's reference implementation is in Typescript. JS/Python are also supported. There are step-by-step AWS online workshops for these and the other supported languages.
create the project locally
Most of your work will be done locally in your language of choice, with a cdk deploy CLI command to
bundle the deployment artefacts and send them up to the cloud.
test the related resources locally
The CDK has built-in testing and assertion support.
version it
"Deterministic deploy" is a CDK design goal. Commit your code and the generated deployment artefacts so you have change control over your infrastructure.
delegate the deployment to the pipeline
The CDK has good pipeline support: i.e. a push to the remote main branch can kick off a deploy.
AWS SAM is actually a good option if you are just trying to get your feet wet with AWS. SAM is an open-source wrapper around the aws-cli, which allows you to create aws resources like Lambda in say ~10 lines of code vs ~100 lines if you were to use the aws-cli directly. Yes, you'll need to learn SAM specific things like SAMtemplate and SAM-cli but it is pretty straightforward using this doc.
Once you get the hang of it, it would be easier to start looking under the hood of what/how SAM is doing things and get into the weeds with aws-cli if you wanted. Which will then allow you to build out custom solutions (using aws-cli) for your complex use cases that SAM may not support. Caveat: SAM is still pretty new and has open issues that could be a blocker for advanced features/complex use cases.

Import current state of my cloud AWS account with terraform

I would like to version control my cloud resources initially before using it to apply through Terraform. Is there anyway I can run a single command and store the current state of my cloud?
I have tried to use Terraform's import command:
terraform import ADR ID
But this takes a long time to identify all the resources and import them.
I have tried terraforming but this also needs a resource type to import:
terraforming s3
Is there any tool that can help in importing all existing resources?
While this doesn't technically answer your question I would strongly advise not to try and import an entire existing AWS account into Terraform in a single way even if it was possible.
If you look at any Terraform best practices an awful lot of it comes down to minimising blast radius of things so only things that make sense to be changed at the same time as each other are ever applied at the same time. Charity Majors wrote up a good blog post about this and the impact it had when that wasn't the case.
Any tool that would mass import things (eg terraforming) is just going to dump everything in a single state file. Which, as mentioned before, is a bad idea.
While it sounds laborious I'd recommend that you being your migration to Terraform more carefully and methodically. In general I'd probably say that only new infrastructure should use Terraform, utilising Terraform's data sources to look up existing things such as VPC IDs that already exist.
Once you feel comfortable with using Terraform and structuring your infrastructure code and state files in a particular way you can then begin to think about how you would map your existing infrastructure code into Terraform state files etc and begin manually importing specific resources as necessary.
Doing things this way also allows you to find your feet with Terraform a bit better and understand its limitations and strengths while also working out how your team and/or CI will work together (eg remote state and state file locking and orchestration) without tripping over each other or causing potentially crippling state issues.
I'm using terraformer to import my existing AWS infrastructure. It's much more flexible than terraforming and has no issues mentioned in answers.

How do I know what key value pairs are available for deployment manager?

For example when I try to figure out what properties I can put into deployment manager for creating a bigquery table, I had to reference the REST API docs as the best place to find parameters and required fields.
Is there a good place from within gcloud command or online docs that are specific to deployment manager yamls? I would like to be able to reference required fields and optional fields for creating GCP resources. Currently it's very difficult to figure out.
From the documentation at: https://cloud.google.com/deployment-manager/docs/configuration/supported-resource-types
You can get a list of the supported resource types by running:
gcloud deployment-manager types list
That said the yaml reference from documentation on the that page looks pretty complete.
Edit: Refer to this github link for a list of deployment manager examples.
If anything you need is not described in the documentation/exemplary schemas there is a brutal walk around.
You can make an api call with developer console open (F12) and have a look on network activity where your call will be described with all used and available properties.
It will not provide any addtional information about implementation besides parameter's name itself, so you will have to follow rules covering alike parameter.

understand how opsworks and custom cookbooks work together

I have my stack on opsworks and app is deploying fine (cake php).
Now I have to configure some things like chmod, php versions, etc etc... I'm reading about this but don't know exactly whats the best way to do this.
Question 1 - Should I do this with custom deploy JSON or via custom cookbooks?
Question 2 - Whats the correctly way to work with custom cookbooks? Fork original AWS repositories, update recipes and then use it in my stack?
depends on what you would like to achieve, you may implement many things, such as:
a recipe, which is invoked only once during a chef-client run.
a lightweight resource provider, which supports notifies and can be invoked zero or more times.
a definition, which is available before resource collection and can be invoked zero or more times.
for your second question, first checkout berkshelf -- a cookbook manager.
i would suggest forking a project only if the project is dead, otherwise i would consider to contribute to already implemented project so everybody will benefit from it; and you can always write your own wrapper cookbook, you can also refer to Chef wrapper cookbook best practices.

How to automate the Updating/Editing of Amazon Data Pipeline

I want to use AWS Data Pipeline service and have created some using the manual JSON based mechanism which uses the AWS CLI to create, put and activate the pipeline.
My question is that how can I automate the editing or updating of the pipeline if something changes in the pipeline definition? Things that I can imagine changing could be schedule time, addition or removal of Activities or Preconditions, references to DataNodes, resources definition etc.
Once the pipeline is created, we cannot edit quite a few things as mentioned here in the official doc: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-manage-pipeline-modify-console.html#dp-edit-pipeline-limits
This makes me believe that if I want to automate the updating of pipeline then I would have to delete and re-create/activate a new pipeline? If yes, then the next question is that how can I create a automated process which identifies the previous version's ID, deletes it and creates a new one? Essentially trying to build a release management flow for this where the configuration JSON file is released and deployed automatically.
Most commands like activate, delete, list-runs, put-pipeline-definition etc. take the pipeline-id which is not known until a new pipeline created. I am unable to find anything which remains constant across updates or recreation (the unique-id and name parameters of the createpipeline command are consistent but then I can't use them for the above mentioned tasks (I need pipeline-id for that.
Of course I can try writing shell scripts which grep and search the output and try to create a script but is there any other better way? Some other info that I am missing?
Thanks a lot.
You cannot edit schedules completely or change references so creating/deleting pipelines seems to be the best way for your scenario.
You'll need the pipeline-id to delete a pipeline. Is it not possible to keep a record of that somewhere? You can have a file with the last used id stored locally or in S3 for instance.
Some other ways I can think of are:
If you have only 1 pipeline in the account you can list-pipelines and
use the only result
If you have the pipeline name you can list-pipelines and find the id