Terraform: Custom user authentication on digitalocean - digital-ocean

With vagrant I was able to set a droplet up with a custom user:
config.ssh.username = 'living'
config.ssh.private_key_path = "./keys/qa.openssh"
Currently, I'm using terraform. Nevertheless I don't quite figure out how to set a custom user instead of root.
Any ideas?

You have a couple of ways in which you could approach this:
1: Use a Terraform provisioner to create your user:
Terraform ships with a number of provisioners that will let you perform provisioning actions on the resources it creates for you. Probably the best one for you in this situation is the remote-exec provisioner. Essentially this just allows you to run one or more shell commands on the remote resource. Using this you could then simply apply the typical Unix commands to create your living user and add in the correct SSH key for them (or even delegate the call out to a true configuration tool like Ansible or Chef). It might look something like this:
provisioner "remote-exec" {
inline = [
"adduser living",
"..."
]
}
You may also be able to use the file provisioner to copy over configuration files, or even the chef provisioner.
The thing to remember with provisioners is that they only run once when the resource is created.
2: Create a machine image with the correct SSH configuration in place
An alternative is to create an image for your Droplets that has the basic SSH configuration and users already set. This way you simply ask Terraform to create resources that you know to have the correct configuration already. This is what we'd call immutable infrastructure.
You could for example use Packer (which is also from Hashicorp) to create your Droplet with the living user already created. Packer is basically a tool for creating machine images with numerous providers such as AWS and also DigitalOcean.
Once you had created your new image for your Droplet, you would simply update your Terraform resource definitions to ensure that you launch the correct image.
Given the choice, I personally would take option 2 as I much prefer the immutable infrastructure route. However if you're just playing around with Terraform, then option 1 will work just fine.

Related

Rebuild/recreate terraform from remote state

A contractor built an application's AWS infrastructure on his local laptop, never commited his code then left (wiping the notebook's HD). But he did create the infrastructure with Terraform and stored the remote state in an s3 bucket, s3://analytics-nonprod/analytics-dev.tfstate.
This state file includes all of the VPC, subnets, igw, nacl, ec2, ecs, sqs, sns, lambda, firehose, kinesis, redshift, neptune, glue connections, glue jobs, alb, route53, s3, etc. for the application.
I am able to run Cloudformer to generate cloudformation for the entire infrastructure, and also tried to import the infrastructure using terraformer but terraformer does not include neptune and lambda components.
What is the best way/process to recreate a somewhat usable terraform just from the remote state?
Should I generate some generic :
resource "aws_glue_connection" "dev" { }
and run "terraform import aws_glue_connection.dev"
then run "terraform show"
for each resource?
Terraform doesn't have a mechanism specifically for turning existing state into configuration, and indeed doing so would be lossy in the general case because the Terraform configuration likely contained expressions connecting resources to one another that are not captured in the state snapshots.
However, you might be able to get a starting point -- possibly not 100% valid but hopefully a better starting point than nothing at all -- by configuring Terraform just enough to find the remote state you have access to, running terraform init to make Terraform read it, and then run terraform show to see the information from the state in a human-oriented way that is designed to resemble (but not necessarily exactly match) the configuration language.
For example, you could write a backend configuration like this:
terraform {
backend "s3" {
bucket = "analytics-nonprod"
key = "analytics-dev.tfstate"
}
}
If you run terraform init with appropriate AWS credentials available then Terraform should read that state snapshot, install the providers that the resource instances within it belong to, and then leave you in a situation where you can run Terraform commands against that existing state. As long as you don't take any actions that modify the state, you should be able to inspect it with commands like terraform show.
You could then copy the terraform show output into another file in your new Terraform codebase as a starting point. The output is aimed at human consumption and is not necessarily all parsable by Terraform itself, but the output style is similar enough to the configuration language that hopefully it won't take too much effort to massage it into a usable shape.
One important detail to watch out for is the handling of Terraform modules. If the configuration that produced this state contained any module "foo" blocks then in your terraform show output you will see some things like this:
# module.foo.aws_instance.bar
resource "aws_instance" "bar" {
# ...
}
In order to replicate the configuration for that, it is not sufficient to paste the entire output into one file. Instead, any resource block that has a comment above it indicating that it belongs to a module will need to be placed in a configuration file belonging to that module, or else Terraform will not understand that block as relating to the object it can see in the state.
I'd strongly suggest taking a backup copy of the state object you have before you begin, and you should be very careful not to apply any plans while you're in this odd state of having only a backend configuration, because Terraform might (if it's able to pick up enough provider configuration from the execution environment) plan to destroy all of the objects in the state in order to match the configuration.

What's the best practice to use created resources in Terraform?

I will start a new Terraform project on AWS. The VPC is already created and i want to know what's the best way to integrate it in my code. Do i have to create it again and Terraform will detect it and will not override it ? Or do i have to use Data source for that ? Or is there other best way like Terraform Import ?
I want also to be able in the future to deploy the entire infrastructure in other Region or other Account.
Thanks.
When it comes to integrating with existing objects, you first have to decide between two options: you can either import these objects into Terraform and use Terraform to manage them moving forward, or you can leave them managed by whatever existing system and use them in Terraform by reference.
If you wish to use Terraform to manage these existing objects, you must first write a configuration for the object as if Terraform were going to create it itself:
resource "aws_vpc" "example" {
# fill in here all the same settings that the existing object already has
cidr_block = "10.0.0.0/16"
}
# Can then use that vpc's id in other resources using:
# aws_vpc.example.id
But then rather than running terraform apply immediately, you can first run terraform import to instruct Terraform to associate this resource block with the existing VPC using its id assigned by AWS:
terraform import aws_vpc.example vpc-abcd1234
If you then run terraform plan you should see that no changes are required, because Terraform detected that the configuration matches the existing object. If Terraform does propose some changes, you can either accept them by running terraform apply or continue to update the configuration until it matches the existing object.
Once you have done this, Terraform will consider itself the owner of the VPC and will thus plan to update it or destroy it on future runs if the configuration suggests it should do so. If any other system was previously managing this VPC, it's important to stop it doing so or else this other system is likely to conflict with Terraform.
If you'd prefer to keep whatever existing system is managing the VPC, you can also use the Data Sources feature to look up the existing VPC without putting Terraform in charge of it.
In this case, you might use the aws_vpc data source, which can look up VPCs by various attributes. A common choice is to look up a VPC by its tags, assuming your environment has a predictable tagging scheme that allows you to describe the single VPC you are looking for:
data "aws_vpc" "example" {
tags = {
Name = "example-VPC-name"
}
}
# Can then use that vpc's id in other resources using:
# data.aws_vpc.example.id
In some cases users will introduce additional indirection to find the VPC some other way than by querying the AWS VPC APIs directly. That is a more advanced configuration and the options here are quite broad, but for example if you are using SSM Parameter Store you could place the VPC into a parameter store parameter and retrieve it using the aws_ssm_parameter data source.
If the existing system managing the VPC is CloudFormation, you could also use aws_cloudformation_export or aws_cloudformation_stack to retrieve the information from the CloudFormation API.
If you are happy to manage it via terraform moving forward then you can import existing resources into your terraform state. Here is the usage page for it https://www.terraform.io/docs/import/usage.html
You will have to define a resource block inside of your configuration for the vpc first. You could do something like:
resource "aws_vpc" "existing" {
cidr_block = "172.16.0.0/16"
tags = {
Name = "prod"
}
}
and then on the cli run the command
terraform import aws_vpc.existing <vpc-id>
Make sure you run a terraform plan afterwards, because terraform may try to make changes to it. You kind of have to reverse engineer it a bit, by adding all the necessary configuration to the aws_vpc resource. Once it is aligned, terraform will not attempt to change it. You can then re-use this to deploy to other accounts and regions.
As you suggested, you could use a data source for the vpc. This can be useful if you want to manage it outside of terraform, instead of having the potential to destroy the vpc if it is run by an inexperienced user.
Some customers I've worked with prefer to manage resources like vpcs/subnets (and other core infrastructure) in separate terraform scripts that only senior engineers have access to. This can avoid the disaster scenarios where people destroy the underlying infrastructure by accident.
I personally prefer managing all my terraform code in a git repository that is then deployed using a CI/CD tool, even if it's just myself working on it. Some people may not see the value in spending the time creating the pipeline though and may stick with running it locally.
This post has some great recommendations on running terraform in an an automated environment https://learn.hashicorp.com/terraform/development/running-terraform-in-automation

On AWS, why should Packer be required to generate AMIs, since ami_from_instance exist in terraform

Terraform allows provisionning aws infrastructures with custom ansible scripts.
Since the function ami_from_instance from terraform,
allow convert an Instance into an AMI, and aws_instance the opposit.
I am quite new to that tools and I might not understand their subtilities but why should the common pattern of using Packer to generate the ami instanciated by Terraform be used ?
I am not going to repeat what "ydaetskcoR" had already mentioned. Grt points. Another usecase that Packer really does is sharing the AMI With multiple accounts. In our setup, we create AMI in one account and shared it other accounts to be used. Packer is specifically build to create AMI's and so has many features than the simple Terraform's ami_from_instance. My 2 cents
Because Packer creates an AMI with Configuration as Code, you will have a reproducible recipe for how you AMI's are created.
If you would use Terraforms ami_from_instance you instead creates clones of an non-reproducible source, thus creating snowflake servers (all are slightly different).
Also on important feature of a public cloud is autoscaling and for that you want to start off with AMI's that includes as much as possible so the startup time is small. This makes a pre-baked AMI better than a generic with a initialisation script that installs and configure all adaptations to you production environment.

When to provision in Packer vs Terraform?

I am sitting with a situation where I need to provision EC2 instances with some packages on startup. There are a couple of (enterprise/corporate) constraints that exist:
I need to provision on top of a specific AMI, which adds enterprisey stuff such as LDAP/AD access and so on
These changes are intended to be used for all internal development machines
Because of mainly the second constraint, I was wondering where is the best place to place the provisioning. This is what I've come up with
Provision in Terraform
As it states, I simply provision in terraform for the necessary instances. If I package these resources into modules, then provisioning won't "leak out". The disadvantages
I won't be able to add a different set of provisioning steps on top of the module?
A change in the provisioning will probably result in instances being destroyed on apply?
Provisioning takes a long time because of the packages it tries to install
Provisioning in Packer
This is based on the assumption that Packer allows you to provision on top of AMIs so that AMIs can be "extended". Also, this will only be used in AWS so it won't use other builders necessarily. Provisioning in Packer makes the Terraform Code much simpler and terraform applies will become faster because it's just an AMI that you fire up.
For me both of these methods have their place. But what I really want to know is when do you choose Packer Provisioning over Terraform Provisioning?
Using Packer to create finished (or very nearly finished) images drastically shortens the time it takes to deploy new instances and also allows you to use autoscaling groups.
If you have Terraform run a provisioner such as Chef or Ansible on every EC2 instance creation you add a chunk of time for the provisioner to run at the time you need to deploy new instances. In my opinion it's much better to do the configuration up front and ahead of time using Packer to bake as much as possible into the AMI and then use user data scripts/tools like Consul-Template to provide environment specific differences.
Packer certainly can build on top of images and in fact requires a source_ami to be specified. I'd strongly recommend tagging your AMIs in a way that allows you to use source_ami_filter in Packer and Terraform's aws_ami data source so when you make changes to your AMIs Packer and Terraform will automatically pull those in to be built on top of or deployed at the next opportunity.
I personally bake a reasonably lightweight "Base" AMI that does some basic hardening and sets up monitoring and logging that I want for all instances that are deployed and also makes sure that Packer encrypts the root volume of the AMI. All other images are then built off the latest "Base" AMI and don't have to worry about making sure those things are installed/configured or worry about encrypting the root volume.
By baking your configuration into the AMI you are also able to move towards the immutable infrastructure model which has some major benefits in that you know that you can always throw away an instance that is having issues and very quickly replace it with a new one. Depending on your maturity level you could even remove access to the instances so that it's no longer possible to change anything on the instance once it has been deployed which, in my experience, is a major factor in operational issues.
Very occasionally you might come across something that makes it very difficult to bake an AMI for and in those cases you might choose to run your provisioning scripts in a Terraform provisioner when it is being created. Sometimes it's simply easier to move an existing process over to using provisioners with Terraform than baking the AMIs but I would push to move things over to Packer where possible.
I have come across the same situation. As per my understanding
If you bring up your EC2 instances very frequently say 2 to 3 times a
day then go with creating an customized AMI with packer and then call
the ami through terraform.
If your base image ( AMI created by packer) change frequently based
on your requirements then its good to go with packer. But for me
running a packer scripts are very time consuming.
You can do the same with packer as well. You can write your
requirements in a script and call it in terraform. By having everything
incorporated in a terraform script would reduce some time
Finally its your decision and your frequency of bringing up EC2 instances.

How to use Chef Opscode with AWS autoscaling

I have been facing issues with integrating chef with AWS autoscale.
In most of my searches it tells about bootstrapping an instance and then using it's AMI to launch other instance in just the same way.
Basic issue is, Chef recognises each host with it's hostname, which in the above case is all going to be same. However, I was hoping for something like a Role which integrates in to AWS and does the thing better for me. Any help/Ideas will be appreciated. I just hope someone has done it already.
Regards,
There are a lot of options but the general flow looks like this:
Create an AMI with Chef pre-installed and with your org validator key and a client.rb with the server URL set. Packer is great for this. Technically optional, you could do this from the user-data script, but it saves a few seconds on each server launch.
Configure the UserData field on the ASG to be a script (or cloud-init config if you want to get fancy but we'll ignore that option for now) that launches chef-client -r 'role[myrole] where myrole is usually based on the type of ASG you are building. This will use the validator key to register with the Chef Server automatically and set the run list based on the command line you give. You can use similar arguments to set the environment or policy name if you are using those features.
Include the chef-client cookbook/recipe in that role to install Chef as a daemon on the machine and to remove the validator key.