How are data sources used in Terraform? - amazon-web-services

The Terraform Data Sources documentation tells me what a data source is, but I do not quite understand it. Can somebody give me a use case of data source? What is the difference between it and configuring something using variables?

Data sources can be used for a number of reasons; but their goal is to do something and then give you data.
Let's take the example from their documentation:
# Find the latest available AMI that is tagged with Component = web
data "aws_ami" "web" {
filter {
name = "state"
values = ["available"]
}
filter {
name = "tag:Component"
values = ["web"]
}
most_recent = true
}
This uses the aws_ami data source - this is different than a resource! It will instead just give you information, and not create anything. This example in particular will call out to the describe-images AWS API call, pass in a few --filter options as specified, and return an object that you can get information from - take a look at these attributes!
name
owner_id
description
image_id
... The list goes on. This is really useful if I were, let's say - always wanting to pull the latest AMI matching some tags, and keep a launch configuration up to date with it. I could use this data provider rather than always have to update a variable or hard-code the ID.
Data source can be used for other reasons as well; one of my favorites is the template provider.
Good luck!

Data sources provide information about entities that are not managed by the current Terraform configuration.
This may include:
Configuration data from Consul
Information about the state of manually-configured infrastructure components
In other words, data sources are read-only views into the state of pre-existing components external to our configuration.
Once you have defined a data source, you can use the data elsewhere in your Terraform configuration.
For example, let's suppose we want to create a Terraform configuration for a new AWS EC2 instance. We want to use an AMI image which were created and uploaded by a Jenkins job using the AWS CLI, and not managed by Terraform. As part of the configuration for our Jenkins job, this AMI image will always have a name with the prefix app-.
In this case, we can use the aws_ami data source to obtain information about the most recent AMI image that has the name prefix app-.
data "aws_ami" "app_ami" {
most_recent = true
filter {
name = "name"
values = ["app-*"]
}
}
Data sources export attributes, just like resources do. We can interpolate these attributes using the syntax data.TYPE.NAME.ATTR. In our example, we can interpolate the value of the AMI ID as data.aws_ami.app_ami.id, and pass it as the ami argument for our aws_instance resource.
resource "aws_instance" "app" {
ami = "${data.aws_ami.app_ami.id}"
instance_type = "t2.micro"
}
Data sources are most powerful when retrieving information about dynamic entities - those whose properties change value often. For example, the next time Terraform fetches data for our aws_ami data source, the value of the exported attributes may be different (we might have built and pushed a new AMI).
Variables are used for static values, those that rarely changes, such as your access and secret keys, or a standard list of sudoers for your servers.

Good examples up there!
The main difference between Terraform data source, resource and variable is :
Resource: Provisioning of resources/infra on our platform. Create, Update and delete!
Variable Provides predefined values as variables on our IAC. Used by resource for provisioning.
Data Source: Fetch values from our infra/provider and and provides data for our resource to provision infra/resource.
Examples are well explained above :)

Data sources are used to fetch the data from the provider end, so that It can be used as configuration in .tf files, Instead of hardcoding it. Example: Below code fetches the AWS AMI ID and uses it to launch AWS instance.
data "aws_ami" "std_ami" {
most_recent = true
owners = ["amazon"]
filter {
name = "root-device-type"
values = ["ebs"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
resource "aws_instance" "myec2" {
ami = data.aws_ami.std_ami.id
instance_type = "t2.micro"
}

Related

aws_shield_protection Terraform

I am struggling to find a way to include all load balancers with certain tag value's (e.g. Shield protection = ON) in an aws account.
Currently i have a map of arn's in a variable and running a for loop. This method work's but not in an efficient way; since every time I have to add the ARN of a new Load balancer manually.
resource "aws_shield_protection" "this" {
for_each = var.listofarn
name = "shield protection".each.key
resource_arn = each.key
}
variable listofarn {
type = map(string)
default = {
appx_alb="arn::xxxxx"
appy_alb="arn:yyyyy"
}
}
Is there a way to use data resource "aws_lb".
thanks.
Using data source wouldn't help much. aws_lb data source can only return one alb. You can't use it to get information about all your ALBs. You would have to run the aws_lb data source in a for_loop with tags or some ALB id.
But you could overcome your issue through development of an external data source. Since its a fully custom data source, it can return information about all your ALBs in the form you want.

What does keyword `count` mean in data source?

when reading open-source terraform module chgangaraju/terraform-aws-cloudfront-s3-website, I found they use count in Data source. But I didn't find any document about count.
what does it mean in this place?
data "aws_acm_certificate" "acm_cert" {
count = var.use_default_domain ? 0 : 1
domain = coalesce(var.acm_certificate_domain, "*.${var.hosted_zone}")
provider = aws.aws_cloudfront
//CloudFront uses certificates from US-EAST-1 region only
statuses = [
"ISSUED",
]
}
It means same as for resource. The pattern uses that you have is an Conditional Expression:
count = var.use_default_domain ? 0 : 1
The expression with count are often used for optional resources or data sources. Specifically, in your case, if you set use_default_domain to false, the CloudFront distro created in this TF script will be created with your own, custom domain and SSL certificate. For this to happen, the data source acm_cert will fetch information about SSL certificate from ACM for your acm_certificate_domain.
In contrast, when use_default_domain is true, you are going to use default domain and SSL certificate from CloudFront. For that you don't need to have any SSL certificate in ACM. Subsequently, TF will not fetch it.
Technically, if use_default_domain is true, then count is 0 and the data "aws_acm_certificate" "acm_cert" is not executed. But, if count is 1 (when use_default_domain is false), the data source will run and try to fetch information about your custom SSL certificate.
A data source is a query, which is used for getting data from the outside world and making it available to your Terraform configuration
terraform example code
provider "aws" {
region = "us-west-2"
}
data "aws_ami" "example" {
count = 2 // look at this line!
most_recent = true
owners = ["self"]
}
output "amis" {
value = "${data.aws_ami.example.*.id}"
}
when you specify the count in the data source block, it's similar to SQL limit. This can be thought of as equivalent to the following SQL query:
select * from data_source
where owners in ('self')
order by most_recent desc
limit 2; /** <= look at this line! **/
Response
The data source returns two records and sends them to the output.
You will see the following results in your terminal.
amis = [
"ami-0345c0186ced78ce6",
"ami-0345c0186ced78ce6",
]
Terraform Documentation: Multiple Resource Instances
Data resources support count and for_each meta-arguments as defined for managed resources, with the same syntax and behavior.
As with managed resources, when count or for_each is present it is important to distinguish the resource itself from the multiple resource instances it creates. Each instance will separately read from its data source with its own variant of the constraint arguments, producing an indexed result.
Related Question
How can I output a data source that uses count?

How to convert the aws secret manager string to map in terraform (0.11.13)

I have a secret stored in AWS secret manager and trying to integrate that within terraform during runtime. We are using terraform 0.11.13 version, and updating to latest terraform is in the roadmap.
We all want to use the jsondecode() available as part of latest terraform, but need to get few things integrated before we upgrade our terraform.
We tried to use the below helper external data program suggested as part of https://github.com/terraform-providers/terraform-provider-aws/issues/4789.
data "external" "helper" {
program = ["echo", "${replace(data.aws_secretsmanager_secret_version.map_example.secret_string, "\\\"", "\"")}"]
}
But we ended up getting this error now.
data.external.helper: can't find external program "echo"
Google search didn't help much.
Any help will be much appreciated.
OS: Windows 10
It sounds like you want to use a data source for the aws_secretsmanager_secret.
Resources in terraform create new resources. Data sources in terraform reference the value of existing resources in terraform.
data "aws_secretsmanager_secret" "example" {
arn = "arn:aws:secretsmanager:us-east-1:123456789012:secret:example-123456"
}
data "aws_secretsmanager_secret_version" "example" {
secret_id = data.aws_secretsmanager_secret.example.id
version_stage = "example"
}
Note: you can also use the secret name
Docs: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/secretsmanager_secret
Then you can use the value from this like so:
output MySecretJsonAsString {
value = data.aws_secretsmanager_secret_version.example.secret_string
}
Per the docs, the secret_string property of this resource is:
The decrypted part of the protected secret information that was originally provided as a string.
You should also be able to pass that value into jsondecode and then access the properties of the json body individually.
but you asked for a terraform 0.11.13 solution. If the secret value is defined by terraform you can use the terraform state datasource to get the value. This does trust that nothing else is updating the secret other than terraform. But the best answer is to upgrade your terraform. This could be a useful stopgap until then.
As a recommendation, you can make the version of terraform specific to a module and not your whole organization. I do this through the use of docker containers that run specific versions of the terraform bin. There is a script in the root of every module that will wrap the terraform commands to come up in the version of terraform meant for that project. Just a tip.

Terragrunt v0.14.9, Terraform v0.11.7 reading AWS VPC ID from second environment

I have used Terragrunt to orchestrate the creation of a non-default AWS VPC.
I've got S3/DynamoDB state mgmt, and the VPC code is a module. I have the 'VPC environment' terraform.tfvars code checked into a second repo as per the terragrunt README.md.
I created a second module which will eventually create hosts in this VPC but for now just aims to output its ID. I have created a separate 'hosts environment' / terraform.tfvars for the instantiation of this module.
I run terragrunt apply in the VPC environment directory - VPC created
I run terragrunt apply a second time in the hosts environment directory - output directive doesn't work (no error, but incorrect, see below).
This is a precursor to one day running a terragrunt apply-all in the parent directory of the VPC/hosts environment directories; my reading of the docs suggest using a terraform_remote_state data source to expose the VPC ID, so I specified access like this in the data.tf file of the hosts module:
data "terraform_remote_state" "vpc" {
backend = "s3"
config {
bucket = "myBucket"
key = "keyToMy/vpcEnvironment.tfstate"
region = "stateRegion"
}
}
Then, in the hosts module outputs.tf, I specified an output to check assignment:
output "mon_vpc" {
value = "${data.terraform_remote_state.vpc.id}"
}
When I run (2) above, it exits with:
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
mon_vpc = 2018-06-02 23:14:42.958848954 +0000 UTC
Questions:
I'm going wrong setting up the code so that the hosts environment is configured to correctly acquire the VPC ID from the already-existing VPC (terraform state file) - any advice on what to change here would be appreciated.
It does look like I've managed to acquire the date of when the VPC was created rather than its ID, which given the code is perplexing - anyone know why?
I'm not using community modules - all hand rolled.
EDIT: In response to Brandon Miller, here is a bit more. In my VPC module, I have an outputs.tf containing among other outputs:
output "aws_vpc.mv.id-op" {
value = "${aws_vpc.mv.id}"
}
and the vpc.tf contains
resource "aws_vpc" "mv" {
cidr_block = "${var.vpcCidr}"
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = "mv-vpc-${var.aws_region}"
}
}
As this cfg results in a vpc being created, and as most of the parameters are <computed>, I assumed state would contain sufficient data for other modules to refer to by consulting state (I assumed at first that terraform used the AWS API for this under the bonnet, rather than consulting a different state key).
EDIT 2: Read all of #brendan-miller's answer and following comments first.
Use of periods causes a problem as it confuses terraform (see Brendan's answer for the specification format below):
Error: output 'mon_vpc': unknown resource 'data.aws_vpc.mv-ds' referenced in variable data.aws_vpc.mv-ds.vpc.id
You named your output aws_vpc.mv.id-op but when you retrieve it you are retrieving just id. You could try
data.terraform_remote_state.vpc.aws_vpc.mv.id
but im not sure if Terraform will complain about the additional .. However the format should always be
data.terraform_remote_state.<name of the remote state module>.<name of the output>
You mentioned wanting to be able to get this info with the AWS API. That is also possible by using the aws_vpc data source. Their example uses id, but you can also use any tag you used on your vpc.
Like this:
data "aws_vpc" "default" {
filter {
name = "tag:Name"
values = ["example-vpc-name"]
}
}
Then you can use this for the id
${data.aws_vpc.default.id}
In addition this retrieves all tags set, for example:
${data.aws_vpc.default.tags.Name}
And the cidr block
${data.aws_vpc.default.cidr_block}
As well as some other info. This can be very useful for storing and retrieving things about your VPC.

Elastic Beanstalk Application Version in Terraform

I attempted to manage my application versions in my terraform template by parameterising the name. This was an attempt to have a new application version created by our CI process whenever the contents of the application changed. This way in elasticbeanstalk i could keep a list of historic application versions so that i could roll back etc. This didnt work as the same application version was constantly updated and in effect i lost the history of all application versions.
resource "aws_elastic_beanstalk_application_version" "default" {
name = "${var.eb-app-name}-${var.build-number}"
application = "${var.eb-app-name}"
description = "application version created by terraform"
bucket = "${aws_s3_bucket.default.id}"
key = "${aws_s3_bucket_object.default.id}"
}
I then tried to parameterise the logical resource reference name, but this isnt supported by terraform.
resource "aws_elastic_beanstalk_application_version" "${var.build-number}" {
name = "${var.eb-app-name}-${var.build-number}"
application = "${var.eb-app-name}"
description = "application version created by terraform"
bucket = "${aws_s3_bucket.default.id}"
key = "${aws_s3_bucket_object.default.id}"
}
Currently my solution is to manage my application versions outside of terraform which is disappointing as there are other associated resources such as the S3 bucket and permissions to worry about.
Am i missing something?
As far as Terraform is concerned you are just updating a single EB application version resource there. If you wanted to keep the previous versions around then you might need to try and increment the count of resources that Terraform is managing.
Off the top of my head you could try something like this:
variable "builds" = {
type = list
}
resource "aws_elastic_beanstalk_application_version" "default" {
count = "${length(var.builds)}"
name = "${var.eb-app-name}-${element(builds, count.index)}"
application = "${var.eb-app-name}"
description = "application version created by terraform"
bucket = "${aws_s3_bucket.default.id}"
key = "${aws_s3_bucket_object.default.id}"
}
Then if you have a list of builds it should create a new application version for each build.
Of course that could be dynamic in that the variable could instead be a data source that returns a list of all your builds. If a data source doesn't exist for it already you could write a small script that is used as an external data source.