I'm trying to build a Docker Swarm cluster in AWS using Terraform. I've successfully got a Swarm manager started, but I'm trying to work out how best to pass the join key to the workers (which will be created after the manager).
I'd like some way of running the docker swarm join-token worker -q command that can be set to a Terraform variable. That way, the workers can have a remote_exec command something like docker swarm join ${var.swarm_token} ${aws_instance.swarm-manager.private_ip}
How can I do this?
My config is below:
resource "aws_instance" "swarm-manager" {
ami = "${var.manager_ami}"
instance_type = "${var.manager_instance}"
tags = {
Name = "swarm-manager${count.index + 1}"
}
provisioner "remote-exec" {
inline = [
"sleep 30",
"docker swarm init --advertise-addr ${aws_instance.swarm-manager.private_ip}"
"docker swarm join-token worker -q" // This is the value I want to store as a variable/output/etc
]
}
}
Thanks
You can use an external data source in supplement to your remote provisioning script.
This can shell into your swarm managers and get the token after they are provisioned.
If you have N swarm managers, you'll probably have to do it all at once after the managers are created. External data sources return a map of plain strings, so using keys that enable you to select the right result for each node is required, or return the whole set as a delimited string, and use element() and split() to get the right item.
resource "aws_instance" "swarm_manager" {
ami = "${var.manager_ami}"
instance_type = "${var.manager_instance}"
tags = {
Name = "swarm-manager${count.index + 1}"
}
provisioner "remote-exec" {
inline = [
"sleep 30",
"docker swarm init --advertise-addr ${aws_instance.swarm-manager.private_ip}"
]
}
}
data "external" "swarm_token" {
program = ["bash", "${path.module}/get_swarm_tokens.sh"]
query = {
swarms = ["${aws_instance.swarm_manager.*.private_ip}"]
}
}
resource "aws_instance" "swarm_node" {
count = "${var.swarm_size}"
ami = "${var.node_ami}"
tags = {
Name = "swarm-node-${count.index}"
}
provisioner "remote-exec" {
inline = [
"# Enrol me in the right swarm, distributed over swarms available",
"./enrol.sh ${element(split("|", data.swarm_token.result.tokens), count.index)}"
]
}
}
Related
I have saas based Django app, I want, when a customer asks me to use my software, then i will auto provision new droplet and auto-deploy the app there, and the info should be saved in my database, like ip, customer name, database info etc.
This is my terraform script and it is working very well coz, the database is now running on
terraform {
required_providers {
digitalocean = {
source = "digitalocean/digitalocean"
version = "~> 2.0"
}
}
}
provider "digitalocean" {
token = "dop_v1_60f33a1<MyToken>a363d033"
}
resource "digitalocean_droplet" "web" {
image = "ubuntu-18-04-x64"
name = "web-1"
region = "nyc3"
size = "s-1vcpu-1gb"
ssh_keys = ["93:<The SSH finger print>::01"]
connection {
host = self.ipv4_address
user = "root"
type = "ssh"
private_key = file("/home/py/.ssh/id_rsa") # it works
timeout = "2m"
}
provisioner "remote-exec" {
inline = [
"export PATH=$PATH:/usr/bin",
# install docker-compse
# install docker
# clone my github repo
"docker-compose up --build -d"
]
}
}
I want, when i run the commands, it should be create new droplet, new database instance and connect the database with my django .env file.
Everything should be auto created. Can anyone please help me how can I do it?
or my approach is wrong? in this situation, what would be the best solution?
I am creating an instance from a sourceImage, using this terraform template:
resource "tls_private_key" "sandbox_ssh" {
algorithm = "RSA"
rsa_bits = 4096
}
output "tls_private_key_sandbox" { value = "${tls_private_key.sandbox_ssh.private_key_pem}" }
locals {
custom_data1 = <<CUSTOM_DATA
#!/bin/bash
CUSTOM_DATA
}
resource "google_compute_instance_from_machine_image" "sandboxvm_test_fromimg" {
project = "<proj>"
provider = google-beta
name = "sandboxvm-test-fromimg"
zone = "us-central1-a"
tags = ["test"]
source_machine_image = "projects/<proj>/global/machineImages/sandboxvm-test-img-1"
can_ip_forward = false
labels = {
owner = "test"
purpose = "test"
ami = "sandboxvm-test-img-1"
}
metadata = {
ssh-keys = "${var.sshuser}:${tls_private_key.sandbox_ssh.public_key_openssh}"
}
network_interface {
network = "default"
access_config {
// Include this section to give the VM an external ip address
}
}
metadata_startup_script = local.custom_data1
}
output "instance_ip_sandbox" {
value = google_compute_instance_from_machine_image.sandboxvm_test_fromimg.network_interface.0.access_config.0.nat_ip
}
output "user_name" {
value = var.sshuser
}
I can't even ping / netcat, neither the private or public IP of the VM created. Even the "serial port" ssh, passed inside custom script helps.
I'm suspecting, that since it is a "google beta" capability, is it even working / reliable?
Maybe we just can't yet, create VMs i.e GCEs from "SourceImages" in GCP, Unless proven otherwise, with a simple goof-up not very evident in my TF.
I could solve it actually, and all this somewhere sounds very sick of GCE.
Problem was while creating the base image, the instance I had chosen has had the following :
#sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.6 2
#sudo update-alternatives --install /usr/bin/python3 python /usr/bin/python3.7 1
Maybe I should try with "python3" instead of "python",
but when instantiating GCEs basis this MachineImage, it looks for a rather deprecated "python2.7" and not "python3" and complained of missing / unreadable packages like netiplan etc.
Commenting the "update-alternatives" and installing python3.6 and python3.7 explicitly did the trick!
I am adding a startup script to my GKE nodes using terraform
provider "google" {
project = var.project
region = var.region
zone = var.zone
credentials = "google-key.json"
}
terraform {
backend "gcs" {
bucket = "tf-state-bucket-devenv"
prefix = "terraform"
credentials = "google-key.json"
}
}
resource "google_container_cluster" "primary" {
name = var.kube-clustername
location = var.zone
remove_default_node_pool = true
initial_node_count = 1
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "primary_preemptible_nodes" {
name = var.kube-poolname
location = var.zone
cluster = google_container_cluster.primary.name
node_count = var.kube-nodecount
node_config {
preemptible = var.kube-preemptible
machine_type = "n1-standard-1"
disk_size_gb = 10
disk_type = "pd-standard"
metadata = {
disable-legacy-endpoints = "true",
startup_script = "cd /mnt/stateful_partition/home && echo hi > test.txt"
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
}
}
When i go into the GCP interface, select the node and view the metadata I can see the key/value is there
metadata_startup_script
#!/bin/bash sudo su && cd /mnt/stateful_partition/home && echo hi > test.txt
However when running the below command on the node -
sudo google_metadata_script_runner --script-type startup -
-debug
I got the below -
startup-script: INFO Starting startup scripts.
startup-script: INFO No startup scripts found in metadata.
startup-script: INFO Finished running startup scripts.
Does anyone know why this script is not working/showing up? - Is it because its a GKE node and google dont let you edit these, I cant actually find anything on their documentation where they specifically say that.
You cannot specify startup scripts to run on GKE nodes. The node has a built-in startup sequence to initialize the node and join it to the cluster and to ensure that this works properly (e.g. to ensure that when you ask for 100 nodes you get 100 functional nodes) you cannot add additional logic to the startup sequence.
As an alternative, you can create a DaemonSet that runs on all of your nodes to perform node-level initialization. One advantage of this is that you can tweak your DaemonSet and re-apply it to existing nodes (without having to recreate them) if you want to change how they are configured.
Replace this metadata key name metadata_startup_script by this one startup_script.
In addition, your startup script runs as root user. you don't need to perform a sudo su
I know that container optimized OS is mostly "noexec". I have a usecase when I need to execute some simple scripts in my home directory, copy files from my docker image to host etc. There are no problems with this when I log into the instance with SSH. But with terraform it seems not to work:
resource "null_resource" "test_upload2" {
count = length(var.nodes)
provisioner "remote-exec" {
connection {
type = "ssh"
host = google_compute_address.static[count.index].address
private_key = file("keys/private_key")
user = var.admin_username
script_path = "/home/hyperledger/provision.sh"
}
inline = [
"ls"
]
}
depends_on = [google_compute_instance.peer-blockchain-vm, null_resource.test_upload]
}
I get the following error message:
null_resource.test_upload[0] (remote-exec): bash: /home/hyperledger/provision.sh: Permission denied
Error: error executing "/home/hyperledger/provision.sh": Process exited with status 126
Is there a way to perform this purely with Terraform?
Seems not nice to outsource this logic to some local shell script and achieve the goal with "local-exec".
For now I've found a solution.
When I create the instance I set the following startup-script:
resource "google_compute_instance" "my_vm" {
...
metadata_startup_script = "mkdir -p /home/hyperledger/tmp/;sudo mount -t tmpfs -o size=100M tmpfs /home/hyperledger/tmp/"
}
It creates an in-memory disk. The script-path in resource should then be redefined as follows:
script_path = "/home/hyperledger/tmp/provision.sh"
All scripts in this temporary directory can be executed.
The first executed provisioner should should change the owner of home directory since it was created with root owner above:
provisioner "remote-exec" {
connection {
type = "ssh"
private_key = file(var.private_key)
user = var.admin_username
script_path = "/home/hyperledger/tmp/provision.sh"
}
inline = [
"sudo chown -R hyperledger:hyperledger /home/hyperledger"
]
}
I am trying to create a Windows Ec2 instance from AMI and executing a powershell command on that as :
data "aws_ami" "ec2-worker-initial-encrypted-ami" {
filter {
name = "tag:Name"
values = ["ec2-worker-initial-encrypted-ami"]
}
}
resource "aws_instance" "my-test-instance" {
ami = "${data.aws_ami.ec2-worker-initial-encrypted-ami.id}"
instance_type = "t2.micro"
tags {
Name = "my-test-instance"
}
provisioner "local-exec" {
command = "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
interpreter = ["PowerShell"]
}
}
and I am facing following error :
aws_instance.my-test-instance: Error running command 'C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1
-Schedule': exit status 1. Output: The term 'C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1'
is not recognized as the name of a cmdlet, function, script file, or
operable program. Check the spelling of the name, or if a path was
included, verify that the path is correct and try again. At line:1
char:72
C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1
<<<< -Schedule
CategoryInfo : ObjectNotFound: (C:\ProgramData...izeInstance.ps1:String) [],
CommandNotFoundException
FullyQualifiedErrorId : CommandNotFoundException
You are using a local-exec provisioner which runs the request powershell code on the workstation running Terraform:
The local-exec provisioner invokes a local executable after a resource
is created. This invokes a process on the machine running Terraform,
not on the resource.
It sounds like you want to execute the powershell script on the resulting instance in which case you'll need to use a remote-exec provisioner which will run your powershell on the target resource:
The remote-exec provisioner invokes a script on a remote resource
after it is created. This can be used to run a configuration
management tool, bootstrap into a cluster, etc.
You will also need to include connection details, for example:
provisioner "remote-exec" {
command = "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
interpreter = ["PowerShell"]
connection {
type = "winrm"
user = "Administrator"
password = "${var.admin_password}"
}
}
Which means this instance must also be ready to accept WinRM connections.
There are other options for completing this task though. Such as using userdata, which Terraform also supports. This might look like the following example:
Example of using a userdata file in Terraform
File named userdata.txt:
<powershell>
C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule
</powershell>
Launch instance using the userdata file:
resource "aws_instance" "my-test-instance" {
ami = "${data.aws_ami.ec2-worker-initial-encrypted-ami.id}"
instance_type = "t2.micro"
tags {
Name = "my-test-instance"
}
user_data = "${file(userdata.txt)}"
}
The file interpolation will read the contents of the userdata file as string to pass to userdata for the instance launch. Once the instance launches it should run the script as you expect.
What Brian is claiming is correct, you will get "invalid or unknown key: interpreter" error.
To correctly run powershell you will need to run it as following, based on Brandon's answer:
provisioner "remote-exec" {
connection {
type = "winrm"
user = "Administrator"
password = "${var.admin_password}"
}
inline = [
"powershell -ExecutionPolicy Unrestricted -File C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule"
]
}
Edit
To copy the files over to the machine use the below:
provisioner "file" {
source = "${path.module}/some_path"
destination = "C:/some_path"
connection {
host = "${azurerm_network_interface.vm_nic.private_ip_address}"
timeout = "3m"
type = "winrm"
https = true
port = 5986
use_ntlm = true
insecure = true
#cacert = "${azurerm_key_vault_certificate.vm_cert.certificate_data}"
user = var.admin_username
password = var.admin_password
}
}
Update:
Currently provisioners are not recommended by hashicorp, full instructions and explanation (it is long) can be found at: terraform.io/docs/provisioners/index.html
FTR: Brandon's answer is correct, except the example code provided for the remote-exec includes keys that are unsupported by the provisioner.
Neither command nor interpreter are supported keys.
https://www.terraform.io/docs/provisioners/remote-exec.html