Can't start a self managed node group through Terraform - amazon-web-services

I am trying to deploy a self managed node group through terraform, for days now. Deploying a non self managed one works out of the bat, however, I have the following issue with the self managed one. This is what my code looks like:
self_managed_node_groups = {
self_mg_4 = {
node_group_name = "self-managed-ondemand"
subnet_ids = module.aws_vpc.private_subnets
create_launch_template = true
launch_template_os = "amazonlinux2eks"
custom_ami_id = "xxx"
public_ip = false
pre_userdata = <<-EOT
yum install -y amazon-ssm-agent \
systemctl enable amazon-ssm-agent && systemctl start amazon-ssm-agent \
EOT
disk_size = 5
instance_type = "t2.small"
desired_size = 1
max_size = 5
min_size = 1
capacity_type = ""
k8s_labels = {
Environment = "dev-test"
Zone = ""
WorkerType = "SELF_MANAGED_ON_DEMAND"
}
additional_tags = {
ExtraTag = "t2x-on-demand"
Name = "t2x-on-demand"
subnet_type = "private"
}
create_worker_security_group = false
}
}
This is the module I use: github.com/aws-samples/aws-eks-accelerator-for-terraform
And this is what Terraform throws after 10 mins:
Error: "Cluster": Waiting up to 10m0s: Need at least 1 healthy instances in ASG, have 0.
Cause: "At 2022-02-10T16:46:14Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.", Description: "Launching a new EC2 instance. Status Reason: The requested configuration is currently not supported. Please check the documentation for supported configurations. Launching EC2 instance failed.", StatusCode: "Failed"
Full code:
https://pastebin.com/mtVGC8PP

The solution was actually changing my t2.small to t3.small. Turns out my AZS didn't support t2.

Related

Packer Error while creating ami using hcl2: "Error querying AMI: InvalidAMIID.Malformed: Invalid id:"

I am working to build the packer pipeline which would use the market place ami to install certain softwares and create an ami. I had created the json template which are working fine but as per the packer recommendation, I am working to upgrade it to the hcl2 template.
When i run the hcl2_upgrade command. I see a the json template is converted to the .pkr.hcl template but while running it. I have done some customization to template as per the recommded in packer documentation.It gives me below error.
data "amazon-ami" "autogenerated_1"{
access_key = "${var.aws_access_key}"
filters = {
root-device-type = "ebs"
virtualization-type = "hvm"
name = "**** Linux *"
}
most_recent = true
region = "${var.aws_region}"
owners = ["${var.owner_id}"]
secret_key = "${var.aws_secret_key}"
}
when I am trying to consume this ami id in the source block It gives me error.
ami_name = "${var.ami_name}"
associate_public_ip_address = false
force_deregister = true
iam_instance_profile = "abc"
instance_type = "****"
region = "${var.aws_region}"
source_ami = "{data.amazon-ami.autogenerated_1.id}"
ssh_interface = "private_ip"
ssh_username = "user"
subnet_id = "subnet-********"
vpc_id = "vpc-***********"
}
Error details are below:
amazon-ebs.pqr_ami: status code: 400, request id: *********
Build 'amazon-ebs.pqr_ami' errored after 1 second 49 milliseconds: Error querying AMI: InvalidAMIID.Malformed: Invalid id: "{data.amazon-ami.autogenerated_1.id}" (expecting "ami-...")
status code: 400, request id: ************
Your AMI is literally a string source_ami = "{data.amazon-ami.autogenerated_1.id}". It should be:
source_ami = "${data.amazon-ami.autogenerated_1.id}"
or for HCL2:
source_ami = data.amazon-ami.autogenerated_1.id
Ensure that the AMI ID is in the same region as what was specified in the script

Error launching source instance: InvalidBlockDeviceMapping: Volume of size 8GB is smaller than snapshot

I came across an issue while launching a single ec2 instance (with existing AMI having volume size=32GB) using the ec2-instance gruntwork module in terraform-aws-service-catalog. It does not allow the creation of root volume with the given snapshot id (size=32GB). The error popped up while launching ec2 instance -
module.demo_instance.module.ec2_instance.aws_instance.instance: Creating...
Error: Error launching source instance: InvalidBlockDeviceMapping: Volume of size 8GB is smaller than snapshot 'snap-0850762dcfacpb2957', expect size >= 32GBB
on .terraform/modules/demo_instance.ec2_instance/modules/single-server/main.tf line 23, in resource "aws_instance" "instance":
23: resource "aws_instance" "instance" {
I see ec2 instance is using single-server module from terraform-aws-server and it has a default root_volume_size of 8GB which can't be modifiable from the ec2-instance module. And it seems that the ebs_volume I am trying to attach to ec2 instance is not working
module "demo_instance" {
source = "git::git#github.com:gruntwork-io/terraform-aws-service-catalog.git//modules/services/ec2-instance?ref=v0.44.5"
name = "${var.name}-${var.account_name}"
instance_type = var.instance_type
ami = "ami-03a0a2de6ce3aq7ff7"
ami_filters = null
enable_ssh_grunt = false
keypair_name = local.key_pair_name
vpc_id = var.vpc_id
subnet_id = var.subnet_ids[0]
ebs_volumes = {
"demo-volume" = {
type = "gp2"
size = 32
snapshot_id = "snap-0850762dcfacpb2957"
},
}
allow_ssh_from_cidr_blocks = var.allow_ssh_from_cidr_list
allow_ssh_from_security_group_ids = []
allow_port_from_cidr_blocks = {}
allow_port_from_security_group_ids = {}
route53_zone_id = ""
dns_zone_is_private = true
route53_lookup_domain_name = ""
}
Is there any way to modify the default value (8GB) of root_volume_size using ec2-instance gruntwork module ? Any help would be appreciated.
Gruntwork raised an issue regarding the problem. size is not exposed in the ec2-instance module. In the meantime, single-server module can be used directly.

cannot ssh into instance created from sourceImage "google_compute_instance_from_machine_image"

I am creating an instance from a sourceImage, using this terraform template:
resource "tls_private_key" "sandbox_ssh" {
algorithm = "RSA"
rsa_bits = 4096
}
output "tls_private_key_sandbox" { value = "${tls_private_key.sandbox_ssh.private_key_pem}" }
locals {
custom_data1 = <<CUSTOM_DATA
#!/bin/bash
CUSTOM_DATA
}
resource "google_compute_instance_from_machine_image" "sandboxvm_test_fromimg" {
project = "<proj>"
provider = google-beta
name = "sandboxvm-test-fromimg"
zone = "us-central1-a"
tags = ["test"]
source_machine_image = "projects/<proj>/global/machineImages/sandboxvm-test-img-1"
can_ip_forward = false
labels = {
owner = "test"
purpose = "test"
ami = "sandboxvm-test-img-1"
}
metadata = {
ssh-keys = "${var.sshuser}:${tls_private_key.sandbox_ssh.public_key_openssh}"
}
network_interface {
network = "default"
access_config {
// Include this section to give the VM an external ip address
}
}
metadata_startup_script = local.custom_data1
}
output "instance_ip_sandbox" {
value = google_compute_instance_from_machine_image.sandboxvm_test_fromimg.network_interface.0.access_config.0.nat_ip
}
output "user_name" {
value = var.sshuser
}
I can't even ping / netcat, neither the private or public IP of the VM created. Even the "serial port" ssh, passed inside custom script helps.
I'm suspecting, that since it is a "google beta" capability, is it even working / reliable?
Maybe we just can't yet, create VMs i.e GCEs from "SourceImages" in GCP, Unless proven otherwise, with a simple goof-up not very evident in my TF.
I could solve it actually, and all this somewhere sounds very sick of GCE.
Problem was while creating the base image, the instance I had chosen has had the following :
#sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.6 2
#sudo update-alternatives --install /usr/bin/python3 python /usr/bin/python3.7 1
Maybe I should try with "python3" instead of "python",
but when instantiating GCEs basis this MachineImage, it looks for a rather deprecated "python2.7" and not "python3" and complained of missing / unreadable packages like netiplan etc.
Commenting the "update-alternatives" and installing python3.6 and python3.7 explicitly did the trick!

Google GKE startup script not workin for GKE Node

I am adding a startup script to my GKE nodes using terraform
provider "google" {
project = var.project
region = var.region
zone = var.zone
credentials = "google-key.json"
}
terraform {
backend "gcs" {
bucket = "tf-state-bucket-devenv"
prefix = "terraform"
credentials = "google-key.json"
}
}
resource "google_container_cluster" "primary" {
name = var.kube-clustername
location = var.zone
remove_default_node_pool = true
initial_node_count = 1
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "primary_preemptible_nodes" {
name = var.kube-poolname
location = var.zone
cluster = google_container_cluster.primary.name
node_count = var.kube-nodecount
node_config {
preemptible = var.kube-preemptible
machine_type = "n1-standard-1"
disk_size_gb = 10
disk_type = "pd-standard"
metadata = {
disable-legacy-endpoints = "true",
startup_script = "cd /mnt/stateful_partition/home && echo hi > test.txt"
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
}
}
When i go into the GCP interface, select the node and view the metadata I can see the key/value is there
metadata_startup_script
#!/bin/bash sudo su && cd /mnt/stateful_partition/home && echo hi > test.txt
However when running the below command on the node -
sudo google_metadata_script_runner --script-type startup -
-debug
I got the below -
startup-script: INFO Starting startup scripts.
startup-script: INFO No startup scripts found in metadata.
startup-script: INFO Finished running startup scripts.
Does anyone know why this script is not working/showing up? - Is it because its a GKE node and google dont let you edit these, I cant actually find anything on their documentation where they specifically say that.
You cannot specify startup scripts to run on GKE nodes. The node has a built-in startup sequence to initialize the node and join it to the cluster and to ensure that this works properly (e.g. to ensure that when you ask for 100 nodes you get 100 functional nodes) you cannot add additional logic to the startup sequence.
As an alternative, you can create a DaemonSet that runs on all of your nodes to perform node-level initialization. One advantage of this is that you can tweak your DaemonSet and re-apply it to existing nodes (without having to recreate them) if you want to change how they are configured.
Replace this metadata key name metadata_startup_script by this one startup_script.
In addition, your startup script runs as root user. you don't need to perform a sudo su

Get endpoint for Terraform with aws_elasticache_replication_group

I have what I think is a simple Terraform config for AWS ElastiCache with Redis:
resource "aws_elasticache_replication_group" "my_replication_group" {
replication_group_id = "my-rep-group",
replication_group_description = "eln00b"
node_type = "cache.m4.large"
port = 6379
parameter_group_name = "default.redis5.0.cluster.on"
snapshot_retention_limit = 1
snapshot_window = "00:00-05:00"
subnet_group_name = "${aws_elasticache_subnet_group.my_subnet_group.name}"
automatic_failover_enabled = true
cluster_mode {
num_node_groups = 1
replicas_per_node_group = 1
}
}
I tried to define the endpoint output using:
output "my_cache" {
value = "${aws_elasticache_replication_group.my_replication_group.primary_endpoint_address}"
}
When I run an apply through terragrunt I get:
Error: Error running plan: 1 error(s) occurred:
module.mod.output.my_cache: Resource 'aws_elasticache_replication_group.my_replication_group' does not have attribute 'primary_endpoint_address' for variable 'aws_elasticache_replication_group.my_replication_group.primary_endpoint_address'
What am I doing wrong here?
The primary_endpoint_address attribute is only available for non cluster-mode Redis replication groups as mentioned in the docs:
primary_endpoint_address - (Redis only) The address of the endpoint for the primary node in the replication group, if the cluster mode is disabled.
When using cluster mode you should use configuration_endpoint_address instead to connect to the Redis cluster.