Can't SSH into EC2 instance launched from an autoscale group - amazon-web-services

TL;DR: I am spawning an EC2 instance using an autoscale group, and I can connect to it. But I cannot successfully log in to that instance using the SSH key pair I specified in the autoscale group.
I have used Terraform to create an autoscale group to launch an EC2 instance. Here is the autoscale group:
module "ssh_key_pair" {
source = "cloudposse/key-pair/aws"
version = "0.18.3"
name = "myproj-ec2"
ssh_public_key_path = "."
generate_ssh_key = true
}
module "autoscale_group" {
source = "cloudposse/ec2-autoscale-group/aws"
version = "0.30.0"
name = "myproj"
image_id = data.aws_ami.amazon_linux_2.id
instance_type = "t2.small"
security_group_ids = [module.sg.id]
subnet_ids = module.subnets.public_subnet_ids
health_check_type = "EC2"
min_size = 1
desired_capacity = 1
max_size = 1
wait_for_capacity_timeout = "5m"
associate_public_ip_address = true
user_data_base64 = base64encode(templatefile("${path.module}/user_data.tpl", { cluster_name = aws_ecs_cluster.default.name }))
key_name = module.ssh_key_pair.key_name
# Auto-scaling policies and CloudWatch metric alarms
autoscaling_policies_enabled = true
cpu_utilization_high_threshold_percent = "70"
cpu_utilization_low_threshold_percent = "20"
}
And the user_data.tpl file looks like this:
#!/bin/bash
echo ECS_CLUSTER=${cluster_name} >> /etc/ecs/ecs.config
# Set up crontab file
echo "MAILTO=webmaster#myproj.com" >> /var/spool/cron/ec2-user
echo " " >> /var/spool/cron/ec2-user
echo "# Clean docker files once a week" >> /var/spool/cron/ec2-user
echo "0 0 * * 0 /usr/bin/docker system prune -f" >> /var/spool/cron/ec2-user
echo " " >> /var/spool/cron/ec2-user
start ecs
The instance is spawned, and when I SSH into the spawned instance using the DNS name for the first time, I can successfully connect. (The SSH server returns a host key on first connect, the same one listed in the instance's console output. After approving it, the host key is added to ~/.ssh/known_hosts.)
However, despite having created an ssh_key_pair and specifying the key pair's key_name when creating the autoscale group, I am not able to successfully log in to the spawned instance. (I've checked, and the key pair exists in the AWS console using the expected name.) When I use SSH on the command line, specifying the private key half of the key pair created, the handshake above succeeds, but then the connection ultimately fails with:
debug1: No more authentication methods to try.
ec2-user#myhost.us-east-2.compute.amazonaws.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
When I use the Connect button in the AWS Console and click the "SSH client" tab, it says:
No associated key pair
This instance is not associated with a key pair. Without a key pair, you can't connect to the instance through SSH.
You can connect using EC2 Instance Connect with just a valid username. You can connect using Session Manager if you have been granted the necessary permissions.
I also can't use EC2 Instance Connect, which fails with:
There was a problem connecting to your instance
Log in failed. If this instance has just started up, wait a few minutes and try again. Otherwise, ensure the instance is running on an AMI that supports EC2 Instance Connect.
I'm using the most_recent AMI with regex amzn2-ami-ecs-hvm.*x86_64-ebs, which as I understand it comes pre-installed with EC2 Instance Connect.
Am I missing a step in the user_data template? I also read something somewhere about the instance's roles possibly affecting this, but I can't figure out how to configure that with an automatically generated instance like this.

What you've posted now, and in your previous questions, is correct. There is no reason why you won't be able to ssh into the instance.
You must make sure that you are using myproj-ec2 private ssh key in your ssh command, for example:
ssh -i ./myproj-ec2 ec2-user#<instance-public-ip-address>
Also ec2-instance-connect is not installed on ECS-optimized instances. You would have to manually install it if you want to use it.
p.s. I'm not checking your user_data or any iam roles, as they are not related to your ssh issues. If you have issues with those, new question should be asked.

Related

How to configure Packer ssh into GCP VM for building image?

I am building GCP image with packer. I created service account of "Compute Instance Admin v1" and "Service Account User". It can successfully create the VM but cannot ssh into the instance to proceed further for the custom image.
Error message
Build 'googlecompute.custom-image' errored after 2 minutes 20 seconds: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
build file source code (packer.pkr.hcl)
locals {
project_id = "project-id"
source_image_family = "rocky-linux-8"
source_image_project_id = ["rocky-linux-cloud"]
ssh_username = "packer"
machine_type = "e2-medium"
zone = "us-central1-a"
}
source "googlecompute" "custom-image" {
image_name = "custom-image" # Name of image to be created
image_description = "Custom Image 1" # Description for image to be created
project_id = "${local.project_id}"
source_image_family = "${local.source_image_family}"
source_image_project_id = "${local.source_image_project_id}"
ssh_username = "${local.ssh_username}"
machine_type = "${local.machine_type}"
zone = "${local.zone}"
}
build {
sources = ["source.googlecompute.custom-image"]
#
# Run arbitrary shell script file
#
provisioner "shell" {
execute_command = "sudo su - root -c \"sh {{ .Path }} \""
script = "foo.sh"
}
}
It appears that you are having trouble connecting via SSH to the Packer-created instance for your GCP image. If the username and password are incorrect or if the necessary permissions are not granted, this error message indicates that the authentication process failed. Check to see if the Compute Instance Admin v1 and Service Account User roles have the necessary access rights to resolve this issue. In addition, the project's firewall rules may need to be set up to allow incoming SSH connections on the port you're using. You can refer to the official GCP documentation for more information regarding the configuration of firewall rules. You can also connect to the instance and continue troubleshooting the issue by using the "gcloud compute ssh" command.
Attaching troubleshooting ssh for reference.
The problem is associated with Qwiklab. I was using the lab environment provided by Qwiklab for testing packer and GCP.
Once I deployed the same thing on regular GCP project. The packer ran successfully. it is suggested there may be some constraints in the lab environment of Qwiklab.

Terraform: What are the alternatives to connect to existing ec2 instance avoiding remote-exec provisioner

I have some queries on the usage of the "remote-exec" provisioner on an existing ec2 instance.
I am connecting to an ubuntu ec2 using an ssh connection, to execute a script. Below is the sample code to do so:-
resource "null_resource" "cwa_setup_on_existing_ec2" {
for_each = toset(local.cw_existing_instance_watchlist)
connection {
type = "ssh"
user = "ubuntu"
private_key = "${file("priv_key")}"
host = each.value.instance_info.public_ip
timeout = "4m"
}
provisioner "file"{
source = "user_data.sh"
destination = "/tmp/user_data.sh"
}
provisioner "remote-exec" {
inline = [
"echo '----- Executing cwa-setup script on already initialized ec2 ------- '",
"chmod +x /tmp/user_data.sh",
"cd /tmp",
"./user_data.sh >> cwa-setup.log" ]
}
}
I am copying the script to ec2 using file provisioner first and then executing it. The execution succeeds and looks safe to connect with my own private key.
But I need to able to connect (via Terraform) to set of existing instances created with multiple user's private keys in the root account. Its insecure and infeasible to provide private keys for other users by loading them from disk.
Thus summing up the concerns :-
What is the better way(s) to connect to existing ec2 via terraform?
Terraform documentation suggests using Provisioners as a last resort and using better alternatives. What are the alternative ways to run a script on a remote instance without compromising secure access?

Error invalid instance URLs: resource "google_compute_instance_group" "t-compute-instance-group"

Goal: Create a compute instance and add it to an unmanaged instance group in GCP using terraform.
Issue: A compute instance and an unmanaged instance group are being created successfully, but the instance is not being added to the group and giving:
Error invalid instance URLs:
resource "google_compute_instance_group" "t-compute-instance-group"
Able to add the instance to the group manually after running the terraform configuration though.
Service account key has Project Editor permission assigned.
Code: https://github.com/sagar-aj7/terraform_unmanaged_inst_group
I've hit the same problem, what worked for me was to use the selflink instead of the id:
resource "google_compute_instance_group" "backend-instances" {
name = "..."
zone = "${var.availability_zone}"
instances = ["${google_compute_instance.node.*.self_link}"]
named_port {
name = "http"
port = "8080"
}
named_port {
name = "https"
port = "8443"
}
..
}
I'm on the google provider version 2.8.0. I guess it's time to upgrade :)
I had the same problem today. The solution was to update the google terraform provider to a newer version (3.52.0). This fixed the issue and created the instance group with the assigned instance.

Cannot remote (rdp) into EC2 started from aws lambda using boto3::run_instances

When I launch an EC2 instance from a particular AMI via the web console, it works just fine and I can RDP into it no problems.
But when I launch another (identical) instance via an aws lambda, I cannot RDP into the instance
Details
Here is the lambda used to launch the instance
import boto3
REGION = 'ap-southeast-2'
AMI = 'ami-08e9ad7d527e4e95c'
INSTANCE_TYPE = 't2.small'
def lambda_handler(event, context):
EC2 = boto3.client('ec2', region_name=REGION)
init_script = """<powershell>
powershell "C:\\Users\\Administrator\\Desktop\\ScriptToRunDaily.ps1"
aws ec2 terminate-instances --instance-ids 'curl http://169.254.169.254/latest/meta-data/instance-id'
</powershell>"""
instance = EC2.run_instances(
ImageId=AMI,
InstanceType=INSTANCE_TYPE,
MinCount=1,
MaxCount=1,
InstanceInitiatedShutdownBehavior='terminate',
UserData=init_script
)
I can see the instance start up in the AWS console. Everything looks normal until I go to remote in, where a prompt says 'Initiating remote session' takes ~15 seconds and returns
We couldn't connect to the remote PC. Make sure the PC is turned on and connected to the network, and that remote access is enabled.
Error code: 0x204
Note
When I click try to connect to the instance through the AWS console, it lets me download an RDP file, however, it doesn't display the option to 'Get Password' as it does if I start the exact same AMI through the console (as opposed to via a lambda)
I suspect I may need to associate the instance with a keypair at launch?
Also note
Before creating this particular AMI, I logged in and changed the password, so I really have no need to generate one using the .pem file.
It turns out I needed to add SecurityGroupIds
Note that it's an array of up to 5 values, rather than a single value, so it's specified like ['first', 'second', 'etc'] rather than just 'first'. Hence the square brackets around ['launch-wizard-29'] below
I also specified a key.
The following is what worked for me
import boto3
REGION = 'ap-southeast-2'
AMI = 'ami-08e9ad7d527e4e95c'
INSTANCE_TYPE = 't2.small'
def lambda_handler(event, context):
EC2 = boto3.client('ec2', region_name=REGION)
init_script = """<powershell>
powershell "C:\\Users\\Administrator\\Desktop\\ScriptToRunDaily.ps1"
aws ec2 terminate-instances --instance-ids 'curl http://169.254.169.254/latest/meta-data/instance-id'
</powershell>"""
instance = EC2.run_instances(
ImageId=AMI,
InstanceType=INSTANCE_TYPE,
MinCount=1,
MaxCount=1,
InstanceInitiatedShutdownBehavior='terminate',
UserData=init_script,
KeyName='aws', # Name of a key - I used a key (i.e. pem file) that I used for other instances
SecurityGroupIds=['launch-wizard-29'] # I copied this from another (running) instance
)

Why can't terraform SSH in to EC2 Instance using supplied example?

I'm using the AWS Two-tier example and I've direct copy-n-pasted the whole thing. terraform apply works right up to where it tries to SSH into the created EC2 instance. It loops several times giving this output before finally failing.
aws_instance.web (remote-exec): Connecting to remote host via SSH...
aws_instance.web (remote-exec): Host: 54.174.8.144
aws_instance.web (remote-exec): User: ubuntu
aws_instance.web (remote-exec): Password: false
aws_instance.web (remote-exec): Private key: false
aws_instance.web (remote-exec): SSH Agent: true
Ultimately, it fails w/:
Error applying plan:
1 error(s) occurred:
* ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
I've searched around and seen some older posts/issues saying flip agent=false and I've tried that also w/ no changes or success. I'm skeptical that this example is broky out of the box yet I've done no tailoring or modifications that could have broken it. I'm using terraform 0.6.11 installed via homebrew on OS X 10.10.5.
Additional detail:
resource "aws_instance" "web" {
# The connection block tells our provisioner how to
# communicate with the resource (instance)
connection {
# The default username for our AMI
user = "ubuntu"
# The connection will use the local SSH agent for authentication.
agent = false
}
instance_type = "t1.micro"
# Lookup the correct AMI based on the region
# we specified
ami = "${lookup(var.aws_amis, var.aws_region)}"
# The name of our SSH keypair we created above.
key_name = "${aws_key_pair.auth.id}"
# Our Security group to allow HTTP and SSH access
vpc_security_group_ids = ["${aws_security_group.default.id}"]
# We're going to launch into the same subnet as our ELB. In a production
# environment it's more common to have a separate private subnet for
# backend instances.
subnet_id = "${aws_subnet.default.id}"
# We run a remote provisioner on the instance after creating it.
# In this case, we just install nginx and start it. By default,
# this should be on port 80
provisioner "remote-exec" {
inline = [
"sudo apt-get -y update",
"sudo apt-get -y install nginx",
"sudo service nginx start"
]
}
}
And from the variables tf file:
variable "key_name" {
description = "Desired name of AWS key pair"
default = "test-keypair"
}
variable "key_path" {
description = "key location"
default = "/Users/n8/dev/play/.ssh/terraform.pub"
}
but i can ssh in with this command:
ssh -i ../.ssh/terraform ubuntu#w.x.y.z
You have two possibilities:
Add your key to your ssh-agent:
ssh-add ../.ssh/terraform
and use agent = true in your configuration. The case should work for you
Modify your configuration to use the key directly with
secret_key = "../.ssh/terraform"
or so. Please consult the documentation for more specific syntax.
I had the same issue and I did following configurations
connection {
type = "ssh"
user = "ec2-user"
private_key = "${file("*.pem")}"
timeout = "2m"
agent = false
}
Below is a complete and stand-alone resource "null_resource" with remote-exec provisioner w/ SSH connection including the necessary arguments supported by the ssh connection type:
private_key - The contents of an SSH key to use for the connection. These can be loaded from a file on disk using the file function. This takes preference over the password if provided.
type - The connection type that should be used. Valid types are ssh and winrm Defaults to ssh.
user - The user that we should use for the connection. Defaults to root when using type ssh and defaults to Administrator when using type winrm.
host - The address of the resource to connect to. This is usually specified by the provider.
port - The port to connect to. Defaults to 22 when using type ssh and defaults to 5985 when using type winrm.
timeout - The timeout to wait for the connection to become available. This defaults to 5 minutes. Should be provided as a string like 30s or 5m.
agent - Set to false to disable using ssh-agent to authenticate. On Windows the only supported SSH authentication agent is Pageant.
resource null_resource w/ remote-exec example code below:
resource "null_resource" "ec2-ssh-connection" {
provisioner "remote-exec" {
inline = [
"sudo apt-get update",
"sudo apt-get install -y python2.7 python-dev python-pip python-setuptools python-virtualenv libssl-dev vim zip"
]
connection {
host = "100.20.30.5"
type = "ssh"
port = 22
user = "ubuntu"
private_key = "${file(/path/to/your/id_rsa_private_key)}"
timeout = "1m"
agent = false
}
}
}
The solution provided in https://stackoverflow.com/a/35382911/12880305 was not working for me so I tried more ways and found the issue was with the keypair type I am using.
I was using the RSA keypair type and because of that I was getting an error
ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
I created a new key pair with the ED25519 type and it works perfectly fine with me.
https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-ec2-customers-ed25519-keys-authentication/
Connection block which I used
connection {
//Use the public IP of the instance to connect to it.
host = self.public_ip
type = "ssh"
user = "ubuntu"
private_key = file("pemfile-location.pem")
timeout = "1m"
agent = true
}
Check user name that exists in the base image. For example, it can be ubuntu for Ubuntu OS, or ec2-user for AWS images.
Alternatively, most cloud providers allow Terraform to create a new user on the first instance start with help of cloud-init config (check your provider documentation):
metadata = {
user-data = "${file("./user-meta-data.txt")}"
}
user-meta-data.txt:
#cloud-config
users:
- name: <NEW-USER-NAME>
groups: sudo
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh-authorized-keys:
- ssh-rsa <SSH-PUBLIC-KEY>
Increase connection timeout settings, sometimes it takes 1-2 minutes to start instance cloud network with ssh.
connection {
type = "ssh"
user = "<USER_NAME>"
private_key = "${file("pathto/id_rsa")}"
timeout = "3m"
}
If it does not work, try to connect manually via ssh with -v for verbose
ssh -v -i <path_to_private_key/id_rsa> <USER_NAME>#<INSTANCE_IP>
the top answer it does not work for me.
the answer that use ed25519 does work for me. but not necessary to use PEM fortmat.
here is working sample for me
connection {
host = "${aws_instance.example.public_ip}"
type = "ssh"
port = "22"
user = "ubuntu"
timeout = "120s"
private_key = "${file("${var.key_location}")}"
agent= false
}
}
key_location="~/.ssh/id_ed25519"