How to configure Packer ssh into GCP VM for building image? - google-cloud-platform

I am building GCP image with packer. I created service account of "Compute Instance Admin v1" and "Service Account User". It can successfully create the VM but cannot ssh into the instance to proceed further for the custom image.
Error message
Build 'googlecompute.custom-image' errored after 2 minutes 20 seconds: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
build file source code (packer.pkr.hcl)
locals {
project_id = "project-id"
source_image_family = "rocky-linux-8"
source_image_project_id = ["rocky-linux-cloud"]
ssh_username = "packer"
machine_type = "e2-medium"
zone = "us-central1-a"
}
source "googlecompute" "custom-image" {
image_name = "custom-image" # Name of image to be created
image_description = "Custom Image 1" # Description for image to be created
project_id = "${local.project_id}"
source_image_family = "${local.source_image_family}"
source_image_project_id = "${local.source_image_project_id}"
ssh_username = "${local.ssh_username}"
machine_type = "${local.machine_type}"
zone = "${local.zone}"
}
build {
sources = ["source.googlecompute.custom-image"]
#
# Run arbitrary shell script file
#
provisioner "shell" {
execute_command = "sudo su - root -c \"sh {{ .Path }} \""
script = "foo.sh"
}
}

It appears that you are having trouble connecting via SSH to the Packer-created instance for your GCP image. If the username and password are incorrect or if the necessary permissions are not granted, this error message indicates that the authentication process failed. Check to see if the Compute Instance Admin v1 and Service Account User roles have the necessary access rights to resolve this issue. In addition, the project's firewall rules may need to be set up to allow incoming SSH connections on the port you're using. You can refer to the official GCP documentation for more information regarding the configuration of firewall rules. You can also connect to the instance and continue troubleshooting the issue by using the "gcloud compute ssh" command.
Attaching troubleshooting ssh for reference.

The problem is associated with Qwiklab. I was using the lab environment provided by Qwiklab for testing packer and GCP.
Once I deployed the same thing on regular GCP project. The packer ran successfully. it is suggested there may be some constraints in the lab environment of Qwiklab.

Related

GCP Terraform, can I add a root ssh key?

resource "google_compute_instance" "my_vm" {
boot_disk {
initialize_params {
image = "ubuntu-1804-lts"
size = 100
}
}
...
metadata = {
ssh-keys = "root:${var.publickey}"
}
}
I am attempting to create a google compute instance via terraform. However, when attempting to add a root SSH key, it doesn't seem to work. Instance gets created fine, but I can't SSH. (I Have enabled root ssh and restarted SSHD). I think the public key does not exist. If I go to
nano /root/.ssh/authorized_keys
It just has
# Added by Google
---- BEGIN SSH2 PUBLIC KEY ----
with no keys in it. Where can I validate that the key has/hasn't been added to the machine?
This is the error I get when attempting to SSH as root -
Server refused our key
No supported authentication methods available (server sent: publickey)

Can't SSH into EC2 instance launched from an autoscale group

TL;DR: I am spawning an EC2 instance using an autoscale group, and I can connect to it. But I cannot successfully log in to that instance using the SSH key pair I specified in the autoscale group.
I have used Terraform to create an autoscale group to launch an EC2 instance. Here is the autoscale group:
module "ssh_key_pair" {
source = "cloudposse/key-pair/aws"
version = "0.18.3"
name = "myproj-ec2"
ssh_public_key_path = "."
generate_ssh_key = true
}
module "autoscale_group" {
source = "cloudposse/ec2-autoscale-group/aws"
version = "0.30.0"
name = "myproj"
image_id = data.aws_ami.amazon_linux_2.id
instance_type = "t2.small"
security_group_ids = [module.sg.id]
subnet_ids = module.subnets.public_subnet_ids
health_check_type = "EC2"
min_size = 1
desired_capacity = 1
max_size = 1
wait_for_capacity_timeout = "5m"
associate_public_ip_address = true
user_data_base64 = base64encode(templatefile("${path.module}/user_data.tpl", { cluster_name = aws_ecs_cluster.default.name }))
key_name = module.ssh_key_pair.key_name
# Auto-scaling policies and CloudWatch metric alarms
autoscaling_policies_enabled = true
cpu_utilization_high_threshold_percent = "70"
cpu_utilization_low_threshold_percent = "20"
}
And the user_data.tpl file looks like this:
#!/bin/bash
echo ECS_CLUSTER=${cluster_name} >> /etc/ecs/ecs.config
# Set up crontab file
echo "MAILTO=webmaster#myproj.com" >> /var/spool/cron/ec2-user
echo " " >> /var/spool/cron/ec2-user
echo "# Clean docker files once a week" >> /var/spool/cron/ec2-user
echo "0 0 * * 0 /usr/bin/docker system prune -f" >> /var/spool/cron/ec2-user
echo " " >> /var/spool/cron/ec2-user
start ecs
The instance is spawned, and when I SSH into the spawned instance using the DNS name for the first time, I can successfully connect. (The SSH server returns a host key on first connect, the same one listed in the instance's console output. After approving it, the host key is added to ~/.ssh/known_hosts.)
However, despite having created an ssh_key_pair and specifying the key pair's key_name when creating the autoscale group, I am not able to successfully log in to the spawned instance. (I've checked, and the key pair exists in the AWS console using the expected name.) When I use SSH on the command line, specifying the private key half of the key pair created, the handshake above succeeds, but then the connection ultimately fails with:
debug1: No more authentication methods to try.
ec2-user#myhost.us-east-2.compute.amazonaws.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
When I use the Connect button in the AWS Console and click the "SSH client" tab, it says:
No associated key pair
This instance is not associated with a key pair. Without a key pair, you can't connect to the instance through SSH.
You can connect using EC2 Instance Connect with just a valid username. You can connect using Session Manager if you have been granted the necessary permissions.
I also can't use EC2 Instance Connect, which fails with:
There was a problem connecting to your instance
Log in failed. If this instance has just started up, wait a few minutes and try again. Otherwise, ensure the instance is running on an AMI that supports EC2 Instance Connect.
I'm using the most_recent AMI with regex amzn2-ami-ecs-hvm.*x86_64-ebs, which as I understand it comes pre-installed with EC2 Instance Connect.
Am I missing a step in the user_data template? I also read something somewhere about the instance's roles possibly affecting this, but I can't figure out how to configure that with an automatically generated instance like this.
What you've posted now, and in your previous questions, is correct. There is no reason why you won't be able to ssh into the instance.
You must make sure that you are using myproj-ec2 private ssh key in your ssh command, for example:
ssh -i ./myproj-ec2 ec2-user#<instance-public-ip-address>
Also ec2-instance-connect is not installed on ECS-optimized instances. You would have to manually install it if you want to use it.
p.s. I'm not checking your user_data or any iam roles, as they are not related to your ssh issues. If you have issues with those, new question should be asked.

Can't start google compute instance with terraform

TLDR
I'm just trying to start up a simple vm and it terraform tells me I don't have sufficient permissions. Keep in mind I have a Trial account (and like $290 left of sweet sweet free money).
Details
provider.tf
provider "google" {
project = "My First Project"
region = "us-east1"
}
resource.tf
resource "google_compute_instance" "vm_instance" {
name = "terraform-instance"
machine_type = "f1-micro"
zone = "us-east1-c"
boot_disk {
initialize_params {
image = "debian-cloud/debian-9"
}
}
network_interface {
# A default network is created for all GCP projects
network = "default"
access_config {
}
}
}
Error
Error: Error loading zone 'us-east1-c': googleapi: Error 403: Permission denied on resource project My First Project., forbidden
My Trouble Shooting
I tried switching us-east1-c for other zones and tried us-central1 with different zones. Always got the same error.
I'm passing the credentials withGOOGLE_APPLICATION_CREDENTIALS environment variable and I know I'm passing it in correctly cause when I change the filename it breaks and says something like "that filename doesn't exist"
I've tried different server types (n1-standard-1, n1-highcpu-16)
I've tried so many different IAM permissions of particular note I tried Compute Admin, Compute Admin with Service Account User, and Compute Instance Admin and Service Account Admin.
Concerning the last point. I used
gcloud projects get-iam-policy <PROJECT NAME> \
--flatten="bindings[].members" \
--format='table(bindings.role)' \
--filter="bindings.members:<KEY NAME>"
And got this output
ROLE
roles/compute.admin
roles/compute.instanceAdmin
roles/compute.instanceAdmin.v1
roles/compute.instanceAdmin.v1
roles/iam.serviceAccountUser
But wait there's more
I went through this link and added all the permissions they suggested using the aforementioned key (except for all the ones that have to do with billing cause the organization is my school). When I checked the roles again I saw that it had added roles/storage.admin. Produced the same error though.
Update
A billing account is linked to my account and now the roles are this. AND IT STILL doesn't work
roles/billing.projectManager
roles/compute.admin
roles/compute.instanceAdmin
roles/compute.instanceAdmin.v1
roles/compute.instanceAdmin.v1
roles/iam.serviceAccountUser
roles/storage.admin

How to make Ansible Dynamic Inventory work with Google Cloud Platform (Google Compute Engine), GCP

I used Ansible to create a gce cluster following the guideline at: https://docs.ansible.com/ansible/latest/scenario_guides/guide_gce.html
And at the end of the GCE creations, I used the add_host Ansible module to register all instances in their corresponding groups. e.g. gce_master_ip
But then when I try to run the following tasks after the creation task, they would not work:
- name: Create redis on the master
hosts: gce_master_ip
connection: ssh
become: True
gather_facts: True
vars_files:
- gcp_vars/secrets/auth.yml
- gcp_vars/machines.yml
roles:
- { role: redis, tags: ["redis"] }
Within the auth.yml file I already provided the service account email, path to the json credential file and the project id. But apparently that's not enough. I got errors like below:
UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Permission denied (publickey).\r\n", "unreachable": true}
This a typical ssh username and credentials not permitted or not provided. In this case I would say I did not setup anything of the username and private key for the ssh connection that Ansible will use to do the connecting.
Is there anything I should do to make sure the corresponding credentials are provided to establish the connection?
During my search I think one question just briefly mentioned that you could use the gcloud compute ssh... command. But is there a way I could specify in Ansible to not using the classic ssh and use the gcloud one?
To have Ansible SSH into a GCE instance, you'll have to supply an SSH username and private key which corresponds to the the SSH configuration available on the instance.
So the question is: If you've just used the gcp_compute_instance Ansible module to create a fresh GCE instance, is there a convenient way to configure SSH on the instance without having to manually connect to the instance and do it yourself?
For this purpose, GCP provides a couple of ways to automate and manage key distribution for GCE instances.
For example, you could use the OS Login feature. To use OS Login with Ansible:
When creating the instance using Ansible, Enable OS Login on the target instance by setting the "enable-oslogin" metadata field to "TRUE" via the metadata parameter.
Make sure the Service Account attached to the instance that runs Ansible have both the roles/iam.serviceAccountUser and roles/compute.osLoginAdmin permissions.
Either generate a new or choose an existing SSH keypair that will be deployed to the target instance.
Upload the public key for use with OS Login: This can be done via gcloud compute os-login ssh-keys add --key-file [KEY_FILE_PATH] --ttl [EXPIRE_TIME] (where --ttl specifies how long you want this public key to be usable - for example, --ttl 1d will make it expire after 1 day)
Configure Ansible to use the Service Account's user name and the private key which corresponds to the public key uploaded via the gcloud command. For example by overriding the ansible_user and ansible_ssh_private_key_file inventory parameters, or by passing --private-key and --user parameters to ansible-playbook.
The service account username is the username value returned by the gcloud command above.
Also, if you want to automatically set the enable-oslogin metadata field to "TRUE" across all instances in your GCP project, you can simply add a project-wide metadata entry. This can be done in the Cloud Console under "Compute Engine > Metadata".

Why can't terraform SSH in to EC2 Instance using supplied example?

I'm using the AWS Two-tier example and I've direct copy-n-pasted the whole thing. terraform apply works right up to where it tries to SSH into the created EC2 instance. It loops several times giving this output before finally failing.
aws_instance.web (remote-exec): Connecting to remote host via SSH...
aws_instance.web (remote-exec): Host: 54.174.8.144
aws_instance.web (remote-exec): User: ubuntu
aws_instance.web (remote-exec): Password: false
aws_instance.web (remote-exec): Private key: false
aws_instance.web (remote-exec): SSH Agent: true
Ultimately, it fails w/:
Error applying plan:
1 error(s) occurred:
* ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
I've searched around and seen some older posts/issues saying flip agent=false and I've tried that also w/ no changes or success. I'm skeptical that this example is broky out of the box yet I've done no tailoring or modifications that could have broken it. I'm using terraform 0.6.11 installed via homebrew on OS X 10.10.5.
Additional detail:
resource "aws_instance" "web" {
# The connection block tells our provisioner how to
# communicate with the resource (instance)
connection {
# The default username for our AMI
user = "ubuntu"
# The connection will use the local SSH agent for authentication.
agent = false
}
instance_type = "t1.micro"
# Lookup the correct AMI based on the region
# we specified
ami = "${lookup(var.aws_amis, var.aws_region)}"
# The name of our SSH keypair we created above.
key_name = "${aws_key_pair.auth.id}"
# Our Security group to allow HTTP and SSH access
vpc_security_group_ids = ["${aws_security_group.default.id}"]
# We're going to launch into the same subnet as our ELB. In a production
# environment it's more common to have a separate private subnet for
# backend instances.
subnet_id = "${aws_subnet.default.id}"
# We run a remote provisioner on the instance after creating it.
# In this case, we just install nginx and start it. By default,
# this should be on port 80
provisioner "remote-exec" {
inline = [
"sudo apt-get -y update",
"sudo apt-get -y install nginx",
"sudo service nginx start"
]
}
}
And from the variables tf file:
variable "key_name" {
description = "Desired name of AWS key pair"
default = "test-keypair"
}
variable "key_path" {
description = "key location"
default = "/Users/n8/dev/play/.ssh/terraform.pub"
}
but i can ssh in with this command:
ssh -i ../.ssh/terraform ubuntu#w.x.y.z
You have two possibilities:
Add your key to your ssh-agent:
ssh-add ../.ssh/terraform
and use agent = true in your configuration. The case should work for you
Modify your configuration to use the key directly with
secret_key = "../.ssh/terraform"
or so. Please consult the documentation for more specific syntax.
I had the same issue and I did following configurations
connection {
type = "ssh"
user = "ec2-user"
private_key = "${file("*.pem")}"
timeout = "2m"
agent = false
}
Below is a complete and stand-alone resource "null_resource" with remote-exec provisioner w/ SSH connection including the necessary arguments supported by the ssh connection type:
private_key - The contents of an SSH key to use for the connection. These can be loaded from a file on disk using the file function. This takes preference over the password if provided.
type - The connection type that should be used. Valid types are ssh and winrm Defaults to ssh.
user - The user that we should use for the connection. Defaults to root when using type ssh and defaults to Administrator when using type winrm.
host - The address of the resource to connect to. This is usually specified by the provider.
port - The port to connect to. Defaults to 22 when using type ssh and defaults to 5985 when using type winrm.
timeout - The timeout to wait for the connection to become available. This defaults to 5 minutes. Should be provided as a string like 30s or 5m.
agent - Set to false to disable using ssh-agent to authenticate. On Windows the only supported SSH authentication agent is Pageant.
resource null_resource w/ remote-exec example code below:
resource "null_resource" "ec2-ssh-connection" {
provisioner "remote-exec" {
inline = [
"sudo apt-get update",
"sudo apt-get install -y python2.7 python-dev python-pip python-setuptools python-virtualenv libssl-dev vim zip"
]
connection {
host = "100.20.30.5"
type = "ssh"
port = 22
user = "ubuntu"
private_key = "${file(/path/to/your/id_rsa_private_key)}"
timeout = "1m"
agent = false
}
}
}
The solution provided in https://stackoverflow.com/a/35382911/12880305 was not working for me so I tried more ways and found the issue was with the keypair type I am using.
I was using the RSA keypair type and because of that I was getting an error
ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
I created a new key pair with the ED25519 type and it works perfectly fine with me.
https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-ec2-customers-ed25519-keys-authentication/
Connection block which I used
connection {
//Use the public IP of the instance to connect to it.
host = self.public_ip
type = "ssh"
user = "ubuntu"
private_key = file("pemfile-location.pem")
timeout = "1m"
agent = true
}
Check user name that exists in the base image. For example, it can be ubuntu for Ubuntu OS, or ec2-user for AWS images.
Alternatively, most cloud providers allow Terraform to create a new user on the first instance start with help of cloud-init config (check your provider documentation):
metadata = {
user-data = "${file("./user-meta-data.txt")}"
}
user-meta-data.txt:
#cloud-config
users:
- name: <NEW-USER-NAME>
groups: sudo
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh-authorized-keys:
- ssh-rsa <SSH-PUBLIC-KEY>
Increase connection timeout settings, sometimes it takes 1-2 minutes to start instance cloud network with ssh.
connection {
type = "ssh"
user = "<USER_NAME>"
private_key = "${file("pathto/id_rsa")}"
timeout = "3m"
}
If it does not work, try to connect manually via ssh with -v for verbose
ssh -v -i <path_to_private_key/id_rsa> <USER_NAME>#<INSTANCE_IP>
the top answer it does not work for me.
the answer that use ed25519 does work for me. but not necessary to use PEM fortmat.
here is working sample for me
connection {
host = "${aws_instance.example.public_ip}"
type = "ssh"
port = "22"
user = "ubuntu"
timeout = "120s"
private_key = "${file("${var.key_location}")}"
agent= false
}
}
key_location="~/.ssh/id_ed25519"