Reason for sudden inability to SSH into GCP VM instance - google-cloud-platform

I was no longer able to SSH into a Google Cloud Compute Engine VM instance that previously showed no problems.
The error logs show the following
#type: "type.googleapis.com/google.protobuf.Struct" value: {
conditionNotMet: { userVisibleMessage: "Supplied fingerprint does not
match current metadata fingerprint."
Trying SSH through the console showed
Code: 4003 Reason: failed to connect to backend Please ensure that:
your user account has iap.tunnelInstances.accessViaIAP permission
VM has a firewall rule that allows TCP ingress traffic from the IP range XXX.0/20, port: 22
you can make a proper https connection to the IAP for TCP hostname: https://tunnel.cloudproxy.app You may be able to connect without using
the Cloud Identity-Aware Proxy.
The VM instance logs showed the following
Error watching metadata: Get
http://metadata.google.internal/computeMetadata/v1//?recursive=true&alt=json&wait_for_change=true&timeout_sec=60&last_etag=XXX:
net/http: request canceled (Client.Timeout exceeded while awaiting
headers)
After stopping and restarting the instance I was able to ssh again but I would like to understand the reason for the problem in the first place.

The error message you received indicates that the metadata server's response caused the connection to the Google Compute Engine VM instance to time out. This could be because the server was taking too long to respond or there was a problem with the network. You can try to resolve this issue by either increasing the timeout value by using this doc or waiting for the instance to become healthy using the gcloud compute wait command.
The instance was unable to reach the metadata server, as suggested by the timeout error message. This could be a problem with the instance itself or with the network connection. A firewall or network configuration issue could have prevented the instance from connecting to the metadata server, or an issue with the underlying infrastructure could have rendered the instance temporarily unavailable.
To prevent this issue from happening again, you can increase the timeout value or use the gcloud compute wait command to wait for the instance to become healthy.it is recommended that you regularly update the SSH key used to connect to the instance, and check that the instance can reach the metadata server by making an HTTPS request to the IAP for TCP hostname. Additionally, it is important to ensure that your user account has the "iap.tunnelInstances.accessViaIAP" permission, and that the VM has a firewall rule that allows TCP ingress traffic from the IP range XXX.0/20, port: 22.
If you are using windows vm try troubleshooting steps mentioned in this doc.

Related

Google Compute Engine Unable to Access VM

I am unable to ssh into a VM on GCP Compute Engine
However, when I run the command with the --troubleshoot flag, it seems like everything is okay.
When I connect through the console, I get an error message saying "You cannot connect to the VM instance because of an unexpected error".
Also, other people from my organization are able to connect.
I am unable to figure out what the error is. Any help would be appreciated.
From the error message that you got "Permission denied (publickey)" you can check this documentation for further troubleshooting.
Further more you can investigate also the Identity-Aware Proxy (IAP).
If you use Identity-Aware Proxy (IAP) for TCP forwarding, update your custom firewall rule to accept traffic from IAP, then check your IAM permissions.
Update your custom firewall rule to allow traffic from 35.235.240.0/20, the IP address range that IAP uses for TCP forwarding. For more information, see Create a firewall rule.
Grant permissions to use IAP TCP forwarding, if you haven't already done so.
For the error message "You cannot connect to the VM instance because of an unexpected error".
The VM is booting up and sshd is not running yet. You can't connect to a VM before it is running.
To resolve this issue, wait until the VM has finished booting and try to connect again.
The firewall rule allowing SSH is missing or misconfigured. By default, Compute Engine VMs allow SSH access on port 22. If the default-allow-ssh rule is missing or misconfigured, you won't be able to connect to VMs.
To resolve this issue, Check your firewall rules and re-add or reconfigure default-allow-ssh.
sshd is running on a custom port. If you configured sshd to run on a port other than port 22, you won't be able to connect to your VM.
To resolve this issue, create a custom firewall rule allowing tcp traffic on the port that your sshd is running on using the following command:
gcloud compute firewall-rules create FIREWALL_NAME \
--allow tcp:PORT_NUMBER
For further troubleshooting on SSH you see this documentation on Common SSH errors.

ALB results in 504 gateway time out error with ECS

I have an httpd container with ECS service along with ALB.
Container with ALB are using a dynamic port feature which means host port is set to 0.
if i try to ssh in an instance container and try to curl localhost:port number it works.
But when i try to use ALB DNS name it turns out to 504.
ALb security group allows HTTP 80 connections from anywhere and instance sg allows any connection on any port from alb sg.
Interestingly
when I try to check the target group associated with alb all the instances are unhealthy.
Update:- i tried to open a security group of ecs container to public and yet the instance were not healthy
you need to check the events of the ECS service and see what is the exact error message. If it states something like port 45675 is unhealthy then you need to check your security group configuration, it should get rid of 504 error message. If it states health check failed (this should give 502) then you should ssh into the container and check on which port the application is running and create a new service with the modification.
Assuming, you have configured the health check for traffic port and haven't modified it.
httpd service generally works on port 80. So I'll suggest use the container port as 80.
504 is Gateway Timeout error, if the above information doesn't help you can provide look at the AWS link here - https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-http-5xx/
If you can share the error message from the ecs events that will help in narrowing down the issue.
Adding the screenshots of the changes I made to fix the issue, I hope it helps. I am assuming you are using the default httpd image -

Code: 4010 - Connection via Cloud Identity-Aware Proxy Failed

I have seen a similar error on stackoverflow, but with a different code (so maybe it is not the same?). Any how I have been thrown this error spontaneously. Sometimes after 20 seconds of starting my instance (and launching SSH in browser), sometimes after 30 minutes, but it completely shuts down my instance.
Connection via Cloud Identity-Aware Proxy Failed
Code: 4010
Reason: destination read failed
You may be able to connect without using the Cloud Identity-Aware Proxy.
If I click the "Cloud Identity-Aware Proxy" button I am getting:
Connection Failed
We are unable to connect to the VM on port 22. Learn more about possible causes of this issue.
Any idea what is happening? I havn't done any changes in my instance settings for a long time.
The issue regarding the Identity-Aware Proxy(IAP) connection to the instance is due to the lack of a firewall rule allow-ingress-from-iap with this IP ranges 35.235.240.0/20 that needs to be configured when using IAP.
To Allow SSH access to all VM instances in your network, do the following:
1- Open the Firewall Rules page (Navigation menu > VPC network > Firewall) and click Create firewall rule
2- Configure the following settings:
Name: allow-ingress-from-iap
Direction of traffic: Ingress
Target: All instances in the network
Source filter: IP ranges
Source IP ranges: 35.235.240.0/20
Protocols and ports: Select TCP and enter 22 to allow SSH
3- Click Create

AWS Connection timeout + EC2 Instance Connect not working

I tried to connect to a running ec2 instance with my usual settings, it returns
ssh: connect to host ec2 port 22: Connection timed out
I tried to connect with the built-in "EC2 Instance Connect", to connect directly from the browser with the AWS account, it returns
There was a problem setting up the instance connection An error
occurred and we were unable to connect or stay connected to your
instance. If this instance has just started up, try again in a minute
or two.
The instance was running for weeks, I am the only user with access to the AWS account and the SSH Keys and I didn t change any setting in the last ~3 weeks or restarted it
1st the timeout started ~1 week ago, nand then without any other change, my website (wordpress) suddenly started to show a database connection error (the database in inside the EC2 instance as well)
What I used to connect :
Either
ssh -i "Keys.pem" ec2-user#ec2-[public ip].eu-west-3.compute.amazonaws.com
Or
ssh ec2-user#[public ip] -i "Keys.pem"
Both show the same error. I used the first one several weeks ago and it used to work well
This timeout will be caused by invalid security group rules.
Ensure that the security group rules attached to your instance allow inbound access from the source IP address you're trying to SSH from, the database connection may also be related to this.
If you're connecting using a dynamic public IP address to SSH to your host, you will need to adjust this every time your IP address changes. It might be more appropriate to setup a VPN so that you can connect privately to your host.

AWS EC2 Instance: SSH instance Operation timed out when connectiong through Mac

I launched Amazon Linux instance and I am using a default security group with following settings:
Type:All traffic
Protocol:All
Port Range:All
But when connecting through ssh from my Mac I get Operation Timed Out message:
ssh -i "<key in double quotes>" ec2-user#<>.amazonaws.com
result in
ssh: connect to host <>.amazonaws.com port 22: Operation timed out
I am not sure what could be the reason. Can someone please help?
A time-out normally indicates there is no network connectivity to the remote computer. A simple rule-of-thumb is:
If the error comes back immediately, then the SSH request has been rejected by the remote computer
If the error takes some time to come back (eg 5+ seconds), then it never reached the remote computer
Some potential causes:
Something else is blocking the access, such as a corporate firewall. Try from a different network (eg home, work, tether via your phone) to try and diagnose this situation.
The instance might be in a private subnet
The instance might be in a subnet that is incorrectly configured (eg not routing to an Internet Gateway to make a 'Public' subnet)