Error: Facter: GCE metadata request failed: Timeout was reached - google-cloud-platform

This is regarding the GCP compute instances. My GCP instances are not able to fetch the GCE metadata from the metadata server & therefore when I am installing Openstack on google cloud (GCP) via packstack over a CentOS image I am getting this error.
VM instances are a part of default network with no Firewall rules. I am able to login the nodes externally also, which proves the network is OK.
ERROR : Error appeared during Puppet run: 10.142.0.16_compute.pp
Error: Facter: GCE metadata request failed: Timeout was reached
in /var/log/messages I am getting this message repeatedly:
Oct 25 20:16:31 controller-8 google_guest_agent[146448]: ERROR main.go:190 Network error when requesting metadata, make sure your instance has an active network and can reach the metadata server.
Oct 25 20:16:31 controller-8 google_guest_agent[146448]: ERROR main.go:193 Error watching metadata: Get http://169.254.169.254/computeMetadata/v1//?recursive=true&alt=json&wait_for_change=true&timeout_sec=60&last_etag=6f06fe6d055dd9f5: dial tcp 169.254.169.254:80: connect: no route to host
Oct 25 20:19:07 controller-8 OSConfigAgent[146888]: 2021-10-25T20:19:07.5468Z OSConfigAgent Error main.go:218: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=6f06fe6d055dd9f5&timeout_sec=60: dial tcp 169.254.169.254:80: connect: no route to host
Oct 25 20:20:08 controller-8 OSConfigAgent[146888]: 2021-10-25T20:20:08.9868Z OSConfigAgent Error main.go:218: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=6f06fe6d055dd9f5&timeout_sec=60: dial tcp 169.254.169.254:80: connect: no route to host
Oct 25 20:21:10 controller-8 OSConfigAgent[146888]: 2021-10-25T20:21:10.4268Z OSConfigAgent Error main.go:218: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=6f06fe6d055dd9f5&timeout_sec=60: dial tcp 169.254.169.254:80: connect: no route to host
Oct 25 20:22:10 controller-8 OSConfigAgent[146888]: 2021-10-25T20:22:10.7148Z OSConfigAgent Error main.go:218: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=6f06fe6d055dd9f5&timeout_sec=60: dial tcp 169.254.169.254:80: connect: no route to host

IIUC, according this error tips: connect: no route to host, try use the metadata.google.internal domain to replace this IP 169.254.169.254 ?
Check from this documentation Parts of a metadata request , and make sure you have the correctly /etc/resolv.conf file for nameserver before. And you should check the service account setting by John Hanley's suggestion.
Thanks, #John Hanley. Reference from your tips and this question -- "Why can't I access Metadata Server of GCP Instance?" 's earliest answer.

Related

Reason for sudden inability to SSH into GCP VM instance

I was no longer able to SSH into a Google Cloud Compute Engine VM instance that previously showed no problems.
The error logs show the following
#type: "type.googleapis.com/google.protobuf.Struct" value: {
conditionNotMet: { userVisibleMessage: "Supplied fingerprint does not
match current metadata fingerprint."
Trying SSH through the console showed
Code: 4003 Reason: failed to connect to backend Please ensure that:
your user account has iap.tunnelInstances.accessViaIAP permission
VM has a firewall rule that allows TCP ingress traffic from the IP range XXX.0/20, port: 22
you can make a proper https connection to the IAP for TCP hostname: https://tunnel.cloudproxy.app You may be able to connect without using
the Cloud Identity-Aware Proxy.
The VM instance logs showed the following
Error watching metadata: Get
http://metadata.google.internal/computeMetadata/v1//?recursive=true&alt=json&wait_for_change=true&timeout_sec=60&last_etag=XXX:
net/http: request canceled (Client.Timeout exceeded while awaiting
headers)
After stopping and restarting the instance I was able to ssh again but I would like to understand the reason for the problem in the first place.
The error message you received indicates that the metadata server's response caused the connection to the Google Compute Engine VM instance to time out. This could be because the server was taking too long to respond or there was a problem with the network. You can try to resolve this issue by either increasing the timeout value by using this doc or waiting for the instance to become healthy using the gcloud compute wait command.
The instance was unable to reach the metadata server, as suggested by the timeout error message. This could be a problem with the instance itself or with the network connection. A firewall or network configuration issue could have prevented the instance from connecting to the metadata server, or an issue with the underlying infrastructure could have rendered the instance temporarily unavailable.
To prevent this issue from happening again, you can increase the timeout value or use the gcloud compute wait command to wait for the instance to become healthy.it is recommended that you regularly update the SSH key used to connect to the instance, and check that the instance can reach the metadata server by making an HTTPS request to the IAP for TCP hostname. Additionally, it is important to ensure that your user account has the "iap.tunnelInstances.accessViaIAP" permission, and that the VM has a firewall rule that allows TCP ingress traffic from the IP range XXX.0/20, port: 22.
If you are using windows vm try troubleshooting steps mentioned in this doc.

Pushing Cloudwatch Logs from Linux Instance - RequestError: Server Misbehaving

I am trying to push logs to Cloudwatch from a RHEL Instance. Originally I was getting the error:
[outputs.cloudwatchlogs] Aws error received when sending logs to LogGroup/LogStream: RequestError: send request failed caused by: Post "https://logs.<region>.amazonaws.com/": dial tcp xx.xx.xx.xx:443: i/o timeout
I tried everything I could think of, I saw some things online that it could be Proxy related. I have a proxy server instance on the AWS account.
I added the following into the common-config.toml for Cloudwatch:
[proxy] http_proxy = "htttp://${PROXY_SERVER}:$PORT" https_proxy = "http://${PROXY_SERVER}:$PORT" no_proxy = "XX.XX.XX.XX"
The error I am getting now is:
[outputs.cloudwatchlogs] Aws error received when sending logs to LogGroup/LogStream: RequestError: send request failed caused by: Post "https://logs.<region>.amazonaws.com/": proxyconnect tcp: dial tcp: lookup http on XX.XX.XX.XX:53: server misbehaving
I am in a private VPC and I can't ping public sites as I get 100% packet loss. I can manually run the AWS Cli command to push an entry into the log stream. For now I am just trying to push /var/log/messages from my instance. Can anyone help with why the CloudWatch logs aren't pushing?
Some thing I have tried that didn't work for the original error:
exporting no_proxy
adding AWS_STS_REGIONAL_ENDPOINTS as an env variable
Ensuring port 443 is open on SG's
Ensuring IAM profile has correct permissions for CW and EC2

Can not access Google Cloud Instance

I am facing the following error while getting into Google Cloud Instance using the serial port. When I run this command, it starts throwing the error.
gcloud compute connect-to-serial-port instance-1
This is the error:
Sep 20 14:28:35 instance-1 OSConfigAgent[670]: 2022-09-20T14:28:35.5396Z OSConfigAgent Error main.go:196: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=b6d33d232458e45a&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable
Sep 20 14:29:33 instance-1 OSConfigAgent[670]: 2022-09-20T14:29:33.5432Z OSConfigAgent Warning: Error waiting for task (attempt 10 of 10): error fetching Instance IDToken: error getting token from metadata: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/identity?audience=osconfig.googleapis.com&format=full: dial tcp 169.254.169.254:80: connect: network is unreachable
I am also unable to access the instance using external IP, and the SSH does not work either. SSH throws the following error:
These are my network rules.
I don't want to restart my instance as I have a job running in Jenkinwhich will destroy the whole day's runtime.
From your error message "Connection via Cloud Identity-Aware Proxy Failed" these error occur when you try to use SSH when connecting to a VM that doesn't have a public IP address for which you haven't configured Identity-Aware Proxy on port 22.
you can Create a firewall rule on port 22 that allows ingress traffic from Identity-Aware Proxy.
Also from what #John Hanley suggested to check your VM if it have a service account.

IIS FTP behind AWS VPC Endpoint

Giving context: I have a FTP server running in IIS in a WS2019 EC2 instance in VPC A that needs to be accessed by VPCs B and C. While the last one is in GCP, A and B are in AWS.
VPCs A and B have a peering connection.
VPCs B and C are connected through a VPN.
VPC C doesn't exchange data with A, except for this FTP server; therefore, sustaining a VPN is expensive for what I need.
I followed this guideline to build the NLB in VPC A, then attached it to a VPC endpoint in VPC B.
How to run an FTPS server behind the AWS Network Load Balancer | by Michael Kirk | Medium
When I test the TCP connection, it works just fine
PS C:\Users\johndoe> Test-NetConnection -ComputerName vpce-0948b61f1f991b98b-1w539hu9.vpce-svc-0ed1458eb15584b09.us-east-1.vpce.amazonaws.com -Port 21
ComputerName : vpce-0948b61f1f991b98b-1w539hu9.vpce-svc-0ed1458eb15584b09.us-east-1.vpce.amazonaws.com
RemoteAddress : 10.70.255.253
RemotePort : 21
InterfaceAlias : WAN
SourceAddress : 10.58.32.20
TcpTestSucceeded : True
But, when I try to connect through the FTP client, I receive the following error message:
Status: Resolving address of vpce-0948b61f1f991b98b-1w539hu9.vpce-svc-0ed1458eb15584b09.us-east-1.vpce.amazonaws.com
Status: Connecting to 10.70.255.253:21...
Status: Connection established, waiting for welcome message...
Status: Initializing TLS...
Status: TLS connection established.
Status: Logged in
Status: Retrieving directory listing...
Command: PWD
Response: 257 "/" is current directory.
Command: TYPE I
Response: 200 Type set to I.
Command: PASV
Response: 227 Entering Passive Mode (10,74,163,58,43,209).
Command: LIST
Response: 150 Opening BINARY mode data connection.
Error: Connection timed out after 20 seconds of inactivity
Error: Failed to retrieve directory listing
Does it make a difference if my passive mode answers with the public or the private IP address? I've checked all the security groups and route tables...
Can someone help me figure out what's going on, please?
You need to understand FTP Passive Mode. The FTP client is connecting to the FTP server at 10.70.255.253:21. The FTP Server is setting up an IP/Port for the LIST command 10.74.163.43:209 and waiting.
The FTP Client is supposed to connect to that IP:PORT. It is not initiating a connection and the FTP server times out after 20 seconds.
Notice the address that the FTP Client is connecting for the command connection (10.70.255.253) to is not the same address (10.74.163.43) the server thinks it should reply with for data transfer commands. You have a configuration problem with the FTP server.
Note: Since the FTP Client is connecting to the FTP Server for data transfer commands on a different port than 21, you must also allow those ingress ports thru the firewall.
Note: Most FTP clients will not connect to a different IP address for security reasons.

Cf Logs connections failed because connected host has failed to respond

I have a problem with the cf log. If I'm using cf logs, I get the following error
C:\Users\Z003PCEU> cf logs hello-spring-cloud FAILED Error dialing
traffic controller server: dial tcp 139.25.25.200:4443: connectex: A
connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed
because connected hos t has failed to respond.. Please ask your Cloud
Foundry Operator to check the platform configuration (traffic
controller is wss://doppler.sys.de.c
loudfoundry.it-platforms.net:4443).
Using Curl to get access provides the following info
Proxy error
503
the proxy is only needed for communication outside the company. Cf should net use it.
Removing the proxy from console results in
Failed to connect to 10.0.0.17 port 4443: Connection refused
10.X.X.X is the cloud internal network.
Anyone an Idea?
It was a firewall problem. The port 4443 was not open. After changing the configuration within the firewall it works