Google Cloud not managing users/SSH in VMs - google-cloud-platform

We have upgraded Debian distribution in Google Cloud instance and it seems GCloud cannot manage the users and their SSH keys in the instance anymore.
I have installed following tools:
google-cloud-packages-archive-keyring/now 1.2-499050965 all
google-cloud-sdk/cloud-sdk-bullseye,now 412.0.0-0 all
google-compute-engine-oslogin/google-compute-engine-bullseye-stable,now 1:20220714.00-g1+deb11 amd64
google-compute-engine/google-compute-engine-bullseye-stable,now 1:20220211.00-g1 all
google-guest-agent/google-compute-engine-bullseye-stable,now 1:20221109.00-g1 amd64
I cannot connect through the UI. It gets stuck on "Transfering SSH keys to the instance". The "troubleshooting" says that everything is fine.
When trying to connect via gcloud compute ssh it dies with
permission denied (publickey)
I still have access to the instance with some other user, but no new users are created and no SSH keys transferred.
What else am I missing?
EDIT:
Have you added the SSH key to Project metadata or Instance metadata? If its instance metadata, is project level ssh key blocked?
I haven't added any metadata.
Does your user account has necessary permission in the project to SSH to the instance (e.g Owner, Editor or Compute Instance Admin IAM role)?
Yes this worked correctly until the debian upgrade to bookworm. I could see all the google-cloud related packages were remove and I had to install them.
Are you able to SSH to the instance using ssh client e.g Putty?If yes, you need to make sure Google account manager daemon is running on the instance.
I can nicely SSH with accounts which were active on the machine BEFORE the Debian upgrade. These account already have .ssh directory correctly set up and working. New google users cannot login.
Try gcloud beta compute ssh --zone ZONE INSTANCE_NAME --project PROJECT
This works only for users active before the Debian upgrade.
 If yes, you need to make sure Google account manager daemon is running on the instance.
I installed the google-compute-engine-oslogin package which was missing, but it seems it has no effect and new users still cannot login.
EDIT2:
When connecting to serial console, it gets stuck on: csearch-dev google_guest_agent[2839775]: ERROR non_windows_accounts.go:158 Error updating SSH keys for gke-495d6b605cf336a7b160: mkdir /home/gke-495d6b605cf336a7b160/.ssh: no such file or directory. - the same issue, SSH keys are never transferred into the instance.

There are a few things you can do troubleshoot the Permission denied (publickey) error message :
To start, you must ensure that you have properly authenticated yourself with gcloud using an IAM user with the compute instance admin role. You can do that by running gcloud auth login [USER] then try gcloud compute ssh again.
You can also verify that the Linux Guest Environment scripts are properly installed and running. Please refer to this page for information about validating, updating, or manually installing the guest environment.
Another possibility is that the private key was lost or that we have a mismatched keypair. To force gcloud to generate a new SSH keypair, you must first move ~/.ssh/google_compute_engine and ~/.ssh/google_compute_engine.pub if present, for example:
mv ~/.ssh/google_compute_engine.pub ~/.ssh/google_compute_engine.pub.old
mv ~/.ssh/google_compute_engine ~/.ssh/google_compute_engine.old
Once that is done, you may then try gcloud compute ssh [INSTANCE-NAME] again, a new keypair should be created and a public key will be added to the SSH keys metadata.
Refer to Sunny-j and Answer to review the serial-port logs of the affected instance for possible clues on the issue. Also refer to Resolving getting locked out of a Compute Engine for more information.
Edit1:
Refer to this similar SO and Troubleshooting using the serial console which helps to resolve your error.
EDIT2:
Maybe you have git-all installed. Cloud-init and virtually every step of the booting process are disrupted as a result of this, as the older SysV init system takes its place. You are unable to SSH into your instance as a result of this.
Check out these potential solutions to the above problem:
1.Try using git instead of git-all.
2.If git-all is necessary, use apt install --no-install-recommends -y git-all to prevent the installation of recommendations.
Finally : If you were previously able to SSH into the instance with a particular SSH key for new users, either the SSH daemon was not running or was otherwise broken, or you somehow removed that SSH key. It would appear that you damaged this machine during the upgrade.
Why is this particular VM instance required? Does it contain significant data? If this is the case, you can turn it off, mount its disk with a new VM instance, and copy that data off.( I'd recommend build another machine running these services from latest snapshot or scratch and start using that instead).
You should probably move to a new machine if it runs a service: There is no way to tell what still works and what doesn't, even if you are able to access the instance.

Related

SSH into VM in private network not working

I had created several VMs in GCP in a private network ie. With no Public IP Address associated with the instances. The SSH Button from the console was useable and I was able to SSH to any instance at that time.
The next week when I tried to SSH again, I was unable to click on SSH and it shows
I have not changed anything from my end. All infrastructure is managed by terraform only and no one has changed that.
Can it be due to any other API enabling/disabling? Any help would be really appreciated. Thank you
One thing that could be preventing you from establish a ssh connection to your VM is the IAM permissions. If your IAM permissions have been edited recently,it could be an explanation of why the ssh is disabled. Please, check if you have a compute instance admin role on your VMs.
You can also verify that the changes have been applied correctly by using the following command on the GCP shell:
gcloud policy-troubleshoot iam resource --principal-email=email \
--permission=permission
This should give you further detail on your IAM settings for a given user, and maybe give you a clue about what may be causing you the ssh issue.

Cannot connect via SSH to GCP Instance

Friends good night.
I have a server on Google Compute Engine, which I do not have access to via ssh and the old administrator did not leave access to it.
Is there any possibility to access this server either through SDK, GCP Console, etc.?
Thank you very much in advance.
If you or your team have an IAM account on the project with sufficient roles/permissions (Owner, ComputeAdmin), you can try the following:
Check this troubleshooting documentation in order to identify and solve your issue
Try to access the VM through the SerialPort.
I had mistakenly locked myself via these files /etc/hosts.allow and /etc/hosts.deny. It took me a day to get back access to the server and I hope below will help someone locked out of a GCP vm. It simply creates a script that runs when your VM is booting up. You can then have all commands to fix your issue run without direct access to the server. Below is how you can for example reset root password.
I am assuming that you have access to GCP console via browser, do below:-
Shutdown the server
Click on edit and scroll down to Custom metadata. Add a new item with key as startup-script and the value as below. Replace yournewpassword with the password you want to set for the root user:
#!/bin/sh
echo "yournewpassword:root" | chpasswd
Reboot your server and use your new password set above to ssh to your vm
Remove the meta and save your VM. You can reboot again.

How to give permission for an IAM service account to run a docker container within a GCP VM?

I am trying to run a docker image on startup of a Google Cloud VM. I have selected a fresh service account that I created as the Service Account under VM Instance Details through the console. For some reason the docker run command within the startup script is not working. I suspect this is because the service account is not authorized to run the "docker" command within the VM - which was installed via a yum install. Can anyone tell me how this can be done i.e. to give this service account the permission to run docker command?
Edit.
Inside the startup script I am running docker login command to login to Google Container Registry followed by a docker run to run an image.
I have found a solution and want to share it here so it helps someone else looking to do the same thing. The user running the docker command (without sudo) needs to have the docker group. So I tried adding the service account as a user and gave it the docker group and that's it. docker login to gcr worked and so did docker run. So the problem is solved but this raises a couple of additional questions.
First, is this the correct way to do it? If it is not, then what is? If this is indeed the correct way, then perhaps a service account selected while creating a VM must be added as a user when it (the VM) is created. I can understand this leads to some complications such as what happens when the service account is changed. Does the old service account user gets deleted or should it be retained? But I think at least an option can be given to add the service account user to the VM - something like a checkbox in the console - so the end user can take a call. Hope someone from GCP reads this.
As stated in this article, the steps you taken are the correct way to do it. Adding users to the "docker" group will allow the users to run docker commands as non root. If you create a new service account and would like to have that service account run docker commands within a VM instance, then you will have to add that service account to the docker group as well.
If you change the service account on a VM instance, then the old service account should still be able to run docker commands as long as the older service account is not removed from the docker group and has not been deleted from Cloud IAM; however, you will still need to add the new service account to the docker group to allow it to run docker commands as non root.
Update: automating the creation of a service account when at VM instance creation manually would be tedious. Within your startup script, you would have to first create the Service Account using the gcloud commands and then add the appropriate IAM roles. Once that is done, you would have to still add the service account to the docker groupadd directory.
It would be much easier to create the service account from the Console when the VM instance is being created. Once the VM instance is created, you can add the service account to the docker groupadd directory.
If you would like to request for a new feature within GCE, you can submit a Public Issue Tracker by visiting this site.

Unable to SSH into my EC2 instance from a different computer

A little backstory, I have an AWS instance made with Bitnami that I set up on my Windows Machine back home. I am currently out of the country and have no way to access that machine at the moment. One month later, I visit the website getting a 500 error and (only my Macbook on me). I've tried to SSH into it from my Macbook and no luck. I get the error:
Username is not in the sudoers file. This incident will be reported.
I've also tried another way to SSH into my aws but then I just get
Permission denied (publickey).
I do have the public/private keys I made with me so I am not sure if I had to set up some additional permissions to SSH from a different computer. On top of that, I got an email stating that someone attempted to access remote hosts on the internet without authorization. If I visit my Public IP address of my instance, it goes straight to a spam page.
At this point, I am not sure if I am just missing something in my steps or have missed a step. If someone can help me, I would really appreciate it.
Is there some way to get my instance back up and running? If not, is there some way I can back up the wordpress files on that instance that's down and use it to create another one on my Macbook currently? Please let me know.
If you have the private key that your AWS instance has been installed with, place the key in ~/.ssh .
Then, run the following command to set the permissions of the key to read and write only to your user (it's a mandatory step):
chmod 600 ~/.ssh/keyname
Then, run the following command to connect to your instance:
ssh -i ~/.ssh/keyname user#instance_ip
And it should connect successfully.
If you're not sure which user to connect to and you have access to AWS EC2 Console, then look for that server, right-click it and choose "Connect" and it will usually show the correct user to use when connecting to it by SSH.

Google Cloud: prevent users from syncing to other instances

I want an instance having a local user with the purpose of running and owning a service on that instance. I have tried creating it with a simple
adduser <username>
as well as following,
https://cloud.google.com/compute/docs/instances/adding-removing-ssh-keys#instance-only, according to the following
instance_a='<instance_a>'
ssh-keygen -t rsa -C "test_ssh_key"
gcloud compute instances add-metadata $instance_a --metadata-from-file ssh-keys=test_ssh_key.pub
gcloud compute ssh $instance_a --ssh-key-file='test_ssh_key'
However, in both cases, the created user is automatically synced to all other running instance in the project. Also, in the second case, I'm able to ssh into a second instance even though the documentation says it's for a single instance, despite the ssh key not showing up with
gcloud compute instances describe $instance_a
Note that ssh with the newly created key works using both gcloud compute ssh and regular ssh.
Does anyone know how to properly either create a truly local user on an instance or alternatively turn off the service syncing users having no ssh login?
You can't do this using Google managed SSH keys (when you use gcloud compute ssh)
Using instance level SSH will work (as you are doing), just make sure to remove any project level metadata for the same user.
Also make sure the new user is completely limited by
A) Make sure the user does not have IAM permissions to use google managed SSH
B) Limit the scopes of the default service account on the instance to ensure users are not using that to bypass the security measures you have in place.