I want an instance having a local user with the purpose of running and owning a service on that instance. I have tried creating it with a simple
adduser <username>
as well as following,, according to the following
ssh-keygen -t rsa -C "test_ssh_key"
gcloud compute instances add-metadata $instance_a --metadata-from-file
gcloud compute ssh $instance_a --ssh-key-file='test_ssh_key'
However, in both cases, the created user is automatically synced to all other running instance in the project. Also, in the second case, I'm able to ssh into a second instance even though the documentation says it's for a single instance, despite the ssh key not showing up with
gcloud compute instances describe $instance_a
Note that ssh with the newly created key works using both gcloud compute ssh and regular ssh.
Does anyone know how to properly either create a truly local user on an instance or alternatively turn off the service syncing users having no ssh login?

You can't do this using Google managed SSH keys (when you use gcloud compute ssh)
Using instance level SSH will work (as you are doing), just make sure to remove any project level metadata for the same user.
Also make sure the new user is completely limited by
A) Make sure the user does not have IAM permissions to use google managed SSH
B) Limit the scopes of the default service account on the instance to ensure users are not using that to bypass the security measures you have in place.


Google Cloud not managing users/SSH in VMs

We have upgraded Debian distribution in Google Cloud instance and it seems GCloud cannot manage the users and their SSH keys in the instance anymore.
I have installed following tools:
google-cloud-packages-archive-keyring/now 1.2-499050965 all
google-cloud-sdk/cloud-sdk-bullseye,now 412.0.0-0 all
google-compute-engine-oslogin/google-compute-engine-bullseye-stable,now 1:20220714.00-g1+deb11 amd64
google-compute-engine/google-compute-engine-bullseye-stable,now 1:20220211.00-g1 all
google-guest-agent/google-compute-engine-bullseye-stable,now 1:20221109.00-g1 amd64
I cannot connect through the UI. It gets stuck on "Transfering SSH keys to the instance". The "troubleshooting" says that everything is fine.
When trying to connect via gcloud compute ssh it dies with
permission denied (publickey)
I still have access to the instance with some other user, but no new users are created and no SSH keys transferred.
What else am I missing?
Have you added the SSH key to Project metadata or Instance metadata? If its instance metadata, is project level ssh key blocked?
I haven't added any metadata.
Does your user account has necessary permission in the project to SSH to the instance (e.g Owner, Editor or Compute Instance Admin IAM role)?
Yes this worked correctly until the debian upgrade to bookworm. I could see all the google-cloud related packages were remove and I had to install them.
Are you able to SSH to the instance using ssh client e.g Putty?If yes, you need to make sure Google account manager daemon is running on the instance.
I can nicely SSH with accounts which were active on the machine BEFORE the Debian upgrade. These account already have .ssh directory correctly set up and working. New google users cannot login.
Try gcloud beta compute ssh --zone ZONE INSTANCE_NAME --project PROJECT
This works only for users active before the Debian upgrade.
 If yes, you need to make sure Google account manager daemon is running on the instance.
I installed the google-compute-engine-oslogin package which was missing, but it seems it has no effect and new users still cannot login.
When connecting to serial console, it gets stuck on: csearch-dev google_guest_agent[2839775]: ERROR non_windows_accounts.go:158 Error updating SSH keys for gke-495d6b605cf336a7b160: mkdir /home/gke-495d6b605cf336a7b160/.ssh: no such file or directory. - the same issue, SSH keys are never transferred into the instance.
There are a few things you can do troubleshoot the Permission denied (publickey) error message :
To start, you must ensure that you have properly authenticated yourself with gcloud using an IAM user with the compute instance admin role. You can do that by running gcloud auth login [USER] then try gcloud compute ssh again.
You can also verify that the Linux Guest Environment scripts are properly installed and running. Please refer to this page for information about validating, updating, or manually installing the guest environment.
Another possibility is that the private key was lost or that we have a mismatched keypair. To force gcloud to generate a new SSH keypair, you must first move ~/.ssh/google_compute_engine and ~/.ssh/ if present, for example:
mv ~/.ssh/ ~/.ssh/
mv ~/.ssh/google_compute_engine ~/.ssh/google_compute_engine.old
Once that is done, you may then try gcloud compute ssh [INSTANCE-NAME] again, a new keypair should be created and a public key will be added to the SSH keys metadata.
Refer to Sunny-j and Answer to review the serial-port logs of the affected instance for possible clues on the issue. Also refer to Resolving getting locked out of a Compute Engine for more information.
Refer to this similar SO and Troubleshooting using the serial console which helps to resolve your error.
Maybe you have git-all installed. Cloud-init and virtually every step of the booting process are disrupted as a result of this, as the older SysV init system takes its place. You are unable to SSH into your instance as a result of this.
Check out these potential solutions to the above problem:
1.Try using git instead of git-all.
2.If git-all is necessary, use apt install --no-install-recommends -y git-all to prevent the installation of recommendations.
Finally : If you were previously able to SSH into the instance with a particular SSH key for new users, either the SSH daemon was not running or was otherwise broken, or you somehow removed that SSH key. It would appear that you damaged this machine during the upgrade.
Why is this particular VM instance required? Does it contain significant data? If this is the case, you can turn it off, mount its disk with a new VM instance, and copy that data off.( I'd recommend build another machine running these services from latest snapshot or scratch and start using that instead).
You should probably move to a new machine if it runs a service: There is no way to tell what still works and what doesn't, even if you are able to access the instance.

SSH into VM in private network not working

I had created several VMs in GCP in a private network ie. With no Public IP Address associated with the instances. The SSH Button from the console was useable and I was able to SSH to any instance at that time.
The next week when I tried to SSH again, I was unable to click on SSH and it shows
I have not changed anything from my end. All infrastructure is managed by terraform only and no one has changed that.
Can it be due to any other API enabling/disabling? Any help would be really appreciated. Thank you
One thing that could be preventing you from establish a ssh connection to your VM is the IAM permissions. If your IAM permissions have been edited recently,it could be an explanation of why the ssh is disabled. Please, check if you have a compute instance admin role on your VMs.
You can also verify that the changes have been applied correctly by using the following command on the GCP shell:
gcloud policy-troubleshoot iam resource --principal-email=email \
This should give you further detail on your IAM settings for a given user, and maybe give you a clue about what may be causing you the ssh issue.

Compute instances got deleted after hours of 100% CPU usage

We noticed that multiple compute instances got deleted at the same time after hours of 100% CPU usage. Because of this deletion, the hours of computation was lost.
Can anyone tell us why they got deleted?
I have created a gist with the only log we could find in Stackdriver logging around the time of deletion.
The log files show the following pieces of information:
The deleter's source IP address Check if this matches the public IP address of the instance that was deleted. This IP address is within Google Cloud.
The User-Agent specifies that the Google Cloud SDK CLI gcloud is the program that deleted the instance.
The Compute Engine Default Service Account provided the permissions to delete the instance.
In summary, a person or script ran the CLI and deleted the instance using your project's Compute Engine Default Service Account key from a Google Cloud Compute service.
Future Suggestions:
Remove the permission to delete instances from the Compute Engine Default Service Account or (better) create a new service account that only has the required permissions for this instance.
Do not share service accounts in different Compute Engine instances.
Create separate SSH keys for each user that can SSH into the instance.
Enable Stackdriver logging of the SSH Server auth.log file. You will then know who logged into the instance.

How to give permission for an IAM service account to run a docker container within a GCP VM?

I am trying to run a docker image on startup of a Google Cloud VM. I have selected a fresh service account that I created as the Service Account under VM Instance Details through the console. For some reason the docker run command within the startup script is not working. I suspect this is because the service account is not authorized to run the "docker" command within the VM - which was installed via a yum install. Can anyone tell me how this can be done i.e. to give this service account the permission to run docker command?
Inside the startup script I am running docker login command to login to Google Container Registry followed by a docker run to run an image.
I have found a solution and want to share it here so it helps someone else looking to do the same thing. The user running the docker command (without sudo) needs to have the docker group. So I tried adding the service account as a user and gave it the docker group and that's it. docker login to gcr worked and so did docker run. So the problem is solved but this raises a couple of additional questions.
First, is this the correct way to do it? If it is not, then what is? If this is indeed the correct way, then perhaps a service account selected while creating a VM must be added as a user when it (the VM) is created. I can understand this leads to some complications such as what happens when the service account is changed. Does the old service account user gets deleted or should it be retained? But I think at least an option can be given to add the service account user to the VM - something like a checkbox in the console - so the end user can take a call. Hope someone from GCP reads this.
As stated in this article, the steps you taken are the correct way to do it. Adding users to the "docker" group will allow the users to run docker commands as non root. If you create a new service account and would like to have that service account run docker commands within a VM instance, then you will have to add that service account to the docker group as well.
If you change the service account on a VM instance, then the old service account should still be able to run docker commands as long as the older service account is not removed from the docker group and has not been deleted from Cloud IAM; however, you will still need to add the new service account to the docker group to allow it to run docker commands as non root.
Update: automating the creation of a service account when at VM instance creation manually would be tedious. Within your startup script, you would have to first create the Service Account using the gcloud commands and then add the appropriate IAM roles. Once that is done, you would have to still add the service account to the docker groupadd directory.
It would be much easier to create the service account from the Console when the VM instance is being created. Once the VM instance is created, you can add the service account to the docker groupadd directory.
If you would like to request for a new feature within GCE, you can submit a Public Issue Tracker by visiting this site.

How to limit access to a shared ami

I need to share an ami so that it can be used by a client to create their own instances through their own account. However, I do not wish that client to be able to ssh in to the instance. I will need to be able to ssh into the instance to be able to maintain it. They will have ftp and www access only. I've got the ftp and www access part working through ssh configuration. How do I keep them out of ssh when they are starting up the instance with their own keypairs?
Well, you can't really prevent this, if they are determined to get in, since they control the instance.
Stop instance → unmount root EBS volume → mount elsewhere → modify contents → unmount → remount → restart → pwn3d.
However, according to the documentation, if you don't configure the AMI to load their public key, it just sits there and doesn't actually work for them.
Amazon EC2 allows users to specify a public-private key pair name when launching an instance. When a valid key pair name is provided to the RunInstances API call (or through the command line API tools), the public key (the portion of the key pair that Amazon EC2 retains on the server after a call to CreateKeyPair or ImportKeyPair) is made available to the instance through an HTTP query against the instance metadata.
To log in through SSH, your AMI must retrieve the key value at boot and append it to /root/.ssh/authorized_keys (or the equivalent for any other user account on the AMI). Users can launch instances of your AMI with a key pair and log in without requiring a root password.
If you don't fetch the key and append it, as described, it doesn't appear that their key, just by virtue of being "assigned" to the instance, will actually give them any access to the instance.
I was able to finally accomplish this crazy task by using this process:
1) Login as ubuntu
2) create a user, belongs to sudo and admin group
3) install all my s/w under the newuser
4) verify that new user has all required privileges
5) chroot jail the ubuntu user to ftp access only
when the ami is transmitted to the new zone/account, the ubuntu user exists as a sudoer, but cannot ssh into the instance. Ftp allows them to connect to the system, but they view a bare directory and cannot cd anywhere else in the system.
It's not a complete denial, but I think it will serve the purpose for this client.