GCP GCE Agent Deletes Users and changes keys - google-cloud-platform

I noticed that when my VM boots, it will randomly choose between two guest environments. In the logs, I see "GCE Agent Started" with either (version 20210908.1) or (version 20200610.00). The host key changes along with the agent version. The issue is that I am only able to access my conda environments and other information on one of these versions. Is there any way I can control or choose which agent version is used on startup? Is there a way to disable or uninstall these agents?

Related

Google Cloud not managing users/SSH in VMs

We have upgraded Debian distribution in Google Cloud instance and it seems GCloud cannot manage the users and their SSH keys in the instance anymore.
I have installed following tools:
google-cloud-packages-archive-keyring/now 1.2-499050965 all
google-cloud-sdk/cloud-sdk-bullseye,now 412.0.0-0 all
google-compute-engine-oslogin/google-compute-engine-bullseye-stable,now 1:20220714.00-g1+deb11 amd64
google-compute-engine/google-compute-engine-bullseye-stable,now 1:20220211.00-g1 all
google-guest-agent/google-compute-engine-bullseye-stable,now 1:20221109.00-g1 amd64
I cannot connect through the UI. It gets stuck on "Transfering SSH keys to the instance". The "troubleshooting" says that everything is fine.
When trying to connect via gcloud compute ssh it dies with
permission denied (publickey)
I still have access to the instance with some other user, but no new users are created and no SSH keys transferred.
What else am I missing?
EDIT:
Have you added the SSH key to Project metadata or Instance metadata? If its instance metadata, is project level ssh key blocked?
I haven't added any metadata.
Does your user account has necessary permission in the project to SSH to the instance (e.g Owner, Editor or Compute Instance Admin IAM role)?
Yes this worked correctly until the debian upgrade to bookworm. I could see all the google-cloud related packages were remove and I had to install them.
Are you able to SSH to the instance using ssh client e.g Putty?If yes, you need to make sure Google account manager daemon is running on the instance.
I can nicely SSH with accounts which were active on the machine BEFORE the Debian upgrade. These account already have .ssh directory correctly set up and working. New google users cannot login.
Try gcloud beta compute ssh --zone ZONE INSTANCE_NAME --project PROJECT
This works only for users active before the Debian upgrade.
 If yes, you need to make sure Google account manager daemon is running on the instance.
I installed the google-compute-engine-oslogin package which was missing, but it seems it has no effect and new users still cannot login.
EDIT2:
When connecting to serial console, it gets stuck on: csearch-dev google_guest_agent[2839775]: ERROR non_windows_accounts.go:158 Error updating SSH keys for gke-495d6b605cf336a7b160: mkdir /home/gke-495d6b605cf336a7b160/.ssh: no such file or directory. - the same issue, SSH keys are never transferred into the instance.
There are a few things you can do troubleshoot the Permission denied (publickey) error message :
To start, you must ensure that you have properly authenticated yourself with gcloud using an IAM user with the compute instance admin role. You can do that by running gcloud auth login [USER] then try gcloud compute ssh again.
You can also verify that the Linux Guest Environment scripts are properly installed and running. Please refer to this page for information about validating, updating, or manually installing the guest environment.
Another possibility is that the private key was lost or that we have a mismatched keypair. To force gcloud to generate a new SSH keypair, you must first move ~/.ssh/google_compute_engine and ~/.ssh/google_compute_engine.pub if present, for example:
mv ~/.ssh/google_compute_engine.pub ~/.ssh/google_compute_engine.pub.old
mv ~/.ssh/google_compute_engine ~/.ssh/google_compute_engine.old
Once that is done, you may then try gcloud compute ssh [INSTANCE-NAME] again, a new keypair should be created and a public key will be added to the SSH keys metadata.
Refer to Sunny-j and Answer to review the serial-port logs of the affected instance for possible clues on the issue. Also refer to Resolving getting locked out of a Compute Engine for more information.
Edit1:
Refer to this similar SO and Troubleshooting using the serial console which helps to resolve your error.
EDIT2:
Maybe you have git-all installed. Cloud-init and virtually every step of the booting process are disrupted as a result of this, as the older SysV init system takes its place. You are unable to SSH into your instance as a result of this.
Check out these potential solutions to the above problem:
1.Try using git instead of git-all.
2.If git-all is necessary, use apt install --no-install-recommends -y git-all to prevent the installation of recommendations.
Finally : If you were previously able to SSH into the instance with a particular SSH key for new users, either the SSH daemon was not running or was otherwise broken, or you somehow removed that SSH key. It would appear that you damaged this machine during the upgrade.
Why is this particular VM instance required? Does it contain significant data? If this is the case, you can turn it off, mount its disk with a new VM instance, and copy that data off.( I'd recommend build another machine running these services from latest snapshot or scratch and start using that instead).
You should probably move to a new machine if it runs a service: There is no way to tell what still works and what doesn't, even if you are able to access the instance.

Is there a way to see in the project level to see any VM is not having ops agent is installed in it

I want to see the list of VMs that doesn't have the ops agent installed in my GCP project.
Any command or API or console view will work for me.

How to update CloudWatch Agent version on Windows Instance?

Let's say that the CloudWatch Agent installed on Windows Server was version X.0. After few months, there was an update, and the latest available version of CloudWatch Agent was X.1. So, how can I proceed with updating the already installed CloudWatch Agent version on Windows Server?
In the user guide, I am able to find ways to 'Download and Configure the CloudWatch Agent' and other related processes but not able to find ways to update the CloudWatch Agent version.
Any prompt support in this regard will be highly appreciated.
You can re-install using AWS Systems Manager, for that Systems Manager has to be installed already and you need to add these I AM roles if not there already
AmazonSSMManagedInstanceCore, CloudWatchAgentServerPolicy.
Download the CloudWatch agent package
Systems Manager Run Command enables you to manage the configuration of your instances. You specify a Systems Manager document, specify parameters, and execute the command on one or more instances. SSM Agent on the instance processes the command and configures the instance as specified.
To download the CloudWatch agent using Run Command:
Open the Systems Manager console at
https://console.aws.amazon.com/systems-manager/.
In the navigation pane, choose Run Command.
-or-
If the AWS Systems Manager home page opens, scroll down and choose
Explore Run Command.
Choose Run command.
In the Command document list, choose AWS-ConfigureAWSPackage.
In the Targets area, choose the instance on which to install the
CloudWatch agent. If you do not see a specific instance, it might not be configured for Run Command. For more information, see Setting Up AWS Systems Manager for Hybrid Environments in the AWS Systems Manager User Guide.
In the Action list, choose Install.
In the Name box, enter AmazonCloudWatchAgent.
Keep Version set to latest to install the latest version of the agent.
Choose Run.
Optionally, in the Targets and outputs areas, select the button next to an instance name and choose View output. Systems Manager should show that the agent was successfully installed.
As below it does uninstall and reinstall.
Reference: AWS Documentation

Install software on multiple ec2 instances along with json file

I need to install Fire Eye in multiple ec2 instances in my AWS account, all running Windows Server 2012. I have the installer msi and could do it using Distributor in SSM. However there is a json file that needs to be in the same folder as the msi file when software is being installed. This doesn't seem to be supported by Distributor.
Can anyone help me out with how this can be done, short of logging in to every server and installing it manually after copy pasting the json and msi file in one folder?
Usually for ad-hoc execution of commands on a fleet of instances you would use AWS Systems Manager Run Command:
Administrators use Run Command to perform the following types of tasks on their managed instances: install or bootstrap applications, build a deployment pipeline, capture log files when an instance is terminated from an Auto Scaling group, and join instances to a Windows domain, to name a few.

No changes to app after redeployment to EC2 instance

I've got development and production instances in EC2. I've been updating my app in Visual Studio 2019 and redeploying it to the dev instance, then creating an AMI of that instance and using that image to update the production instance(s).
Suddenly my app no longer updates when I deploy to the dev instance. The logs all show the update was applied, but when I look at the files on the server they have not changed for days. I suspect I may be using AMIs incorrectly, but I'm not sure what I'm doing wrong.
How do I get my updates to show again?
You are facing the issue because creating an AMI from running environment isn't the right approach since EB runs several scripts under the hood to attach instances to that particular environment.
Note: Custom AMIs are ideal only when you're installing a lot of dependencies or software that you want to be baked into your AMI so subsequent deployments go through quick. Here's the documentation that walks you through the steps, and here's the summary of the steps:
The best approach would be to launch a stand alone EC2 using an EB
AMI as base (ideally an AMI with HVM virtualization).
Connect to the instance with SSH or RDP.
Perform any customizations you want.
(Windows platforms) Run the EC2Config service Sysprep. For
information about EC2Config, see Configuring a Windows Instance Using
the EC2Config Service. Ensure that Sysprep is configured to generate
a random password that can be retrieved from the AWS Management
Console.
In the Amazon EC2 console, stop the EC2 instance. Then on the
Instance Actions menu, choose Create Image (EBS AMI).