how to get uptime of GCP VM instance - google-cloud-platform

I need to know the uptime of GCP VM instances (windows and Linux both) and based on the time I need to stop the VM. Somehow, I am not getting any simpler way to get the uptime of my all GCP VMs which are like 100 in numbers and will be increasing.
I went through below answer but even there it is not answered, I could not add comment so had to ask new question.
Get vm uptime data from stackdriver-agent in gcp?
In the python code snippet at below link, there is no module available for instance uptime all we have is creating uptime check for service availability.
https://github.com/GoogleCloudPlatform/python-docs-samples
How can I get uptime of all GCP VM instances ?

Assuming that you can adjust process of starting VMs, I think solution below is viable:
When VM is started, add a custom tag with current timestamp (API reference)
Use this tag's value to determine actual instance's uptime
I realize that it sounds overcomplicated, but I don't see any better OS-independent solution.
Update:
The feature you need is already requested in Google's issue tracker. You can check the progress and\or "start" it here: https://issuetracker.google.com/issues/136105125
Note: the issue referenced above is marked as blocked by another one, non-public

Go to GCP console
Select monitoring
Click Uptime checks.
Click Create Uptime check.
for more info check the below document
https://cloud.google.com/monitoring/uptime-checks

Related

Failed to start instance: A e2-micro VM instance is currently unavailable in the us-central1-a zone

I am facing this issue from yesterday. This is the exact error: Failed to start feature-config: A e2-micro VM instance is currently unavailable in the us-central1-a zone. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation.
I had scheduled Google Compute Engine to TURN on & off at specific time using Instance scheduler but now I am locked out of it. I cannot even create a machine image to deploy on another zone
I changed the Machine Configuration. As from your answer I could figure out that resources might not be available for the US Central Zone possibly due to traffic. I changed configuration to - n2-highcpu-2 vCPU 2 & Memory -2 GB
At the end, it seems this was a general issue that multiple users experienced in us-central1 among other regions.
In this thread more is talked and it seems it got worse during the weekend.
As some suggestions in the comments, changing the zone/region/hardware can help but not always since this also depends on any constraints you may have.
As the error suggests, there aren't any available resources in that regions. I contacted GCP support after facing the same issue and got the following response:
Google Cloud Support, : Upon further checking, the reason that the e2-medium VM instance is currently unavailable is because there are limited VMs available to a specific zone and regions. Best we can do is to try another time or select a different zone so that the VMs that you desire to use will start. Rest assured that there is nothing wrong with your account and it was on the us-central1 zone who do not have available VMs you selected as of the moment.
If possible, try deploying to a different instance. For those who need an instance in us-central1 (for Qwiklabs?) might have to wait until more instances are available.
Similar issue here, but coming from a terraform apply. I've tried multiple zones and every one says both 'e2-small' and 'e2-micro' instances are unavailable. Seems google completely fumbled the "cloud game" here since AWS doesn't have this problem EVER! (not that I like using AWS, it's just "ick" compared to google).

Google Cloud Monitoring groups are including instances that are already closed for a brief time

I have configured a group on Google Cloud Monitoring to select gce_instances following a naming convention for a predefined instance group. However, I have noticed that it seems to include instances that have already been deleted for a brief time (ie. right after a replacement of the vms in the instance group). This is causing additional alerts to be sent for an uptime check that was created for the monitoring group because the uptime checks are still being performed for vms that are already deleted. Is there a way to configure criteria for the group to only consider vm instances that are actually running?
I have also set up autohealing for the instance group with the same triggering conditions as the uptime check which is being used in conjunction with the uptime check. Would it be possible to configure alerts on autohealing instead of using both in conjunction because of the aforementioned situation with uptime checks?
It seems you wanted to configure criteria for the monitoring group to only consider vm instances that are actually running, which is not available currently.
I have created a Feature request. Feel free to post there should you have any additional comments or concerns regarding the issue and also track for future updates.

How to move instance zone in google cloud without running the VM

My VM in google cloud can't run due to below error has shown.
"Starting VM instance failed. Error: The zone does not have enough
resources available to fulfill the request. Try a different zone, or
try again later."
Then I what to start VM from other zones by changing the zone of my VM by this method but it's required VM running.
The problem is I can't run the VM. How can I use another solution?
It looks like Google's having issues with limited External IPs. Try removing the external IP before starting the instance. Then create an external IP and attach it your instance.
You’ve just encountered a stockout issue. A Stockout means that the particular GCP datacenter in that zone has reached its resource limit.
The Google Cloud Platform team are there to make sure that there are available resources in all zones. This type of issue is rare. When a situation like this occurs or is about to occur, the team is notified immediately, the issue is investigated and quickly fixed.
I recommend deploying and balancing your workload across multiple zones or regions to reduce the likelihood of a stockout. Please review the documentation which outlines how to build resilient and scalable architectures on the Google Cloud Platform.
You may also try again later, once resources will be available again in the region.
This being said, I see that you’ve posted your question on November 9th. That was a long time ago. Can you confirm if your issue is fixed now? It is very rare for stockouts to last this long.

get alert when instance is not active in GCE

For Compute Engine of GCE, I use stackdriver monitoring for monitoring and alert.
For most of the general metrics like CPU, disk IO, memory ... etc is available and can set alert for those metrics based or dead-or-alive by process name.
However I cannot find any metrics related to status of GCE instance itself.
My use-case is so simply. I'd like to know if the instance id down or not.
Any suggestion appreciated.
thanks.
think the instance status not a monitoring metric; there's just instance/uptime available.
(and I have no clue what it would return when it is terminated, possibly worth a try).
but one can check for servers with Uptime Checks and then report the Incident.
and one can get the instance status with gcloud compute instances describe instance01.

How to extract an instance uptime based on incidents?

On stackdriver, creating an Uptime Check gives you access to the Uptime Dashboard that contains the uptime % of your service:
My problem is that uptime checks are restricted to http/tcp checks. I have other services running and those services report their health in different ways (say, for example, by a specific process running). I have incident policies already set up for this services, so if the service is not running I get notified.
Now I want to be able to look back and know how long the service was down for the last hour. Is there a way to do that?
There's no way to programmatically retrieve alerts at the moment, unfortunately. Many resource types expose uptime as a metric, though (e.g., instance/uptime on GCE instances) - could you pull those and do the math on them? Without knowing what resource types you're using, it's hard to give specific suggestions.
Aaron Sher, Stackdriver engineer