Google Compute Engine autoscale based on 'used' memory - google-cloud-platform

I'm looking to scale my Compute Engine instances based on memory which is an agent metric in Stackdriver. The caveat is that out of the 5 states that the agent can monitor(buffered, cached, free, slab, used) see the link here, I only want to look at 'used' memory and if that value is above certain %age threshold across the group(or per-instance would also work for me), I want to autoscale.
I've already installed the Stackdriver Monitoring agent in all the nodes across the Managed Instance Group and I am successfully able to visualize 'used' memory in my monitoring dashboard as I'm well acquainted with it.
Unfortunately, I can't do it for autoscaling. This is what I see when I go to configure it in the autoscaling section of MIG.
In my belief, adding filter expressions should work as expected, since this expression works correctly in the Stackdriver console using the Monitoring dashboard. Also, it's mentioned here that the syntax is compatible with Cloud Monitoring filter syntax that is given here.
I've tried different combinations for the syntax in the filter expression field but none of them has worked. Please help.

I was attempting the exact same configuration in attempts to scale based on memory usage. After testing various unsuccessful entries I reached out to Google support. Based on your question I can't tell what kind of instance group you have. It matters because of the following.
TLDR
Based on input from Google support, only zonal instance groups allow the filter expression entry.
Zonal Instance Group
Only zonal instance groups will allow the metric setting. The setting you are attempting to enter is correct with metric.state=used for a zonal instance group. However, that field must be left blank for regional instance group.
Regional Instance Group
As noted above, applying the filter for a regional instance group is not supported. As noted in their documentation they mention that you leave that field blank.
In the Additional filter expression section:For a zonal MIG, optionally enter a filter to use individual values from metrics with multiple streams or labels. For more information, see Filtering per-instance metrics.For a regional MIG, leave this section blank.
If you add an entry you'll receive the message "Regional managed instance groups do not support autoscaling using per-group metrics." when attempting to save your changes.
On the other hand if you leave the field empty it will save. However, I found that leaving the field empty and setting almost any number in the Target Utilization field always caused my group to scale to the maximum number.
Summary
Google informed me that they do have a feature request for this. I communicated that it didn't make sense to even have the option to select percent_used if it's not supported. The response was that we should see the documentation updated in the future to clarify that point.

Related

Google compute commited use/resevation doesnt use my existing instance

I have created instance with same configuration as mentioned in the Commmited use discount/reservation seciton, yet when i goto Reservations, it shows it is currently using none.
reservationslist
reservation configuration
instance configuration
The reservation type in instance is set as "Automatic", but it doesnt automatically detect and match, they both are in same regions and zones. Is there something i am missing?
As per official doc, VM can only consume a reservation if the
properties for both the VM and reservation are matching .
In this consumption model, existing and new instances automatically
count against the reservation if those instances' properties match
the reserved instance's properties
A VM instance can consume a reservation only if all of the following
properties for both the VM and
the reservation match exactly:
Project
Zone
Machine type
Minimum CPU platform
GPU type and count
Local SSD type and count
Refer to this doc explaining to you the requirements and
restrictions for compute engine Vm creations
Follow this official doc for the step by step process of how to Consume instances from any matching reservation.
Thanks for the post Hemanth, everything in my instance was matching but still its not attached to the reservation.
I resolved the issue by creating another instance, and this time while creating the instance, I expanded the "Advnaced Options" in the create instance page, and manually choose the Reservation i wanted it to consume. And it worked!

When creating an AWS Auto-Scaling Launch Configuration & using spot instances - How can I set a maximum price based on unit type?

I regularly am seeing the following througout various AWS documentation:
If you set TargetCapacityUnitType to vcpu or memory-mib , the price
protection threshold is applied based on the per-vCPU or per-memory
price instead of the per-instance price.
Most importantly, I see it on the create-launch-template documentation.
I would like to create a launch configuration for an auto-scaling group that will use a variety of instance types based on their attribute-based selection.
This will of course, allow me to use a number of various instance types - making my spot request more eligible for being fulfilled and have less interruptions.
I've found that I'm able to set a maximum price defined as "Per instance/hour" - but if I'm using a variety of instances which have a slew of different pricing, this of course breaks down.
For this reason - The request-spot-fleet API call has a means of setting a TargetCapacityUnitType so that you're able to define a maximum price based on vCPU or memory instead.
It seems like all the pieces are here - and the aforementioned 'Note' is even on the create-launch-template documentation; but I cannot find where to actually define TargetCapacityUnitType in my Launch Configuration.
When creating an AWS Auto-Scaling Launch Configuration & using spot instances - How can I set a maximum price based on unit type? Is this possible?
You can set up a launch template with your AMI and then use the launch template to create the group. You don't use a launch configuration at all.
All properties you specify for attribute-based instance type selection are part of a mixed instances policy, which is part of the create-auto-scaling-group call.

Autoscaling instance groups used for HTCondor batch workloads?

I've set up an HTCondor cluster using google cloud, following this tutorial.
I like it other than the autoscaling feature. I want something simpler than a target cpu utilization average across all instances in the group. I'd like to just delete a machine if HTCondor has no use for it, once there are not enough jobs to use all of the available clusters.
I could try using instances that delete themselves after a certain amount of time without any use. But then the autoscaler would just spin up another machine. I'd need to change automatically delete the machine and lower the maximum number of replicas.
Any ideas for how to do this?
Tutorial you linked sets instance group to have 2 instance at all times. I assume you already adjusted that.
You can edit autoscaling behavior of your HTCondor instance group by entering Compute Engine → Instance groups → HTCondor group name → Edit group and pressing pencil under Autoscaling policy
Example metric:
More information about autoscaling an instance group can be found here.

Google Cloud Monitoring groups are including instances that are already closed for a brief time

I have configured a group on Google Cloud Monitoring to select gce_instances following a naming convention for a predefined instance group. However, I have noticed that it seems to include instances that have already been deleted for a brief time (ie. right after a replacement of the vms in the instance group). This is causing additional alerts to be sent for an uptime check that was created for the monitoring group because the uptime checks are still being performed for vms that are already deleted. Is there a way to configure criteria for the group to only consider vm instances that are actually running?
I have also set up autohealing for the instance group with the same triggering conditions as the uptime check which is being used in conjunction with the uptime check. Would it be possible to configure alerts on autohealing instead of using both in conjunction because of the aforementioned situation with uptime checks?
It seems you wanted to configure criteria for the monitoring group to only consider vm instances that are actually running, which is not available currently.
I have created a Feature request. Feel free to post there should you have any additional comments or concerns regarding the issue and also track for future updates.

Google Cloud Instance Group Autoscaler can not see my custom log-based metric

In google cloud I created a custom log-based metric to use for my instance group autoscaler. I can view the metric in the metrics explorer. The metrics explorer clearly shows that google added the requred "instance-id" and "zone" labels for me (yay). I can also so that the "Logs Ingestion" status for GCE Autoscaler is "All ingested". Unfortunately I am still seeing the following error in my instance group for the autoscaler:
"The custom metric that was specified does not exist or does not have
the necessary labels. Check the metric."
This error appears in the top of the instance group summary when the instance group is launched. The result is that the metric in the "monitoring" section of the instance group is flat (0) instead of how it looks in the metrics explorer and the instance group is not autoscaled.
Any help would be appreciated.
Thanks!
Think you might have specified incorrect Metrics identifier. You need to specify the name as custom.googleapis.com/appdemo_queue_depth_01.
If you haven't come across this page, please take a look at this page which includes detailed instructions on how to setup custom metrics.