Global GPU quota needed but can't request increase - google-cloud-platform

I'm trying to use GCloud's deep learning VM image. My request for 8 Tesla K80s was approved. But when I try to create an instance with even a single GPU, I get an error saying the Global GPU limit of 0 is exceeded.
The error statement in specific:
ERROR: (gcloud.compute.instances.create) Could not fetch resource: - Quota 'GPUS_ALL_REGIONS' exceeded. Limit: 0.0 globally.
The code I wrote to create the VM is this:
export IMAGE_FAMILY="tf-latest-cu92"
export ZONE="us-west1-b"
export INSTANCE_NAME="my-new-instance"
export INSTANCE_TYPE="n1-standard-8"
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release \
--maintenance-policy=TERMINATE \
--accelerator="type=nvidia-tesla-k80,count=1" \
--machine-type=$INSTANCE_TYPE \
--boot-disk-size=120GB \
--metadata="install-nvidia-driver=True"
This code snippet is drawn from:
https://cloud.google.com/deep-learning-vm/docs/quickstart-cli
Thank you for your time and effort.

I had this same thing happen a while ago. You have to increase the Tesla K80 quota as well as a global quota called GPUS_ALL_REGIONS. I'm not sure how to do this from the command line, but you can do it through the web console by going into your IAM settings, selecting "Quotas" from the side bar. In the dropdown labeled "Metric", deselect everything except for "GPUs (all regions)". You will now need to increase this quota to 8 as well. Once it is approved, you will be able to use all of your GPUs.

UPDATE 2022:
Here is how to do it in the 2022 Gcloud UI:
Simply write: GPUS_ALL_REGION in the filter input and then edit the selected quota.

Although this was already answered by #Alex Krantz, here is a screenshot to the corresponding UI mask in the Gcloud UI.
You can nativagate to this page through "IAM & Admin", then "Quotas".

Related

GKE cluster creator in GCP

How can we get the cluster owner details in GKE. Logging part only contains the entry with service account operations and there is no entry with principal email of userId anywhere.
It seems very difficult to get the name of the user who created the GKE cluster.
we have exported complete json file of logs but did not the user entry who actually click on create cluster button. I think this is very common use case to know GKE cluster creator, not sure if we are missing something.
Query:
resource.type="k8s_cluster"
resource.labels.cluster_name="clusterName"
resource.labels.location="us-central1"
-protoPayload.methodName="io.k8s.core.v1.configmaps.update"
-protoPayload.methodName="io.k8s.coordination.v1.leases.update"
-protoPayload.methodName="io.k8s.core.v1.endpoints.update"
severity=DEFAULT
-protoPayload.authenticationInfo.principalEmail="system:addon-manager"
-protoPayload.methodName="io.k8s.apiserver.flowcontrol.v1beta1.flowschemas.status.patch"
-protoPayload.methodName="io.k8s.certificates.v1.certificatesigningrequests.create"
-protoPayload.methodName="io.k8s.core.v1.resourcequotas.delete"
-protoPayload.methodName="io.k8s.core.v1.pods.create"
-protoPayload.methodName="io.k8s.apiregistration.v1.apiservices.create"
I have referred the link below, but it did not help either.
https://cloud.google.com/blog/products/management-tools/finding-your-gke-logs
Audit Logs and specifically Admin Activity Logs
And, there's a "trick": The activity audit log entries include the API method. You can find the API method that interests you. This isn't super straightforward but it's relatively easy. You can start by scoping to the service. For GKE, the service is container.googleapis.com.
NOTE APIs Explorer and Kubenetes Engine API (but really container.googleapis.com) and projects.locations.clusters.create. The mechanism breaks down a little here as the protoPayload.methodName is a variant of the underlying REST method name.
And so you can use logs explorer with the following very broad query:
logName="projects/{PROJECT}/logs/cloudaudit.googleapis.com%2Factivity"
container.googleapis.com
NOTE replace {PROJECT} with the value.
And then refine this based on what's returned:
logName="projects/{PROJECT}/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.serviceName="container.googleapis.com"
protoPayload.methodName="google.container.v1beta1.ClusterManager.CreateCluster"
NOTE I mentioned that it isn't super straightforward because, as you can see in the above, I'd used gcloud beta container clusters create and so I need the google.container.v1beta1.ClusterManager.CreateCluster method but, it was easy to determine this from the logs.
And, who dunnit?
protoPayload: {
authenticationInfo: {
principalEmail: "{me}"
}
}
So:
PROJECT="[YOUR-PROJECT]"
FILTER="
logName=\"projects/${PROJECT}/logs/cloudaudit.googleapis.com%2Factivity\"
protoPayload.serviceName=\"container.googleapis.com\"
protoPayload.methodName=\"google.container.v1beta1.ClusterManager.CreateCluster\"
"
gcloud logging read "${FILTER}" \
--project=${PROJECT} \
--format="value(protoPayload.authenticationInfo.principalEmail)"
For those who are looking for a quick answer.
Use the log filter in Logs Explorer & use below to check the creator of the cluster.
resource.type="gke_cluster"
protoPayload.authorizationInfo.permission="container.clusters.create"
resource.labels.cluster_name="your-cluster-name"
From gcloud command, you can get the creation date of the cluster.
gcloud container clusters describe YOUR_CLUSTER_NAME --zone ZONE

Why am I getting inconsistent results when attempting to update my instance group using `gcloud`?

I have an instance group in GCP, and I am working on automating the deployment process. The instances in this group are based on a tagged GCR image. When a new image is pushed to the container registry, we have been manually triggering an upgrade by navigating to the instance group from console.cloud.google.com, clicking "restart/replace vms", and setting these options:
Operation: replace
Maximum surge: 3
Maximum unavailable: 0
Here is my gcloud command for doing the same thing (link to Google's documentation about this command):
gcloud beta compute instance-groups managed rolling-action start-update my-instance-group \
--version=template=my-template-with-image \
--replacement-method=substitute \
--max-surge=3 \
--max-unavailable=0 \
--region=us-central1
Manually, the process always works. But the gcloud command is flaky. It always appears to succeed from the command line, but the instance groups are not always restarted. I have even tried adding these two flags, and the restart attempt was still unreliable:
--minimal-action=replace \
--most-disruptive-allowed-action=replace \
There is quite a lot of output from the gcloud command (which I can provide, if necessary), but here are the only parts of the output that differ between a successful and unsuccessful attempt:
Good:
currentActions:
creating: 1
status:
isStable: false
versionTarget:
isReached: false
Bad:
currentActions:
creating: 0
status:
isStable: true
versionTarget:
isReached: true
That is pretty much the extent of my knowledge at this point. I am not sure how to move forward in automating the build process, and I have been unable to find answers from the documentation so far.
I hope I was not too verbose, and thank you in advance to anyone who spends time on this :)

Error with gcloud beta command for streaming assets to bigquery

This might be a bit bleeding edge but hopefully someone can help. The problem is a catch 22.
So what we're trying to do is create a continuous stream of inventory changes in each GCP project to BigQuery dataset tables that we can create reports from and get a better idea of what we're paying for, what's turned on what's in use what isn't, etc.
Error: Error running command 'gcloud beta asset feeds create asset_change_feed --project=project_id --pubsub-topic=asset_change_feed': exit status 2. Output: ERROR: (gcloud.beta.asset.feeds.create) argument (--asset-names --asset-types): Must be specified.
Usage: gcloud beta asset feeds create FEED_ID --pubsub-topic=PUBSUB_TOPIC (--asset-names=[ASSET_NAMES,...] --asset-types=[ASSET_TYPES,...]) (--folder=FOLDER_ID | --organization=ORGANIZATION_ID | --project=PROJECT_ID) [optional flags]
optional flags may be --asset-names | --asset-types | --content-type |
--folder | --help | --organization | --project
For detailed information on this command and its flags, run:
gcloud beta asset feeds create --help
Using terraform we tried creating a dataflow job and a pubsub topic called asset_change_feed.
We get an error trying to create the pubsub topic because the gcloud beta asset feeds create command wants a parameter that includes all the asset names monitor...
Well... this kind of defeats the purpose. The whole point is to monitor all the asset names that change, appear and disappear. It's like creating a feed that monitors all the new baby names that appear over the next year but the feed command requires that we know them in advance somehow. WTF? What's the point then? Are we re-inventing the wheel here?
We were going by this documentation here:
https://cloud.google.com/asset-inventory/docs/monitoring-asset-changes#creating_a_feed
As per the gcloud beta asset feeds create documentation it is required to specify at least one of --asset-names and --asset-types:
At least one of these must be specified:
--asset-names=[ASSET_NAMES,…] A comma-separated list of the full names of the assets to receive updates. For example:
//compute.googleapis.com/projects/my_project_123/zones/zone1/instances/instance1.
See
https://cloud.google.com/apis/design/resource_names#full_resource_name
for more information.
--asset-types=[ASSET_TYPES,…] A comma-separated list of types of the assets types to receive updates. For example:
compute.googleapis.com/Disk,compute.googleapis.com/Network See
https://cloud.google.com/resource-manager/docs/cloud-asset-inventory/overview
for all supported asset types.
Therefore, when we don't know the names a priori we can monitor all resources of the desired types by only passing --asset-types. You can see the list of supported asset types here or use the exportAssets API method (gcloud asset export) to retrieve the types used at an organization, folder or project level.

API [sqladmin.googleapis.com] not enabled on project [1234].

When running: gcloud sql instances create example --tier=db-n1-standard-1 --region=europe-west1
I get the error in the title, though I'm not too sure why as I do have the 'Google Cloud SQL API' enabled.
What is the cause of this error?
It seems it takes a while (a few minutes) a for the change to propagate...

How do I filter and extract raw log event data from Amazon Cloudwatch

Is there any way to 1) filter and 2) retrieve the raw log data out of Cloudwatch via the API or from the CLI? I need to extract a subset of log events from Cloudwatch for analysis.
I don't need to create a metric or anything like that. This is for historical research of a specific event in time.
I have gone to the log viewer in the console but I am trying to pull out specific lines to tell me a story around a certain time. The log viewer would be nigh-impossible to use for this purpose. If I had the actual log file, I would just grep and be done in about 3 seconds. But I don't.
Clarification
In the description of Cloudwatch Logs, it says, "You can view the original log data (only in the web view?) to see the source of the problem if needed. Log data can be stored and accessed (only in the web view?) for as long as you need using highly durable, low-cost storage so you don’t have to worry about filling up hard drives." --italics are mine
If this console view is the only way to get at the source data, then storing logs via Cloudwatch is not an acceptable solution for my purposes. I need to get at the actual data with sufficient flexibility to search for patterns, not click through dozens of pages lines and copy/paste. It appears a better way to get to the source data may not be available however.
For using AWSCLI (plain one as well as with cwlogs plugin) see http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/SearchDataFilterPattern.html
For pattern syntax (plain text, [space separated] as as {JSON syntax}) see: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/FilterAndPatternSyntax.html
For python command line utility awslogs see https://github.com/jorgebastida/awslogs.
AWSCLI: aws logs filter-log-events
AWSCLI is official CLI for AWS services and now it supports logs too.
To show help:
$ aws logs filter-log-events help
The filter can be based on:
log group name --log-group-name (only last one is used)
log stream name --log-stream-name (can be specified multiple times)
start time --start-time
end time --end-time (not --stop-time)
filter patter --filter-pattern
Only --log-group-name is obligatory.
Times are expressed as epoch using milliseconds (not seconds).
The call might look like this:
$ aws logs filter-log-events \
--start-time 1447167000000 \
--end-time 1447167600000 \
--log-group-name /var/log/syslog \
--filter-pattern ERROR \
--output text
It prints 6 columns of tab separated text:
1st: EVENTS (to denote, the line is a log record and not other information)
2nd: eventId
3rd: timestamp (time declared by the record as event time)
4th: logStreamName
5th: message
6th: ingestionTime
So if you have Linux command line utilities at hand and care only about log record messages for interval from 2015-11-10T14:50:00Z to 2015-11-10T15:00:00Z, you may get it as follows:
$ aws logs filter-log-events \
--start-time `date -d 2015-11-10T14:50:00Z +%s`000 \
--end-time `date -d 2015-11-10T15:00:00Z +%s`000 \
--log-group-name /var/log/syslog \
--filter-pattern ERROR \
--output text| grep "^EVENTS"|cut -f 5
AWSCLI with cwlogs plugin
The cwlogs AWSCLI plugin is simpler to use:
$ aws logs filter \
--start-time 2015-11-10T14:50:00Z \
--end-time 2015-11-10T15:00:00Z \
--log-group-name /var/log/syslog \
--filter-pattern ERROR
It expects human readable date-time and always returns text output with (space delimited) columns:
1st: logStreamName
2nd: date
3rd: time
4th till the end: message
On the other hand, it is a bit more difficult to install (few more steps to do plus current pip requires to declare the installation domain as trusted one).
$ pip install awscli-cwlogs --upgrade \
--extra-index-url=http://aws-cloudwatch.s3-website-us-east-1.amazonaws.com/ \
--trusted-host aws-cloudwatch.s3-website-us-east-1.amazonaws.com
$ aws configure set plugins.cwlogs cwlogs
(if you make typo in last command, just correct it in ~/.aws/config file)
awslogs command from jorgebastida/awslogs
This become my favourite one - easy to install, powerful, easy to use.
Installation:
$ pip install awslogs
To list available log groups:
$ awslogs groups
To list log streams
$ awslogs streams /var/log/syslog
To get the records and follow them (see new ones as they come):
$ awslogs get --watch /var/log/syslog
And you may filter the records by time range:
$ awslogs get /var/log/syslog -s 2015-11-10T15:45:00 -e 2015-11-10T15:50:00
Since version 0.2.0 you have there also the --filter-pattern option.
The output has columns:
1st: log group name
2nd: log stream name
3rd: message
Using --no-group and --no-stream you may switch the first two columns off.
Using --no-color you may get rid of color control characters in the output.
EDIT: as awslogs version 0.2.0 adds --filter-pattern, text updated.
If you are using the Python Boto3 library for extraction of AWS cloudwatch Logs. The function of get_log_events() accepts start and end time in milliseconds.
For reference: http://boto3.readthedocs.org/en/latest/reference/services/logs.html#CloudWatchLogs.Client.get_log_events
For this you can take a UTC time input and convert it into milliseconds by using the Datetime and timegm modules and you are good to go:
from calendar import timegm
from datetime import datetime, timedelta
# If no time filters are given use the last hour
now = datetime.utcnow()
start_time = start_time or now - timedelta(hours=1)
end_time = end_time or now
start_ms = timegm(start_time.utctimetuple()) * 1000
end_ms = timegm(end_time.utctimetuple()) * 1000
So, you can give inputs as stated below y using sys input as:
python flowlog_read.py '2015-11-13 00:00:00' '2015-11-14 00:00:00'
While Jan's answer is a great one and probably what the author wanted, please note that there is an additional way to get programmatic access to the logs - via subscriptions.
This is intended for always-on streaming scenarios where data is constantly fetched (usually into Kinesis stream) and then further processed.
Haven't used it myself, but here is an open-source cloudwatch to Excel exporter I came across on GitHub:
https://github.com/petezybrick/awscwxls
Generic AWS CloudWatch to Spreadsheet Exporter CloudWatch doesn't provide an Export utility - this does. awscwxls creates spreadsheets
based on generic sets of Namespace/Dimension/Metric/Statistic
specifications. As long as AWS continues to follow the
Namespace/Dimension/Metric/Statistic pattern, awscwxls should work for
existing and future Namespaces (Services). Each set of specifications
is stored in a properties file, so each properties file can be
configured for a specific set of AWS Services and resources. Take a
look at run/properties/template.properties for a complete example.
I think the best option to retrieve the data is provided as described in the API.