Nomad failing to pull from GCP container registery due to docker auth - google-container-registry

This is my server config:
client {
enabled = true
servers = ["127.0.0.1:4647"]
}
I get an error of us.gcr.io/PROJECTID/IMAGE/NAME:latest: API error (404): {"message":"pull access denied for us.gcr.io/PROJECTID/IMAGE/NAME, repository does not exist or may require 'docker login'"}
I'm trying to pull an image from google cloud container registery.
How do I instruct nomad to use the docker authentication via gcloud?

gcloud auth configure-docker
And the server config:
client {
enabled = true
servers = ["127.0.0.1:4647"]
options = {
"docker.auth.helper" = "gcloud"
}
}
I have no idea how this works when I'll need other docker auth helpers... But oh well

Related

Google Cloud VM Instance is not authorized with service account

I have dist-upgraded my Debian VM instance to current version after which I have re-installed the google cloud host environment. This is the state of google-related services:
$ systemctl list-unit-files | grep google
google-cloud-ops-agent-diagnostics.service enabled enabled
google-cloud-ops-agent-fluent-bit.service static -
google-cloud-ops-agent-opentelemetry-collector.service static -
google-cloud-ops-agent.service enabled enabled
google-guest-agent.service enabled enabled
google-osconfig-agent.service enabled enabled
google-oslogin-cache.service static -
google-shutdown-scripts.service enabled enabled
google-startup-scripts.service enabled enabled
google-oslogin-cache.timer enabled enabled
I cannot see any errors with them in system logs. My gcloud config list looks like:
[core]
account = my-project-id-compute#developer.gserviceaccount.com
disable_usage_reporting = True
project = my-project
Your active configuration is: [default]
Everything seems to be fine, and yet I cannot access any gcloud resources, eg.
# gcloud compute instances list
ERROR: (gcloud.compute.instances.list) Some requests did not succeed:
- Request had insufficient authentication scopes.
I have some other instance, which was not upgraded. The commands above give the same result, but the gcloud compute instances list works correctly. GOOGLE_APPLICATION_CREDENTIALS is undefined on both instances and none of the instances has $HOME/.config/gcloud/application_default_credentials.json file.
How can I authorize the service account so that it is usable on the broken instance?

Batch cannot pull docker image from Artifact Registry

I use a workflow to create a batch job using a docker image hosted in a docker registry.
All of this happens within the same google cloud project.
My batch job fails with this error :
"docker: Error response from daemon: Head "https://us-west1-docker.pkg.dev/v2/entity/docker-registry/image-name/manifests/latest": denied: Permission "artifactregistry.repositories.downloadArtifacts" denied on resource "projects/project-id/locations/us-west1/repositories/docker-registry" (or it may not exist).
See 'docker run --help'.
From google documentation I understand that Compute Engine's service account doesn't have the roles/artifactregistry.admin : Jobs default to using the Compute Engine default service account
I get the same error after giving the role to the service account :
gcloud projects add-iam-policy-binding project-id \
--member=serviceAccount:compute#developer.gserviceaccount.com \
--role=roles/artifactregistry.admin
While digging service accounts I found another service another service account and also gave it the role : service-xxxx#gcp-sa-cloudbatch.iam.gserviceaccount.com.
It does not solve the problem.
How can I see which service account is used ?
Can I see logs about denied permissions ?
The error occurs when you are trying to push an image on a repository in which a specific hostname associated with its repository location is not yet authenticated and specified in the credential helper.You may refer to this Setting up authentication for Docker .You may check and confirm the service account to make sure you are still impersonating the correct one ,run below as mentioned in document
gcloud auth list
This command will show the active account, along with the other
accounts that are authorized to access your Google Cloud project. The
active account will be marked with an asterisk (*).
Try to run the authentication using a command specifying the location of your repository.You may try to run the configure-docker command against the auth group and see.
gcloud auth configure-docker <location>-docker.pkg.dev
And then try pulling the Docker image again.
Refer Authenticating to a repository for more information and you can see these logs permission denied logs in Cloud logging for more details.

How to configure Packer ssh into GCP VM for building image?

I am building GCP image with packer. I created service account of "Compute Instance Admin v1" and "Service Account User". It can successfully create the VM but cannot ssh into the instance to proceed further for the custom image.
Error message
Build 'googlecompute.custom-image' errored after 2 minutes 20 seconds: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
build file source code (packer.pkr.hcl)
locals {
project_id = "project-id"
source_image_family = "rocky-linux-8"
source_image_project_id = ["rocky-linux-cloud"]
ssh_username = "packer"
machine_type = "e2-medium"
zone = "us-central1-a"
}
source "googlecompute" "custom-image" {
image_name = "custom-image" # Name of image to be created
image_description = "Custom Image 1" # Description for image to be created
project_id = "${local.project_id}"
source_image_family = "${local.source_image_family}"
source_image_project_id = "${local.source_image_project_id}"
ssh_username = "${local.ssh_username}"
machine_type = "${local.machine_type}"
zone = "${local.zone}"
}
build {
sources = ["source.googlecompute.custom-image"]
#
# Run arbitrary shell script file
#
provisioner "shell" {
execute_command = "sudo su - root -c \"sh {{ .Path }} \""
script = "foo.sh"
}
}
It appears that you are having trouble connecting via SSH to the Packer-created instance for your GCP image. If the username and password are incorrect or if the necessary permissions are not granted, this error message indicates that the authentication process failed. Check to see if the Compute Instance Admin v1 and Service Account User roles have the necessary access rights to resolve this issue. In addition, the project's firewall rules may need to be set up to allow incoming SSH connections on the port you're using. You can refer to the official GCP documentation for more information regarding the configuration of firewall rules. You can also connect to the instance and continue troubleshooting the issue by using the "gcloud compute ssh" command.
Attaching troubleshooting ssh for reference.
The problem is associated with Qwiklab. I was using the lab environment provided by Qwiklab for testing packer and GCP.
Once I deployed the same thing on regular GCP project. The packer ran successfully. it is suggested there may be some constraints in the lab environment of Qwiklab.

What are access scopes in gcloud auth, and why do they differ in Cloud Shell vs. my local machine?

I'm seeing a permissions bug when using docker push as described in the Google Artifact Registry Quickstart. As noted in that question, the problem seems to come down to missing scopes on the access token. In my local shell, the scopes are these (as indicated by https://www.googleapis.com/oauth2/v1/tokeninfo?access_token=<token>):
openid https://www.googleapis.com/auth/userinfo.email https://www.googleapis.com/auth/cloud-platform https://www.googleapis.com/auth/appengine.admin https://www.googleapis.com/auth/compute https://www.googleapis.com/auth/accounts.reauth
When I run the same sequence of steps in Cloud Shell, I have many more scopes on the access token:
https://www.googleapis.com/auth/userinfo.email https://www.googleapis.com/auth/appengine.admin https://www.googleapis.com/auth/bigquery https://www.googleapis.com/auth/compute https://www.googleapis.com/auth/devstorage.full_control https://www.googleapis.com/auth/devstorage.read_only https://www.googleapis.com/auth/drive https://www.googleapis.com/auth/ndev.cloudman https://www.googleapis.com/auth/cloud-platform https://www.googleapis.com/auth/sqlservice.admin https://www.googleapis.com/auth/prediction https://www.googleapis.com/auth/projecthosting https://www.googleapis.com/auth/source.full_control https://www.googleapis.com/auth/source.read_only https://www.googleapis.com/auth/source.read_write openid"
I'm not able to pinpoint what differences between my Cloud Shell configuration and my local one might cause this difference in scopes. These commands all have the same output on both:
$ gcloud auth list
Credentialed Accounts
ACTIVE: *
ACCOUNT: <my email address>
$ cat ~/.docker/config.json
{
"credHelpers": {
"gcr.io": "gcloud",
"us.gcr.io": "gcloud",
"eu.gcr.io": "gcloud",
"asia.gcr.io": "gcloud",
"staging-k8s.gcr.io": "gcloud",
"marketplace.gcr.io": "gcloud",
"us-central1-docker.pkg.dev": "gcloud"
}
}
gcloud config list shows these differences:
// in Cloud Shell
[accessibility]
screen_reader = True
[component_manager]
disable_update_check = True
[compute]
gce_metadata_read_timeout_sec = 30
[core]
account = <my email address>
disable_usage_reporting = True
project = <my project>
[metrics]
environment = devshell
// on my local machine
[core]
account = <my email address>
disable_usage_reporting = True
pass_credentials_to_gsutil = false
project = <my project>
Questions:
What are scopes here anyway? What is their relationship to the roles assigned to the project principal (example#stackoverflow.com)?
What could be causing my scopes to differ in Cloud Shell vs on my local machine? How do I fix it so I can correctly access the Artifact Registry locally?
EDIT:
To clarify, here are the commands I'm running and the error I'm seeing, which exactly duplicates the SO question referenced above. Commands are taken directly from the [Artifact Registry Quickstart]
(https://cloud.google.com/artifact-registry/docs/docker/quickstart#gcloud). This question was intended to be about scopes, but seems like those may not be my issue.
$ gcloud auth configure-docker us-central1-docker.pkg.dev
WARNING: Your config file at [~/.docker/config.json] contains these credential helper entries:
{
"credHelpers": {
"gcr.io": "gcloud",
"us.gcr.io": "gcloud",
"eu.gcr.io": "gcloud",
"asia.gcr.io": "gcloud",
"staging-k8s.gcr.io": "gcloud",
"marketplace.gcr.io": "gcloud",
"us-central1-docker.pkg.dev": "gcloud"
}
}
Adding credentials for: us-central1-docker.pkg.dev
gcloud credential helpers already registered correctly.
$ sudo docker tag us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0 \
us-central1-docker.pkg.dev/<my project>/quickstart-docker-repo/quickstart-image:tag1
$ sudo docker push us-central1-docker.pkg.dev/<my project>/quickstart-docker-repo/quickstart-image:tag1
The push refers to repository [us-central1-docker.pkg.dev/<my project>/quickstart-docker-repo/quickstart-image]
260c3e3f1e70: Preparing
e2eb06d8af82: Preparing
denied: Permission "artifactregistry.repositories.downloadArtifacts" denied on resource "projects/qwanto/locations/us-central1/repositories/quickstart-docker-repo" (or it may not exist)
What are scopes here anyway? What is their relationship to the roles
assigned to the project principal (example#stackoverflow.com)?
In Google Cloud, permissions for an identity are determined by Google Cloud IAM roles. This is an important point to understand.
OAuth Scopes are using when requesting authorization. Scopes can limit permissions granted to a subset of the permissions granted by an IAM role. Scopes cannot grant permissions that exceed or are not included by an IAM role.
Think of the resulting permissions being the intersection of IAM Roles and OAuth Scopes.
Note: You have the scope https://www.googleapis.com/auth/cloud-platform which is sufficient. The other scopes are just extras. Ignore scopes and make sure your IAM roles are correct.
What could be causing my scopes to differ in Cloud Shell vs on my
local machine? How do I fix it so I can correctly access the Artifact
Registry locally?
You are chasing the wrong details (scopes) in solving your problem. Provided that you have the correct IAM roles granted to your identity, you can push to Container Registry and Artifact Registry.
Presumably running in the same issue, I've found the solution somewhat hidden in the docs (at the end of the linked section):
Note: If you normally run Docker commands on Linux with sudo, Docker
looks for Artifact Registry credentials in /root/.docker/config.json
instead of $HOME/.docker/config.json. If you want to use sudo with
docker commands instead of using the Docker security group, configure
credentials with sudo gcloud auth configure-docker instead.
So basically, the quick-start works only if you don't use sudo for running docker.

Google Stackdriver Logging doesn't work in Google Cloud Shell nor GKE

I've built a docker image for GKE, and want to use Google Stackdriver Logging.
For the moment I'm just trying to log Service started when a service starts.
While running the container on my host works well (In Google Cloud Console > Logs Viewer > Global, I can see Service started when expected), running the container the exact same way on Google Cloud Shell doesn't log anything. Deploying to GKE does the exact same behavior, no errors but I can't find the supposedly created logs.
Here are the scopes for my cluster:
cloud-platform,compute-rw,datastore,default,storage-full,logging-write,service-control,service-management.
Note that the logging client gets successfully created:
client, err := logging.NewClient(ctx, projectID)
if err != nil {
log.Fatalf("Failed to create the logging client: %v", err)
} else {
fmt.Println("Logging client created")
}
app.Logger = client.Logger(logName)
text := "Started service !"
app.Logger.Log(logging.Entry{
Payload: text,
})
I get "Logging client created" every time in my cluster logs, or when running the container manually inside the Google Cloud Shell.
But I get "Started service !" only when running the container on my own machine.
I ran the gcloud logging command:
gcloud logging read "logName=projects/${PROJECT_ID}/logs/${LOG_NAME}"
and I found out the type: gce_instance instead of the expected type: global.
Thanks to https://stackoverflow.com/a/45085569/7046455 I found my logs under GCE VM Instance.
It is quite surprising NOT to be able to simply gets all the logs by log name on the Google Cloud Console, while it is possible on the CLI...
EDIT: these are actually the logs from the Google Cloud Shell, NOT from my containers ! I still haven't found out why my logs are not created in my cluster ...