Permission failure when pulling from gcr.io - google-container-registry

I have 2 VMs running on Google Compute Engine. They are identical except for the fact that they are running under different service accounts.
Both of those service accounts have (as far as I can tell) identical permissions on the buckets used by gcr.io
The init script that runs when the VM starts up pulls a docker container from gcr.io, on the VM running as data-dev-dp#project-id.iam.gserviceaccount.com the pull succeeds:
Unable to find image 'gcr.io/project-id/gdp/jupyterlab-py2-spark-notebook:1.9' locally
1.9: Pulling from project-id/gdp/jupyterlab-py2-spark-notebook
bc51dd8edc1b: Pulling fs layer
b56e3f6802e3: Pulling fs layer
on the VM running as data-dev-cmp#project-id.iam.gserviceaccount.com the pull fails:
Unable to find image 'gcr.io/project-id/gdp/jupyterlab-py2-spark-notebook:1.9' locally
/usr/bin/docker: Error response from daemon: pull access denied for gcr.io/project-id/gdp/jupyterlab-py2-spark-notebook, repository does not exist or may require 'docker login': denied: Permission denied for "1.9" from request "/v2/project-id/gdp/jupyterlab-py2-spark-notebook/manifests/1.9"
I was under the impression that having identical permissions on the bucket should be sufficient hence I'm wondering what other permissions are required in order to make this work. Could anyone suggest something?
UPDATE. I used toolbox (https://cloud.google.com/container-optimized-os/docs/how-to/toolbox) to verify that the permissions on the bucket are not the same for those two accounts:
# gsutil ls gs://artifacts.project-id.appspot.com
gs://artifacts.project-id.appspot.com/containers/
# gsutil ls gs://artifacts.project-id.appspot.com
AccessDeniedException: 403 data-dev-cmp#project-id.iam.gserviceaccount.com does not have storage.objects.list access to artifacts.project-id.appspot.com.
Clearly that's the cause of the issue, though I find it very strange that my screenshots above from the GCP Console suggest different. I am continuing to investigate.

This turned out to be a problem that is all too familiar to us because we are constantly creating infrastructure, tearing it down, and standing it up again. When doing so, particularly when those operations don’t occur cleanly (as was the case today), we can find ourselves in a position whereby roles are assigned to an old instance of a service account. The console will tell you that the account has roles assigned to it but that’s actually not the case. We encounter this problem often.
The solution on this occasion was to tear down all the infrastructure cleanly then recreate it again, including the service account that was exhibiting the problem.

Related

AWS EMR jupyter error 403 Forbidden (Workspace is not attached to cluster)

I have a simple notebook in EMR. I have no running clusters. From the notebook open page itself I request a new cluster so my expectation is that all params necessary to ensure a good workbook-cluster connection are in place. I observe that the release is emr-5.36.0 and that applications Hadoop, Spark, Livy, Hive, JupyterEnterpriseGateway are all included. I am using default security groups.
Both the cluster and the workbook hosts start but upon opening jupyter (or jupyterlab), the kernel launch fails with the message Error 403: Workbook is not attached to cluster. All attempts at "jiggling" the kernel -- choosing a different one, doing a start/stop, etc. -- all yield the same error.
There are a number of docs plus answers here on SO but these tend to revolve around trying to use EC2 instances instead of EMR, messing with master vs. core nodes, forgetting the JupyterGateway, and the like. Again, you'd think that a cluster launched directly from notebook would work.
Any clues?
I have done this many times before and it always works, with the create new cluster option, and default security groups are not an issue.
here is an image of one from before:
One thing that could cause this error, and which you have not made clear is that it will not let you open it as root. So do not use the root AWS account to create the cluster / notebook. Create and use an IAM user that has permissions to launch the cluster
I tried with the admin policy attached.

AWS kibana/ES trying to create policy but getting "authorization exception"

I created an AWS ES cluster via terraform, VPC version.
It got me a kibana instance which I can access through a URL.
I access it via a proxy as it is in a VPC and thus not publicly accessible.
All good. But recently I ran out of disk. The infamous Write Status was in red, and nothing was being written into the cluster anymore.
As this is a dev environment. I googled and found the easiest possible to fix this:
curl -XDELETE <URL>/*
So far so good, logs are being written again.
But I now thought I need to fix this. So I did some more reading and was wanting to create a Index State Management Policy. I just took the default one and just changed the notification destination.
But when hitting "Create Policy" I get:
Sorry, there was an error
Authorization Exception
Which is quite odd as AWS just created a kibana instance with no user management whatsoever - so I would assume to have all rights.
Any idea?
Indeed we had to ask support and the reason it was failing was that - as this is a dev environment and not production - we had no master nodes and also no UltraWarm storage. The sample strategy I was trying to install moves from hot to warm - which apparently actually means UltraWarm, and thus needs that UltraWarm storage enabled.
A bit of an inappropriate error message though.

GCP machine images and credentials

I have a question regarding Google Cloud custom images and how/if credentials are stored. Namely, if I customize a VM and save the machine image with public access, am I possibly exposing credentials??
In particular, I'm working on a cloud-based application that relies on a "custom" image which has both gsutil and docker installed. Basic GCE VMs have gsutil pre-installed but do not have docker. On the other hand, the container-optimized OS have docker, but do not have gsutil. Hence, I'm just starting from a basic debian image and installing docker to get what I need.
Ideally, when I distribute my application, I would like to just expose that customized image for public use; this way, users will not have to spend extra effort to make their own images.
My concern, however, is that since I have used gsutil on the customized VM, persisting this disk to an image will inadvertently save some credentials related to my project (if so, where are they??). Hence, anyone using my image will also get those credentials.
I tried to reproduce your situation. I created a customer image from the disk of an instance who could access my project Storage buckets. Then, I shared the image for another user in a different project. The user could create an instance out of that shared image. However, when he tried to access my project buckets, he encountered AccessDeniedException error.
According to this reproduction and my investigations, your credentials are not exposed with the image. IAM grant permissions are based on roles given to a user, a group, or a service account. Sharing images can't grant them to others.
Furthermore, (as Patrick W mentioned below) any thing you run from within a GCE VM instance will use the VM's service account (unless otherwise specified). As long as the service account has access to the bucket, so will your applications (including docker containers.

Kubernetes Engine unable to pull image from non-private / GCR repository

I was happily deploying to Kubernetes Engine for a while, but while working on an integrated cloud container builder pipeline, I started getting into trouble.
I don't know what changed. I can not deploy to kubernetes anymore, even in ways I did before without cloud builder.
The pods rollout process gives an error indicating that it is unable to pull from the registry. Which seems weird because the images exist (I can pull them using cli) and I granted all possibly related permissions to my user and the cloud builder service account.
I get the error ImagePullBackOff and see this in the pod events:
Failed to pull image
"gcr.io/my-project/backend:f4711979-eaab-4de1-afd8-d2e37eaeb988":
rpc error: code = Unknown desc = unauthorized: authentication required
What's going on? Who needs authorization, and for what?
In my case, my cluster didn't have the Storage read permission, which is necessary for GKE to pull an image from GCR.
My cluster didn't have proper permissions because I created the cluster through terraform and didn't include the node_config.oauth_scopes block. When creating a cluster through the console, the Storage read permission is added by default.
The credentials in my project somehow got messed up. I solved the problem by re-initializing a few APIs including Kubernetes Engine, Deployment Manager and Container Builder.
First time I tried this I didn't succeed, because to disable something you have to disable first all the APIs that depend on it. If you do this via the GCloud web UI then you'll likely see a list of services that are not all available for disabling in the UI.
I learned that using the gcloud CLI you can list all APIs of your project and disable everything properly.
Things worked after that.
The reason I knew things were messed up, is because I had a copy of the same things as a production environment, and there these problems did not exist. The development environment had a lot of iterations and messing around with credentials, so somewhere things got corrupted.
These are some examples of useful commands:
gcloud projects get-iam-policy $PROJECT_ID
gcloud services disable container.googleapis.com --verbosity=debug
gcloud services enable container.googleapis.com
More info here, including how to restore service account credentials.

AWS CodePipeline doesn't work anymore - GitHub's token insufficient permissions

I've created AWS Code Pipeline with GitHub as a source. It was working fine and I was able to fetch repository from GitHub without difficulties. I've deployed my app million times through this pipeline.
Until last Sunday (15-11-2015) when I tried to release changes to my pipeline.
Since then I'm getting
Either the GitHub repository "epub" does not exist, or the GitHub
access token provided has insufficient permissions to access the
repository. Verify that the repository exists and edit the pipeline to
reconnect the action to GitHub.
error message.
I've deleted the pipeline, revoked access to all AWS services on GitHub, created the pipeline from scratch granting access to AWS CodePipeline App on GitHub.
I'm able to set up pipeline correctly, when I'm connecting to GitHub while setting up the pipeline, I'm able to fetch all the repos, and choose branch.
But then after I'm running the pipeline I'm getting this annoying error.
It seems to me that this is an GitHub - AWS access issue, but I have no control over it, as AWS CodePipeline in GitHub's authorised applications controls it.
I'm trying to figure it out for a couple of days now but without success going through different tutorials and potential solutions
Any advise?
I've been experiencing this recently as well; however, it looks like a false-negative: when I push to a branch being watched, despite the Pipeline showing that stage in error, it pulls and delivers source to the next stage.
Strange behavior, but not actually an issue.
Due to a limitation in CodePipeline, if your GitHub account has a large number of repositories you can encounter this error (even though the permissions are set up correctly). This happens with around 2,000 repositories, including repositories where you're a collaborator or an organization member.
Thanks for making us aware of this issue. I will update this post once this limitation is removed, but unfortunately I can't give an estimate for when that will happen.