What are the credentials used by Datalab for accessing data? - google-cloud-platform

I have access to a BigQuery table and I can use it from BigQuery console or gcloud command line. But I am unable to write basic queries against it in Datalab and get an access denied error.

Datalab is intended for use in a team environment. Notebooks may contain results of code execution (e.g. a BigQuery SQL query) and are accessible to members of the project. Hence, Datalab uses the App Engine service account in your project to access data. This ensures uniform access for viewing and executing notebooks and minimizes the risk of accidental disclosure of data. If you do not control access to data, you may need to ask that access be granted to the service account. You can find the service account in the Developers Console by clicking Permissions in the left navigation bar and locating the App Engine service account. Currently, Datalab does not use individual user's credentials.

Was it the same project that you worked in from BigQuery Console and in Datalab? If yes, you need to be the project owner/editor permission.
Also, please notice that in Google Datalab, the notebook is using a service account to get access to data, instead of your own account. So you can check if there's any permission differences between these two accounts. For example, if in your queries, you were referring to data set in another BigQuery project, you can do these steps:
run the following command in your datalab notebook to check which service account is being used:
%%bash
curl --silent -H "Metadata-Flavor: Google" \
http://metadata/computeMetadata/v1/instance/service-accounts/default/email
add the service account shown as the result of step 1 to the permission list of the other projects that are being queried

Related

VM Instance Service Account can't be recognized

Although I've gave Owner role to that specific service, I can't use the permissions from my instances that I connect with SSH from my local.
Also can't upload my files to Storage bucket which I've created in cloud platform.
Here is the screenshots of the problem:
The problem might be caused by the access token not having the appropriate permission scopes to conduct the required activity. To make sure you're using the auth scope of this service account appropriately, I recommend doing the following:
Run the command in the Google documentation inside the VM to
create a new key for the service account. This will create a .json
file inside the current directory containing the private
authentication key for the service account.
Run the command in the Google documentation to activate the
service account.
Run the command: $gcloud auth list to check if this worked.
In the output you should see an asterisk before the service
account’s name, indicating that this is the service account you are
currently using.
Now refer to the Google documentation and run the $env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"
Google Cloud Compute VMs have a setting for Access Scopes. This feature can limit the permissions that a service account has when attached to a virtual machine.
Go to the Google Cloud Console GUI, select your VM, stop the VM and then edit Acess Scopes to grant the permissions you require.
Access scopes

Authenticating to pubsub in a co-lab notebook via the gcloud auth command

I would like to authenticate to pubsub via a co-lab notebook by using the !gcloud auth command. However, when I run that command, I am able to authenticate to cloud sql as well as to the gcp buckets, but I get the following error message when I run the publisher = pubsub.PublisherClient() command:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
I want to avoid uploading my cloud credentials to a google drive folder to authenticate with pubsub, since i want to be able to share this notebook across my organization so that other users with the correct access rights can also run the notebook directly from their end without needing to upload their own service account credentials. Is there a way that I can do this? Thanks in advance.
Use:
from google.colab import auth
auth.authenticate_user()
As in this example showing Google Cloud Storage access:
https://colab.research.google.com/notebooks/io.ipynb#scrollTo=NQX0hbfYaEKc

Google Cloud Run permissions to query bigquery

I have a small python app running in google cloud run with docker. The application is triggered by http requests, executes a query in big query and return the result. Unfortunately I get the following permission error:
Reason: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/XXXX/jobs: Access Denied: Project XXXX: User does not have bigquery.jobs.create permission in project XXXX.\n\n(job ID: XXXX-XX-XX-XX-XXXX)\n\n
I understand I need to give access from cloud run to big query. How do I do it? to which user? how can i find out?
You need to add BiqQuery permissions via IAM Roles to the service account assigned to Cloud Run.
To allow Cloud Run to create Big Query jobs (bigquery.jobs.create) you need one of the following roles:
roles/bigquery.user
roles/bigquery.jobUser
The service account for Cloud Run is displayed in the Google Cloud Console in the Cloud Run section for your service. Most likely this is Compute Engine default service account.
To add a BiqQuery role, you can use the Google Cloud Console. Go to IAM, find the service account. Add roles to the service account.
Documentation:
BigQuery predefined Cloud IAM roles
Service accounts on Cloud Run (fully managed)
Granting roles to service accounts
One of the issues could be that Service Account which your Cloud Run job is using does not have permissions on BigQuery.
You can update the service account permission and add roles/bigquery.user role to create a job.
Also, based on your application requirement add relevant roles. You can see details about different BigQuery roles here.
A good rule is provide only required permissions to a service account.
I hope this helps.
The application is triggered by http requests, executes a query in big query and return the result.
From the security standpoint the permissions required are identical to those used by the custom website from this solution. I'm the author. The website is also triggered by http requests, executes a query in BQ and returns the result. And granting the permission to create jobs (via bigquery.jobUser role) is not enough.
You can grant the required permissions to the service account in different ways (e.g. a more sweeping permission and a more restricted one), the details are here at the Step 6.
Generally speaking, the more restricted and the more granular the permissions are the better for security.
I'm adding extra clarifications and also pasting specific instructions related to Google's tools usage.
To add the permission to create and run jobs (the BQ error message says this permission is lacking) execute the command:
gcloud projects add-iam-policy-binding <project-name> --member=serviceAccount:<sa-name>#<project-name>.iam.gserviceaccount.com --role roles/bigquery.jobUser
The command can be executed in Cloud Shell, open it using the "Activate Cloud Shell" icon in BigQuery Web UI or from other Google Console page. Replace the placeholders:
<sa-name> - replace with service account name used by Cloud Run,
<project-name> - replace with the project name.
The command adds the role bigquery.jobUser to the service account. Do not add other permissions/roles to solve the inability to create/run jobs because excessive permissions are bad for security.
Another permission is required to read BQ data. There are two options to add it:
Grant the bigquery.dataViewer role to the service account:
gcloud projects add-iam-policy-binding <project-name> --member=serviceAccount:<sa-name>#<project-name>.iam.gserviceaccount.com --role roles/bigquery.dataViewer
Then proceed to the next step. Not recommended unless you are using a throw-away project. The drawback of this approach is granting permissions to view all project datasets.
Take more granular approach (recommended) by allowing the service account to query one dataset only. This is the approach described below.
Execute the commands replacing <ds-name> with the dataset name (used by your query):
bq show --format=prettyjson <ds-name> >/tmp/mydataset.json
vi /tmp/mydataset.json
Using vi, append the following item to the existing access array and replace the placeholders before saving the file:
,
{
"role": "READER",
"userByEmail": "[<sa-name>#<project-name>.iam.gserviceaccount.com](mailto:<sa-name>#<project-name>.iam.gserviceaccount.com)"
}
Execute the command to effect the changes for the dataset:
bq update --source /tmp/mydataset.json <ds-name>

Why using service account from VM instance when listing BQ jobs

I am trying to list the jobs running in big query for many projects using a user account which is having owner access on the gcp projects. We are using python APIs and all this process is running on a VM instance. But listing the job is failing because default service account for the VM instance doesn't have permission for other projects.
What I am unable to understand is, why using service account when the user have all the access. We don't want to create a service account with owner access, So is there anyway that we can list the BQ jobs only with our own account and not using service account.
This is the python code I am using for listing the job:
from google.cloud import bigquery
import pandas as pd
client = bigquery.Client(project=<project_ID>)
job_list = client.list_jobs(project=<project_ID>,max_results=100000, state_filter='running', all_users=True)
I tried giving the credentials with json file using below command, But that also was not helpful.
client = bigquery.Client.from_service_account_json("0874ee00257b.json")
Because by default, the GCE instance (VM) is authenticated using the service account it has been given access to when it was created (default service account). That VM & service account is linked only to the project in which it has been created, and not to a user. This makes sense when you think about it. You wouldn't want VMs (or any services on GCP in fact) authenticated/tied to an individual user. This would be bad practice e.g. what is that user leaves the company and their account is deleted.
So is there anyway that we can list the BQ jobs only with our own account and not using service account.
So, back to your actual question. Yes, but I wouldn't recommend doing this if it's something you intend to deploy and productionize/operationalize.
SSH into the VM
Run gcloud auth login
Follow the prompts
Note: when you do this, Google will even tell you it's not recommended and you should use service accounts instead:
Some more info here: https://cloud.google.com/sdk/docs/authorizing

Cloud ML Service account cannot access Cloud Storage and is not listed in IAM & admin panel

When creating a new version of an ML Engine Model with the command
gcloud ml-engine versions create 'v1' --model=model_name --origin=gs://path_to_model/1/ --runtime-version=1.4
I recieve the following error:
ERROR: (gcloud.ml-engine.versions.create) FAILED_PRECONDITION: Field: version.deployment_uri Error: Read permissions are required for Cloud ML service account cloud-ml-service#**********.iam.gserviceaccount.com to the model file gs://path_to_model/1/saved_model.pb.
- '#type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: Read permissions are required for Cloud ML service account cloud-ml-service#**********.iam.gserviceaccount.com to the model file gs://path_to_model/1/saved_model.pb.
field: version.deployment_uri
This service account is not listed in the IAM & admin panel and does not belong to my project, so I don't want to grant permissions for this account manually.
Has anyone else also experienced this? Any suggestions on what I should do?
Additional information:
The google storage bucket has storage class regional and location europe-west1.
I already tried to disable (and re-enable) the ML Engine service with the command
gcloud services disable ml.googleapis.com
but this resulted in the following error:
ERROR: (gcloud.services.disable) The operation with ID tmo-acf.********-****-****-****-************ resulted in a failure.
Updated information:
The storage bucket does not belong to a different project.
The command
gcloud iam service-accounts get-iam-policy cloud-ml-service#**********.iam.gserviceaccount.com
gives the error:
ERROR: (gcloud.iam.service-accounts.get-iam-policy) PERMISSION_DENIED: Permission iam.serviceAccounts.getIamPolicy is required to perform this operation on service account projects/-/serviceAccounts/cloud-ml-service#**********.iam.gserviceaccount.com.
The dash in the path projects/-/serviceAccounts/... in this error message seems very wrong to me.
PROBLEM HAS BEEN SOLVED
I was finally able to disable the ML Engine service after removing all my models. After re-enabling the service I got a new service account which shows up in my IAM & admin panel and is able to access my cloud storage.
If someone finds this issue, #freeCris wrote the solution in the question. I decided to write this down as I read all the documentation in the answers to find nothing useful and then realized he wrote how to solve it in the question itself.
For those wanting to fix this, just run (make sure you don't have resources in ML Engine such as models and versions):
gcloud services disable ml.googleapis.com
And then run:
gcloud services enable ml.googleapis.com
You'll get a new service account that this time is listed in your IAM console. Just add it to your GCS bucket and it'll work now.
I think the problem was, that you tried to create the model under a different project, which was not associated with that bucket you tried to reach. So you used the service account of that different project to access the bucket, that's why it did not have any permissions and did not appear in you AMI.
If that happens again or if anybody else has that problem, you can check your projects with gcloud projects list and change it with gcloud config set project <project name>.
Yes, that service account doesn't belong to your project. You can know the service account for the Cloud ML Engine. For deploying on ML Engine, you will need to grant read access to your model files on gcs to that service account. Here is the documentation on how you can do that: https://cloud.google.com/ml-engine/docs/access-control#permissions_required_for_storage
This might also be useful: https://cloud.google.com/ml-engine/docs/working-with-data#using_a_cloud_storage_bucket_from_a_different_project