I am trying to access GCS from pyspark running in docker and for that I have the json file which copied in docker container .
spark._jsc.hadoopConfiguration().set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
spark._jsc.hadoopConfiguration().set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
# This is required if you are using service account and set true,
spark._jsc.hadoopConfiguration().set("fs.gs.auth.service.account.enable", "true")
then setting up the GOOGLE_APPLICATION_CREDENTIALS like below -
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS']=r"abc.json"
Now when trying to access gcs objects like below it's throwing error -
df= spark.read.csv("gs://integration-o9/california_housing_train.csv")
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "**111111-compute#developer.gserviceaccount.com** does not have storage.objects.get access to the Google Cloud Storage object. Permission 'stora
ge.objects.get' denied on resource (or it may not exist).",
"reason" : "forbidden"
} ],
"message" : "**111111-compute#developer.gserviceaccount.com** does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage
.objects.get' denied on resource (or it may not exist)."
}
This is not the service account mentioned in the Json file , but if I set this via below -
export GOOGLE_APPLICATION_CREDENTIALS=abc.json
it's working fine
Any advice what's to look into , need to make it work via os.environ properties
Posting the answer here as below is working (need to set in the config part with .p12 key)
spark=SparkSession.builder.appName("test").config("google.cloud.auth.service.account.enable", "true").config("google.cloud.auth.service.account.email", "o9int-759#integration-3-344806.iam.gserviceaccount.com").config("google.cloud.auth.service.account.keyfile", "/path/file.p12").getOrCreate()
Related
I'm trying to use BigQuery Storage Read API. As far as I can tell, the local script is using the an account, that has Owner role, BigQuery user, and BigQuery read session on the entire project. However, running the code from the local machine yields this error:
google.api_core.exceptions.PermissionDenied: 403 request failed: the user does not have 'bigquery.readsessions.create' permission for 'projects/xyz'
According to the GCP documentation the API is enabled by default. So the only reason I can think of is my script is using the wrong account.
How would you go debugging this issue? Is there a way to know for sure which user/account is running a python code on run time, something like print(user.user_name)
There is a gcloud command to get the current user permissions
$ gcloud projects get-iam-policy [PROJECT_ID]
You can also check the user_email field of your job to find out which user it is using to execute your query.
Example:
{
# ...
"user_email": "myemail#company.com",
"configuration": {
# ...
"jobType": QUERY
},
},
"jobReference": {
"projectId": "my-project",
# ...
}
I'm trying to find my way with Google Cloud.
I have a Debian VM Instance that I am running a server on. It is installed and working via SSH Connection in a browser window. The command to start the server is "./ninjamsrv config-file-path.cfg"
I have the config file in my default google firebase storage bucket as I will need to update it regularly.
I want to start the server referencing the cfg file in the bucket, e.g:
"./ninjamsrv gs://my-bucket/ninjam-config.cfg"
But the file is not found:
error opening configfile 'gs://my-bucket/ninjam-config.cfg'
Error loading config file!
However if I run:
"gsutil acl get gs://my-bucket/"
I see:
[
{
"entity": "project-editors-XXXXX",
"projectTeam": {
"projectNumber": "XXXXX",
"team": "editors"
},
"role": "OWNER"
},
{
"entity": "project-owners-XXXXX",
"projectTeam": {
"projectNumber": "XXXXX",
"team": "owners"
},
"role": "OWNER"
},
{
"entity": "project-viewers-XXXXX",
"projectTeam": {
"projectNumber": "XXXXX",
"team": "viewers"
},
"role": "READER"
}
]
Can anyone advise what I am doing wrong here? Thanks
The first thing to verify is if indeed the error thrown is a permission one. Checking the logs related to the VM’s operations will certainly provide more details in that aspect, and a 403 error code would confirm if this is a permission issue. If the VM is a Compute Engine one, you can refer to this documentation about logging.
If the error is indeed a permission one, then you should verify if the permissions for this object are set as “fine-grained” access. This would mean that each object would have its own set of permissions, regardless of the bucket-level access set. You can read more about this here. You could either change the level of access to “uniform” which would grant access to all objects in the relevant bucket, or make the appropriate permissions change for this particular object.
If the issue is not a permission one, then I would recommend trying to start the server from the same .cfg file hosted on the local directory of the VM. This might point the error at the file itself, and not its hosting on Cloud Storage. In case the server starts successfully from there, you may want to re-upload the file to GCS in case the file got corrupted during the initial upload.
I'm pretty new on Google cloud, and I'm trying to use GCloud command line, and I faced the following problem
Error: Forbidden access to resources.
Raw response:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "forbidden",
"message": "Required 'compute.regions.get' permission for 'projects/$project_id/regions/us-central1'"
}
],
"code": 403,
"message": "Required 'compute.regions.get' permission for 'projects/$project_id/regions/us-central1'"
}
}
Can someone help?
Much appreciated
To troubleshoot your issue, please try following:
Where are you running the command: Cloud Shell, Local environment?
If it is local environment, try Cloud Shell instead.
Check that you are using the latest version of gcloud sdk 262.
Did you properly initialize gcloud?
Can you confirm that you have appropriate role to run the command, like editior/owner?
Check if you are using that same location for your products
If above steps don't work, can you share your complete gcloud command to have more context?
Oh, I see where the problem is! When I created the storage, I put the region as "Asia". When I configured it via gcloud init, I put it as "us-central1-a". The "Permission denied" means in this context, I have no permission to access another server region. It is misleading in terms of thinking out the cloud scope. However, the Pawel's answer is more comprehensive, and it is a very good start to lead you to the correct direction.
I'm following the instructions in the Cloud Data Fusion sample tutorial and everything seems to work fine, until I try to run the pipeline right at the end. Cloud Data Fusion Service API permissions are set for the Google managed Service account as per the instructions. The pipeline preview function works without any issues.
However, when I deploy and run the pipeline it fails after a couple of minutes. Shortly after the status changes from provisioning to running the pipeline stops with the following permissions error:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X.",
"reason" : "forbidden"
} ],
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X."
}
xxxxxxxxxxx-compute#developer.gserviceaccount.com is the default Compute Engine service account for my project.
"Project X" is not one of mine though, I've no idea why the pipeline startup code is trying to create a bucket there, it does successfully create temporary buckets ( one called df-xxx and one called dataproc-xxx) in my project before it fails.
I've tried this with two separate accounts and get the same error in both places. I had tried adding storage/admin roles to the various service accounts to no avail but that was before I realized it was attempting to access a different project entirely.
I believe I was able to reproduce this. What's happening is that the BigQuery Source plugin first creates a temporary working GCS bucket to export the data to, and I suspect it is attempting to create it in the Dataset Project ID by default, instead of your own project as it should.
As a workaround, create a GCS bucket in your account, and then in the BigQuery Source configuration of your pipeline, set the "Temporary Bucket Name" configuration to "gs://<your-bucket-name>"
You are missing setting up permissions steps after you create an instance. The instructions to give your service account right permissions is in this page https://cloud.google.com/data-fusion/docs/how-to/create-instance
I am testing my backup procedure for an API, in my API Gateway.
So, I am exporting my API from the API Console within my AWS account, I then go into API Gateway and create a new API - "import from swagger".
I paste my exported definition in and create, which throws tons of errors.
From my reading - it seems this is a known issue / pain.
I suspect the reason for the error(s) are because I use a custom authorizer;
"security" : [ {
"TestAuthorizer" : [ ]
}, {
"api_key" : [ ]
} ]
I use this on each method, hence, I get a lot of errors.
The weird thing is that I can clone this API perfectly fine, hence, I assumed that I could take an exported definition and re-import without issues.
Any ideas on how I can correct these errors (preferably within my API gateway, so that I can export / import with no issues).
An example of one of my GET methods using this authorizer is:
"/api/example" : {
"get" : {
"produces" : [ "application/json" ],
"parameters" : [ {
"name" : "Authorization",
"in" : "header",
"required" : true,
"type" : "string"
} ],
"responses" : {
"200" : {
"description" : "200 response",
"schema" : {
"$ref" : "#/definitions/exampleModel"
},
"headers" : {
"Access-Control-Allow-Origin" : {
"type" : "string"
}
}
}
},
"security" : [ {
"TestAuthorizer" : [ ]
}, {
"api_key" : [ ]
} ]
}
Thanks in advance
UPDATE
The error(s) that I get when importing a definition I had just exported are:
Your API was not imported due to errors in the Swagger file.
Unable to put method 'GET' on resource at path '/api/v1/MethodName': Invalid authorizer ID specified. Setting the authorization type to CUSTOM or COGNITO_USER_POOLS requires a valid authorizer.
I get the message for each method in my API - so there is a lot.
Additionality, right at the end of the message, I get this:
Additionally, these warnings were found:
Unable to create authorizer from security definition: 'TestAuthorizer'. Extension x-amazon-apigateway-authorizer is required. Any methods with security: 'TestAuthorizer' will not be created. If this security definition is not a configured authorizer, remove the x-amazon-apigateway-authtype extension and it will be ignored.
I have tried with Ignoring the errors, same result.
Make sure you are exporting your swagger with both integrations and authorizers extensions.
Try exporting your swagger using AWS CLI:
aws apigateway get-export \
--parameters '{"extensions":"integrations,authorizers"}' \
--rest-api-id {api_id} \
--stage-name {stage_name} \
--export-type swagger swagger.json
The output will be sent to swagger.json file.
For more details about swagger custom extensions see this.
For anyone that may come across this issue.
After LOTS of troubleshooting and eventually involving the AWS Support Team, this has been resolved and identified as an AWS CLI client bug (confirmed from AWS Support Team);
Final response.
Thank you for providing the details requested. After going through the AWS CLI version and error details, I can confirm the error is because of known issue with Powershell AWS CLI. I apologize for inconvenience caused due to the error. To get around the error I recommend going through the following steps
Create a file named data.json in the current directory where the powershell command is to be executed
Save the following contents to file {"extensions":"authorizers,integrations"}
In Powershell console, ensure the current working directory is the same as the location where data.json is present
Execute the following command aws apigateway get-export --parameters file://data.json --rest-api-id APIID --stage-name dev --export-type swagger C:\temp\export.json
Using this, finally resolved my issue - I look forward to the fix in one of the upcoming versions.
PS - this is currently on the latest version:
aws --version
aws-cli/1.11.44 Python/2.7.9 Windows/8 botocore/1.5.7