I am accessing data stored in GCS bucket while running Python within a container in a GKE node within the same project.
I can run gsutil ls without problems, but when I try to access the bucket with Python, I get a permission error:
raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/xxxxxxxx/o?maxResults=1&projection=noAcl&prefix=test%2F&prettyPrint=false: Caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
I am listing the GCS bucket using the answer from #Robino in this post. For brevity, I copied it here:
import google.cloud.storage as gcs
client = gcs.Client()
BUCKET_NAME = "abc"
blobs = client.list_blobs(
BUCKET_NAME,
prefix="xyz/", # <- you need the trailing slash
delimiter="/",
max_results=1,
)
next(blobs, ...) # Force blobs to load.
The problem was that the authentication configs were not picked by python. Calling gcloud auth application-default login solved it.
Related
I have a VM (vm001) on Google Cloud and on that I have added some users. Using a user (user1) I want to copy a directory to a GCP bucket (bucket1). The following is what I do:
user1#vm001: gsutil cp -r dir_name gs://bucket1
, but I get the following error:
[Content-Type=application/octet-stream]...ResumableUploadAbortException: 403 Access denied.
I know user1 does not have access to upload files to bucket1 and I should use IAM to grant permission to it but I do not know how to do it for a user that is on VM. This video shows how we can give access using an email but I have not been able to see how we can do it for current users that are already on VM.
Note
I have added user1 using adduser on VM and I do not know how to see it on my Google Cloud Console to change its access.
I managed to replicate your error. There are two (2) ways on how to transfer your files from your VM to your GCS bucket.
You can either create a new VM or use your existing one. Before finishing your setup, go to API and identity management > Cloud API access scopes. Search for Storage and set it to Read Write.
If you're not sure which access scope to set, you can select Allow full access to all Cloud APIs. Make sure that you restrict access by setting the following permissions on your service account under your GCS bucket:
Storage Legacy Bucket Owner (roles/storage.legacyBucketOwner)
Storage Legacy Bucket Writer (roles/storage.legacyBucketWriter)
After that I started my VM and refreshed my GCS bucket and run gsutil cp -r [directory/name] gs://[bucket-name] and managed to transfer the files to my GCS bucket.
I followed the steps using this link on changing the service account and access scopes for an instance. Both steps worked out for me.
I'm running Spark 2.4 on an EC2 instance. I am assuming an IAM role and setting the key/secret key/token in the sparkSession.sparkContext.hadoopConfiguration, along with the credentials provider as "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider".
When I try to read a dataset from s3 (using s3a, which is also set in the hadoop config), I get an error that says
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 7376FE009AD36330, AWS Error Code: null, AWS Error Message: Forbidden
read command:
val myData = sparkSession.read.parquet("s3a://myBucket/myKey")
I've repeatedly checked the S3 path and it's correct. My assumed IAM role has the right privileges on the S3 bucket. The only thing I can figure at this point is that spark has some sort of hidden credential chain ordering and even though I have set the credentials in the hadoop config, it is still grabbing credentials from somewhere else (my instance profile???). But I have no way to diagnose that.
Any help is appreciated. Happy to provide any more details.
spark-submit will pick up your env vars and set them as the fs.s3a access +secret + session key, overwriting any you've already set.
If you only want to use the IAM credentials, just set fs.s3a.aws.credentials.provider to com.amazonaws.auth.InstanceProfileCredentialsProvider; it'll be the only one used
Further Reading: Troubleshooting S3A
I am trying to get a service account to create blobs in Google Cloud Storage
from within a Python script, but I am having issues with the credentials.
1) I create the service account for my project and then download the key file in json:
"home/user/.config/gcloud/service_admin.json"
2) I give the service account the necessary credentials (via gcloud in a subprocess)
roles/viewer, roles/storage.admin, roles/resourcemanager.projectCreator, roles/billing.user
Then I would like to access a bucket in GCS
from google.cloud import storage
import google.auth
credentials, project = google.auth.default()
client = storage.Client('myproject', credentials=credentials)
bucket = client.get_bucket('my_bucket')
Unfortunately, this results in:
google.api_core.exceptions.Forbidden: 403 GET
https://www.googleapis.com/storage/v1/b/my_bucket?projection=noAcl:
s_account#myproject.iam.gserviceaccount.com does not have
storage.buckets.get access to my_bucket
I have somewhat better luck if I set the environment variable
export GOOGLE_APPLICATION_CREDENTIALS="home/user/.config/gcloud/service_admin.json"
and rerun the script. However, I want it all to run in one single instance of the script that creates the accounts and continues to create the necessary files in the buckets. How can I access my_bucket if I know where my json credential file is.
Try this example from the Documentation for Server to Server Authentication:
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key file.
storage_client = storage.Client.from_service_account_json('service_account.json')
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
This way you point the file containing the key of the Service Account directly in your code.
I have tried to access files in a bucket and I keep getting access denied on the files. I can see them in the GCS console but can access them through that and cannot access them through gsutil either running the command below.
gsutil cp gs://my-bucket/folder-a/folder-b/mypdf.pdf files/
But all this returns is AccessDeniedException: 403 Forbidden
I can list all the files and such but not actually access them. I've tried adding my user to the acl but that still had no effect. All the files were uploaded from a VM through a fuse mount which worked perfectly and just lost all access.
I've checked these posts but none seem to have a solution thats helped me
Can't access resource as OWNER despite the fact I'm the owner
gsutil copy returning "AccessDeniedException: 403 Insufficient Permission" from GCE
gsutil cors set command returns 403 AccessDeniedException
Although, quite an old question. But I had a similar issue recently. After trying many options suggested here without success, I carefully re-examined my script and discovered I was getting the error as a result of a mistake in my bucket address gs://my-bucket. I fixed it and it worked perfectly!
This is quite possible. Owning a bucket grants FULL_CONTROL permission to that bucket, which includes the ability to list objects within that bucket. However, bucket permissions do not automatically imply any sort of object permissions, which means that if some other account is uploading objects and sets ACLs to be something like "private," the owner of the bucket won't have access to it (although the bucket owner can delete the object, even if they can't read it, as deleting objects is a bucket permission).
I'm not familiar with the default FUSE settings, but if I had to guess, you're using your project's system account to upload the objects, and they're set to private. That's fine. The easiest way to test that would be to run gsutil from a GCE host, where the default credentials will be the system account. If that works, you could use gsutil to switch the ACLs to something more permissive, like "project-private."
The command to do that would be:
gsutil acl set -R project-private gs://muBucketName/
tl;dr The Owner (basic) role has only a subset of the GCS permissions present in the Storage Admin (predefined) role—notably, Owners cannot access bucket metadata, list/read objects, etc. You would need to grant the Storage Admin (or another, less privileged) role to provide the needed permissions.
NOTE: This explanation applies to GCS buckets using uniform bucket-level access.
In my case, I had enabled uniform bucket-level access on an existing bucket, and found I could no longer list objects, despite being an Owner of its GCP project.
This seemed to contradict how GCP IAM permissions are inherited— organization → folder → project → resource / GCS bucket—since I expected to have Owner access at the bucket level as well.
But as it turns out, the Owner permissions were being inherited as expected, rather, they were insufficient for listing GCS objects.
The Storage Admin role has the following permissions which are not present in the Owner role: [1]
storage.buckets.get
storage.buckets.getIamPolicy
storage.buckets.setIamPolicy
storage.buckets.update
storage.multipartUploads.abort
storage.multipartUploads.create
storage.multipartUploads.list
storage.multipartUploads.listParts
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.getIamPolicy
storage.objects.list
storage.objects.setIamPolicy
storage.objects.update
This explained the seemingly strange behavior. And indeed, after granting the Storage Admin role (whereby my user was both Owner and Storage Admin), I was able to access the GCS bucket.
Footnotes
Though the documentation page Understanding roles omits the list of permissions for Owner (and other basic roles), it's possible to see this information in the GCP console:
Go to "IAM & Admin"
Go to "Roles"
Filter for "Owner"
Go to "Owner"
(See list of permissions)
I'm getting an Access Denied error with Amazon S3 and can't figure out why.
My settings are as follows:
STATIC_URL = 'http://s3.amazonaws.com/%s/' % AWS_STORAGE_BUCKET_NAME
What would cause an access denied error? I have verified that my keys are correct.
The URL you show above would resolve to a bucket within S3. In order to access that bucket successfully with such a URL, the permissions on the bucket would have to grant 'public-read' access to the bucket. In addition, each object or file within the bucket would have to grant 'public-read' access, as well.
Do you want the bucket and all content within the bucket to be readable by anyone? If so, make sure the permissions are set appropriately. Note, however, that granting 'public-read' to the bucket itself will allow anyone to list the contents of the bucket. That's usually unnecessary and probably should be avoided.
Also note that the keys (I assume you mean your AWS access key and secret key) only apply when you are accessing S3 via the API. If you simply access it with the URL via a browser, the credentials are not used in the request.