Dataproc job reading from another project storage bucket - google-cloud-platform

I've got project A with Storage buckets A_B1 and A_B2. Now Dataproc jobs running from project B needs to have read access to buckets A_B1 and A_B2. Is that possible somehow?
Motivation: project A is production environment with production data stored in Storage. Project B is "experimental" environment running experiment Spark jobs on production data. Goal is to obviously separate billing for production and experiment environment. Similar can be done with dev.

Indeed, the Dataproc cluster will be acting on behalf of a service account in project "B"; generally it'll be the default GCE service account, but this is also customizable to use any other service account you create inside of project B.
You can double check the service account name by getting the details of one of the VMs in your Dataproc cluster, for example by running:
gcloud compute instances describe my-dataproc-cluster-m
It might look something like <project-number>-compute#developer.gserviceaccount.com. Now, in your case if you already have data in A_B1 and A_B2 you would have to recursively edit the permissions on all the contents of those buckets to add access for your service account using something like gsutil -m acl ch -r -u -compute#developer.gserviceaccount.com:R gs://foo-bucket; while you're at it, you might also want to change the bucket's "default ACL" so that new objects also have that permission. This could get tedious to do for lots of projects, so if planning ahead, you could either:
Grant blanket GCS access into project A for project B's service account by adding the service account as a project member with a "Storage Reader" role
Update the buckets that might need to be shared in project A with read access and/or write/owners access by a new googlegroup you create to manage groupings of permissions. Then you can atomically add service accounts as members to your googlegroup without having to re-run a recursive update of all the objects in the bucket.

Related

gcloud app deploy behavior differs with/without bucket specified

A colleague and I have a bucket each in the same gcloud project, and are both experiencing this behavior on our respective buckets.
When I login to gcloud in local terminal and do gcloud app deploy without specifying anything, my code deploys to my bucket. If instead I do gcloud app deploy --bucket=(my bucket) a large number of files are deposited in the bucket whose names are long strings of alphanumerics. The files I want to put are compiled JS in a build folder, and these weird files seem to be all the individual JS files from the project instead. In both cases it finds the bucket fine but the first option concerns me because I worry it's only finding my bucket due to my account's permissions or something.
I'd appreciate any details anyone has on how app deploy really works because we're very confused about this. The first option appears to work but this won't do for automation and we don't want to deploy to all the buckets by accident and break everything.
gcloud app deploy uses Google Cloud Storage buckets to stage files and potentially create containers that are used by the App Engine service:
https://cloud.google.com/sdk/gcloud/reference/app/deploy#--bucket
If you don't specify a bucket using --bucket flag, defaults are used:
staging.[project-id].appspot.com
[us.]artifacts.[project-id].appspot.com
BLOBs are stored in a GCS bucket named:
[project-id].appspot.com
https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/setting-up-cloud-storage#activating_a_cloud_storage_bucket
NB If you also use Google Container Registry you may see additional buckets named *.artifacts.[project-id].appspot.com. As with the bucket used by App Engine these contain objects representing the container layers.

How can I copy an AMI to another account using Packer?

I have two AWS Accounts:
Test Account
Prod Account
I am creating an AMI using Packer in the Test Account and want to copy the AMI to the Prod Account after that.
How can I use Packer to do that and also remove the actual AMI after the job is done?
I already checked following questions but they didn't resolve my query:
How do I bulk copy AMI AWS account number permissions from one AMI image to another?
how to copy AMI from one aws account to other aws account?
You can accomplish this behavior by using the ami_users directive in packer. This will allow the specified accounts to access the created AMIs from the source account.
If you are looking to have a deep copy of the AMIs in each account (distinct IDs) then you will have to re-run packer build with credentials into the other account.
As answered above use ami_users.
The way we use this in production is, we usually have vars file for each environment in the "vars" folder. One of the value in the vars JSON file is "nonprod_account_id":"1234567890". Then in the packer.json, use ami_users as below.
"ami_users": ["{{user `nonprod_account_id`}}"]
I'm unclear on why you would want to remove the AMI from the account where it was built after copying it to another account rather than just building it in the "destination" account, unless maybe there are stronger access restrictions or something in Prod, but in that case I would question copying in an AMI built where things are "loose".
To specifically do the copying you may want this plugin.
https://github.com/martinbaillie/packer-post-processor-ami-copy
The removal from the source account might need to be "manual" or could be automated by a cleanup process that removes AMIs older than a certain period of time. As of May 2019 it is possible to create in one account and share access for both unencrypted AND encrypted AMIs (the ability to copy/utilize encrypted AMIs is the new bit compared to the other answers).
A couple Amazon posts on the new capabilities.
https://aws.amazon.com/about-aws/whats-new/2019/05/share-encrypted-amis-across-accounts-to-launch-instances-in-a-single-step/
https://aws.amazon.com/blogs/security/how-to-share-encrypted-amis-across-accounts-to-launch-encrypted-ec2-instances/
This article outlines a process of using Packer to copy an AMI between accounts rather than just referencing a source in another account, you can probably extend it to perform the cleanup.
https://www.helecloud.com/single-post/2019/03/21/How-to-overcome-AWS-Copy-AMI-boundaries-by-using-Hashicorp%E2%80%99s-Packer
This one shows an updated process from above that uses the ability to grant access across accounts to avoid creating multiple copies of the AMI, one for each account/environment where you want to utilize it.
https://www.helecloud.com/single-post/2019/11/06/Overcome-AWS-Copy-AMI-boundaries-%E2%80%93-share-encrypted-AMIs-with-Packer-%E2%80%93-follow-up

Granting datalab access to another project

I have Datalab running on one Google Cloud project (lets call it A), I have data sitting in another project (B). I'd like to grant Datalab access to this data.
I note that Datalab uses my projects "Compute Engine default" service account - I assume I can authorize this account in my second project (B) to grant Datalab access to the data within it. Is this considered the best practice approach and are there any other considerations I should keep in mind?
The right way to do is exactly what you think.
Go to 'IAM and admin' -> 'Admin' then choose your service account and a role for this project.
Keep in mind that project has been created to create security on accessibility. Sometimes having replicated data is not a bad idea. Really depends on the need.

"gcloud container clusters create" command throws "error Required 'compute.networks.get'"

I want to create GKE clusters by gcloud command. But I cannot solve this error:
$ gcloud container clusters create myproject --machine-type=n1-standard1# --zone=asia-northeast1-a
ERROR: (gcloud.container.clusters.create) ResponseError: code=403, message=Google
Compute Engine: Required 'compute.networks.get' permission for
'projects/myproject/global/networks/default'
cloud account linked to my gmail is owner of the project and relative powers, so I anticipate that there is no problem about permissions.
When you create a cluster though $ gcloud container clusters create command you should keep in mind that there are hundreds of operations hidden.
When you have the owner rights then you are able to give the initial "Kick" to the process to make everything start. At this point Service accounts starts to enter in the process and they taking care of creating all the resource for you, automatically.
These service account have different powers and permissions (that can be customised) in order to limit the attack surface in case of one of them is compromise and to keep a sort of order, you will have for example ****-compute#developer.gservuceaccount.com that is a Default compute engine service account.
When you enable different the API some of these service accounts can be created in order to make the components work as expected, but if one of them is deleted or modified you might face one of the error that you are experiencing.
Usually the easiest way to solve the issue is recreate the service account for example deleting it and disabling an enabling the corresponting API.
For example when you enable Kubernetes engine service-****#container-engine-robot-iam-gaservice account is created
In my test project for example I modified them removing the "Kubernetes Engine service Agent" permission and I modified as well the Google APIs service account setting it as a "project viewer" and I am facing permission issues both creating and deleting clusters.
You can navigate through IAM&Amin-->admin to check the status and which service accounts are at the moment authorised in your project.
Here you can find a more deep explanation of some default service accounts.
Here you can find a small guide regarding how to re-enable Kubernetes Engine's default service account:
"If you remove this role binding from the service account, the default service account becomes unbound from the project, which can prevent you from deploying applications and performing other cluster operations."

Is it possible to use s3 buckets to create and grant admin privileges on different directorys in my ec2 instance?

I have an ec2 instance that I use as sort of a staging environment for small websites and custom Wordpress websites.
What I'm trying to find out is; Can I create a bucket for /var/www/html/site1 and assign FTP access to Developer X to work on this particular site within this particular bucket?
No. Directories on your EC2 instance have no relationship with S3.*
If you want to set up permissions for files stored on your EC2 instance, you'll have to do it by making software configuration changes on that instance, just as if it were any other Linux-based server.
*: Assuming you haven't set up something weird like s3fs, which I assume isn't the case here.