gcloud app deploy behavior differs with/without bucket specified - google-cloud-platform

A colleague and I have a bucket each in the same gcloud project, and are both experiencing this behavior on our respective buckets.
When I login to gcloud in local terminal and do gcloud app deploy without specifying anything, my code deploys to my bucket. If instead I do gcloud app deploy --bucket=(my bucket) a large number of files are deposited in the bucket whose names are long strings of alphanumerics. The files I want to put are compiled JS in a build folder, and these weird files seem to be all the individual JS files from the project instead. In both cases it finds the bucket fine but the first option concerns me because I worry it's only finding my bucket due to my account's permissions or something.
I'd appreciate any details anyone has on how app deploy really works because we're very confused about this. The first option appears to work but this won't do for automation and we don't want to deploy to all the buckets by accident and break everything.

gcloud app deploy uses Google Cloud Storage buckets to stage files and potentially create containers that are used by the App Engine service:
https://cloud.google.com/sdk/gcloud/reference/app/deploy#--bucket
If you don't specify a bucket using --bucket flag, defaults are used:
staging.[project-id].appspot.com
[us.]artifacts.[project-id].appspot.com
BLOBs are stored in a GCS bucket named:
[project-id].appspot.com
https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/setting-up-cloud-storage#activating_a_cloud_storage_bucket
NB If you also use Google Container Registry you may see additional buckets named *.artifacts.[project-id].appspot.com. As with the bucket used by App Engine these contain objects representing the container layers.

Related

Identifying user from AWS Sagemaker Studio generated EFS storage

When a sagemaker studio domain is created. An EFS storage is associated with the domain. As the assigned users log into Sagemaker studio, a corresponding home directory is created.
Using a separate EC2 instance, I mounted the EFS storage that was created to try to see whether is it possible to look at each of the individual home domains. I noticed that each of these home directories are shown in terms of numbers (e.g 200000, 200005). Is there a specific rule on how this folders are named? Is it possible to trace the folders back to a particular user or whether this is done by design?
(currently doing exploration on my personal aws account)
Yes, if you list and describe the domain users, you'll get back the user's HomeEfsFileSystemUid value.
Here's a CLI example:
aws sagemaker describe-user-profile --domain-id d-lcn1vbt47yku --user-profile-name default-1588670743757
{
...
"UserProfileName": "default-1588670743757",
"HomeEfsFileSystemUid": "200005",
...
}

Can I change default Cloud Storage bucket region used to store artifacts while deploying Cloud Functions?

What I'm doing
I'm deploying Cloud Functions using Cloud Source Repositories as source using the gcloud command line like this:
gcloud functions deploy foo \
--region=us-east1 \
--source=<repoUrl> \
--runtime=nodejs12 \
--trigger-http
This process behinds the scene triggers Cloud Build that uses Cloud Container Registry to store its images and also creates some buckets in Cloud Storage.
Problem
The problem is one of those buckets us.artifacts.<projectName>.appspot.com that is a multi-regional storage which incurs additional charges compared to a regional storage and doesn't have a free tier to use.
The others bucket are created in the same region as the function (us-east1 in my case)
What I'd like to know
If I can change the default region for this artifacts bucket
Or, if it's not possible, what I can change in my deployment process to avoid these charges.
What I've already tried or read
Some users had similar problems and suggested a lifecycle rule to auto-clean this bucket, also in the same post some users didn't recommend to do it because it may break the build process
Here we had an answer explaining the behind the scenes of an App Engine application deployment that also creates the same bucket.
This post may solve my problem but I'd need to setup Cloud Build to trigger a build after a commit in master branch.
Thanks in advance

AWS storage architecture help is needed

I have recently joined a company who have a single working AWS environment, manually created in the console, running a few Java apps that use data in an S3 bucket using a Boto-esque library.
I want to use Terraform create arbitrary clones of the environment on demand, in different AWS accounts, and I am stuck on copying big S3 buckets (~600Gb of files).
How have other people solved this problem in the past?

Dataproc job reading from another project storage bucket

I've got project A with Storage buckets A_B1 and A_B2. Now Dataproc jobs running from project B needs to have read access to buckets A_B1 and A_B2. Is that possible somehow?
Motivation: project A is production environment with production data stored in Storage. Project B is "experimental" environment running experiment Spark jobs on production data. Goal is to obviously separate billing for production and experiment environment. Similar can be done with dev.
Indeed, the Dataproc cluster will be acting on behalf of a service account in project "B"; generally it'll be the default GCE service account, but this is also customizable to use any other service account you create inside of project B.
You can double check the service account name by getting the details of one of the VMs in your Dataproc cluster, for example by running:
gcloud compute instances describe my-dataproc-cluster-m
It might look something like <project-number>-compute#developer.gserviceaccount.com. Now, in your case if you already have data in A_B1 and A_B2 you would have to recursively edit the permissions on all the contents of those buckets to add access for your service account using something like gsutil -m acl ch -r -u -compute#developer.gserviceaccount.com:R gs://foo-bucket; while you're at it, you might also want to change the bucket's "default ACL" so that new objects also have that permission. This could get tedious to do for lots of projects, so if planning ahead, you could either:
Grant blanket GCS access into project A for project B's service account by adding the service account as a project member with a "Storage Reader" role
Update the buckets that might need to be shared in project A with read access and/or write/owners access by a new googlegroup you create to manage groupings of permissions. Then you can atomically add service accounts as members to your googlegroup without having to re-run a recursive update of all the objects in the bucket.

CodeDeploy to S3

I have a site in a S3 bucket, configured for web access, for which I run an aws s3 sync command every time I push on a specific git repository (I'm using Gitlab at the moment).
So if I push to stable branch, a Gitlab runner performs the npm start build command for building the site, and then aws s3 sync to synchronize to a specific bucket.
I want to migrate to CodeCommit and use pure AWS tools to do the same.
So far I was able to successfully setup the repository, create a CodeBuild for building the artifact, and the artifact is being stored (not deployed) to a S3 bucket. Difference is that I can't get it to deploy to the root folder of the bucket instead of a subfolder, seems like the process is not made for that. I need it to be on a root folder because of how the web access is configured.
For the deployment process, I was taking a look at CodeDeploy but it doesn't actually let me deploy to S3 bucket, it only uses the bucket as an intermediary for deployment to a EC2 instance. So far I get the feeling CodeDeploy is useful only for deployments involving EC2.
This tutorial with a similar requirement to mine, uses CodePipeline and CodeBuild, but the deployment step is actually a aws s3 sync command (same as I was doing on Gitlab), and the actual deployment step on CodePipeline is disabled.
I was looking into a solution which involves using AWS features made for this specific purpose, but I can't find any.
I'm also aware of LambCI, but to me looks like what CodePipeline / CodeBuild is doing, storing artifacts (not deploying to the root folder of the bucket). Plus, I'm looking for an option which doesn't require me to learn or deploy new configuration files (outside AWS config files).
Is this possible with the current state of AWS features?
Today AWS has announced as a new feature the ability to target S3 in the deployment stage of CodePipeline. The announcement is here, and the documentation contains a tutorial available here.
Using your CodeBuild/CodePipeline approach, you should now be able to choose S3 as the deployment provider in the deployment stage rather than performing the sync in your build script. To configure the phase, you provide an S3 bucket name, specify whether to extract the contents of the artifact zip, and if so provide an optional path for the extraction. This should allow you to deploy your content directly to the root of a bucket by omitting the path.
I was dealing with similar issue and as far as I was able to find out, there is no service which is suitable for deploying app to S3.
AWS CodeDeploy is indeed for deploying code running as server.
My solution was to use CodePipeline with three stages:
Source which takes source code from AWS CodeCommit
Build with AWS CodeBuild
Custom lambda function which after successful build takes artifact from S3 artifact storage, unzip it and copies files to my S3 website host.
I used this AWS lambda function from SeamusJ https://github.com/SeamusJ/deploy-build-to-s3
Several changes had to be made, I used node-unzip-2 instead of unzip-stream for unziping artifict from s3.
Also I had to change ACLs in website.ts file
Uploading from CodeBuild is currently the best solution available.
There's some suggestions on how to orchestrate this deployment via CodePipeline in this answer.