I want to know difference between gcloud and gsuitl. Where do we use what? Why certain commands begin with gsutil while others with gcloud?
The gsutil command is used only for Cloud Storage.
With the gcloud command, you can interact with other Google Cloud products like the App Engine, Google Kubernetes Engine etc. You can have a look here and here for more info.
The gsutil is a Python application that lets you access Google Cloud Storage from the command line. You can use gsutil to do a wide range of bucket and object management tasks, including:
Creating and deleting buckets.
Uploading, downloading, and deleting objects.
Listing buckets and objects.
Moving, copying, and renaming objects.
Editing object and bucket ACLs.
The gcloud command-line interface is the primary CLI tool to create and manage Google Cloud resources. You can use this tool to perform many common platform tasks either from the command line or in scripts and other automations.
For example, you can use the gcloud CLI to create and manage:
Google Compute Engine virtual machine instances and other resources,
Google Cloud SQL instances,
Google Kubernetes Engine clusters,
Google Cloud Dataproc clusters and jobs,
Google Cloud DNS managed zones and record sets,
Google Cloud Deployment manager deployments.
"gcloud" can create and manage Google Cloud resources while "gsutil" cannot do so.
"gsutil" can manipulate buckets, bucket's objects and bucket ACLs on GCS(Google Cloud Storage) while "gcloud" cannot do so.
With gcloud storage you can do now everything what you can do with gsutil. Look here: https://cloud.google.com/blog/products/storage-data-transfer/new-gcloud-storage-cli-for-your-data-transfers and also ACL on objects: https://cloud.google.com/sdk/gcloud/reference/storage/objects/update
Related
What I'm doing
I'm deploying Cloud Functions using Cloud Source Repositories as source using the gcloud command line like this:
gcloud functions deploy foo \
--region=us-east1 \
--source=<repoUrl> \
--runtime=nodejs12 \
--trigger-http
This process behinds the scene triggers Cloud Build that uses Cloud Container Registry to store its images and also creates some buckets in Cloud Storage.
Problem
The problem is one of those buckets us.artifacts.<projectName>.appspot.com that is a multi-regional storage which incurs additional charges compared to a regional storage and doesn't have a free tier to use.
The others bucket are created in the same region as the function (us-east1 in my case)
What I'd like to know
If I can change the default region for this artifacts bucket
Or, if it's not possible, what I can change in my deployment process to avoid these charges.
What I've already tried or read
Some users had similar problems and suggested a lifecycle rule to auto-clean this bucket, also in the same post some users didn't recommend to do it because it may break the build process
Here we had an answer explaining the behind the scenes of an App Engine application deployment that also creates the same bucket.
This post may solve my problem but I'd need to setup Cloud Build to trigger a build after a commit in master branch.
Thanks in advance
As I'm making changes within the GCP - Google Cloud Console, I like to capture equivalent gcloud cli commands for automation use later.
Thank you.
As mentioned by Guillaume blaquiere, you have an option on the bottom of the resource creation page for GCE, GKE, VPC Network, etc.
Look like this:
I'm trying to execute jobs in the Dataproc cluster which access several resources of GCP like Google Cloud Storage.
My concern is whatever file or object is being created through my job is owned/created by Dataproc default user.
Example - 123456789-compute#developer.gserviceaccount.com.
Is there any way I can configure this user/service-account so that the object gets created by a given user/service-account instead of default one?
You can configure service account to be used by a Dataproc cluster using flag --service-account at cluster creation time.
Gcloud command would look like:
gcloud dataproc clusters create cluster-name \
--service-account=your-service-account#project-id.iam.gserviceaccount.com
More details: https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/service-accounts
https://cloud.google.com/dataproc/docs/concepts/iam/iam
Note: it is better to have one dataproc cluster per job so that each job get isolated environment and doesnt affect each other and you can manage them better (in terms of security as well).
you can also look at GCP Composer using which you can schedule jobs and automate them.
Hope this helps.
How we can find the details programmatically about GCP Infrastructure like various Folders, Projects, Compute Instances, datasets etc. which can help to have a better understanding of GCP platform.
Regards,
Neeraj
There is a service in GCP called Cloud Asset Inventory. Cloud Asset Inventory is a storage service that keeps a five week history of Google Cloud Platform (GCP) asset metadata.
It allows you to export all asset metadata at a certain timestamp to Google Cloud Storage or BigQuery.
It also allows you to search resources and IAM policies.
It supports a wide range of resource types, including:
Resource Manager
google.cloud.resourcemanager.Organization
google.cloud.resourcemanager.Folder
google.cloud.resourcemanager.Project
Compute Engine
google.compute.Autoscaler
google.compute.BackendBucket
google.compute.BackendService
google.compute.Disk
google.compute.Firewall
google.compute.HealthCheck
google.compute.Image
google.compute.Instance
google.compute.InstanceGroup
...
Cloud Storage
google.cloud.storage.Bucket
BigQuery
google.cloud.bigquery.Dataset
google.cloud.bigquery.Table
Find the full list here.
The equivalent service in AWS is called AWS Config.
I have found open source tool named as "forseti Security", which is easy to install and use. It has 5 major components in it.
Inventory : Regularly collects the data from GCP and store the results in cloudSQL under the table “gcp_inventory”. In order to refer to the latest inventory information you can refer to the max value of column : inventory_index_id.
Scanner : It periodically compares the policies applied on GCP resources with the data collected from Inventory. It stores the scanner information in table “scanner_index”
Explain : it helps to manage the cloud IAM policies.
Enforcer : This component use Google Cloud API to enforce the policies you have set in GCP platform.
Notifier : It helps to send notifications to Slack, Cloud Storage or SendGrid as show in Architecture diagram above.
You can find the official documentation here.
I tried using this tool and found it really useful.
I am currently working on Google Cloud Platform to run Spark Jobs in the cloud. To do so, I am planning to use Google Cloud Dataproc.
Here's the work flow I am automatising :
Upload a csv file on Google Cloud Storage which will be the input of my Spark job
On upload, trigger a Google Cloud Functions which should create the cluster, submit a job and shutdown the cluster though the HTTP API available for Dataproc
I am able to create a cluster from my Google Cloud Function using the google apis nodejs client (http://google.github.io/google-api-nodejs-client/latest/dataproc.html). But the problem is that I cannot see this cluster on the Dataproc cluster viewer or even by using the Gcloud sdk : gcloud dataproc clusters list.
However, I am able to see my newly created cluster on Google Api explorer : https://developers.google.com/apis-explorer/#p/dataproc/v1/dataproc.projects.regions.clusters.list.
Note that I am creating my cluster in the current project.
What can I possibly do wrong not to be able to see that cluster when listing with gcloud sdk ?
Thank you in advance for your help.
Regards.
I bet it has to do with "region" field. Out of the box Cloud SDK defaults to "global" region [1]. Try using dataproc Cloud SDK commands with --region flag (e.g., gcloud dataproc clusters list --region)
[1] https://cloud.google.com/dataproc/docs/concepts/regional-endpoints