How to get hardware information from VMs? - google-cloud-platform

We are migrating our production environment from DigitalOcean to GCP.
However, because it is different, we don't know where to get some information about our VMs.
Is it possible to have a report that tells me the amount of CPUs, Machine Type, amount of RAM, amount of SSD and amount of SSD used by VM?

Compute Engine lets you export detailed reports of your Compute Engine usage (daily & monthly) to a Cloud Storage bucket using the usage export feature. Usage reports provide information about the lifetime of your resources.
VM instance insights help you understand the CPU, memory, and network usage of your Compute Engine VMs.
As #Dharmaraj mentioned in the comment, GCP introduced a new observability tab designed to give insights into common scenarios and issues associated with CPU, Disk, Memory, Networking, and live processes. With access to all of this data in one location, you can easily correlate between signals over a given time frame.
Finally, the Stackdriver agent can be installed on GCE VMs, allowing additional metrics like memory monitoring. You can also use Stackdriver's notification and alerting features. However, premium-tier accounts are the only ones that can access agent metrics.

Related

Can google access data on compute engine virtual machine?

I'm using an always-free VM on Google Cloud (e2-micro). When creating the instance, there's an option Enable Confidential Computing service, but that requires n2d machine, not part of the always-free resources.
Does that mean Google can read my VM's data?
In other words, without that option enabled, what can Google read on my VM?
I'm not worried about system health monitoring data. I'm only concerned with files and folders that I put there.
Google has written policies that describe what they can access and when. Google also provides the ability to log their access.
Confidential Computing is a different type of technology that is not related to Google accessing your data.
Start with this page which provides additional links:
Creating trust through transparency
This Whitepaper is a good read. Page 9 answers your question:
Trusting your data with Google Cloud Platform
You may have heard of Encryption in Transit, or Encryption at Rest. Confidential Computing just encrypts data while it's being processed within the VM as well (Encryption during Processing?).
You need to use n2d machine types because it uses tech/features available on the AMD EPYC procs.
A Confidential Virtual Machine (Confidential VM) is a type of N2D Compute Engine VM running on hosts based on the second generation of AMD Epyc processors, code-named "Rome." Using AMD Secure Encrypted Virtualization (SEV), Confidential VM features built-in optimization of both performance and security for enterprise-class high memory workloads, as well as inline memory encryption that doesn't introduce significant performance penalty to those workloads.
You can select the Confidential VM service when creating a new VM using the Google Cloud Console, the Compute Engine API, or the gcloud command-line tool.
You can find more details here.
You can check their privacy document here.

How to I can know performance indicators on the AWS instance?

How to I can know, for example, RAM size on the AWS instance?
I select View details on the needed instance, but I can't find any needed information about the productivity (CPU speed, Storage size, RAM):
CPU usage for your instance is reported in CloudWatch by default. The RAM and disk volume usage is only known by the operating system running on the instance, so you have to either login to the instance to check that, or you can install the AWS CloudWatch Agent on the instance to have those values reported to CloudWatch.

Can't create GPU instances on GCE

I am trying to create a GPU instance (n1-standard-2 with 1 NVIDIA T4 GPU) on Compute Engine and I have been getting this error since yesterday:
Operation type [insert] failed with message "The zone 'projects/deep-learning-xxxx/zones/us-central1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later."
It seems that this region of Google Cloud doesn't have enough GPU resources, but I am getting the same error with other zones too, and after trying multiple times. Regular non-GPU instances are working fine though. I am trying to figure out if I'm doing something wrong or if there is a just a huge demand for GPU instances on GCP right now.
The reasons for GPU not being created on a VM in a particular region/zone can be,
1.Resource Unavailability. Check Resource availability here GPU availability across regions and zones.
2.Quota overuse can restrict the creation of GPUs. Refer Checking project quota for details.
3.Few GCP Restrictions, you can refer to the list of Restrictions here.
You can Check GPU Quota in Create VM with GPU's
Alternatively, GCP offers a feature called Reserving Compute Engine zonal resources to ensure that your project has resources for future use.
Finally, I was able to launch a preemptible GPU instance without a problem. So it really seems like Google Cloud doesn't have enough GPU resources to reserve an on-demand GPU VM at the moment.

How to monitor Google Cloud Platform (GCP) costs on an hourly basis?

I am running a VM instance on GCP (actually a ready Deep Learning Package: 8 CPUs, 1 Tesla V100 GPU, ..., access via a Jupyter Notebook).
Is there a way to monitor the overall usage and costs in real-time?
I am thinking about a "Live usage" link inside https://console.cloud.google.com/, which shows which products are currently used, and their price per second/hour.
I think it is not possible to monitor the services usage per second/hour. In case you want to analyze your projects bills, GCP offers several options that you can use for this matter, such as Billng Cicles, Billing Reports, Export Billing Data to a File or BigQuery and Visualize your spend with Data Studio; however, it is important to keep in mind that these alternatives may require certain amount of time to reflect each service usage.
Additionally, you can use the Cloud Billing Catalog API to get the list of all the public services and SKUs metadata in a programmatic, real-time way that can be used as a complement of the cost management tools mentioned above to reconcile list pricing rates.

Monitoring works or identifying bottlenecks in data pipeline

I am using google cloud datafow. Some of my data pipelines needs to be optimized. I need to understand how workers are performing in the dataflow cluster on these lines .
1. How much memory is being used ?
Currently I am logging memory usage using java code .
2. Is there a bottleneck on the disk operations ? To understand whether a SSD is required ?
3. Is there a bottleneck in Vcpus ? So as to increase the Vcpus in workers nodes.
I know stackdriver can be used to monitor Cpu and disk usage for the cluster. However it does not provide information on individual workers and also on whether we are hitting the bottle neck in these.
Within the Dataflow Stackdriver UI, you are correct, you cannot view the individual worker's metrics. However, you can certainly setup a Stackdriver Dashboard which gives you the invdividual worker metrics for all of what you have mention. Below is a sample dashboard which shows metrics for CPU, Memory, Network, Read IOPs, and Write IOPS.
Since the Dataflow job name will be part of the GCE instance name, here I filter down the GCE instances being monitored by the job name I'm interested in. In this case, my Dataflow job was named "pubsub-to-bigquery", so I filtered down to instance_name ~= pubsub-to-bigquery.*. I did a regex filter to be sure I captured any job names which may be suffixed with additional data in future runs. Setting up a dashboard such as this can inform you when you'd actually benefit from SSDs, more network bandwidth, etc.
Also be sure to check the Dataflow job graph in the cloud console when looking to optimize your pipeline. The wall time below the step name can give a good indication on what custom transforms or dofns should be targeted for optimization.