Is there a way to see CPU utilisation per configured instance ?
Currently I only see one CPU utilisation no matter if 1 or 3 instances of Google Cloud Spanner are configured.
This would be interesting to get a glimpse if our data-structure balances load evenly.
Thanks,
Christian
Today you can see the CPU usage of each instance by navigating to that instance's page in the console. Cloud Spanner doesn't separate the usage by node though, so you do only get a single figure for, say, a 3-node instance. Generally, Spanner will take care of balancing data across nodes to achieve even distribution.
Instance- and database-level utilization is also available through Google Stackdriver. Look for "spanner_instance" under the Metrics Explorer.
Related
I'm working on a video rendering server in Node.js. It spawns multiple headless chrome and uses puppeteer library to capture the screenshot and feed them to ffmpeg. Later it concats all the parts with some post processing.
Now I want to move it to production but the it's not performing efficiently.
Tried serverless architecture and cloudrun etc but still unable to achieve it. Also they clearly mention that these are not meant for heavy and long running tasks. The video is taking too much to get rendered and even longer than time my laptop takes.
I tried using GCE, the results are satisfactory but now I'm having hard time to scale it. Actually the server can only handle one request at a time efficiently. How to scale horizontally and make sure that each get only one request at a time?
Thanks in advance.
To scale up number of identical instances you can use Managed Instance Groups. Have a look at the autoscaling documentation to get better understanding how it works but basically it says:
You can autoscale based on one or more of the following metrics that reflect the load of the instance group:
Average CPU utilization.
HTTP load balancing serving capacity, which can be based on either utilization or requests per second.
Cloud Monitoring metrics.
If you will be autoscaling based on the CPI uage the just enable autoscaling and set it up when creating a new group of instances;
Here's an example gcloud command to do this:
gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
--max-num-replicas 20 \
--target-cpu-utilization 0.60 \
--cool-down-period 90
You can also use any available metric to scale up your group or even create a new custom metric that will trigger scaling up your group;
You can create custom metrics using Cloud Monitoring and write your own monitoring data to the Monitoring service. This gives you side-by-side access to standard Google Cloud data and your custom monitoring data, with a familiar data structure and consistent query syntax. If you have a custom metric, you can choose to scale based on the data from these metrics.
And last - I've found this example use case that scales up group of VM's based on pub/sub queue which might be the solution you're looking for.
I keep finding old posts about an on demand pricing option for Cloud SQL, but I see no more references in Google's documentation on how to enable it? Did this feature go away?
If so, can someone provide me a recommendation on an alternative for hosting a MySQL DB? Cloud SQL is just too much if I have to always be running the instance.
As per the GCP doc Cloud SQL for MySQL pricing is charged for every second. This means that each second of usage counts towards a full billable minute. For example, If you use an instance for 1.5 seconds or 2.49 seconds, in both cases you are billed for 2 seconds.
We have implemented kube-state metrics (by following the steps mentioned in this article section 4.4.1 Install monitoring components) on one of our kubernetes clusters on GCP. So basically it created 3 new deployments node-exporter, prometheus-k8s and kube-state metrics on our cluster. After that, we were able to see all metrics inside Metric Explorer with prefix "external/prometheus/".
In order to check External metrics pricing, we referred to this link. Hence, we calculated the price accordingly but when we received the bill it's a shocking figure. GCP has charged a lot of amount but we haven't added any single metric in dashboard or not set monitoring for anything. From the ingested volume (which is around 1.38GB/day), it looks these monitoring tools do some background job (at specific time it reads some metrics or so) which consumed this volume and we received this bill.
We would like to understand how these kube-state metrics monitoring components work. Will it automatically get metrics data and increase the ingested volume and bill in such way or there is any mis-configuration in its setup?
Any guidance on this would be really appreciated!
Thank you.
By default, when implemented, kube-state-metrics exposes several metrics for events across your cluster:
If you have a number of frequently-updating resources on your cluster, you may find that a lot of data is ingested into these metrics which incurs high costs.
You need to configure what metrics you'd like to expose, as well as consult the documentation for your Kubernetes environment in order to avoid unexpected high costs.
I am using google cloud datafow. Some of my data pipelines needs to be optimized. I need to understand how workers are performing in the dataflow cluster on these lines .
1. How much memory is being used ?
Currently I am logging memory usage using java code .
2. Is there a bottleneck on the disk operations ? To understand whether a SSD is required ?
3. Is there a bottleneck in Vcpus ? So as to increase the Vcpus in workers nodes.
I know stackdriver can be used to monitor Cpu and disk usage for the cluster. However it does not provide information on individual workers and also on whether we are hitting the bottle neck in these.
Within the Dataflow Stackdriver UI, you are correct, you cannot view the individual worker's metrics. However, you can certainly setup a Stackdriver Dashboard which gives you the invdividual worker metrics for all of what you have mention. Below is a sample dashboard which shows metrics for CPU, Memory, Network, Read IOPs, and Write IOPS.
Since the Dataflow job name will be part of the GCE instance name, here I filter down the GCE instances being monitored by the job name I'm interested in. In this case, my Dataflow job was named "pubsub-to-bigquery", so I filtered down to instance_name ~= pubsub-to-bigquery.*. I did a regex filter to be sure I captured any job names which may be suffixed with additional data in future runs. Setting up a dashboard such as this can inform you when you'd actually benefit from SSDs, more network bandwidth, etc.
Also be sure to check the Dataflow job graph in the cloud console when looking to optimize your pipeline. The wall time below the step name can give a good indication on what custom transforms or dofns should be targeted for optimization.
Single-Region Spanner is advertised with a 99.99% availability SLA. In the US-based configuration, there will be exactly three replicas per node, all in Council Bluffs, Iowa. Can you share information that breaks down why the 99.99% (~one hour of downtime per year) is believable, especially in the case of geographically-local disasters? I assume that Google has done a thorough analysis, or else it would not advertise the SLA, but I cannot find a detailed paper.
In the event of a regional failure, what recovery procedures will Google carry out and with what recovery time / expected data loss?
(I understand that multi-region may be available, and have seen some pricing data, but will not discuss this here).
Spanner automatically replicates data for high availability. As you stated, regional instances have three full copies of data. The key is that they are replicated across three zones within the region which have independent power, cooling, networking, etc. Zones generally fail independently for each other, so your other replicas can continue serving reads and writes even if one zone goes down. Multi-region provides even greater availability by replicating across regions.
Zonal failures are very rare and would be transparent to your application; Cloud Spanner automatically reroutes requests to replicas that are able to serve the request. It would be even rarer for a region to go down with data loss. Google takes many measures against disasters.
Further out we will expose managed backups, but these would still be stored within Google data centers. We're also working on a Dataflow connector to help you import/export data should you want to manage your own backups.