How to understand errors(combined) at Google Spanner Monitor? - google-cloud-platform

Google Spanner monitor provides helpful information about databases and instance. Operation per seconds view contains errors(combined) measure that is not clear for me.
How to understand errors(combined)?

You can make a dashboard in Stackdriver (https://app.google.stackdriver.com) that will break down the errors slightly. We're working on a resources page for Cloud Spanner right now that will actually break them down by error code, but before that, you can go to Resources > Metrics Explorer and filter by response status:
You'll occasionally get error responses using the Cloud Spanner API; FAILED_PRECONDITION is somewhat common if you have a lot of transactions happening simultaneously that invalidate other transactions.

Related

General guidance around Bigtable performance

I'm using a single node Bigtable cluster for my sample application running on GKE. Autoscaling feature has been incorporated within the client code.
Sometimes I experience slowness (>80ms) for the GET calls. In order to investigate it further, I need some clarity around the following Bigtable behaviour.
I have cached the Bigtable table object to ensure faster GET calls. Is the table object persistent on GKE? I have learned that objects are not persistent on Cloud Function. Do we expect any similar behaviour on GKE?
I'm using service account authentication but how frequently auth tokens get refreshed? I have seen frequent refresh logs for gRPC Java client. I think Bigtable won't be able to serve the requests over this token refreshing period (4-5 seconds).
What if client machine/instance doesn't scale enough? Will it cause slowness for GET calls?
Bigtable client libraries use connection pooling. How frequently connections/channels close itself? I have learned that connections are closed after minutes of inactivity (>15 minutes or so).
I'm planning to read only needed columns instead of entire row. This can be achieved by specifying the rowkey as well as column qualifier filter. Can I expect some performance improvement by not reading the entire row?
According to GCP official docs you can get here the cause of slower performance of Bigtable. I would like to suggest you to go through the docs that might be helpful. Also you can see Troubleshooting performance issues.

BigQuery API Listed Twice in APIs & Services Dashboard

does anyone happen to know why the BigQuery API would be listed twice in the APIs & Services Dashboard in Google Clout Platform?
BigQuery seems to be functioning properly I just thought it was strange this is the only API that seems to be listed twice.. I don't think it could be enabled twice as both the links lead to the same overview page and all the metrics are the same.
Duplicate Bigquery API listed in dashboard
This behavior is apparently caused by the fact that bigquery-json.googleapis.com is an alias for bigquery.googleapis.com.
The BigQuery engineering team is aware of this issue and are working on resolving it. All further updates should occur on this Public report.

Google Cloud Platform - Stack Driver Enabled - 100% Compute Errors

I developed and support a client's mobile app that uses Firebase services.
Google Cloud Platform logged this event yesterday at 4:17 am:
'<my account email> has executed
google.api.serviceusage.v1.ServiceUsage.EnableService
on stackdriver.googleapis.com'
I was sleeping at the time and a review of Google Admin Console Login Audit Log does not show a login event around that same time.
Immediately, 100% errors were reporting for 'compute':
A look at the Stackdriver API overview page does not give any indication of activity:
My question, my concern, how/why did this service get activated and what is the activity driving the compute errors at 100%?
During my efforts to understand, I clicked on Compute Engine API in the API library, which enabled the API (but no VMs, Disk, etc. were created):
A short time later, Google Cloud Platform has several log entries:
google.devtools.cloudbuild.v1.CloudBuild.ListBuilds
was executed on builds
Number of returned items 1000
The 'compute' errors stopped.
When I disabled the Compute Engine API, the ListBuilds logs stopped, but the Computer Errors returned to 100%.
I have not found a definitive answer to my question.
It's clear that Stackdriver API was enabled, but I don't know why.
When enabled, 100% Compute errors were being reported (orange line on graph) without any details.
While customizing the Google Cloud Platform Dashboard for this account, I toggled/enabled the Compute Engine card/graph hoping that might reveal some clues regarding the 100% errors. That action initialized the Compute Engine API. Almost immediately the Compute errors ended but there was a surge of activity that has continued. Reviewing many resources I found information that suggest this is normal behavior.
I would still like to fully understand how Stackdriver was enabled, why it was enabled, what value it provides, if I can simply disable it and Compute API as this project will never require VM compute services.

Is it possible to hide BigQuery query execution logs in Google Cloud platform?

Based on my understanding Google Cloud platform does not provide Bigquery specific Logging api that we can disable so that BQ sql query does not get logged.
Any reference or workaround will be highly appriciated.
Use case:
Queries need to be executed in client dataset and data has to stay within client project only.
There is no data in the Logs, only the query performed. However, you can exclude the logs if you want. but you won't be able to track, debug, understand what happened. If you are safe with this, so, go ahead!

Stackdriver Logging Client Libraries - What happens during Google Downtime?

If you embed the Stackdrvier client library in your application and the Google stack driver API has downtime (Google documentation indicates 99.95% or 21.92 minutes of downtime/month)
My question is: What will happen to my application during the downtime? Will logging info build up in memory? Will it cause application errors or will it discard the log data and continue on?
Logging API downtimes can have different root causes and consequences. Google System Engineers have mechanisms in place to track and take mitigation actions so the downtime and its consequences are minimal but Google cannot guarantee data loss prevention in all outages all the time related to logging API.
Hopefully your application and pipeline can withstand up to (21.56 minutes) expected downtime a month (SLA 99.95%) as per the internal SLOs and SLAs of GCP.
The three scenarios you listed are plausible. In this period, your application sending the logs may have 500 responses from the network so it has to be able to deal with this kind of issue.
If the logging data manages to reach Google's platform but an outage prevents the data to be accessible, then Google's team will try their best to release backlogs, repopulate data, etc. They will post general notice on https://status.cloud.google.com/
If the issue is caused by the logging agent not sending data to our platform, then logging data may not be retrievable (but it could still be an infrastructural outage with one of the GCP products) or linked to something other than an outage like your application or its underlying host running out of resources or the logging agent being corrupted which is not covered by GCP Stackdriver SLA [1].
If the pipeline that ingests data from Logging API is backlogged, it could cause an outage but GCP team will try their best to make the data accessible after the outage ends.
If you suspect issues with Logging API malfunctioning, please contact support or file issue tracker or inspect open issues where Google's product team will provide updates live. Links below:
[1] https://cloud.google.com/stackdriver/sla#sla_exclusions
[2]
create new incident:
https://issuetracker.google.com/issues/new?component=187203&template=0
[3]
open issues:
https://issuetracker.google.com/savedsearches/559764