Finding untraced time in Google Cloud Tracer Agent for Express.js - google-cloud-platform

I'm using Google Cloud's Stackdriver Trace Agent with the Express.js plugin.
I noticed there are a few routes which have substantial "untraced" time. What strategies can I use to find and begin to measure these untraced paths, and why would it not pick up certain code paths?

If the Trace agent isn't working, there's unfortunately not very much you can do to modify its behavior. I recommend using OpenCensus to instrument your application, which will give you much more control over exactly how traces and spans are created.

Related

Analyze Number value in Different Conditions with google cloud platform logging

I'm struggling to find out how to use GCP logging to log a number value for analysis, I'm looking for a link to a tutorial or something (or a better 3rd party service to do this).
Context: I have a service that I'd like to test different conditions for the function execution time and analyze it with google-cloud-platform logging.
Example Log: { condition: 1, duration: 1000 }
Desire: Create graph using GCP logs to compare condition 1 and 2.
Is there a tutorial somewhere for this? Or maybe there is a better 3rd party service to use?
PS: I'm using the Node google cloud logging client which only talks about text logs.
PSS: I considered doing this in loggly, but ended up getting lost in their documentation and UI.
There are many tools that you could use to solve this problem. However, you suggest a willingness to use Google Cloud Platform services (e.g. Stackdriver monitoring), so I'll provide some guidance using it.
NOTE Please read around the topic and understand the costs involved with using e.g. Cloud Monitoring before you commit to an approach.
Conceptually, the data you're logging (!) more closely matches a metric. However, this approach would require you to add some form of metric library (see Open Telemetry: Node.js) to your code and instrument your code to record the values that interest you.
You could then use e.g. Google Cloud Monitoring to graph your metric.
Since you're already producing a log with the data you wish to analyze, you can use Log-based metrics to create a metric from your logs. You may be interested in reviewing the content for distribution metric.
Once you've a metric (either directly or using logs-based), you can then graph the resulting data in Cloud Monitoring. For logs-based metrics, see the Monitoring documentation.
For completeness and to provide an alternative approach to producing and analyzing metrics, see the open-source tool, Prometheus. Using a 3rd-party Prometheus client library for Node.js, you could instrument you code to produce a metric. You would then configure Prometheus to scrape your app for its metrics and graph the results for you.

Google Cloud Platform - Stack Driver Enabled - 100% Compute Errors

I developed and support a client's mobile app that uses Firebase services.
Google Cloud Platform logged this event yesterday at 4:17 am:
'<my account email> has executed
google.api.serviceusage.v1.ServiceUsage.EnableService
on stackdriver.googleapis.com'
I was sleeping at the time and a review of Google Admin Console Login Audit Log does not show a login event around that same time.
Immediately, 100% errors were reporting for 'compute':
A look at the Stackdriver API overview page does not give any indication of activity:
My question, my concern, how/why did this service get activated and what is the activity driving the compute errors at 100%?
During my efforts to understand, I clicked on Compute Engine API in the API library, which enabled the API (but no VMs, Disk, etc. were created):
A short time later, Google Cloud Platform has several log entries:
google.devtools.cloudbuild.v1.CloudBuild.ListBuilds
was executed on builds
Number of returned items 1000
The 'compute' errors stopped.
When I disabled the Compute Engine API, the ListBuilds logs stopped, but the Computer Errors returned to 100%.
I have not found a definitive answer to my question.
It's clear that Stackdriver API was enabled, but I don't know why.
When enabled, 100% Compute errors were being reported (orange line on graph) without any details.
While customizing the Google Cloud Platform Dashboard for this account, I toggled/enabled the Compute Engine card/graph hoping that might reveal some clues regarding the 100% errors. That action initialized the Compute Engine API. Almost immediately the Compute errors ended but there was a surge of activity that has continued. Reviewing many resources I found information that suggest this is normal behavior.
I would still like to fully understand how Stackdriver was enabled, why it was enabled, what value it provides, if I can simply disable it and Compute API as this project will never require VM compute services.

How to get logs for Compute Engine API errors?

I am a total beginner in cloud service management, so this is a very basic question.
I have inherited a kubernetes based project running in Google Cloud. I have discovered recently that there are millions of errors I am unaware of in APIs & Services > Compute Engine API > Metrics menu:
I have tried searching for these values both on google in the docs to no avail. With no link to the list of logs and hundreds of sub menu items I feel completely lost on where to start.
How can I get more information about these errors?
How can I navigate to the relevant logs?
Your question is rather general so I will make some assumptions and educated guesses about your project and try to explain.
This level of error with API calls is of course unusually high and suggesting that some things don't work (for example someone deleted a backend service but left the load balancer without any health checks and it's accepting requests from the outside but there's nothing in the backend to process them).
That is just an exmaple - without more details I'm not even speculate further.
If you want to read more about the messages take the second one from the top - documentation for compute.v1.BackendServicesService.delete.
You can also explore other Compute Engine API methods to see what they do to give you more insight what is happening with your project.
This should give you a good starting point to explore the API.
Now - regarding logs. Just navigate to Logs Viewer and select as a resource whatever you want to analyse (all or a single VM, Load Balancer, firewall rule, etc). You can also include (or exclude) certain level of logs (warning, error etc). Pissibilities are endless.
Your query may look something like this:
Here's more documentation on GCP Logs Viewer to help you out.

Google API Speeds Slow in Cloud Run / Functions?

Bottom Line: Cloud Run and Cloud Functions seem to have bizarrely limited bandwidth to the Google Drive API endpoints. Looking for advice on how to work around, or, ideally, #Google support to fix the underlying issue(s) as I will not be the only like use case.
Background: I have what I think is a really simple use case. We're trying to automate private domain Google Drive users to take existing audio recordings and send them off to Speech API to generate a transcript on an ad hoc basis, and to dump the transcript back into the same Drive folder with email notification to the submitter. Easy, right? Only hard part is that Speech API will only read from Google Cloud Storage, so the 'hard part' should be moving the file over. 'Hard' doesn't really cover it...
Problem: Writing in nodejs and using the latest version of the official modules for Drive and GCS, the file copying was going extremely slow. When we broke things down, it became apparent that the GCS speed was acceptable (mostly -- honestly it didn't get a robust test, but was fast enough in limited testing); it was the Drive ingress which was causing the real problem. Using even the sample Google Drive Download app from the repo was slow as can be. Thinking the issue might be either my code or the library, though, I ran the same thing from the Cloud Console, and it was fast as lightning. Same with GCE. Same locally. But in Cloud Functions or Cloud Run, it's like molasses.
Request:
Has anyone in the community run into this or a like issue and found a workaround?
#Google -- Any chance that whatever the underlying performance bottleneck is, you can fix it? This is a quintessentially 'serverless' use case, and it's hard to believe that the folks who've been doing this the longest can't crack it.
Thank you all in advance!
Updated 1/4/19 -- GCS is also slow following more robust testing. Image base also makes no difference (tried nodejs10-alpine, nodejs12-slim, nodejs12-alpine without impact), and memory limits equally do not impact results locally or on GCP (256m works fine locally; 2Gi fails in GCP).
Google Issue at: https://issuetracker.google.com/147139116
Self-inflicted wound. Google-provided code seeks to be asynchronous and do work in the background. Cloud Run and Cloud Functions do not support that model (for now at least). Move to promise-chaining and all of a sudden it works like it should -- so long as the CPU keeps the attention it needs. Limits what we can do with CR / CF, but hopefully that too will evolve.

StackDriver Trace across Cloud/Services

What if I have an application that works across cloud services. Eg. AWS Lambda will call Google CloudRun service and I want to have my traces work across these. Isit possible? I guess I will have to somehow pass a trace ID and set it when I need it? But I see no way to set a trace ID?
If we look at the list of supported language/backend combinations, we see that both GCP (Stackdriver) and AWS (X-Ray) are supported. See: Exporters. What this means is that you can instrument either (or both) of your AWS Lambda or GCP CloudRun applications with OpenCensus calls. I suspect you will have to dig deep to determine the specifics but this feels like a good starting point.
If an OpenCensus library is available for your programming language, you can simplify the process of creating and sending trace data by using OpenCensus. In addition to being simpler to use, OpenCensus implements batching which might improve performance click here.
The Stackdriver Trace API allows you to send and retrieve latency data to and from Stackdriver Trace. There are two versions of the API:
Stackdriver Trace API v1 is fully supported.
Stackdriver Trace API v2 is in Beta release.
The client libraries for Trace automatically generate the trace_id and the span_id. You need to generate the values for these fields if you don't use the Trace client libraries or the OpenCensus client libraries. In this case, you should use a pseudo-random or random algorithm. Don't derive these fields from need-to-know data or from personally identifiable information, for details please click here.