StackDriver Trace across Cloud/Services - google-cloud-platform

What if I have an application that works across cloud services. Eg. AWS Lambda will call Google CloudRun service and I want to have my traces work across these. Isit possible? I guess I will have to somehow pass a trace ID and set it when I need it? But I see no way to set a trace ID?

If we look at the list of supported language/backend combinations, we see that both GCP (Stackdriver) and AWS (X-Ray) are supported. See: Exporters. What this means is that you can instrument either (or both) of your AWS Lambda or GCP CloudRun applications with OpenCensus calls. I suspect you will have to dig deep to determine the specifics but this feels like a good starting point.

If an OpenCensus library is available for your programming language, you can simplify the process of creating and sending trace data by using OpenCensus. In addition to being simpler to use, OpenCensus implements batching which might improve performance click here.
The Stackdriver Trace API allows you to send and retrieve latency data to and from Stackdriver Trace. There are two versions of the API:
Stackdriver Trace API v1 is fully supported.
Stackdriver Trace API v2 is in Beta release.
The client libraries for Trace automatically generate the trace_id and the span_id. You need to generate the values for these fields if you don't use the Trace client libraries or the OpenCensus client libraries. In this case, you should use a pseudo-random or random algorithm. Don't derive these fields from need-to-know data or from personally identifiable information, for details please click here.

Related

Analyze Number value in Different Conditions with google cloud platform logging

I'm struggling to find out how to use GCP logging to log a number value for analysis, I'm looking for a link to a tutorial or something (or a better 3rd party service to do this).
Context: I have a service that I'd like to test different conditions for the function execution time and analyze it with google-cloud-platform logging.
Example Log: { condition: 1, duration: 1000 }
Desire: Create graph using GCP logs to compare condition 1 and 2.
Is there a tutorial somewhere for this? Or maybe there is a better 3rd party service to use?
PS: I'm using the Node google cloud logging client which only talks about text logs.
PSS: I considered doing this in loggly, but ended up getting lost in their documentation and UI.
There are many tools that you could use to solve this problem. However, you suggest a willingness to use Google Cloud Platform services (e.g. Stackdriver monitoring), so I'll provide some guidance using it.
NOTE Please read around the topic and understand the costs involved with using e.g. Cloud Monitoring before you commit to an approach.
Conceptually, the data you're logging (!) more closely matches a metric. However, this approach would require you to add some form of metric library (see Open Telemetry: Node.js) to your code and instrument your code to record the values that interest you.
You could then use e.g. Google Cloud Monitoring to graph your metric.
Since you're already producing a log with the data you wish to analyze, you can use Log-based metrics to create a metric from your logs. You may be interested in reviewing the content for distribution metric.
Once you've a metric (either directly or using logs-based), you can then graph the resulting data in Cloud Monitoring. For logs-based metrics, see the Monitoring documentation.
For completeness and to provide an alternative approach to producing and analyzing metrics, see the open-source tool, Prometheus. Using a 3rd-party Prometheus client library for Node.js, you could instrument you code to produce a metric. You would then configure Prometheus to scrape your app for its metrics and graph the results for you.

Tracking of Common Services in AWS using X-ray

I have multiple Lambdas and each of them either invoke another lambda or a
rest API or a dynamoDB or S3 etc .
Example :
HotelBooking
FlighBooking
These invoke the common services like
BookingService
InvoiceService
I need to track which application i.e Flight Booking / HotelBooking is invoking the booking service and how many times / how much CPU etc
Is this possible through X-Ray in AWS or any other better ways ?
After some research, I believe annotation is the best way for the above problem
So we need to add an annotation in NodeJS
AWSXRay.captureFunc('annotations', (subsegment) => {
subsegment.addAnnotation('application', "BookingService");
});
Annotations are indexed and can be used to filter too
using an expression like this annotation.= "BookingService"
more Info :
X-ray merges segments, Subsegments to traces so the annotation at subsegment level is enough for filtering the traces
AWS Xray can help you to trace downstream calls made by lambda application in a form of segments and subsegments in order to give you the over all view of your application. I think AWS Xray can help you in your use case and also you will be able to trace Dynamodb, S3 or RestAPI calls that your lambda is making and able to identify which application (in your case Flight Booking / HotelBooking) is invoking the service (booking service). Though you might not see performance numbers (e.g. memory and CPU usage) but you will be able to trace exceptions inside your application.

Finding untraced time in Google Cloud Tracer Agent for Express.js

I'm using Google Cloud's Stackdriver Trace Agent with the Express.js plugin.
I noticed there are a few routes which have substantial "untraced" time. What strategies can I use to find and begin to measure these untraced paths, and why would it not pick up certain code paths?
If the Trace agent isn't working, there's unfortunately not very much you can do to modify its behavior. I recommend using OpenCensus to instrument your application, which will give you much more control over exactly how traces and spans are created.

Is there a way to report custom DataDog metrics from AWS Lambda?

I'm looking to report custom metrics from Lambda functions to Datadog. I need things like counters, gauges, histograms.
Datadog documentation outlines two options for reporting metrics from AWS Lambda:
print a line into the log
use the API
The fine print in the document above mentions that the printing method only supports counters and gauges, so that's obviously not enough for my usecase (I also need histograms).
Now, the second method - the API - only supports reporting time series points, which I'm assuming are just gauges (right?), according to the API documentation.
So, is there a way to report metrics to Datadog from my Lambda functions, short of setting up a statsd server in EC2 and calling out to it using dogstatsd? Anyone have any luck getting around this?
The easier way is using this library: https://github.com/marceloboeira/aws-lambda-datadog
It has runtime no dependencies, doesn't require authentication and reports everything to cloud-watch too. You can read more about it here: https://www.datadoghq.com/blog/how-to-monitor-lambda-functions/
Yes it is possible to emit metrics to DataDog from a AWS Lambda function.
If you were using node.js you could use https://www.npmjs.com/package/datadog-metrics to emit metrics to the API. It supports counters, gauges and histograms. You just need to pass in your app/api key as environment variables.
Matt

Google Cloud Spanner: Want Java API for doing my own retries

This is really a question for the Google Cloud Spanner Java API team...
Looking at the new Google Cloud Spanner service, it appears that the only way to perform read/write transactions is by providing a callback, via the TransactionRunner interface.
I understand that the API is trying to hide the details of the need to automatically retry transactions as a convenience to the programmer, but this limitation is a serious problem, at least for me. I need to be able to manage the transaction lifecycle myself, even if that means I have to perform my own retries (e.g., based on catching some sort of "retryable" exception).
To make this problem more concrete, suppose you wanted to implement Spring's PlatformTransactionManager for Google Cloud Spanner, so as to fit in with your existing code, and use your existing retry logic. It appears impossible to do that with the current Java API.
It seems like it would be easy to augment the API in a backward compatible way, to add a method returning a TransactionContext to the user, and let the user handle the retries.
Am I missing something? Can this alternate (more traditional) transaction API style be added to the Java API?
You are right in that TransactionRunner is the only way to do Read write transactions in the Java Client for Cloud Spanner. We believe that most users would prefer using that vs hand rolling their own retry logic. But we realize that it might not fit the needs of all the users and would love to hear about such use cases. Can you please file a feature request and we can further discuss there.