I'm attempting to implement distributed tracing via OpenCensus across different services where each service is using Flask for serving and sends downstream requests using... er.. requests to build its reply. The tracing platform is GCP Cloud Trace.
I'm using FlaskMiddleware, which traces the initial call correctly, but there's no propagation of span info between the source and target service, even when the middleware has a propagation defined (I've tried a few):
#middleware = FlaskMiddleware(app, exporter=exporter, sampler=sampler, excludelist_paths=['healthz', 'metrics'], propagator=google_cloud_format.GoogleCloudFormatPropagator())
middleware = FlaskMiddleware(app, exporter=exporter, sampler=sampler, excludelist_paths=['healthz', 'metrics'], propagator=b3_format.B3FormatPropagator())
#middleware = FlaskMiddleware(app, exporter=exporter, sampler=sampler, excludelist_paths=['healthz', 'metrics'], propagator=trace_context_http_header_format.TraceContextPropagator())
I guess the question is, does anyone have an example of setting up tracing that traverses several downstream calls when each service is serving via Flask and making downstream requests via requests.
Currently I end up with each service having its own trace with a single span.
Related
When I make the calls from Cloud Run instance to other cloud APIs for some reason there are huge delays in responses.
Everything works within 1 project.
Even from local machine the calls are much faster (couple of secs) - but deployed in the cloud it takes couple of mins for some requests to complete. As I see it is relevant for all APIs (apart from Firestore, Translate and TTS APIs as well). This is not related to cold starts for sure.
Code example (Node JS) and logs are below:
console.log('Received the request for stats');
const usersCollection = this.firestore.collection('users')
const snapshot = await this.usersCollection.get();
console.log('Fetched all users from Firestore');
Well after some further investigation I figured out what the problem was.
The thing is that all the operations I perform happen not before the response is sent but after (this is the way chatbot is architectured).
So the flow looks like this:
request to do smth - response 200 that the request is accepted
all the business logic and work
chatbot sends the message with the results
According to the docs the CPU is allocated only during the request processing by default so the only thing I had to change is to enable CPU allocation for background activities: https://cloud.google.com/run/docs/tips/general#background-activity
Cloud Trace and Cloud Logging integrate quite nicely in most cases, described in https://cloud.google.com/trace/docs/trace-log-integration
Unfortunately, this doesn't seem to include the HTTP request logs generated by a Load Balancer when request logging is enabled.
The LB logs show the traces icon, and are correctly associated with an overall trace in the Cloud Trace system, but the context menu 'show trace details' is greyed out for those log items.
A similar problem arose with my application level logging/tracing, and was solved by setting the traceSampled attribute on the LogEntry, but this can't work for LB logs, since I'm not in control of their generation.
In this instance I'm tracing 100% of requests since the service is M2M and fairly low volume, but in the general case it makes sense that the LB can't know if something is actually generating traces without being told.
I can't find any good references in the docs, but in theory a response header indicating it was sampled could be observed by the LB and cause it to issue the appropriate log.
Any ideas if such a feature exists, in this form or any other?
(Last-ditch workaround might be to use Logs Router to feed LB logs into a pubsub queue (and exclude them from normal logging sinks), and resubmit them to the normal sink(s) with fields appropriately set by some Cloud Function or other pubsub consumer, but that seems like a lot of work and complexity for this purpose)
There is currently a Feature Request created for this, you can follow the status in the following link 1.
As a workaround, you could implement target proxies along with your Load Balancer, according to the documentation for a Global external HTTP(S) load balancer:
The proxies set HTTP request/response headers as follows:
Via: 1.1 google (requests and responses)
X-Forwarded-Proto: [http | https] (requests only)
X-Cloud-Trace-Context: <trace-id>/<span-id>;<trace-options> (requests only) Contains parameters for Cloud Trace.
X-Forwarded-For: [<supplied-value>,]<client-ip>,<load-balancer-ip>
(see X-Forwarded-For header) (requests only)
You can find the complete documentation about external HTTP(S) load balancers and target proxies here 2.
And finally, take a look at the documentation on how to use and configure target proxies here 3.
From the documentation reported here I read
This project checks for various conditions and provides reports when
anomalous behavior is detected.The following health checks are bundled
with this project: cache, database, storage, disk and memory
utilization (viapsutil), AWS S3 storage, Celery task queue, Celery
ping, RabbitMQ, Migrations
and from use case section
The primary intended use case is to monitor conditions via HTTP(S),
with responses available in HTML and JSONformats. When you get back a
response that includes one or more problems, you can then decide the
appropriate courseof action, which could include generating
notifications and/or automating the replacement of a failing node with
a newone
And then
The /ht/ endpoint will respond aHTTP 200 if all checks passed and a HTTP
500 if any of the tests failed.
From a security point of view: should this url (https://example.com/ht) be reachable from everybody? It seems to give away different information.
Kubernetes and Istio already installed in the cluster. Three micro services deployed as PODs. The flow is
Micro service A to Micro Service B calls => HTTP
Micro service B to Micro service C calls => via Kafka
Micro service A expose a HTTP API to outside
I guess when client hit the Ingres, Istio generate traceId and spanId in HTTP header and enter to Service A.
Are these spanId and traceId propagate to Micro service B and C without using separate API like Spring Cloud sleuth?
No, Istio does not provide tracing headers propagation. However it can be configured on application side without use of 3rd party APIs.
According to Istio documentation:
Istio leverages Envoy’s distributed tracing feature to provide tracing integration out of the box. Specifically, Istio provides options to install various tracing backend and configure proxies to send trace spans to them automatically. See Zipkin, Jaeger and LightStep task docs about how Istio works with those tracing systems.
Istio documentation also has an example of application side header propagation for the bookinfo demo application:
Trace context propagation
Although Istio proxies are able to automatically send spans, they need some hints to tie together the entire trace. Applications need to propagate the appropriate HTTP headers so that when the proxies send span information, the spans can be correlated correctly into a single trace.
To do this, an application needs to collect and propagate the following headers from the incoming request to any outgoing requests:
x-request-id
x-b3-traceid
x-b3-spanid
x-b3-parentspanid
x-b3-sampled
x-b3-flags
x-ot-span-context
Additionally, tracing integrations based on OpenCensus (e.g. Stackdriver) propagate the following headers:
x-cloud-trace-context
traceparent
grpc-trace-bin
If you look at the sample Python productpage service, for example, you see that the application extracts the required headers from an HTTP request using OpenTracing libraries:
def getForwardHeaders(request):
headers = {}
# x-b3-*** headers can be populated using the opentracing span
span = get_current_span()
carrier = {}
tracer.inject(
span_context=span.context,
format=Format.HTTP_HEADERS,
carrier=carrier)
headers.update(carrier)
# ...
incoming_headers = ['x-request-id']
# ...
for ihdr in incoming_headers:
val = request.headers.get(ihdr)
if val is not None:
headers[ihdr] = val
return headers
The reviews application (Java) does something similar:
#GET
#Path("/reviews/{productId}")
public Response bookReviewsById(#PathParam("productId") int productId,
#HeaderParam("end-user") String user,
#HeaderParam("x-request-id") String xreq,
#HeaderParam("x-b3-traceid") String xtraceid,
#HeaderParam("x-b3-spanid") String xspanid,
#HeaderParam("x-b3-parentspanid") String xparentspanid,
#HeaderParam("x-b3-sampled") String xsampled,
#HeaderParam("x-b3-flags") String xflags,
#HeaderParam("x-ot-span-context") String xotspan) {
if (ratings_enabled) {
JsonObject ratingsResponse = getRatings(Integer.toString(productId), user, xreq, xtraceid, xspanid, xparentspanid, xsampled, xflags, xotspan);
When you make downstream calls in your applications, make sure to include these headers.
I am using istio with version 1.3.5. Is there any configuration to be set to allow istio-proxy to log traceId? I am using jaeger tracing (wit zipkin protocol) being enabled. There is one thing I want to accomplish by having traceId logging:
- log correlation in multiple services upstream. Basically I can filter all logs by certain traceId.
According to envoy proxy documentation for envoy v1.12.0 used by istio 1.3:
Trace context propagation
Envoy provides the capability for reporting tracing information regarding communications between services in the mesh. However, to be able to correlate the pieces of tracing information generated by the various proxies within a call flow, the services must propagate certain trace context between the inbound and outbound requests.
Whichever tracing provider is being used, the service should propagate the x-request-id to enable logging across the invoked services to be correlated.
The tracing providers also require additional context, to enable the parent/child relationships between the spans (logical units of work) to be understood. This can be achieved by using the LightStep (via OpenTracing API) or Zipkin tracer directly within the service itself, to extract the trace context from the inbound request and inject it into any subsequent outbound requests. This approach would also enable the service to create additional spans, describing work being done internally within the service, that may be useful when examining the end-to-end trace.
Alternatively the trace context can be manually propagated by the service:
When using the LightStep tracer, Envoy relies on the service to propagate the
x-ot-span-context
HTTP header while sending HTTP requests to other services.
When using the Zipkin tracer, Envoy relies on the service to propagate the B3 HTTP headers (
x-b3-traceid,
x-b3-spanid,
x-b3-parentspanid,
x-b3-sampled,
and
x-b3-flags).
The
x-b3-sampled
header can also be supplied by an external client to either enable or
disable tracing for a particular request. In addition, the single
b3
header propagation format is supported, which is a more compressed
format.
When using the Datadog tracer, Envoy relies on the service to propagate the Datadog-specific HTTP headers (
x-datadog-trace-id,
x-datadog-parent-id,
x-datadog-sampling-priority).
TLDR: traceId headers need to be manually added to B3 HTTP headers.
Additional information: https://github.com/openzipkin/b3-propagation