Making GCP Load-balancer HTTP logs integrate with Cloud Trace (in GCP Logs Explorer)? - google-cloud-platform

Cloud Trace and Cloud Logging integrate quite nicely in most cases, described in https://cloud.google.com/trace/docs/trace-log-integration
Unfortunately, this doesn't seem to include the HTTP request logs generated by a Load Balancer when request logging is enabled.
The LB logs show the traces icon, and are correctly associated with an overall trace in the Cloud Trace system, but the context menu 'show trace details' is greyed out for those log items.
A similar problem arose with my application level logging/tracing, and was solved by setting the traceSampled attribute on the LogEntry, but this can't work for LB logs, since I'm not in control of their generation.
In this instance I'm tracing 100% of requests since the service is M2M and fairly low volume, but in the general case it makes sense that the LB can't know if something is actually generating traces without being told.
I can't find any good references in the docs, but in theory a response header indicating it was sampled could be observed by the LB and cause it to issue the appropriate log.
Any ideas if such a feature exists, in this form or any other?
(Last-ditch workaround might be to use Logs Router to feed LB logs into a pubsub queue (and exclude them from normal logging sinks), and resubmit them to the normal sink(s) with fields appropriately set by some Cloud Function or other pubsub consumer, but that seems like a lot of work and complexity for this purpose)

There is currently a Feature Request created for this, you can follow the status in the following link 1.
As a workaround, you could implement target proxies along with your Load Balancer, according to the documentation for a Global external HTTP(S) load balancer:
The proxies set HTTP request/response headers as follows:
Via: 1.1 google (requests and responses)
X-Forwarded-Proto: [http | https] (requests only)
X-Cloud-Trace-Context: <trace-id>/<span-id>;<trace-options> (requests only) Contains parameters for Cloud Trace.
X-Forwarded-For: [<supplied-value>,]<client-ip>,<load-balancer-ip>
(see X-Forwarded-For header) (requests only)
You can find the complete documentation about external HTTP(S) load balancers and target proxies here 2.
And finally, take a look at the documentation on how to use and configure target proxies here 3.

Related

OpenCensus distributed tracing propagation using both Flask & requests

I'm attempting to implement distributed tracing via OpenCensus across different services where each service is using Flask for serving and sends downstream requests using... er.. requests to build its reply. The tracing platform is GCP Cloud Trace.
I'm using FlaskMiddleware, which traces the initial call correctly, but there's no propagation of span info between the source and target service, even when the middleware has a propagation defined (I've tried a few):
#middleware = FlaskMiddleware(app, exporter=exporter, sampler=sampler, excludelist_paths=['healthz', 'metrics'], propagator=google_cloud_format.GoogleCloudFormatPropagator())
middleware = FlaskMiddleware(app, exporter=exporter, sampler=sampler, excludelist_paths=['healthz', 'metrics'], propagator=b3_format.B3FormatPropagator())
#middleware = FlaskMiddleware(app, exporter=exporter, sampler=sampler, excludelist_paths=['healthz', 'metrics'], propagator=trace_context_http_header_format.TraceContextPropagator())
I guess the question is, does anyone have an example of setting up tracing that traverses several downstream calls when each service is serving via Flask and making downstream requests via requests.
Currently I end up with each service having its own trace with a single span.

Should django health-check endpoint /ht/ be accessible from everybody?

From the documentation reported here I read
This project checks for various conditions and provides reports when
anomalous behavior is detected.The following health checks are bundled
with this project: cache, database, storage, disk and memory
utilization (viapsutil), AWS S3 storage, Celery task queue, Celery
ping, RabbitMQ, Migrations
and from use case section
The primary intended use case is to monitor conditions via HTTP(S),
with responses available in HTML and JSONformats. When you get back a
response that includes one or more problems, you can then decide the
appropriate courseof action, which could include generating
notifications and/or automating the replacement of a failing node with
a newone
And then
The /ht/ endpoint will respond aHTTP 200 if all checks passed and a HTTP
500 if any of the tests failed.
From a security point of view: should this url (https://example.com/ht) be reachable from everybody? It seems to give away different information.

How can I filter out errors on sentry to avoid consuming my quota?

I'm using Sentry to log my errors, but there are errors I'm not able to fix (or could not be fixed by me) like
OSError (write error)
Or error that come from RQ (each time I deploy my app)
Or client errors (which are client.errors)
I can't just ignore them because I consume all my quota. How I can filter out this errors?
Here some references for interested people.
uwsgi: OSError: write error during GET request
Fixing broken pipe error in uWSGI with Python
https://github.com/unbit/uwsgi/issues/1623
I created a Gist for rate limiting the amount of events that are being send to Sentry:
https://gist.github.com/jurrian/e22f8e724b8499a29c5537e956f0dc7f
It uses ratelimitingfilter which can be configured to set a rate per minute, and additionally add a burst to start rate limiting after a number of events.
I get the same errors, but i never had any problems with my quota. But if you really want to filter it, you can just do it in your sdk:
https://docs.sentry.io/error-reporting/configuration/filtering/?platform=python
But beware, this could hide other errors as mentioned here:
https://github.com/pypa/warehouse/issues/679
To safe yourself some quota, you have two options:
Avoid forwarding events client side, thus preventing events being send to sentry at all. Have a look at the docs for available client-side filters. The drawback with this approach is of course that you need a new code deployment for any adjustment of client-side filters and some clients may not instantly reflect your code changes.
Avoid forwarding events on sentry's side, via inbound filters ([Project] > Project Settings > Inbound Filters). According to the sentry documentation on quota usage, events filtered via inbound filters are not affecting your quota.
Inbound filters include:
Common browser extension errors
Events coming from localhost
Known legacy browsers errors
Known web crawlers
By their error message
From specific release versions of your code
From certain IP addresses
Business plans and above also allow to filter events by error messages.

How to configure istio-proxy to log traceId?

I am using istio with version 1.3.5. Is there any configuration to be set to allow istio-proxy to log traceId? I am using jaeger tracing (wit zipkin protocol) being enabled. There is one thing I want to accomplish by having traceId logging:
- log correlation in multiple services upstream. Basically I can filter all logs by certain traceId.
According to envoy proxy documentation for envoy v1.12.0 used by istio 1.3:
Trace context propagation
Envoy provides the capability for reporting tracing information regarding communications between services in the mesh. However, to be able to correlate the pieces of tracing information generated by the various proxies within a call flow, the services must propagate certain trace context between the inbound and outbound requests.
Whichever tracing provider is being used, the service should propagate the x-request-id to enable logging across the invoked services to be correlated.
The tracing providers also require additional context, to enable the parent/child relationships between the spans (logical units of work) to be understood. This can be achieved by using the LightStep (via OpenTracing API) or Zipkin tracer directly within the service itself, to extract the trace context from the inbound request and inject it into any subsequent outbound requests. This approach would also enable the service to create additional spans, describing work being done internally within the service, that may be useful when examining the end-to-end trace.
Alternatively the trace context can be manually propagated by the service:
When using the LightStep tracer, Envoy relies on the service to propagate the
x-ot-span-context
HTTP header while sending HTTP requests to other services.
When using the Zipkin tracer, Envoy relies on the service to propagate the B3 HTTP headers (
x-b3-traceid,
x-b3-spanid,
x-b3-parentspanid,
x-b3-sampled,
and
x-b3-flags).
The
x-b3-sampled
header can also be supplied by an external client to either enable or
disable tracing for a particular request. In addition, the single
b3
header propagation format is supported, which is a more compressed
format.
When using the Datadog tracer, Envoy relies on the service to propagate the Datadog-specific HTTP headers (
x-datadog-trace-id,
x-datadog-parent-id,
x-datadog-sampling-priority).
TLDR: traceId headers need to be manually added to B3 HTTP headers.
Additional information: https://github.com/openzipkin/b3-propagation

WSO2 API Manager 2.1 : Gateway not enforcing Throttling Limits

We have deployed API-M 2.1 in a distributed way (each component, GW, TM, KM are running in their own Docker image) on top on DC/OS 1.9 ( Mesos ).
We have issues to get the gateway to enforce throttling policies (should it be subscription tiers or app-level policies). Here is what we have managed to define so far:
The Traffic Manager itself does it job : it receives the event streams, analyzes them on the fly and pushes an event onto the JMS topic throttledata
The Gateway reads the message properly.
So basically we have discarded a communication issue.
However we found two potential issues:
In the event which is pushed to the TM component, the value of the appTenant is null (instead of carbon.super)- We have a single tenant defined.
When the gateway receives the throttling message, it decides to let the message go thinking the "stopOnQuotaReach" is set to false, when it is set to true (we checked the value in the database).
Digging into the source code, we related those two issues to a single source: the value for both values above are read from the authContext and apparently incorrectly set. We are stuck and running out of ideas of things to try and would need some pointers to what could be a potential source of the problem and things to check.
Can somebody help please ?
Thanks- Isabelle.
Is there two TM with HA enabled available in the system?
If the TM is HA enabled, how gateways publish data to TM. Is it load balanced data publishing or failover data publishing to the TMs?
Did you follow below articles to configure the environment with respect to your deployment?
http://wso2.com/library/articles/2016/10/article-scalable-traffic-manager-deployment-patterns-for-wso2-api-manager-part-1/
http://wso2.com/library/articles/2016/10/article-scalable-traffic-manager-deployment-patterns-for-wso2-api-manager-part-2/
Is throttling completely not working in your environment?
Have you noticed any JMS connection related logs in gateways nodes?
In these tests, we have disabled HA to avoid possible complications. Neither subscription nor app throttling policies are working, both because parameters that should have values have not the adequate value (appTenant, stopOnQuotaReach).
Our scenario is far more basic. If we go with one instance of each component, it fails as Isabelle described. And the only thing we know is that both parameters come from the Authentication Context.
Thank you!