How is the application affected by the Sampling rate in AWS XRay? - amazon-web-services

In the AWS Xray documentation it is mentioned that the sdk applies sampling to the requests.
Now I want to implement distributed logging such that any request that comes in to the system can be tracked by using the X-Amzn-Trace-Id or some correlation ID, so that I can later execute a query to fetch all the logs for a given request (across all the microservices).
What is the best possible way to achieve it?
Also, if there is a SNS Topic where I am publishing my events and then a Queue is listening to the Topic, then how can I include that relation in the Xray Map?

This is a common requirement in monitoring system, correlate traces, metrics and logs by keywords. In traces <-> logs case, it is by attaching trace id into logs. Please search topics like OpenTelemetry Logging Instrumentation
Regarding sampling, please check this table. Basically, not sampled still generates trace context for correlating logs.

Related

How do I write a stack driver log query that samples 5% of all logs returned?

I keep reading that I can write a log query to sample a percentage of logs but I have found zero examples.
https://cloud.google.com/blog/products/gcp/preventing-log-waste-with-stackdriver-logging\
You can also choose to sample certain messages so that only a percentage of the messages appear in Stackdriver Logs Viewer
How do I get 10% of all GCE load balancer logs with a log query? I know I can configure this on the backend, but I don't want that. I want to get 100% of logs in stackdriver and create a pub/sub log sink with a log query that only captures 10% of them and sends those sampled logs somewhere else.
I suspect you'll want to create a Pub/Sub sink for Log Router. See Configure and manage sinks
Using Google's Log querying, you can use sample to filter (inclusion) logs.

Went over cloudwatch event size for structured application logs, now what?

We have a service that outputs application logs to cloudwatch. We structure the logs into json format, and output them through stdout, which is forwarded by fluentbit to cloudwatch. We then have a stream set up to forward the logs from cloudwatch to s3, followed by glue crawlers, Athena, and quick sight for dashboards.
We go all this working, and I just saw today that there is a 256kb limit in cloudwatch which we went over for some of our application logs. How else can we get our logs out of our service to s3 (or maybe a different data store?) for analysis? Is cloudwatch not the right approach for this? Other option I thought of us to break up the application logs into multiple events, but then we need to plumb through a joinable ID, as well as write etl logic that does more complex joins. Was hoping to avoid it unless it’s considered a better practice than what we are doing.
Thanks!

How do I send a notification to Slack from AWS CloudWatch on a specific error?

I'm trying to setup notifications to be sent from our AWS Lambda instance to a Slack channel. I'm following along in this guide:
https://medium.com/analytics-vidhya/generate-slack-notifications-for-aws-cloudwatch-alarms-e46b68540133
I get stuck on step 4 however because the type of alarm I want to setup does not involve thresholds or anomalies. It involves a specific error in our code. We want to be notified when users encounter errors when attempting to login in or sign up. We have try/catch blocks in our Node.js backend to log errors to CloudWatch at various points in the login/signup flow where we think the errors are most likely happening. We would like to identify when those SPECIFIC errors are occurring and send a notification to a Slack channel built for this purpose.
So in step 4 of the article, what would I have to do to set this up? Or is the approach in this article simply the wrong one for my purposes?
Thanks.
The step 4 titled "Create a CloudWatch Alarm" uses CPUUtlization metric to trigger an alarm.
In your case, since you want to use CloudWatch Logs, you would create CloudWatch Metric Filters based on the logs entries of interest. This would produce custom metrics based on your error string. Subsequently, you would create CloudWatch Alarm of this metric as shown in the linked tutorial for CPUUtlization.

What is `Active tracing` mean in lambda with Xray?

I deployed a lambda with xray is enabled. And I am able to see all trace in XRay console from my lambda. But I can see a warning message in below screenshot. It shows Active tracing requires permissions that are not configured to lambda. But I don't understand what Active tracing mean. I have read article like this https://docs.aws.amazon.com/xray/latest/devguide/xray-services-lambda.html but it doesn't explain very well.
So what does Active tracing mean and does it cost too much?
I also had this warning under "Active tracing." If you click into Edit it gives a bit more explanation, saying it needs permission to send trace data.
You can find the documentation here, but the short version is that you'll want to add the AWSXRayDaemonWriteAccess policy to your lambda function's execution role.
The different levels of x-ray integration with AWS services is explained here:
Active instrumentation – Samples and instruments incoming requests.
Passive instrumentation – Instruments requests that have been sampled by another service.
Request tracing – Adds a tracing header to all incoming requests and propagates it downstream.
Tooling – Runs the X-Ray daemon to receive segments from the X-Ray SDK.
AWS Lambda supports both active and passive instrumentation. So basically you use passive instrumentation if your function handles requests that have been sampled by some other service (e.g. API gateway). In contrast, if your function gets "raw" un-sampled requests, you should use active instrumentation, so that the sampling takes place.

Monitoring Alert for Cloud Build failure on master

I would like to receive a notification on my Notification Channel every time in Cloud Build a Build on master fails.
Now there were mentions of using Log Viewer but it seems like there is no immediate way of accessing the branch.
Is there another way where I can create a Monitoring Alert/a Metric which is specific to master?
A easy solution might be to define a logging metric and link an alerting trigger to this.
Configure Slack alerting in Notification channels of GCP.
Define your logging metric trigger in Logs-based Metrics. Make a Counter with Units 1 and filter using the logging query language:
resource.type="build"
severity=ERROR
or
resource.type="build"
textPayload=~"^ERROR:"
Create an Alerting Policy with that metric you've just defined and link the trigger to your Slack notification channel you've configured in step 1.
you can create Cloud Build notifications sending you updates to desired channels, such as Slack or your SMTP server HTTP channel. Also create a PubSub topic when your build's state changes, such as when your build is created, when your build transitions to a working state.
I just went through the pain of trying to get the official GCP slack integration via Cloud Run working. It was too cumbersome and didn't let me customize what I wanted.
Best solution I see is to get Cloud Build setup to send Pub/Sub messages to the cloud-builds topic. With that, you can use the below repo I just made public to filter on the specific branch you want but looking at the data_json['substitutions']['BRANCH_NAME'] field.
https://github.com/Ucnt/gcp-cloud-build-slack-notifier