Open Telemetry Context Tracing for Apache and/or Nginx - amazon-web-services

We are implementing distributed tracing in our environment, starting with simple auto-instrumentation, using Open Telemetry. Our environment is primarily cloud based, running on AWS.
We have had success auto-instrumenting most of our cloud services (ECS, EKS, Lambda, etc.), and are seeing context tracing being passed from one service to the next. We are also auto-instrumenting Apache and Nginx servers running on EC2, using the Otel standard, and are successfully seeing trace information being collected, but calls from Apache to another front-end or back-end service are not being tied together by trace context. Apache produces it's own trace id and the system it calls is producing it's own as well, and the linkage is lost.
Has anybody been able to get this to work and are there samples you can share?
Thanks so much!
We have tried using the Otel libraries, as well as the AWS distributed tracing libraries, and have played around with different exporters and collectors. The tracing capabilities work individually, but when it comes time to pass context from Apache and/or Nginx to some other service, the trace link is broken

Related

ChromeOS errors in GCP Logging

I'm seeing errors in StackDriver logging for my Compute instance. The logs are showing repeated issues every hour, creating a lot of noise. I have a Spring Boot API deployed in a container to a VM in Compute Engine using latest stable version of Container OS.
I'm relatively new to GCP and don't understand what is causing this issue, searches have come up empty so far.
Failed to call method: org.chromium.SessionManagerInterface.RetrieveActiveSessions: object_path= /org/chromium/SessionManager: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.SessionManager was not provided by any .service files
CallMethodAndBlockWithTimeout(...): Domain=dbus, Code=org.freedesktop.DBus.Error.ServiceUnknown, Message=The name org.chromium.SessionManager was not provided by any .service file
Error calling D-Bus proxy call to interface '/org/chromium/SessionManager': The name org.chromium.SessionManager was not provided by any .service files
The same 3 lines are repeating every hour. Anyone aware of what might be causing this or how to fix/suppress these?
I looked into this error, and as per my findings:
The error message that you have been receiving is a manifestation of Chrome to reliably exit shortly after starting up.
The UI’s job (which encompasses Chrome, the session_manager and the window manager) gets shut down by upstart because of it's thrashing, and when the test tries to restart the session_manager, the session_manager cannot communicate it over to the D-Bus.
The crash collection software in Container OS was originally for Chromebooks (The laptop using Chrome browser). So the code typically expects Chrome and some other related software on the system.
However, Container OS is a server OS, and does not have Chrome. So if Chrome is missing, the software will report some errors. They are actually not real failures, just some verbose error messages.
Overall, It is safe to ignore these logs and continue using your VM Instances.
Hope this helps.

Kubernetes liveness probes stop responding

I'm using kube-aws to set up a production Kubernetes cluster on AWS. Now I'm running into an issue I am unable to recreate in my dev environment.
When a pod is running heavy computation, which in my case happens in bursts, the liveness and readiness probes stop responding. In my local environment, where I run docker compose, they work as expected.
The probes use simple HTTP 204 No Content output using native Go functionality.
Has anyone seen this before?
EDIT:
I'd be happy to provide more information, but I am uncertain what I should provide as there is a tremendous amount of configuration and code involved. Primarily, I'm looking for help to troubleshoot and where to look to try to locate the actual issue.

Suddenly scheduled tasks are not running in coldfusion 8

I am using Coldfusion MX8 server and one of the scheduled task was running from 2 years but now suddenly from 01/12/2014 scheduled tasks are not running. When i browsed the file in browser then the file is running successfully without error.
I am not sure is there any updatation or license expiration problem. I am aware that mid of this year Adobe closed the support for coldfusion 8.
The first most common problem of this problem is external to the server. When you say you browsed to the file and it worked in a browser, it is very important to know if that test was performed on the server desktop. Knowing that you can browse to the file from your desktop or laptop is of small value.
The most common source of issues like this is a change in the DNS or network stack that is interfereing with resolution. For example, if the internal DNS serving your DMZ suddenly starts serving the "external" address - suddenly your server can't browse to your domain. Or if the IP served by the server for the domain in question goes from being 127.0.0.1 to some other IP that the server can't acces correctly due to reverse proxy or LB or some other rule. Finally, sometimes the Apache or IIS is altered so that an IP that previously was serviced (127.0.0.1 being the most common example) now does not respond.
If it is something intrinsic to the scheduler service then Frank's advice is pretty good - especially look for "proxy schduler" entries in the log - they can give you good clues. I would also log results of a scheduled task to a file. Then check the file. If it exists then your scheduled tasks ARE running - they are just not succeeding. Good luck!
I've seen the cf scheduling service crash in CF8. The rest of CF is unaffected.
Have you tried restarting the server?
Here are your concerns:
Your File (works since you tested it manually).
Your Scheduled Task (failed).
Your Coldfusion Application (Service) (any changes here)?
Your Server (what about here).
To test your problem create a duplicate task and schedule it. Leave the other one in place (maybe set your new one to run earlier). Use the same file too. See if it completes.
If it doesn't then you have a larger problem. Since the Coldfusion Server sits atop of the JVM there could be something happening there. Things just don't stop working unless something got corrupted or you got compromised. If you hardened your server by rearranging/renaming the file structure to make it more secure...It would break your task.
So going back: if your test schedule works then determine what is different between the two. Note you have logging capabilities. Logging abilities for CF8
If you are not directly incharge of maintaining this server, then I would recommend asking around and see if there was recent maintenance, if so, what was done to the server?

Issues with ActiveMQ 3.8.3 (CPP) priorityBackup not working

I am a little new to active MQ so please bear with me.
I am trying to take advantage of the ActiveMQ priority backup feature for some of my Java and CPP applications. I have two brokers on two different servers (local and remote), and I want the following behavior for my apps.
Always connect to local broker on startup
If local broker goes down, connect to remote
While connected to remote, if local comes back up, we then reconnect to local.
I have had success with testing it on the java apps by simply adding priorityBackup to my uri options
i.e.
failover:(tcp://local:61616,tcp://remote:61616)?randomize=false&priorityBackup=true
However stuff isn't going as smoothly on the CPP side.
The following works fine on the CPP apps (with basic working failover functionality - aka jumping to remote when local goes down )
failover:(tcp://local:61616,tcp://remote:61616)?randomize=false
But updating the uri options with priorityBackup seems to break failover functionality completely (my apps never failover to the remote broker, they just stay in some kind of broker-less/limbo state when their local broker goes down)
failover:(tcp://local:61616,tcp://remote:61616)?randomize=false&priorityBackup=true
Is there anything I am missing here? Extra uri options that I should have included?
UPDATE: Transport connector info
<transportConnectors>
<transportConnector name="ClientOpenwire" uri="tcp://0.0.0.0:61616?wireFormat.maxInactivityDuration=7000"/>
<transportConnector name="Broker2BrokerOpenwire" uri="tcp://0.0.0.0:62627?wireFormat.maxInactivityDuration=5000"/>
<transportConnector name="stompConnector" uri="stomp://0.0.0.0:62623"/>
</transportConnectors>
backup and priorityBackup parameters are handled in completely different way in Java and C++ implementation of the library.
Java implementation works well but unfortunately C++ implementation is broken. There are no extra options that can fix this issue. Serious changes in library are required to resolve this issue.
I was testing this issue using activemq-cpp-library-3.8.3, and brokers in various versions (5.10.0, 5.11.1). Issue is not fixed in 3.8.4 release.

JMX monitoring/statistics in Akka application

Are there any built in JMX exposed monitoring/statistics that can be enabled in Akka (Java), besides the Cluster MBean? I have looked at Typesafe Console, but since it requires a license to be used with collecting data from multiple nodes, I was hoping to be able to achieve the same with plain JMX. I have checked the Akka documentation without any luck on this topic.
No - the Cluster JMX support is it.
There are a couple of projects aimed at collecting data from Akka. Both are at very early stages at this point but the code could be a starting point for you.
Eigengo Monitor - http://www.cakesolutions.net/teamblogs/2013/11/01/monitoring-akka/
Kamon - http://kamon.io/
Both are using AspectJ to get the data out of Akka.
Typesafe Console is free to use in non-Production environments, if that works for you.
Try this. I did a pull request with the necessary functionality to Kamon.
After the release of this version (after 0.5.1), all you will need to do to make jmx work - you need to add the module kamon-jmx to project and activate it autostart on configuration.