I am new to istio, and I am trying to setup the tracing, but can't make it work.
Then I logged the trace headers in my service, I fount that the x-b3-sampled is awlays zero
x-request-id:e1a8e4f3-5256-9e43-b571-a9c8a4c93db7, x-b3-traceid:b2328351dbb2f95fd0ec782c5103f1c1, x-b3-spanid:a6ef2135d03fac8a, x-b3-parentspanid:d0ec782c5103f1c1, x-b3-sampled:0, x-b3-flags:null, x-ot-span-context:null
This is what the service log printed.
I've set the sampling to 100
tracing:
sampling: 100
zipkin:
address: zipkin.istio-system:9411
Anyone knows the reason, or how can I work it out?
Related
I got the below error
(service AWS-Service) (instance i-1234567890abcdefg) (port 55551) is
unhealthy in (target-group
arn:aws:elasticloadbalancing:us-east-1:111111111111:targetgroup/aws-targetgroup/123456789)
due to (reason Health checks failed)
My configurations are:
Unhealthy threshold: 2
Timeout: 5
Interval: 30
Error code: 200
Resolution: I increased the unhealthy threshold to 5 which resulted in the passing of healthchecks and the application came to steady-state. Now, I again reduced unhealthy threshold to 2 and the application remained in a steady-state.
What could be the reason for this weird behavior? How can I diagnose more about the issue? It could also be that my configurations are not optimized, then How can I know the optimized values?
Note1: Earlier the application was working on the same configuration. This time it was a new deployment. The healthcheck was working from inside the container. There was no application related issue.
Note2: ALB is being used.
Note3: I have several more applications which are running with the same target group configuration, same CPU and memory configuration, and the application nature is also the same. They did not face the issue. They were working fine. I only faced this in one of my applications.
Cloud Tasks is saying:
App Engine is enforcing a processing rate lower than the maximum rate for this queue either because your application is returning HTTP 503 codes or because currently there is no instance available to execute a request.
However, I am forwarding the tasks to a cloud function using an HTTP POST request, similar to the one outlined in this tutorial. I don't see any 503s in my logs for the cloud function that it forwards to.
My queue.yaml is:
queue:
- name: task-queue-1
rate: 3/s
bucket_size: 500
max_concurrent_requests: 100
retry_parameters:
task_retry_limit: 1
min_backoff_seconds: 120
task_age_limit: 7d
The problem seems to come from any exception, even though only 503 is listed. If the cloud function responds with any error the task queue slows down the rate, and you have no control over that.
My solution was to swallow any errors to prevent that propagating up to Google's automatic check.
I've seen this asked here, here, and here - but without any good answers and was hoping to maybe get some closure on the issue.
I have an ELB connected to 6 instances all running Tomcat7. Up until Friday there were seemingly no issues at all. However, starting about five days ago we started getting around two 504 GATEWAY_TIMEOUT from the ELB per day. That's typically 2/2000 ~ .1%. I turned on logging and see
2018-06-27T12:56:08.110331Z momt-default-elb-prod 10.196.162.218:60132 - -1 -1 -1 504 0 140 0 "POST https://prod-elb.us-east-1.backend.net:443/mobile/user/v1.0/ HTTP/1.1" "BackendClass" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
But my Tomcat7 logs don't have any 504s present at all, implying that the ELB is rejecting these requests without even communicating with the Tomcat.
I've seen people mention setting the Tomcats timeout to be greater than the ELB's timeout - but if that were what were happening (i.e. Tomcat times out and then ELB shuts down), then shouldn't I see a 504 in the Tomcat logs?
Similarly, nothing has changed in the code in a few months. So, this all just started seemingly out of nowhere, and is too uncommon to be a bigger issue. I checked to see if there were some pattern in the timeouts (i.e. tomcat restarting or same instance etc.) but couldn't find anything.
I know other people have run into this issue, but any and all help would be greatly appreciated.
Apparently I cannot figure out how to do custom HTTP endpoint for Health Checks. Maybe I missed something or GCP doesn't offer it yet.
The ElasticSearch health check page describes various ways to check the EL cluster.
I was looking at the GCP health checks interface and it doesn't let us to add a URL endpoint, neither let us to define a parser for the health check to match against the "green" cluster.
What I was able to do is to wire in port 9200 and use a config like:
port: 9200, timeout: 5s, check interval: 60s, unhealthy threshold: 2 attempts
But this is way not the way to go for EL cluster, as the cluster may respond but having a yellow/red state.
There would be an easier way without parsing the output just adding a timeout check like:
GET /_cluster/health?wait_for_status=yellow&timeout=50s
Note: Will wait for 50 seconds for the cluster to reach the yellow level (if it reaches the green or yellow status before 50 seconds elapse, it will return at that point).
Any suggestions?
GCP health checks are simple and use the HTTP status code to determine if the check passes (200) - https://cloud.google.com/compute/docs/load-balancing/health-checks
what you can do is implement a simple HTTP service that will query ES's health check endpoint parse the output and decide if status code 200 should be returned or something else.
C:>vmc stats smile
DL is deprecated, please use Fiddle
Getting stats for smile... OK
instance cpu memory disk
0 0.5% 198.2K of 1G 54.6M of 2G
The 404 is coming from Tomcat which means connections are being forwarded correctly. Looking at the location the 404 is coming from;
/WEB-INF/views/WEB-INF/views/home.jsp.jsp
one can only assume there is something suspect with the way requests are being routed with in the application itself.