I am using Istio in AWS EKS cluster. I am using the pre-installed prometheus and grafana to monitor pods, Istio mesh, Istio service workloads.
I have three services running in three different workspace,
Service 1:- service1.namespace1.svc.cluster.local
Service 2 :- service2.namespace2.svc.cluster.local
Service 3:- service3.namespace3.svc.cluster.local
I can find the latency of each service end points from Istio Service Dashboard in Grafana . But, it just shows the latency for service end points, not the end point prefix. Though the overall service end point latency is fine but I want to check which path is taking time in a service end point.
Let's say P50 Latency of service1.namespace1.svc.cluster.local is 2.91ms , but I also want to check latency of each path. It has four paths,
service1.namespace1.svc.cluster.local/login => Loging Path , P50 Latency = ?
service1.namespace1.svc.cluster.local/signup => Singup Path , P50 Latency = ?
service1.namespace1.svc.cluster.local/auth => Auth path , P50 Latency = ?
service1.namespace1.svc.cluster.local/list => List path , P50 Latency = ?
I am not sure if it is possible in Prometheus and Grafana stack. What is the recommended way to achieve it ?
Istioctl version --remote
client version: 1.5.1
internal-popcraftio-ingressgateway version:
citadel version: 1.4.3
galley version: 1.4.3
ingressgateway version: 1.5.1
pilot version: 1.4.3
policy version: 1.4.3
sidecar-injector version: 1.4.3
telemetry version: 1.4.3
pilot version: 1.5.1
office-popcraftio-ingressgateway version:
data plane version: 1.4.3 (83 proxies), 1.5.1 (4 proxies)
To my knowledge this is not something that the Istio metrics can provide. However, you should take a look at the available metrics that your server framework provides, if any. So, this is application (framework)-dependent. See for instance for SpringBoot ( https://docs.spring.io/spring-metrics/docs/current/public/prometheus ) or Vert.x ( https://vertx.io/docs/vertx-micrometer-metrics/java/#_http_server )
One thing to be aware of, with HTTP path-based metrics, is that it is likely to make the metrics cardinality explode, if not used with care. Imagine some of your paths contain unbounded dynamic values (e.g. /object/123465 , with 123456 being an ID), if that path is stored as a Prometheus label, that would mean under the hood that Prometheus will create one metric per ID, which is likely to cause performance issues on Prometheus and risk out-of-memory on your app.
This is I think a good reason to NOT have Istio providing path-based metrics. While on the other end, frameworks can have the sufficient knowledge to provide metrics based on path template instead of actual path (e.g. /object/$ID instead of /object/123465), which solves the cardinality problem.
PS: Kiali has some documentation about runtimes monitoring, that may help: https://kiali.io/documentation/runtimes-monitoring/
Related
We have deployed Istio 1.11.0 using helm-chart in our dev and production environment.
We are using below configuration in istio configmap, which we have updated via istio-control helm-chart.
meshConfig:
extensionProviders:
- name: "ext-authz-grpc"
envoyExtAuthzGrpc:
service: "ext-auth-service.default.svc.cluster.local"
port: "50051"
includeHeadersInCheck: [ "authorization", "ws-protocol" ]
headersToUpstreamOnAllow: [ "authorization", "x-role", "x-id" ]
accessLogFile: /dev/stdout
enablePrometheusMerge: true
Basically we are using grpc service for external authorization server.
Above configuration is working fine.
One of our client has deployed Istio 1.9.8 using operator. (They have their own deployment model for Istio. Not allowing us to deploy istio using helm-chart)
When we try to apply above changes using operator it gives us below error :
2022-04-05T10:23:09.657830Z info installer Loading values from compiled in VFS at path profiles/minimal.yaml
2022-04-05T10:23:09.657837Z info installer Loading values from compiled in VFS at path profiles/default.yaml
2022-04-05T10:23:09.679340Z error installer failed to merge base profile with user IstioOperator CR profile-poc-customized, failed to unmarshall mesh config: unknown field "includeHeadersInCheck" in v1alpha1.MeshConfig_ExtensionProvider_EnvoyExternalAuthorizationGrpcProvider moreInfo=The values in the selected spec.profile could not be merged with the user IstioOperator resource. impact=The operator controller cannot create and act upon the user defined IstioOperator resource. The Istio control plane will not be installed or updated. action=Check that the IstioOperator resource has the correct syntax. If you are sure your configuration is correct, see https://istio.io/latest/about/bugs for possible solutions. likelyCause=The likely cause is an incorrect or badly formatted configuration.Another possible cause could be an issue with the Istio code.
If we directly edit the configmap and make changes then it is able to apply those changes.
But its giving error when we are updating it from operator.
Can anybody help me to understand why its not working with operator?
includeHeadersInCheck is only available for http and not grpc:
https://istio.io/v1.10/docs/reference/config/istio.mesh.v1alpha1/#MeshConfig-ExtensionProvider-EnvoyExternalAuthorizationGrpcProvider
Specs:
The serverless Amazon MSK that's in preview.
t2.xlarge EC2 instance with Amazon Linux 2
Installed Kafka from https://dlcdn.apache.org/kafka/3.0.0/kafka_2.13-3.0.0.tgz
openjdk version "11.0.13" 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode,
sharing)
Gradle 7.3.3
https://github.com/aws/aws-msk-iam-auth, successfully built.
I also tried adding IAM authentication information, as recommended by the Amazon MSK Library for AWS Identity and Access Management. It says to add the following in config/client.properties:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL
# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM
# Binds SASL client implementation.
# sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
# Binds SASL client implementation. Uses the specified profile name to look for credentials.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required awsProfileName="kafka-client";
And kafka-client is the IAM role attached to the EC2 instance as an instance profile.
Networking: I used VPC Reachability Analyzer to confirm that the security groups are configured correctly and the EC2 instance I'm using as a Producer can reach the serverless MSK cluster.
What I'm trying to do: create a topic.
How I'm trying: bin/kafka-topics.sh --create --partitions 1 --replication-factor 1 --topic quickstart-events --bootstrap-server boot-zclcyva3.c2.kafka-serverless.us-east-2.amazonaws.com:9098
Result:
Error while executing topic command : Timed out waiting for a node assignment. Call: createTopics
[2022-01-17 01:46:59,753] ERROR org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
(kafka.admin.TopicCommand$)
I'm also trying: with the plaintext port of 9092. (9098 is the IAM-authentication port in MSK, and serverless MSK uses IAM authentication by default.)
All the other posts I found on SO about this node assignment error didn't include MSK. I tried suggestions like uncommenting the listener setting in server.properties, but that didn't change anything.
Installing kcat for troubleshooting didn't work for me, since there's no out-of-the box installation for the yum package manager, which Amazon Linux 2 uses, and since these instructions failed for me at checking for libcurl (by compile)... failed (fail).
The Question: Any other tips on solving this "node assignment" error?
The documentation has been updated recently, I was able to follow it end to end without any issue (The IAM policy is now correct)
https://docs.aws.amazon.com/msk/latest/developerguide/serverless-getting-started.html
The created properties file is not automatically used; your command needs to include --command-config client.properties, where this properties file is documented at the MSK docs on the linked IAM page.
Extract...
ssl.truststore.location=<PATH_TO_TRUST_STORE_FILE>
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
Alternatively, if the plaintext port didn't work, then you have other networking issues
Beyond these steps, I suggest reaching out to MSK support, and telling them to update the "Create a Topic" page to no longer use Zookeeper, keeping in mind that Kafka 3.0 is not (yet) supported
Kubernetes and Istio already installed in the cluster. Three micro services deployed as PODs. The flow is
Micro service A to Micro Service B calls => HTTP
Micro service B to Micro service C calls => via Kafka
Micro service A expose a HTTP API to outside
I guess when client hit the Ingres, Istio generate traceId and spanId in HTTP header and enter to Service A.
Are these spanId and traceId propagate to Micro service B and C without using separate API like Spring Cloud sleuth?
No, Istio does not provide tracing headers propagation. However it can be configured on application side without use of 3rd party APIs.
According to Istio documentation:
Istio leverages Envoy’s distributed tracing feature to provide tracing integration out of the box. Specifically, Istio provides options to install various tracing backend and configure proxies to send trace spans to them automatically. See Zipkin, Jaeger and LightStep task docs about how Istio works with those tracing systems.
Istio documentation also has an example of application side header propagation for the bookinfo demo application:
Trace context propagation
Although Istio proxies are able to automatically send spans, they need some hints to tie together the entire trace. Applications need to propagate the appropriate HTTP headers so that when the proxies send span information, the spans can be correlated correctly into a single trace.
To do this, an application needs to collect and propagate the following headers from the incoming request to any outgoing requests:
x-request-id
x-b3-traceid
x-b3-spanid
x-b3-parentspanid
x-b3-sampled
x-b3-flags
x-ot-span-context
Additionally, tracing integrations based on OpenCensus (e.g. Stackdriver) propagate the following headers:
x-cloud-trace-context
traceparent
grpc-trace-bin
If you look at the sample Python productpage service, for example, you see that the application extracts the required headers from an HTTP request using OpenTracing libraries:
def getForwardHeaders(request):
headers = {}
# x-b3-*** headers can be populated using the opentracing span
span = get_current_span()
carrier = {}
tracer.inject(
span_context=span.context,
format=Format.HTTP_HEADERS,
carrier=carrier)
headers.update(carrier)
# ...
incoming_headers = ['x-request-id']
# ...
for ihdr in incoming_headers:
val = request.headers.get(ihdr)
if val is not None:
headers[ihdr] = val
return headers
The reviews application (Java) does something similar:
#GET
#Path("/reviews/{productId}")
public Response bookReviewsById(#PathParam("productId") int productId,
#HeaderParam("end-user") String user,
#HeaderParam("x-request-id") String xreq,
#HeaderParam("x-b3-traceid") String xtraceid,
#HeaderParam("x-b3-spanid") String xspanid,
#HeaderParam("x-b3-parentspanid") String xparentspanid,
#HeaderParam("x-b3-sampled") String xsampled,
#HeaderParam("x-b3-flags") String xflags,
#HeaderParam("x-ot-span-context") String xotspan) {
if (ratings_enabled) {
JsonObject ratingsResponse = getRatings(Integer.toString(productId), user, xreq, xtraceid, xspanid, xparentspanid, xsampled, xflags, xotspan);
When you make downstream calls in your applications, make sure to include these headers.
I'm wondering how to calculate storage usage or sizing (database, log files, i do not know others) of WSO2 EI Clustered deployment (Load balancer + Node 1 + Node 2 )
Wondering which hardware environment should we set up?
Our traffic is very high, almost receives 10000 request per day,
1. Should we use hardware environment what WSO2 recommended ?
you can follow the production deployment guide[1] to follow the hardware requirements recommended by wso2. Basically follow the installation prerequisites[2] and common guidelines[3] sections on that doc. Also for best practices for managing the logs can be found on [4]
[1] - https://docs.wso2.com/display/CLUSTER44x/Production+Deployment+Guidelines
[2] - https://docs.wso2.com/display/CLUSTER44x/Production+Deployment+Guidelines#ProductionDeploymentGuidelines-Installationprerequisites
[3] - https://docs.wso2.com/display/CLUSTER44x/Production+Deployment+Guidelines#ProductionDeploymentGuidelines-Commonguidelinesandchecklist
[4] - https://docs.wso2.com/display/ADMIN44x/Monitoring+Logs#MonitoringLogs-Managingloggrowth
I have a Django rest framework based API hosted on google app engine with 1 instance using gunicorn.
There is no scaling set up, yet. Here is the app.yaml :
runtime: custom
api_version: '1.0'
env: flexible
threadsafe: true
env_variables:
DEPLOYMENT_MODE: prod
manual_scaling:
instances: 1
The instance has 1 CPU and 1 GB of memory and usually serves 4-6 requests per second. The application is working fine, but sometimes under light load, I intermittently get 502s, for no apparent reason.
Ignore the spikes on left and right, they are because of an actual heavy load that the instance could not handle. The Intermittent 502s are highlighted.
I would like to understand the origin of these 502 errors.