Kubernetes Dashboard by Request ID - Distributed Tracing (for AWS EKS using Istio Service Mesh)

Kubernetes Dashboard by Request ID - Distributed Tracing (for AWS EKS using Istio Service Mesh) - amazon-web-services

I have several applications deployed on AWS EKS as microservices.
They are also deployed across different AWS accounts and have dependencies on each other.
I would like some kind of dashboard that says where exactly a request failed in a long flow of request across say 10 different microservices (m1 calls m2 and so on till m5 and say 1 request fails at m2 and another at m4, i would like to see a dashboard that shows where this flow got interrupted for each request).
How could I achieve to get this dashboard?
FOund this tool named ZIkpin which provides pretty much what I am looking for.
Any alternatives available? DOes ELK provide this dashboard? How about Kiali?
I am using istio for service mesh. Is any dashboard available that works best with istio for distributed tracing?

To cover the scenario you mention here, firstly make sure to have a centralized logging. I have used Elk and found it to be good covering logs from multiple services and it comes with a good dashboard view to debug the logs.
You can have different source types for logs across the micro services to differentiate while debugging. use something like a request-id which flows across all the 10 different services which the request hits in the path. This would make the identification easier, there are other ways too to handle it but for someone new to the flow could debug faster
You can use filebeat to push the logs with different log levels to elk from the log files generated at every ms.
Kibana dashboard is good for monitoring and comes with multiple search options as basic as http status code 500 which would directly give all internal server errors.
To improve further monitoring use alerts, graphs to get triggers.

Related

Metric Registrar in Cloud Foundry

Does Metric Registrar works in Cloud Foundry without Pivotal?
I have open source Cloud Foundry and I need to get custom metrics from app. I installed Metric Registrar community plugin for CF, I registered my application with endpoint, I also defined log format. Unfortunately I see no traffic on registered endpoint.
If open source Cloud Foundry do not support Metric Registrar, is there any other way to get support for custom app metrics?

Does Metric Registrar works in Cloud Foundry without Pivotal?
The Metric Registrar is part of the VMware Tanzu Application Service product, it's not part of the Open Source Cloud Foundry project. It's a value-add feature for those using the paid product.
If open source Cloud Foundry do not support Metric Registrar, is there any other way to get support for custom app metrics?
You don't strictly need the Metric Registrar to do this. The Metric Registrar's main purpose is to take metrics from your apps and inject them into the Loggregator log/metric stream. This is convenient if you have other software that is already consuming log & metric streams from Loggregator.
You don't have to do that though, as there are other ways to export metrics from your app.
If you want them to go through Loggregator, you could export structured log messages (perhaps JSON?) via STDOUT that contains your metrics. Those will, like your other log messages, go out through Loggregator. You would then just need to have something ingesting your logs, identifying the structured messages, and parsing out your metrics. This is similar to what Metric Registrar does, you're just parsing out the structured log entries after they leave the platform.
If you have an ELK stack or similar running, you can probably make this solution work easily enough. ELK can ingest your logs & structured log metrics, then you can search/filter through the metrics and create dashboards.
Another option you could do is to run Prometheus/Grafana. You then just need to make sure your app has a Prometheus Exposition metrics endpoint (this is super easy with Java/Spring Boot & Spring Boot Actuator, but can be done in any language). Point Prometheus at your app and it will then be able to scrape metrics from your apps & you can use Grafana to view them. None of this goes through Loggregator.
If you're looking for a solution that's more automatic, you could run an APM agent (NewRelic, DataDog, AppDynamics, Dynatrace, etc..) with your apps. These will capture metrics directly from the process and export them to a SaaS platform where you can monitor/review them.
There are probably other options as well. This is just what comes to mind as I write this up.

NGINX - AWS - LoadBalancer

i have to make a web application which a maximum of 10,000 concurrent users for 1h. The web server is NGINX.
The application is a simple landing page with an HTML5 player with streaming video from CDN WOWZA.
can you suggest a correct deployment on AWS?
LoadBalancer on 2 or more EC2?
If so, which EC2 sizing do you recommend? Better to use Autoscaling?
thanks

thanks for your answer. The application is 2 page PHP and the impact is minimal because in PHP code i write only 2 functions that checks user/password and token.
the video is provided by Wowza CDN because is live streaming, not on-demand.
what tool or service do you suggest about the stress test of Web Server?

I have to make a web application which a maximum of 10,000 concurrent users for 1h.
Avg 3/s, it is not so bad. Sizing is a complex topic and without more details, constraints, testing, etc. You cannot get a reasonable answer. There are many options and without more information it is not possible to say which one is the best. You just started NGINX, but not what it's doing (static sites, PHP, CGI, proxy to something else, etc.)
The application is a simple landing page with an HTML5 player with streaming video from CDN WOWZA.
I will just lay down a few common options:
Let's assume it is a single static (another assumption) web page referring an external resource (video). Then the simplest and the most scalable solution would be an S3 bucket hosting behind the CloudFront (CDN).
If you need some simple quick logic, maybe a lambda behind a load balancer could be good enough.
And you can of course host your solution on full compute (ec2, beanstalk, ecs, fargate, etc.) with different scaling options. But you will have to test out what is your feasible scaling parameters or bottleneck (io, network CPU, etc.). Please note that different instance types may have different network and storage throughput. AWS gives you an opportunity to test and find out what is good enough.

AWS - Log aggregation and visualization

We have couple of application running on AWS. Currently we are redirecting all our logs to single bucket. However for ease of access to users, I am thinking to install ELK Stack on EC2 instance.
Would want to check if there is alternate way available where I don't have to maintain this stack.
Scaling won't be an issue, as this is only for logs generated through application running on AWS, so not ingestion or processing is required. mostly log4j logs.

You can go for either the managed Elasticsearch available in AWS or setup your own in an EC2 instance
It usually comes down to the price involved and the amount of time you have in hand in setting up and maintaining your own setup
With your own setup, you can do a lot more configurations than that provided by the managed service and also helps in reducing the cost
You can find more info on this blog

Creating stack-driver monitoring charts to measure application api response latencies and requests counts

I created an application which runs on GKE kubernetes . Now I want to monitor my application apis using the stack driver monitoring . There are certain built-in/default metrics in GCP which are exposed on the stack driver monitoring console . They are pretty confusing . I would like to monitor the 99percentile, 95percentile api latencies and the requests count of each application api received by the system.
Can someone provide help how to achieve this ?
Is it possible using the metrics which are already there in the stack driver(emitted by GKE, istio, GCE etc) or I need to write custom metrics into the code ?
Any help is deeply appreciated :)
Thanks
Expected Result: Monitoring dashboards/charts for -
1. The 50, 90, 95, 99 percentile of the application api latencies.
2. Percentages/Count(s) of api requests which end with 2xx, 4xx, 5xx status codes.

Stackdriver does not have application-level metrics natively. The built-in metrics in Stackdriver are limited to GCP, AWS, and some established 3rd parties[1].
In order to monitor latency on your APIs, you would need to either create custom metrics[2], then build your Stackdriver dashboards against those, or use Cloud Endpoints[3]. I believe Cloud Endpoints generates the kind of dashboard you are looking for natively, it might be a better fit for this scenario.
If Endpoints don't meet your requirements, Stackdriver custom metrics give you more control, but both the metrics and the dashboards would need to be defined by you.
[1] https://cloud.google.com/monitoring/api/metrics
[2] https://cloud.google.com/monitoring/custom-metrics/creating-metrics
[3] https://cloud.google.com/endpoints/

Using CloudWatch API to get statistics

I have deployed a LAMP stack application on AWS. I need to monitor that using CloudWatch.
Can someone guide me on how to use the CloudWatch API for GetMetrics for CPU utilization? The AWS documentation is very scarce.
I see that the putmetrics call will let me create my own metrics.
My requirement is that I need to display those metric results in a mobile app.
My app monitors a project deployed on AWS. The alerts and metrics that come in must stream into the app.
I don't want just the metrics data in the AWS console,
I want it viewable in my mobile app. The app is developed in MEAN stack.
I must also add that the app is deployed on AWS and the application that is
being monitored is also in there(its a LAMP stack). I have managed to set 2 endpoints(HTTP and DB) and I have written
simple scripts in Javascript to monitor them. But ideally they should happen via Cloudwatch.

Providing a piece of code that replicates the issue that you are seeing normally allows who sees the question to help you better than guessing what you're doing.
Are you using an SDK to do this? What language/version?
here are links to the API docs:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricStatistics.html
http://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_ListMetrics.html
The pattern is to list the metrics and after that use the result and feed it into getmetricsstatistics.
In your specific case, googling the issue a bit before might answer the question before you ask it on SO. For example:
https://forums.aws.amazon.com/thread.jspa?messageID=295740

This can happen when you are hitting the wrong endpoint. Check if you are hitting endpoint of the right AWS service.
For example, trying to hit DynamoDB's endpoint when you want to access CloudWatch APIs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js