Lag spikes on google container engine running a flask restful api

Lag spikes on google container engine running a flask restful api - flask

I'm running flask restplus api on google container engine with TCP Load Balancer. The flask restplus api makes calls to google cloud datastore or cloud sql but this does not seem to be the problem.
A few times a day or even more, there is a moment of latency spikes. Restarting the pod solves this or it solves itself in a 5 to 10 minute period. Of course this is too much and needs to be resolved.
Anyone knows what could be the problem or has experience with these kind of issues?
Thx

One thing you could try is monitoring your instance CPU load.
Although the latency doesn't correspond with usage spikes, it may be the case that there is a cumulative effect on CPU load and the latency you're experiencing occurs when the CPU reaches a given % and needs to back off temporarily. If this is the case you could make use of cluster autoscaling, or try running a higher spec machine to see if that makes any difference. Or, if you have limited CPU use on pods/containers, try increasing this limit.
If you're confident CPU isn’t the cause of the issue, you could try to SSH into the affected instance when the issue is occurring, send a request through the load balancer and use tcpdump to analyse the traffic coming in and out. You may be able to spot if the latency stems from the load balancer (by monitoring the latency of HTTP traffic to the instance), or to Cloud Datastore or Cloud SQL (from the instance).
Alternatively, try using strace to monitor the relevant processes both before and during the latency, or dtrace to monitor the system as a whole.

Related

Why my website that is deployed in Google compute engine VM instance taking too long to load?

My website is taking too much time to load. I have hosted it in a VM instance created from Google compute engine.
My website is built on MERN stack and running with docker compose. I'm using docker scaling for backend application but no load balancing for the client side or the VM.
I would really appreciate if someone helps me out with this issue, I've been searching for days but still couldn't figure out what the issue is.
This is the site link:
https://www.mindschoolbd.com
VM type: e2-standard-2,
zone: nortamerica-norteast2-a

There are few things that we can do to make it load fast :
1)You can optimize your application code (including database queries if you have) - on this part, you can contact your developer as this is out of our scope as well.
Check there are a few situations that may cause the load issue in mobile apps:
i)The MERN Stack - A Practical guide app server may be down and that is causing the loading issue.
ii)Your wifi / mobile data connection is not working properly. Please check your data connection.
iii)Too many users are using the app at the same time. Please try after a few minutes.
Please go through the MERN Stack practical issues for more info.
2)You can upgrade your Machine type and this includes the CPU and RAM of your VM instance and server caching is also part of it.
Check over util of memory and cpu increase, if required change it into something higher than e2-standard-2, As recommended in the SO for more information.
3)Try to have your vm deployed in a region which is near to your location. Suggest to create snapshot and create vm using snapshot and change region.
4)Create a cloud CDN for caching to make it load fast, Cloud CDN lowers network latency & Content delivery best practices, offloads origins, and reduces serving costs. Recommending to use Optimize application latency with load balancing and also installing an Ops agent in your VM instance to have a better monitoring view.
Finally go through the Website Speed Optimization Tips: Techniques to Improve Performance and User Experience for more information.

Latency in Cloud Foundry

Questions:
How do you define latency in cloud foundry?
Is cloud foundry a distributed cloud ?
High Load(Transferring oversized files via Rest call) on one of the app in cloud foundry will impact other apps performance? if yes how?
How is latency calculated for over all cloud network traffic and latency? and any metrics that can be used to determine how is current network latency doing?
Thanks in advance!

How do you define latency in cloud foundry?
The same way it's defined elsewhere. In regards to application traffic on CF, the system will add latency because traffic to your application routes through typically two (external load balancer and Gorouter) or more load balancer layers (optionally additional external load balancers).
Each layer takes some amount of time to process the request, which means each layer adds some amount of latency to the request.
Is cloud foundry a distributed cloud ?
It's a distributed system. Individual components of CF can be scaled as needed (i.e. Gorouter or UAA or Cloud Controller, they are all separate). Not sure what's meant beyond that.
High Load(Transferring oversized files via Rest call) on one of the app in cloud foundry will impact other apps performance? if yes how?
High CPU load in one application can impact the performance of other applications in some ways, however, Cloud Foundry has mitigations in place to generally minimize the impact.
Specifically, an application running on CF will be given a certain number of CPU shares, and the shares ensure a minimal amount of guaranteed CPU time for that app. If there is contention for CPU, then the OS (i.e. Linux kernal) will enforce these limits. If there is no contention, then applications may burst above their allocations and consume extra time.
Where you typically see performance impact caused by load from other applications is when you have an application that is used to consuming or perhaps load tested while consuming additional CPU (i.e. it expects to be able to burst above their assigned limits). This can be a problem because while you'll often be able to burst above the CPU limit, if you suddenly have CPU contention from some other app that now requires its fair share of CPU time, then the limits will be enforced and the app original won't be able to burst above its limits. This is an example of how high load in one app can impact the performance of another application on the platform, though it is no fault of the platform that causes this. The application owner should be sizing CPU for worst case, not the best case.
You can use the cpu entitlement cf cli plugin to get more details on your app's CPU consumption and if your app is bursting above its entitlement. If you are above the entitlement, then you need to increase the memory limit for your app because CPU shares are directly tied to the memory limit of your app in CF (i.e. there is no way to increase just the CPU shares).
How is latency calculated for over all cloud network traffic and latency? and any metrics that can be used to determine how is current network latency doing?
Again, the same way it's calculated elsewhere. It's the time delay added for the system to process the request.

Azure VM Inbound Throttling to VMs?

We have 2 Elastic VMs (Linux) (Currently DS2V2) behind an Azure Load Balancer. We are doing HTTP Posts from our local lan into the Load Balancer, but we seem to be getting throttled. We have tried: Changing the size of the VMs, no difference; adding additional premium SSDs, again no difference; running multiple threads on our end, again no differenece.
What we did do though, was to having the Elastic Engine suck in all of the log files from the Linux boxes and the index rate jump pretty high while it was ingesting them. So we are assuming that it's not really the Linux Elastic boxes that are throttling us.
We do have Kibana installed on the boxes, and as a base line, we're just using the "Cluster Indexing Rate" for both our local posts to the box, and the local ingestion of the log files.
We do understand that yes, there is going to be some latency and overhead since we are now involving the internet, but not the rates we are currently getting. (We have a 1G pipe to the internet, it's nowhere near capacity, so we can rule out at least getting out of our company).
The question is, where else can we look to determine where we might be getting throttled?

For the performance "MUCH slower", it is a bit subjective question and hard to identify. I just provide some information that may impact it.
Azure Compute requests may be throttled at a subscription and on a per-region basis. If you have an API throttling error, you could refer to this document to troubleshoot throttling issues, and best practices to avoid being throttled.
Some factors CPU and storage limits that differ on Azure VM sizes may impact the Azure VM to process incoming data. You may change the size to a higher CPU and premium SSD disk. You could also change Azure resources to another region which is close to your location. You could refer to this article.

AWS CloudWatch - 100% CPU Utilization

I have an AWS M4.Large EC2 instance running a Magento e-commerce site that is experiencing consistent max CPU usage spikes at a regular interval: 10 minutes at 100% CPU, followed by 20 minutes at 40-50% CPU. I've included a screenshot below. I am trying to identify the cause of these routine spikes, but am not sure how to target the cause(s). I would assume an automated task is at play here, due to the regularity of these spikes. Any advice and suggestions would be extremely appreciated!
CloudWatch Monitoring Details
I am hoping to keep our instance type as an M4.Large, but if it requires an increase then I will bump it up. Unfortunately, I do not think that AWS Auto Scaling will be a viable option this web application.
Thank you! Suggestions are very much appreciated!
EDIT:
While looking at the Network monitors, it seems that high traffic correlates exactly to the CPU usage.
Network Activity Details

Have you enabled the access logs if yes then you can easily figure it out whether the requests are coming from your automation module or not.
How to differentiate original request from automation requests
You can add some extra query parameter to the url, Now you can start tracing all the requests generated by your automation module during that time.

AWS EC2 Immediate Scaling Up?

I have a web service running on several EC2 boxes. Based on the Cloudwatch latency metric, I'd like to scale up additional boxes. But, given that it takes several minutes to spin up an EC2 from an AMI (with startup code to download the latest application JAR and apply OS patches), is there a way to have a "cold" server that could instantly be turned on/off?

Not by using AutoScaling. At least not, instant in the way you describe. You could make it much faster however, by making your own modified AMI image where you place the JAR and the latest OS patches. These AMI's can be generated as part of your build pipeline. In that case, your only real wait time is for the OS and services to start, similar to a "cold" server.
Packer is a tool commonly used for such use cases.
Alternatively, you can mange it yourself, by having servers switched off, and start them by writing some custom Lambda scripts that gets triggered by Cloudwatch alerts. But since stopped servers aren't exactly free either, i would recommend against that for cost reasons.

Before you venture into the journey of auto scaling your infrastructure and spending time/effort. Perhaps you should do a little bit of analysis on the traffic pattern day over day, week over week and month over month and see if it's even necessary? Try answering some of these questions.
What was the highest traffic ever your app handled, How did the servers fare given the traffic? How was the user response time?
When does your traffic ramp up or hit peak? Some apps get traffic during business hours while others in the evening.
What is your current throughput? For example, you can handle 1k requests/min and two EC2 hosts are averaging 20% CPU. if the requests triple to 3k requests/min are you able to see around 60% - 70% avg cpu? this is a good indication that your app usage is fairly predictable can scale linearly by adding more hosts. But if you've never seen traffic burst like that no point over provisioning.
Unless you have a Zynga like application where you can see large number traffic at once perhaps better understanding your traffic pattern and throwing in an additional host as insurance could be helpful. I'm making these assumptions as I don't know the nature of your business.
If you do want to auto scale anyway, one solution would be to containerize your application with Docker or create your own AMI like others have suggested. Still it will take few minutes to boot them up. Next option is the keep hosts on standby but and add those to your load balancers using scripts ( or lambda functions) that watches metrics you define (I'm assuming your app is running behind load balancers).
Good luck.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js