Latency in Cloud Foundry - cloud-foundry

Questions:
How do you define latency in cloud foundry?
Is cloud foundry a distributed cloud ?
High Load(Transferring oversized files via Rest call) on one of the app in cloud foundry will impact other apps performance? if yes how?
How is latency calculated for over all cloud network traffic and latency? and any metrics that can be used to determine how is current network latency doing?
Thanks in advance!

How do you define latency in cloud foundry?
The same way it's defined elsewhere. In regards to application traffic on CF, the system will add latency because traffic to your application routes through typically two (external load balancer and Gorouter) or more load balancer layers (optionally additional external load balancers).
Each layer takes some amount of time to process the request, which means each layer adds some amount of latency to the request.
Is cloud foundry a distributed cloud ?
It's a distributed system. Individual components of CF can be scaled as needed (i.e. Gorouter or UAA or Cloud Controller, they are all separate). Not sure what's meant beyond that.
High Load(Transferring oversized files via Rest call) on one of the app in cloud foundry will impact other apps performance? if yes how?
High CPU load in one application can impact the performance of other applications in some ways, however, Cloud Foundry has mitigations in place to generally minimize the impact.
Specifically, an application running on CF will be given a certain number of CPU shares, and the shares ensure a minimal amount of guaranteed CPU time for that app. If there is contention for CPU, then the OS (i.e. Linux kernal) will enforce these limits. If there is no contention, then applications may burst above their allocations and consume extra time.
Where you typically see performance impact caused by load from other applications is when you have an application that is used to consuming or perhaps load tested while consuming additional CPU (i.e. it expects to be able to burst above their assigned limits). This can be a problem because while you'll often be able to burst above the CPU limit, if you suddenly have CPU contention from some other app that now requires its fair share of CPU time, then the limits will be enforced and the app original won't be able to burst above its limits. This is an example of how high load in one app can impact the performance of another application on the platform, though it is no fault of the platform that causes this. The application owner should be sizing CPU for worst case, not the best case.
You can use the cpu entitlement cf cli plugin to get more details on your app's CPU consumption and if your app is bursting above its entitlement. If you are above the entitlement, then you need to increase the memory limit for your app because CPU shares are directly tied to the memory limit of your app in CF (i.e. there is no way to increase just the CPU shares).
How is latency calculated for over all cloud network traffic and latency? and any metrics that can be used to determine how is current network latency doing?
Again, the same way it's calculated elsewhere. It's the time delay added for the system to process the request.

Related

Cloud Memorystore Redis high CPU utilisation

We are using Cloud Memorystore Redis instance to add a caching layer to our mission critical Internet facing application. Total number of calls (including get, set and key expiry operations) to Memorystore instance is around 10-15K per second. CPU utilisation has been consistently around 75-80% and expecting the utilisation to go even higher.
Currently, we are using M4 capacity tier under Standard service tier.
https://cloud.google.com/memorystore/docs/redis/pricing
Need some clarity around the following pointers.
How many CPU cores do the M4 capacity tier correspond to?
Is it really alarming to have more than 100% CPU utilisation? Do we expect any noticeable performance issues?
What are the options to tackle the performance issues (if any) caused by higher CPU utilisation (>=100%)? Will switching to M5 capacity tier address the high CPU consumption and the corresponding issues.
Our application is really CPU intensive and we don't see any way to further optimize our application. Looking forward to some helpful references.
Addressing your questions.
1. How many CPU cores do the M4 capacity tier correspond to?
Cloud Memorystore for Redis is a Google-managed service which means that Google can reserve the inner details(resources) of the virtual machine that is running the redis service. Still it is expected that the higher the capacity tier, the more resources(CPU) the virtual machine will have. For your case in particular, adding CPUs will not solve issues around CPU usage because redis service itself is single threaded.
As you can see from the previous link:
To maximize CPU usage you can start multiple instances of Redis.
If you want to use multiple CPUs you can start thinking of some way to shard earlier.
2. Is it really alarming to have more than 100% CPU utilisation?
Yes, it is alarming to have high CPU utilization because it can result in connection errors or high latency.
CPU utilization is important but also whether the Redis instance is efficient enough to sustain your throughput at a given latency. You can check the redis latency with the command redis-cli --latency while CPU % is high.
3. Do we expect any noticeable performance issues?
This is really hard to say or predict because it depends on several factor(client service, commands run within a time frame, workload). Some of the most common causes for high latency and performance issues are:
Client VMs or services are overloaded and not consuming the messages from Redis: When a client opens a TCP connection to redis then the redis server has a buffer of messages to send to that connection. If a client service has its CPU maxed out, giving no time for the kernel to receive messages from redis then they fill up on the redis server.
The commands executed are consuming a lot of CPU: The following commands are known to be potentially very expensive to process:
EVAL/EVALSHA
KEYS
LRANGE
ZRANGE/ZREVRANGE
4.-What are the options to tackle the performance issues (if any) caused by higher CPU utilisation (>=100%)?
This question revolves mainly around the scaling design of your implementation. Since redis is single threaded, a better approach to reduce CPU % would be by sharding your data in multiple redis instances and have a proxy in front of it to distribute the load. Please take a look at the graph under section Twemproxy from this link.
5.-Will switching to M5 capacity tier address the high CPU consumption and the corresponding issues?
Switching to a higher capacity tier should help with the latency temporarily but this is known as vertical scaling which is limited to the tiers that Cloud Memorystore offers.
Redis Enterprise solves all the issues you are facing. Redis Enterprise can be configured in a clustered configuration and utilize all the resources of the machine as well as scale out over multiple machines.
The Redis Enterprise Software is responsible for watching over the CPU utilization and other resource management tasks so you do not need to.
It is offered on GCP and GCP marketplace as well.
https://redis.com/redis-enterprise-cloud/pricing/

Azure VM Inbound Throttling to VMs?

We have 2 Elastic VMs (Linux) (Currently DS2V2) behind an Azure Load Balancer. We are doing HTTP Posts from our local lan into the Load Balancer, but we seem to be getting throttled. We have tried: Changing the size of the VMs, no difference; adding additional premium SSDs, again no difference; running multiple threads on our end, again no differenece.
What we did do though, was to having the Elastic Engine suck in all of the log files from the Linux boxes and the index rate jump pretty high while it was ingesting them. So we are assuming that it's not really the Linux Elastic boxes that are throttling us.
We do have Kibana installed on the boxes, and as a base line, we're just using the "Cluster Indexing Rate" for both our local posts to the box, and the local ingestion of the log files.
We do understand that yes, there is going to be some latency and overhead since we are now involving the internet, but not the rates we are currently getting. (We have a 1G pipe to the internet, it's nowhere near capacity, so we can rule out at least getting out of our company).
The question is, where else can we look to determine where we might be getting throttled?
For the performance "MUCH slower", it is a bit subjective question and hard to identify. I just provide some information that may impact it.
Azure Compute requests may be throttled at a subscription and on a per-region basis. If you have an API throttling error, you could refer to this document to troubleshoot throttling issues, and best practices to avoid being throttled.
Some factors CPU and storage limits that differ on Azure VM sizes may impact the Azure VM to process incoming data. You may change the size to a higher CPU and premium SSD disk. You could also change Azure resources to another region which is close to your location. You could refer to this article.

AWS Network out

Our web application has 5 pages (Signin, Dashboard, Map, Devices, Notification)
We have done the load test for this application, and load test script does the following:
Signin and go to Dashboard page
Click Map
Click Devices
Click Notification
We have a basic free plan in AWS.
While performing load test, till about 100 users, we didn’t get any error. please see the below image. We could see NetworkIn, CPUUtilization seems to be normal. But the NetworkOut showed 846K.
But when reach around 114 users, we started getting error in the map page (highlighted in red). During that time, it seems only NetworkOut is high. Please see the below image.
We want to know what is the optimal score for the NetworkOut, If this number is high, is there any way to reduce this number?
Please let me know if you need more information. Thanks in advance for your help.
You are using a t2.micro instance.
This instance type has limitations on CPU that means it is good for bursty workloads, but sustained loads will consume all the available CPU credits. Thus, it might perform poorly under sustained loads over long periods.
The instance also has limited network bandwidth that might impact the throughput of the server. While all Amazon EC2 instances have limited allocations of bandwidth, the t2.micro and t2.nano have particularly low bandwidth allocations. You can see this when copying data to/from the instance and it might be impacting your workloads during testing.
The t2 family, especially at the low-end, is not a good choice for production workloads. It is great for workloads that are sometimes high, but not consistently high. It is also particularly low-cost, but please realise that there are trade-offs for such a low cost.
See:
Amazon EC2 T2 Instances – Amazon Web Services (AWS)
CPU Credits and Baseline Performance for Burstable Performance Instances - Amazon Elastic Compute Cloud
Unlimited Mode for Burstable Performance Instances - Amazon Elastic Compute Cloud
That said, the network throughput showing on the graphs is a result of your application. While the t2 might be limiting the throughput, it is not responsible for the spike on the graph. For that, you will need to investigate the resources being used by the application(s) themselves.
NetworkOut simply refers to volume of outgoing traffic from the instance. You reduce the requests you are sending from this instance to reduce the NetworkOut .So you may need to see which one of click Map, Click Devices and Click Notification is sending traffic outside of the instances. It may not necessarily related only to the number of users but a combination of number of users and application module.

start second instance AWS when the first reaches 85% of memory or cpu,

I have the following scenario:
I have two Windows servers on AWS that run an application via IIS. For particularities of the application, they work with HTTP load balancing on the IIs.
To reduce costs, I was asked, that the second instance is only started when the first one reaches 90% CPU usage or 85% memory usage.
In my zone (sa-east-1), there are still no Auto Scaling Groups.
Initially, I created a cloudwatch event to start the second instance when it detected high CPU usage at first. The problem is that Cloudwatch, natively still does not monitor memory and so far I'm having trouble customizing this type of monitoring.
Is there any other way for me to be able to start the second instance based on the above conditions?
Since the first instance is always running, it might be something Windows-level, some powershell that detects the high memory usage and start the second? The script to start instances via powershell I already own, I just need help with how to detect the high memory usage event to start the second instance from it.
or some third-party application that does so...
Thanks!
Auto Scaling groups are available in sa-east-1, so use them
Pick one metric upon which to scale (memory OR CPU), do not pick both otherwise it would be confusing how to scale when one metric is high and the other is low.
If you wish to monitor Windows memory in CloudWatch, see: Sending Logs, Events, and Performance Counters to Amazon CloudWatch - Amazon Elastic Compute Cloud
Also, be careful using a metric such as "memory usage" to measure the need to launch more instances. Some systems use garbage collection to free-up memory, but only when available memory is low (rather than continuously).
Plus, make sure your application is capable of running across multiple instances, such as putting it behind a load balancer (depending on what the application actually does).

Lag spikes on google container engine running a flask restful api

I'm running flask restplus api on google container engine with TCP Load Balancer. The flask restplus api makes calls to google cloud datastore or cloud sql but this does not seem to be the problem.
A few times a day or even more, there is a moment of latency spikes. Restarting the pod solves this or it solves itself in a 5 to 10 minute period. Of course this is too much and needs to be resolved.
Anyone knows what could be the problem or has experience with these kind of issues?
Thx
One thing you could try is monitoring your instance CPU load.
Although the latency doesn't correspond with usage spikes, it may be the case that there is a cumulative effect on CPU load and the latency you're experiencing occurs when the CPU reaches a given % and needs to back off temporarily. If this is the case you could make use of cluster autoscaling, or try running a higher spec machine to see if that makes any difference. Or, if you have limited CPU use on pods/containers, try increasing this limit.
If you're confident CPU isn’t the cause of the issue, you could try to SSH into the affected instance when the issue is occurring, send a request through the load balancer and use tcpdump to analyse the traffic coming in and out. You may be able to spot if the latency stems from the load balancer (by monitoring the latency of HTTP traffic to the instance), or to Cloud Datastore or Cloud SQL (from the instance).
Alternatively, try using strace to monitor the relevant processes both before and during the latency, or dtrace to monitor the system as a whole.