We observed a very strange behavior on a redis instance that is run as managed service in AWS. Although phenomenically the instance was operational, we could observe a lot of delays and timeouts.
Diving into the monitoring dashboard, the CPU utilization graph showed a perfectly straight horizontal line at 20% of the CPU, while on average this instance usually operated at somewhere around 40-50%. It looked as if the instance had been bounded to a particular CPU limit that resulted in poor performance.
Any ideas on what might have caused such behavior?
Update
After searching through various resources for a possible solution, we eventually spotted the culprit in one misused Redis command. Specifically, in one of our services that relied on the particular Redis cluster for caching, a frequently executing function was using the keys() command to search in Redis. According to the official documentation:
consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don't use KEYS in your regular application code.
By removing keys(), CPU utilization instantly dropped to < 5% and never surpassed this threshold.
A bit embarassed, I am adding my reply in case someone faces a similar problem.
Tip: If you are experiencing a slow performance in Redis, use the slowlog command to identify slow-executing commands.
Related
As far as I can tell by default, on Google Cloud and presumably elsewhere, each vCPU = 1 hyperthread. (3rd paragraph in the intro) Which, from my perspective, would suggest that unless one changes this setting to 2 or 4 vCPUs, concurrency in the code running on the docker image achieves nothing. Is there some multi-threaded knowledge im missing that means that concurrency on a single hyperthread accomplishes something? scaling up the vCPU number isnt very attractive as the minimum memory setting is already forced to 2GB for 4 vCPUs
This question is framed based on the Google Cloud tech stack, but is meant to umbrella all providers.
Do Serverless solutions ever really benefit from concurrency?
EDIT:
The accepted answer is a great first look, but I realized my above assumptions ignored context switching idle time. For example:
If we wish to write a backend which talks to a database, a lot of our compute time might be spent idling for the database request results. context switching to the next request in this case would allow us to fill CPU load more efficiently.
Therefore, depending on the use case, even on a single threaded vCPU our Serverless app can benefit from concurrency
I wrote this. From my experience, YES, you can handle several thread in parallel and your performance increase with the number of CPU. however, you need to have a process that support multithread.
In case of Cloud Run, each request can be processed in a thread, parallelization is easy.
Lately, I have been struggling to understand what is my network speed (downlink) between nodes on AWS (in a multi-homed cluster, computers in different regions).
I have a lot of fluctuations when I measure it with a script which I have written (based on this link and SCP) or with Iperf.
I believe it is based on network use which changes rapidly (mostly between regions), but I still don't understand AWS documentation about what is the performance I am paying for, a minimum and a maximum downlink rate for example (aws instances).
At first, I have tried the T2 type, and as I saw it had burst CPU performance, I thought that maybe the NIC performance is also bursty so I have moved to M4 type, but I have got the same problems with M4.
Is there any way to know my NIC downlink rate based on the type and flavor?
*I have asked a similar question on the AWS forum, but I haven't got a response (https://forums.aws.amazon.com/thread.jspa?threadID=296389).
There is no way to get a better indication that your measuring. AWS does not publish anything indicating this performance, and unless we are talking the larger instance where network performance is actually specifically given. I.e. m5.12xlarge having 10 gbps. Most likely network performance does have a burst component for smaller instance types.
There are pages with other peoples benchmarks, but you won't find any official answer for any of this.
We are trying to disable swapping RAM to the disk for a Redis instance managed by AWS's Elasticache - but couldn't find the right property to do so.
We also cannot find a way to SSH it and turn off swapping from the kernerl, can you please help ?
While not a direct answer to your question about disabling swapping, we've been struggling with Redis swaping on Elasticache as well. What we ended up doing to address swapping is the following:
Followed Leo's suggestion of setting reserved memory
Run a nightly batch job to SCAN all keys in batches of 10,000. The SCAN command will evict any expired keys. This helps by proactively cleaning up the cache before swapping kicks in.
Run another custom batch job which processes entities we know can be evicted. These are entities which aren't as important as others which are in the cache. We've setup the keys so they contain enough information to easily identify those associated to an entity. Use SCAN with a match to find the keys. Once you find them, call DEL on each. This batch job alone is saving lots of space in our Redis instance. Word of caution, avoid using the KEYS command as it is slow and will block other threads.
We've been using the above for a few weeks now and so far it has been working well. In a few more weeks we'll know how well it works since we have a default TTL of 30 days and the number of cached items is still increasing.
Good luck!
Update
We turned off the job which uses SCAN on all keys. We discovered it was causing swap to slowly creep up (roughly 500k every other day). Once we turned that off, swap started shrinking. The combination of setting reserved memory and flushing objects we know can be expired is working well. When redis starts running out of room, it evicts any expired cached objects to make room for new entries. The only impact we've noticed is a very small increase in CPU usage, which isn't causing any trouble.
I had a similar problem, where Elasticache(Redis) in AWS suddenly started using Swap space even while we use the Allkeys-LRU algorithm. The machine was not using swap while consuming the whole memory for the past few weeks until that changed one early morning.
I used the command
redis-cli -h elasticache.service-name memory DOCTOR
The output was -->
High allocator fragmentation: This instance has an allocator external fragmentation greater than 1.1. This problem is usually due
either to a large peak memory (check if there is a peak memory entry
above in the report) or may result from a workload that causes the
allocator to fragment memory a lot. You can try enabling
'activedefrag' config option.
checking with command
redis-cli -h elasticache.service-name memory STATS
I saw that the defragment value was high(1.4)
I looked onto the AWS console for Elasticache-Redis params and made the defragment setting to true as it was set as false.
It is not possible to connect to Elasticache via SSH.
Are you sure that you are having issues with Redis swapping to disk, or the host running out of memory and crashing (I've seen this happen with the default configuration)? If so, the guidance is to leave about 25% of the system memory available for host processes - http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/redis-memory-management.html
I am using Django for my project and I ll be hosting it on Linode or any other hosting service. Plus if I want to use memcache will I require a new Linode for it? Means just one server will be ok or I ll have to host my site on 2 servers, one for memcache and one for django? And is it the same for Redis? Also will I require a separate server for Mysql?
I don't think you understand that nobody is a fortune telling wizard. Nobody knows how many requests you will receive per second, nor how cpu/memory intensive each request will be. Nobody knows how optimized your code is. Nobody knows if your application is read heavy or write heavy. Your use case is your own, and your probably the only one who estimate it.
My only actual advice to you is to try to estimate your server data and sever load and benchmark your setup on one machine. If you are unsatisfied with the performance then scale up. You can either scale up vertically, by increasing the size of your linode, or scale horizontally by adding more linode instances. In the latter case, you will most likely put your DB on a machine of it's own and have multiple django instances fed by a load balancer. These Django instances could each share the same memcache on a machine, or they can each have their own memcaches on their own machine. Which one is better? I can't tell you. It again depends on your use case.
If I were you, I would set it all up on one linode instance. I would create test data that I assume would be close to real world. Then I would try to test my response times with an estimated number of requests per second. I would measure response times, cache hits, and memory usage. I would then decide based on that if my use case is satisfied with this level of performance or not because I'm really the only one who would know what is satisfactory performance. Additionally, adding more linode resources is not necessarily where I would first try and improve performance.
Some great tips on optimizing and benchmarking can be found here:
https://docs.djangoproject.com/en/1.8/topics/performance/
http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views
http://scottbarnham.com/blog/2008/04/28/django-performance-testing-a-real-world-example/
Late night reading about scaling up Django can be found in many books, I like this one:
https://highperformancedjango.com/
Sorry if I sound a bit blunt, I just want you to understand that nobody can walk in here and give you an answer with a large degree of confidence. This question doesn't have a straight-forward answer.
TL;DR Start with one instance and scale up only if you've convinced yourself you need to.
You say Memcached or Redis, so I assume Redis would be deployed without persistence, with a purely in-memory configuration.
In such case both Memcached and Redis are unlikely to get saturated even if you run them in one server, since the limiting factor is more likely to be a single Django instance if your requests/second go high.
However you should make sure to have enough memory and to configure an appropriate max memory usage for Memcached / Redis (different ways to accomplish this in the two different services). Note that under memory pressure, the Linux OOM killer may kill your cache otherwise, so if you go for a single instance, which seems to me a sensible first step, make sure your Django memory usage plus the memory you allocate for caching, are not enough to go near the limits of the instance free memory.
CPU is hardly going to be an issue as I said since Memcached / Redis are pretty good at using little CPU, so I can't foresee a setup where Django is ok serving pages but the instance is in trouble since the CPU is burned by the cache.
I'm using cloud VPS instances to host very small private game servers. On Amazon EC2, I get good performance on their micro instance (1 vCPU [single hyperthread on a 2.5GHz Intel Xeon], 1GB memory).
I want to use Google Compute Engine though, because I'm more comfortable with their UX and billing. I'm testing out their small instance (1 vCPU [single hyperthread on a 2.6GHz Intel Xeon], 1.7GB memory).
The issue is that even when I configure near-identical instances with the same game using the same settings, the AWS EC2 instances perform much better than the GCE ones. To give you an idea, while the game isn't Minecraft I'll use that as an example. On the AWS EC2 instances, succeeding world chunks would load perfectly fine as players approach the edge of a chunk. On the GCE instances, even on more powerful machine types, chunks fail to load after players travel a certain distance; and they must disconnect from and re-login to the server to continue playing.
I can provide more information if necessary, but I'm not sure what is relevant. Any advice would be appreciated.
Diagnostic protocols to evaluate this scenario may be more complex than you want to deal with. My first thought is that this shared core machine type might have some limitations in consistency. Here are a couple of strategies:
1) Try backing into the smaller instance. Since you only pay for 10 minutes, you could see if the performance is better on higher level machines. If you have consistent performance problems no matter what the size of the box, then I'm guessing it's something to do with the nature of your application and the nature of their virtualization technology.
2) Try measuring the consistency of the performance. I get that it is unacceptable, but is it unacceptable based on how long it's been running? The nature of the workload? Time of day? If the performance is sometimes good, but sometimes bad, then it's probably once again related to the type of your work load and their virtualization strategy.
Something Amazon is famous for is consistency. They work very had to manage the consistency of the performance. it shouldn't spike up or down.
My best guess here without all the details is you are using a very small disk. GCE throttles disk performance based on the size. You have two options ... attach a larger disk or use PD-SSD.
See here for details on GCE Disk Performance - https://cloud.google.com/compute/docs/disks
Please post back if this helps.
Anthony F. Voellm (aka Tony the #p3rfguy)
Google Cloud Performance Team