Redis is taking too long to respond - concurrency

Experiencing very high response latency with Redis, to the point of not being able to output information when using the info command through redis-cli.
This server handles requests from around 200 concurrent processes but it does not store too much information (at least to our knowledge). When the server is responsive, the info command reports used memory around 20 - 30 MB.
When running top on the server, during periods of high response latency, CPU usage hovers around 95 - 100%.
What are some possible causes for this kind of behavior?

It is difficult to propose an explanation only based on the provided data, but here is my guess. I suppose that you have already checked the obvious latency sources (the ones linked to persistence), that no Redis command is hogging the CPU in the slow log, and that the size of the job data pickled by Python-rq is not huge.
According to the documentation, Python-rq inserts the jobs into Redis as hash objects, and let Redis expires the related keys (500 seconds seems to be the default value) to get rid of the jobs. If you have some serious throughput, at a point, you will have many items in Redis waiting to be expired. Their number will be high compared to the pending jobs.
You can check this point by looking at the number of items to be expired in the result of the INFO command.
Redis expiration is based on a lazy mechanism (applied when a key is accessed), and a active mechanism based on key sampling, which is run in the event loop (in pseudo background mode, every 100 ms). The point is when the active expiration mechanism is running, no Redis command can be processed.
To avoid impacting the performance of the client applications too much, only a limited number of keys are processed each time the active mechanism is triggered (by default, 10 keys). However, if more than 25% keys are found to be expired, it tries to expire more keys and loops. This is the way this probabilistic algorithm automatically adapt its activity to the number of keys Redis has to expire.
When many keys are to be expired, this adaptive algorithm can impact the performance of Redis significantly though. You can find more information here.
My suggestion would be to try to prevent Python-rq to delegate item cleaning to Redis by setting expiration. This is a poor design for a queuing system anyway.

I think reduce ttl should not be the right way to avoid CPU usage when Redis expire keys.
Didier says, with a good point, that the current architecture of Python-rq that it delegates the cleaning jobs to Redis by using the key-expire feature. And surely, like Didier said it is not the best way. ( this is used only when result_ttl is greater than 0 )
Then the problem should rise when you have a set of keys/jobs with a expiration dates near one of the other, and it could be done when you have a bursts of job creation.
But Python-rq sets expire key when the job has been finished in one worker,
Then it doesn't have too sense, because the keys should spread around the time with enough time between them to avoid this situation

Related

Using Concurrent thread in jmeter and AWS DLT with .jmx file - How do I provide inputs so that we can achieve 5000 RPS for 5 minutes duration

We have configured AWS for distributed load testing using - https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/
Our requirement includes achieving 5k RPS.
Please help me in understanding the inputs that needs to be provided here
When we consider the system supports 5k RPS then - What should be the Task Count, Concurrency, Ramp Up and Hold For values in order to achieve 5k RPS using AWS DLT.
We are also trying to achieve it using jmeter concurrent threads. Hoping someone could help with values and explaining the usage for the same.
We don't know.
Have you tried read the documentation the link to which you provided yourself? I.e. for the Concurrency there is a chapter called Determine the number of users which suggests start with 200 and increase/decrease depending on the resources consumption.
The same applies to the Task Count, you may either go with a single container with default resources, increase container resources or increase the number of containers.
The number of hits per second will mostly depend on your application response time, for example given 200 recommended users if response time will be 1 second - you will have 200 RPS, if response time will be 2 seconds - you will get 100 RPS, if response time will be 0.5 seconds - you will get 400 RPS, etc. See What is the Relationship Between Users and Hits Per Second? article for more comprehensive explanation if needed. The throughput can also be controlled on JMeter side using Concurrency Thread Group and Throughput Shaping Timer but again, the container(s) must have sufficient resources in order to produce the desired load.
With regards to ramp-up - again, we don't know. Personally I tend to increase the load gradually so I could correlate increasing load with other metrics. JMeter documentation recommends starting with ramp-up period in seconds equal to number of users.
The same for the time to hold the load, i.e. after ramping up to the number of users required to conduct the load of 5K RPS I would recommend holding the load for the duration of ramp-up period to see how the system behaves, whether it stabilizes when the load stops increasing, are the response times static or they go up, etc.

How can I cancel a download that's too big or slow in a lambda?

I'm using axios in a lambda function to download a file from a user provided url. Obviously that file could be any size, and might be served at any speed. I am concerned that might create Denial of Service and Denial of Wallet risks.
I don't know if aws have any charges for lambda ingress, I haven't been able to find a definitive answer yet. Even if they don't though, large uploads could still force my lambdas to run for longer (costing me money) and potentially pushing me up against the rate limits I have set, in part, to mitigate flooding attack risk (denying people service).
Likewise, very slow downloads might cause my lambdas to run til they time out. My timeouts are set fairly high because there is processing to do once the file is downloaded. I'd rather bale after a small handful of seconds as the input data should always be small and fast.
So what I want is for downloads to abort if they hit a preset maximum size in bytes OR a maximum download time.
If adding these limits isn't possible with Axios then I'm open to using different libraries like node-fetch.
At the axios side itself, you can set a timeout and maxContentLength to limit the request time and download time. Lambda max timeout us 15 minutes.
If you possibly have many lengthy request, it is better to use EC2. Huge numbers of Lambda requests at high memory and high duration ends up more costly than EC2. Basically Serverless is indeed cost-effective and easy operationally especially for spiky type of workload. For steady 24/7 workload, long processing-times, better use VM.

How to handle long requests in Google Cloud Run?

I have hosted my node app in Cloud Run and all of my requests served within 300 - 600ms time. But one endpoint that gets data from a 3rd party service so that request takes 1.2s - 2.5s to complete the request.
My doubts regarding this are
Is 1.2s - 2.5s requests suitable for cloud run? Or is there any rule that the requests should be completed within xx ms?
Also see the screenshot, I got a message along with the request in logs "The request caused a new container instance to be started and may thus take longer and use more CPU than a typical request"
What caused a new container instance to be started?
Is there any alternative or work around to handle long requests?
Any advice / suggestions would be greatly appreciated.
Thanks in advance.
I don't think that will be an issue unless you're worried about the cost of the CPU/memory time, which honestly should only matter if you're getting 10k+ requests/day. So, probably doesn't matter and cloud run can handle that just fine (my own app does requests longer than that with no problem)
It's possible that your service was "scaled to zero" meaning that there were no containers left running to serve requests. In that case, it would be necessary to start up a new instance and wait for whatever initializing/startup costs are associated with that process. It's also possible that it was auto-scaled due to all other instances being at their request limits. Make sure that your setting for max concurrent requests per instance is set greater than one - Node/Express can handle multiple requests at once. Plus, you'll only get charged for the total time spend, not per request:
In situations where you get very long (30 seconds, minutes+) operations, it may be a good idea to switch to some different data transfer method. You could use polling, where the client makes a request every 5 seconds and checks if the response is ready. You could also switch to some kind of push-based system like WebSockets, but Cloud Run doesn't have support for that.
TL;DR longer requests (~10-30 seconds) should be fine unless you're worried about the cost of the increased compute time they may occur at scale.

Throttled Queue Service

I have a function doWork(id) that I'm offloading to some worker servers using AWS SQS. This function can get called very frequently but I'd like to throttle the function so that for a given id, the work is don't no more than once per second.
Is it possible with AWS / are there any services that feature this functionality?
EDIT: Some clarification.
doWork(id) does some expensive work on a record in a database. This work needs to continuously update whenever the user interacts with the record. Thus, I call doWork(id) whenever the user called a method that edits the record. However, the user may edit the record many times very quickly (I'm building a text editor so every character is an edit). Rather than doWork(id) a unnecessary amount of times, I'd like to throttle that work so it happens at most once per second.
Because this work is expensive, I enqueue a message in SQS and have a set of "worker" servers that dequeue tasks and run them.
My goal here is to somehow maintain the stateless horizontal scalability of my servers while throttling doWork(id). To make matters a little more complicated, I don't want to throttle the doWork function itself -- I want to throttle the work for each individual record identified by the id passed to doWork.
You could use a Redis instance on ElastiCache and configure your workers to use a distributed rate limiter for keys based on id. There are also many packages for different languages based on this kind of idea that might be ready to run on your workers.
That's interesting. You want to delay the work in case they hit another key within a given time period. If they don't hit another key in that time period, you then want to do the work. You might also want to do it after x seconds even if they continue typing (Auto Save).
The problem is that each keypress sends a message to the queue. When a worker receives the message, they have no idea whether another key has been pressed since the message was sent, and there's no way to look in the queue for other matching messages.
Amazon SQS does have the ability to delay a message, which means it will not be available for receiving for a given period, but this alone can't solve the problem because the worker doesn't know what else has happened.
Bottom line: A traditional queue is not a suitable mechanism for this use-case. You need something akin to a database/cache that can update a "last modified" timestamp each time that a key is pressed. Once that timestamp is more than x seconds old, you should queue the worker.

high cpu in redis 2.8 (elasticache) cache.r3.large

looking for some help in ElasticCache
We're using ElasticCache Redis to run a Resque based Qing system.
this means it's a mix of sorted sets and Lists.
at normal operation, everything is OK and we're seeing good response times & throughput.
CPU level is around 7-10%, Get+Set commands are around 120-140K operations. (All metrics are cloudwatch based. )
but - when the system experiences a (mild) burst of data, enqueing several K messages, we see the server become near non-responsive.
the CPU is steady # 100% utilization (metric says 50, but it's using a single core)
number of operation drops to ~10K
response times are slow to a matter of SECONDS per request
We would expect, that even IF the CPU got loaded to such an extent, the throughput level would have stayed the same, this is what we experience when running Redis locally. redis can utilize CPU, but throughput stays high. as it is natively single-cored, not context switching appears.
AFAWK - we do NOT impose any limits, or persistence, no replication. using the basic config.
the size: cache.r3.large
we are nor using periodic snapshoting
This seems like a characteristic of a rouge lua script.
having a defect in such a script could cause a big CPU load, while degrading the overall throughput.
are you using such ? try to look in the Redis slow log for one