Google cloud task queues not running in parallel - google-cloud-platform

I have a project in google cloud where there are 2 task queues: process-request to receive requests and process them, send-result to send the result of the processed request to another server. They are both running on an instance called remote-processing
My problem is that I see the tasks being enqueued in send-result but they are only executed after the process-request queue is empty and has processed all requests.
This is the instance config:
instance_class: B4
basic_scaling:
max_instances: 8
Here is the queue config:
- name: send-result
max_concurrent_requests: 20
rate: 1/s
retry_parameters:
task_retry_limit: 10
min_backoff_seconds: 5
max_backoff_seconds: 20
target: remote-processing
- name: process-request
bucket_size: 50
max_concurrent_requests: 10
rate: 10/s
target: remote-processing
Clarification : I don't need for the queues to run in an specific order, but I find it very strange that it looks like the insurance only runs one queue at a time, so it will only run the tasks in another queue after its done with the current queue.

over what period of time is this all happening?
how long does a process-request task take to run vs a send-result task
One thing that sticks out is that your rate for process-request is much higher than your rate for send-result. So maybe a couple send-result tasks ARE squeezing off, but it then hits the rate cap and has to run process-request tasks instead.
Same note for bucket_size. The bucket_size for process-request is huge compared to it's rate:
The bucket size limits how fast the queue is processed when many tasks
are in the queue and the rate is high. The maximum value for bucket
size is 500. This allows you to have a high rate so processing starts
shortly after a task is enqueued, but still limit resource usage when
many tasks are enqueued in a short period of time.
If you don't specify bucket_size for a queue, the default value is 5.
We recommend that you set this to a larger value because the default
size might be too small for many use cases: the recommended size is
the processing rate divided by 5 (rate/5).
https://cloud.google.com/appengine/docs/standard/python/config/queueref
Also, by setting max_instances: 8 does a big backlog of work build-up in these queues?
Let's try a two things:
set bucket_size and rate to be the same for both process-request and send-result. If fixes it, then start fiddling with the values to get the desired balance
bump up max_instances: 8 to see if removing that bottleneck fixes it

Related

Failed cloud tasks are not being retried with task queue retry config

I'm using google cloud tasks with http triggers to invoke cloud functions. I've setup the cloud task queue retry parameters as follows:
max attempts: 2
Max retry duration: 16
Min backoff: 1
Max backoff: 16
Max doublings: 4
I will often have bursts of tasks which will create around 600 tasks within a second or two. There are times when about 15% of these will fail (this is expected and intentional). I'm expecting these failed tasks to retry according to the queue configuration. Thus I would not expect any task retry schedule to be more than 16 seconds beyond its initially scheduled time. However, I'm seeing some failed tasks scheduled several minutes out. Typically, the first few failed tasks will schedule for retry only a few seconds out, but some of the last few failed tasks in this burst will have these retry schedule for many minutes away.
Why are these retry schedules not honoring my retry config?
If it helps, I also have these settings on the queue:
Max dispatches: 40
Max concurrent dispatches: 40

Distributing tasks over HTTP using SpringBoot non-blocking WebClient (performance question?)

I have a large number of tasks - N, needs to be distributed to multiple http worker nodes via load balancer. Though there exists multiple nodes - n, combining all nodes we have a max-concurrency setting - x.
Always
N > x > n
One node can run those tasks in multiple threads. Mean time consumption for each task is about 50 sec to 1 min. Using Web-Client to distribute tasks and Mono response from Workers.
There exists a distributor and designed the process as follows:
1. Remove a task from queue.
2. Send the task via POST request using Web-Client and subscribe immediately with a subscriber instance
3. Holt new subscription when max concurrency reached to x
4. When any one of the above distributed task completes it calls on-accept(T) method of the subscriber.
5. If task queue is not empty, remove and send the next task / (x+1) task.
6. Keep track of total number completed tasks.
7. If all tasks completed & queue empty set Completable Future object as complete
8. Exit
The above process works fine. Tested with N=5000, n=10 & x=25.
Now the confusion is in this design we always have x number of concurrent subscriptions. As soon as one ends we create another until all tasks are completed. What is the impact of this in large scale production environment? If number of concurrent subscription (the value of x > 10,000) increases via the HTTP(s) load balancer is that going to have serious impact on performance and network latency? Our expected production volume will be something like below:
N=200,000,000
n=100
x=10,000
I will be grateful if some one with knowledge of Reactor and Web-Client expertise comment of this approach. Our main concern is having too many concurrent subscriptions.

Priority Weight is not being honoured when using celery executor

I was using Airflow 1.10.10 with Celery executor. I defined two dag, each with three task. Same Pool id was used in both dag/task. Pool slot was configured as 3. First DAG (say High_prioirty)was having priroity_weight as 10 for each task. Second DAG (say Low_priority) was having default priority_weight ( that is 1). I submitted first 5 Low_priority Dag. I waited till 3 tasks of low priority were moved into running state. Then I submitted 4 high priority dag. I was expecting when pool slot becomes available in next scheduling round , high priority task should be moved into QUEUING state. But high priority task remain in Scheduling State. I repeated this for 10-15 times and observe same thing each and every time.
However this works fine when I moved to LocalExecutor.
Please suggest fix/workaround for resolving this priority_weight issue in CeleryExecutor.

AWS Lambda depleting the SQS queue very slowly

I have an SQS queue and a lambda which consume the queue with batch size 10.
Lambda
Reserve concurrency = 600
Timeout = 15 minutes
Memory = 640 MB (but using 150-200 MB per execution)
Processing one item comes from the queue takes about 10 seconds.
SQS
Messages Available (Visible): 5,310
Messages in Flight (Not Visible): 3,355
Default Visibility Timeout: 20 minutes
With these settings, I'm expecting my Lambda function to be invoked 600 times because as you see the queue is full and there are items to received from the queue. So, the function shouldn't be idle and use all of the available concurrency.
I'm aware of the first 1 minute of burst and later my concurrency will increase every minute until hitting to the limit. But my invocation count is always between 40-80. Never hits to 600 and my queue is depleted very slowly. And (according to logs) almost any of the queue items are failing, so they are not going back to queue again.
What is wrong with my settings?
EDIT:
Also another chart:
Increased up for a moment and decreased again..

Concurrency Thread Group Showing more Samples then define

I am using Concurrency Thread group with the following values
Target Concurrency: 200,
Ramp-Up Time: 5 min,
Ramp-Up Step Count: 10,
Hold Target Rate Time : 0 min,
Thread Iteration Limit: 1.
I am Using Throughput Controller as a child to Concurrency Thread Group, Total Executions, Throughput = 1, per User selected.
I am 5 HTTP Request, What I am expected is each HTTP request should have 200 users but, it shows more than 300 users.
Can anyone tell me, that my expectation is wrong or my setup is wrong?
What is the best way to do?
Your expectation is wrong. With regards to your setup - we don't know what you're trying to achieve.
Concurrency Thread Group maintains the defined concurrency so
JMeter will start with 20 users
In 30 seconds another 20 users will be kicked off so you will have 40 users
In 60 seconds another 20 users will arrive so you will have 60 users
etc.
Once started the threads will begin executing Sampler(s) upside down (or according to Logic Controllers) and the actual number of requests will depend on your application response time.
Your "Thread Iteration Limit" setting allows the threads to loop only once so thread will be stopped once it executed all the samplers, however Concurrency Thread Group will kick off another thread to replace the ended one in order to maintain the defined concurrency
If you want to limit the total number of executions to 200 you can go for Throughput Controller
and this way you will have only 200 executions of its children
Be aware that in the above setup your test will still be running for 5 minutes, however the threads will not be executing samplers after 200 total executions.