When is it necessary to queue the requests from the client - concurrency

I have heard that there is a limit for a server for the requests number it can process.
So if the requests from the client are large than the number people will queue the requests.
So I have two problems:
1 When
How to decide if it is necessary to queue the requests? How to measure the largest number?
2 How
If the queue is unavoidable, so where should be the queue done?
For a J2EE application using spring web mvc as the framework, I want to know if the queue should be put in the Controller or the Model or the DAO?
3 Is there a idea which can avoid the queue but keeping providing the service?

First you have to establish your limit at the server actually is. Its likely that its a limit on the frequency of messages, ie. maybe your limited to sending 10 requests a second. If thats the case then your would need to keep a count of how many messages you've sent out in a second, then before you send out a request check to see if you will breach this limit, if this is true then you must make the thread wait until the second is up. If not your free to send the request. This thread would be reading from a queue of outbound messages.
If the server limit is determined in an other way, i.e. dynamically based on its current load, which sounds like it might be in your case, there must be a continuous feed of request limits which you must process to determine the current limit. Once you have this limit you can process the requests in the same way as mentioned in the first paragraph.
As for where to put the queue and the associated logic, i'd put it in the controller.
I don't think there is a way to avoid the queue, you are forced to throttle your requests an therefore you must queue your outbound requests internally so that they are not lost, and will be processed at some point in the future.

Related

Throttled Queue Service

I have a function doWork(id) that I'm offloading to some worker servers using AWS SQS. This function can get called very frequently but I'd like to throttle the function so that for a given id, the work is don't no more than once per second.
Is it possible with AWS / are there any services that feature this functionality?
EDIT: Some clarification.
doWork(id) does some expensive work on a record in a database. This work needs to continuously update whenever the user interacts with the record. Thus, I call doWork(id) whenever the user called a method that edits the record. However, the user may edit the record many times very quickly (I'm building a text editor so every character is an edit). Rather than doWork(id) a unnecessary amount of times, I'd like to throttle that work so it happens at most once per second.
Because this work is expensive, I enqueue a message in SQS and have a set of "worker" servers that dequeue tasks and run them.
My goal here is to somehow maintain the stateless horizontal scalability of my servers while throttling doWork(id). To make matters a little more complicated, I don't want to throttle the doWork function itself -- I want to throttle the work for each individual record identified by the id passed to doWork.
You could use a Redis instance on ElastiCache and configure your workers to use a distributed rate limiter for keys based on id. There are also many packages for different languages based on this kind of idea that might be ready to run on your workers.
That's interesting. You want to delay the work in case they hit another key within a given time period. If they don't hit another key in that time period, you then want to do the work. You might also want to do it after x seconds even if they continue typing (Auto Save).
The problem is that each keypress sends a message to the queue. When a worker receives the message, they have no idea whether another key has been pressed since the message was sent, and there's no way to look in the queue for other matching messages.
Amazon SQS does have the ability to delay a message, which means it will not be available for receiving for a given period, but this alone can't solve the problem because the worker doesn't know what else has happened.
Bottom line: A traditional queue is not a suitable mechanism for this use-case. You need something akin to a database/cache that can update a "last modified" timestamp each time that a key is pressed. Once that timestamp is more than x seconds old, you should queue the worker.

Create workers dynamically (ActiveMQ)

I want to create a web application were a client calls a REST Webservice. This returns OK-Status for the client (with a link to the result) and creates a new message on an activeMQ Queue. On the listeners side of the activeMQ there should be worker who process the messages.
Iam stucking here with my concept, because i dont really know how to determine the number of workers i need. The workers only have to call web service interfaces, so no high computation power is needed for the worker itself. The most time the worker has to wait for returning results from the called webservice. But one worker can not handle all requests, so if a limit of requests in the queue is exceeded (i dont know the limit yet), another worker should treat the queue.
What is the best practise for doing this job? Should i create one worker per Request and destroying them if the work is done? How to dynamically create workers based on the queue size? Is it better to run these workers all the time or creating them when the queue requiere that?
I think a Topic/Suscriber architecture is not reasonable, because only one worker should care about one request. Lets imagine of 100 Requests per Minute average and 500 requests on high workload.
My intention is to get results fast, so no client have to wait for it answer just because not properly used ressources ...
Thank you
Why don't you figure out the max number of workers you'd realistically be able to support, and then make that number and leave them running forever? I'd use a prefetch of either 0 or 1, to avoid piling up a bunch of messages in one worker's prefetch buffer while the others sit idle. (Prefetch=0 will pull the next message when the current one is finished, whereas prefetch=1 will have a single message sitting "on deck" available to be processed without needing to get it from the network but it means that a consumer might be available to consume a message but can't because it's sitting in another consumer's prefetch buffer waiting for that consumer to be read for it). I'd use prefetch=0 as long as the time to download your messages from the broker isn't unreasonable, since it will spread the workload as evenly as possible.
Then whenever there are messages to be processed, either a worker available to process the next message (so no delay) or all the workers are processing messages (so of course you're going to have to wait because you're at capacity, but as soon as there's a worker available it will take the next message from the queue).
Also, you're right that you want queues (where a message will be consumed by only a single worker) not topics (where a message will be consumed by each worker).

sampled machines when using queues

I am new to Amazon Web Services and am currently trying to get my head around how Simple Queue Service (SQS) works.
In the link ReceiveMessage the following is mentioned:
Short poll is the default behavior where a weighted random set of
machines is sampled on a ReceiveMessage call. This means only the
messages on the sampled machines are returned. If the number of
messages in the queue is small (less than 1000), it is likely you will
get fewer messages than you requested per ReceiveMessage call. If the
number of messages in the queue is extremely small, you might not
receive any messages in a particular ReceiveMessage response; in which
case you should repeat the request.
What I understand there is one queue and many machines/instances can read the messages. What is not clear to me is what does "weighted random set of machines" means? Is there more than one queue on a number of machines? Clearly I am lacking some knowledge on on SQS works.
I believe what this means is that because SQS is geographically distributed, not all of the machines (amazon's servers that have your queue) will have the exact same queue content at all times because they won't always be in sync with each other at every instant.
You don't know or control from which of amazons servers it will serve messages from, it uses an algorithm to figure out which messages are sent to you when you request some. That is why you don't always get messages when you ask for them, and occasionally the same message will get served up more than once; you need to make sure whatever your processing entails it can deal with the possibility that it is processing something that has already been processed by another of your worker machines.

Maximum time between an asynchronous call and response (web-services)

Are there any best practices that dictate the maximum time between an asynchronous call and its corresponding response.
Basically I have a process that takes a long time to run (eg: 5 minutes). Option 1: I could expose the process as an asynchronous call. In which case the user calls my service and then at some later time, I respond with a process status.
Option 2
The other way I could implement it is to setup the system such that there is a one-way operation on my web-service that begins the process and immediately returns an id for the process. I could then mandate that the consumer provide a one-way operation, that I can call and report back when the process is done.
The first option is easier as I dont have to mandate anything from the caller. The second seems better as I can report back at anytime (5 minutes to years later).
As I have complete control over the caller and its an internally available service, I am leaning towards option 2.
So I am wondering if there are any time limits imposed on async calls (can they span days? if not what is the best practice). Is option 2 a standard pattern employed?
References would be extremely useful.
Option #2 is better as it's more event driven.
However, there exists an Option #3. Client issues request to server. Server queues request and responds with the id. Client checks back every so often, passing the request id, to see if it's completed.
This way you don't have to depend on the client being available when the request is completed.
I'd probably mix options #2 and #3 and let the client choose if they want an event fired on their side or if they just want to check back later.
UPDATE
Rajah has asked about the maximum time between async request and response. For a WEB application, this is typically measured in seconds. Most servers have timeout values that are typically defaulted in the 30 second range. Personally, I think this is too long.
Consider that an Async call requires the communications channel between the client and server to be open for the duration. How many of those channels can a single server handle? More to the point, how many channels will you have to maintain as requests are made? This can become quite outrageous even if you do control both ends.
Whatever is hosting your services is going to determine the maximum amount of time to keep a request open. Again, every server I've seen measures this in seconds.

Web Services design

Company A has async pooling based webservice for notifications. Company B checks for notifications. Every time when it reads new notifications A deletes them from the system. Thus subsequent read requests return only new notifications. There is also requirement for the client B to interrupt the connection if there is no response within 30 sec.
This causes one potential problem: Due to unexpected slowness it is possible for A get the request deleted a notification and send the response back while B is already interrupted the connection. Under this scenario notification gets lost. Now one can argue that the core problem lies within operation realm (the HTTP response must be delivered withing 20 sec ) still on practice it is not always feasible.
How to design B (the client) to avoid this problem?
One way I can see is to do not delete the notifications by A and make B be aware of its state, so that it knows starting from what ID it needs to process notifications, but that presumes that ID will be sequential. Which is controlled by A. Even if B defines its own sequence A still has to be altered to return it back.
Are there any other approaches?
Thanks!
Web services in general are unreliable enough that it's rarely a good idea to make a "read" request serve double-duty as a "delete" request, especially without the client's knowledge. There is just too much risk of a connection dropping or timing out. There is no way to get around this only by modifying the client, because it's the server that is at fault here - the way it's designed is fundamentally unsuited for a web service.
I think you're on the right track with the incrementing IDs idea. The client knows (or can be modified to know) which notifications it's received, so if it can supply the ID of the last message it's received when it polls for notifications, the server should be able to respond based on that ID.
It really seems like Company A's webservice should be synchronous instead of asynchronous. If that is not possible, it may be a good idea to send a "ACK"-like response to a new Company A webservice that indicates a specific notification was received (by Company B) and can be deleted.