Forcing asmx web service to handle requests one at a time

Forcing asmx web service to handle requests one at a time - web-services

I am debugging an ASMX web service that receives "bursts" of requests. i.e., it is likely that the web service will receive 100 asynchronous requests within about 1 or 2 seconds. Each request seems to take about a second to process (this is expected and I'm OK with this performance). What is important however, is that each request is dealt with sequentially and no parallel processing takes places. I do not want any concurrent request processing due to the external components called by the web service. Is there any way I can force the web service to only handle each response sequentially?
I have seen the maxconnection attribute in the machine.config but this seems to only work for outbound connections, where as I wish to throttle the incoming connections.
Please note that refactoring into WCF is not an option at this point in time.
We are usinng IIS6 on Win2003.

What I've done in the past is to simply put a lock statement around any access to the external resource I was using. In my case, it was a piece of unmanaged code that claimed to be thread-safe, but which in fact would trash the C runtime library heap if accessed from more than one thread at a time.

Perhaps you should be queuing the requests up internally and processing them one by one?
It may cause the clients to poll for results (if they even need them), but you'd get the sequential pipeline you wanted...

In IIS7 you can set up a limit of connections allowed to a web site. Can you use IIS7?

Related

Architecture Design for API of Cloud Service

Background:
I've a local application that process the user input for 3 second (approximately) and then return an answer (output) to the user.
(I don't want to go into details about my application in purpose of not complicate the question and keep it a pure architectural question)
My Goal:
I want to make my application a service in the cloud and expose API
(for the upcoming website and for clients that will connect the service without install the software locally)
Possible Solutions:
Deploy WCF on the cloud and use my application there, so clients can invoke the service and use my application on the cloud. (RPC style)
Use a Web-API that will insert the request into queue and then a worker role will dequeue requests and post the results to a DB, so the client will send one request for creating a request in the queue, and another request for getting the result (which the Web-API will get from the DB).
The Problems:
If I go with the WCF solution (#1) I cant handle great loads of requests, maybe 10-20 simultaneously.
If I go with the WebAPI-Queue-WorkerRole solution (#2) sometimes the client will need to request the results multiple times its can be a problem.
If I go with the WebAPI-Queue-WorkerRole solution (#2) the process isn't sync, the client will not get the result once the process of his request is done, he need to request the result.
Questions:
In the WebAPI-Queue-WorkerRole solution (#2), can I somehow alert the client once his request has processed and done ? so I can save the client multiple request (for the result).
Asking multiple times for the result isn't old stuff ? I remmemeber that 10 - 15 years ago its was accepted but now ? I know that VirusTotal API use this kind of design.
There is a better solution ? one that will handle great loads and will be sync or async (returning result to the client once it done) ?
Thank you.

If you're using Azure, why not simply fire up more servers and use load balancing to handle more load? In that way, as your load increases, you have more servers to handle the requests.
Microsoft recently made available the Azure Service Fabric, which gives you a lot of control over spinning up and shutting down these services.

throttling http api calls with delay

I'm trying to implement some throttles on our REST API. A typical approach is after a certain threshold to block the request (with 403 or 429 response). However, I've seen one api that adds a delay to the response instead.
As you make calls to the API, we will be looking at your average calls per second (c/s) over the previous five-minute period. Here's what will happen:
over 3c/s and we add a 2 second delay
over 5c/s and we add a 4 second delay
over 7c/s and we add a 5 second delay
From the client's perspective, I see this being better than getting back an error. The worst that can happen is that you'll slow down.
I am wondering how this can be achieved without negatively impacting the app server. i.e. To add those delays, the server needs to keep the request open, causing it to keep more and more request processors busy, meaning it has less capacity for new requests coming in.
What's the best way to accomplish this? (i.e. is this something that can be done on the web server / load balancer so that the application server is not negatively affected? Is there some kind of a throttling layer that can be added for this purpose?)
We're using Django/Tastypie, but the question is more on the architecture/conceptual level.

If your are using synchronous application server which is the most common setup for Django applications (for example a gunicorn with default --worker-class sync), then adding such a delay in the application would indeed have a very bad impact on performance. A worker handling a delayed request would be blocked during a delay period.
But you can use asynchronous application server (for example gunicorn with '--worker-class gevent`) and then an overhead should be negligible. A worker that handles a delayed requests is able to handle other requests while a delay is in progress.
Doing this in the reverse proxy server may be a better option, because it allows to easily and flexibly adjust a policy. There is an external nginx module for exactly such thing.

How do JAXWS async calls work with polling

I need to invoke a long running task via a SOAP web service, using JAXWS on both ends, specifically, Apache CXF 2.6 on both ends.
I see that I can enable async methods in the CXF code generator, which creates two async methods per operation. Because of NAT issues, I cannot use WS-Addressing and callbacks. So I may want to use the other polling method.
I need to be sure that there will be no socket read timeouts using this mechanism, so I want to understand how it works.
Is it the case that a SOAP request is made to the server in a background thread which keeps the same, single, HTTP connection open, and the Future#isDone() checks to see if that thread has received a response?
If so, is there not a risk that a proxy server in between may define its own timeout, and cause an error if the server takes to long to respond?
What do other people do for invoking long running tasks via SOAP?

Yes, it would just keep checking the connection until a response is received. If something occurs between the client and server and the connection is lost, the response would not be retrievable.
For really long running things, the better approach would be to split the long running into two methods. One that would take the input and launch the work on a background thread and just return some sort of unique identifier. A second method would take that identifier and return the result. The client could call that method to kind of poll the server. That could be long running, and block or use the async methods or similar. If THAT requests times out, it could just call it again.

Message Queue vs. Web Services? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Under what conditions would one favor apps talking via a message queue instead of via web services (I just mean XML or JSON or YAML or whatever over HTTP here, not any particular type)?
I have to talk between two apps on a local network. One will be a web app and have to request commands on another app (running on different hardware). The requests are things like creating users, moving files around, and creating directories. Under what conditions would I prefer XML Web Services (or straight TCP or something) to using a Message queue?
The web app is Ruby on Rails, but I think the question is broader than that.

When you use a web service you have a client and a server:
If the server fails the client must take responsibility to handle the error.
When the server is working again the client is responsible of resending it.
If the server gives a response to the call and the client fails the operation is lost.
You don't have contention, that is: if million of clients call a web service on one server in a second, most probably your server will go down.
You can expect an immediate response from the server, but you can handle asynchronous calls too.
When you use a message queue like RabbitMQ, Beanstalkd, ActiveMQ, IBM MQ Series, Tuxedo you expect different and more fault tolerant results:
If the server fails, the queue persist the message (optionally, even if the machine shutdown).
When the server is working again, it receives the pending message.
If the server gives a response to the call and the client fails, if the client didn't acknowledge the response the message is persisted.
You have contention, you can decide how many requests are handled by the server (call it worker instead).
You don't expect an immediate synchronous response, but you can implement/simulate synchronous calls.
Message Queues has a lot more features but this is some rule of thumb to decide if you want to handle error conditions yourself or leave them to the message queue.

There's been a fair amount of recent research in considering how REST HTTP calls could replace the message queue concept.
If you introduce the concept of a process and a task as a resource, the need for middle messaging layer starts to evaporate.
Ex:
POST /task/name
- Returns a 202 accepted status immediately
- Returns a resource url for the created task: /task/name/X
- Returns a resource url for the started process: /process/Y
GET /process/Y
- Returns status of ongoing process
A task can have multiple steps for initialization, and a process can return status when polled or POST to a callback URL when complete.
This is dead simple, and becomes quite powerful when you realize that you can now subscribe to an rss/atom feed of all running processes and tasks without any middle layer. Any queuing system is going to require some sort of web front end anyway, and this concept has it built in without another layer of custom code.
Your resources exist until you delete them, which means you can view historical information long after the process and task complete.
You have built in service discovery, even for a task that has multiple steps, without any extra complicated protocols.
GET /task/name
- returns form with required fields
POST (URL provided form's "action" attribute)
Your service discovery is an HTML form - a universal and human readable format.
The entire flow can be used programmatically or by a human, using universally accepted tools. It's a client driven, and therefore RESTful. Every tool created for the web can drive your business processes. You still have alternate message channels by POSTing asynchronously to a separate array of log servers.
After you consider it for a while, you sit back and start to realize that REST may just eliminate the need for a messaging queue and an ESB altogether.
http://www.infoq.com/presentations/BPM-with-REST

Message queues are ideal for requests which may take a long time to process. Requests are queued and can be processed offline without blocking the client. If the client needs to be notified of completion, you can provide a way for the client to periodically check the status of the request.
Message queues also allow you to scale better across time. It improves your ability to handle bursts of heavy activity, because the actual processing can be distributed across time.
Note that message queues and web services are orthogonal concepts, i.e. they are not mutually exclusive. E.g. you can have a XML based web service which acts as an interface to a message queue. I think the distinction your looking for is Message Queues versus Request/Response, the latter is when the request is processed synchronously.

Message queues are asynchronous and can retry a number of times if delivery fails. Use a message queue if the requester doesn't need to wait for a response.
The phrase "web services" make me think of synchronous calls to a distributed component over HTTP. Use web services if the requester needs a response back.

I think in general, you'd want a web service for a blocking task (this tasks needs to be completed before we execute more code), and a message queue for a non-blocking task (could take quite a while, but we don't need to wait for it).

why is the lift web framework scalable?

I want to know the technical reasons why the lift webframework has high performance and scalability? I know it uses scala, which has an actor library, but according to the install instructions it default configuration is with jetty. So does it use the actor library to scale?
Now is the scalability built right out of the box. Just add additional servers and nodes and it will automatically scale, is that how it works? Can it handle 500000+ concurrent connections with supporting servers.
I am trying to create a web services framework for the enterprise level, that can beat what is out there and is easy to scale, configurable, and maintainable. My definition of scaling is just adding more servers and you should be able to accommodate the extra load.
Thanks

Lift's approach to scalability is within a single machine. Scaling across machines is a larger, tougher topic. The short answer there is: Scala and Lift don't do anything to either help or hinder horizontal scaling.
As far as actors within a single machine, Lift achieves better scalability because a single instance can handle more concurrent requests than most other servers. To explain, I first have to point out the flaws in the classic thread-per-request handling model. Bear with me, this is going to require some explanation.
A typical framework uses a thread to service a page request. When the client connects, the framework assigns a thread out of a pool. That thread then does three things: it reads the request from a socket; it does some computation (potentially involving I/O to the database); and it sends a response out on the socket. At pretty much every step, the thread will end up blocking for some time. When reading the request, it can block while waiting for the network. When doing the computation, it can block on disk or network I/O. It can also block while waiting for the database. Finally, while sending the response, it can block if the client receives data slowly and TCP windows get filled up. Overall, the thread might spend 30 - 90% of it's time blocked. It spends 100% of its time, however, on that one request.
A JVM can only support so many threads before it really slows down. Thread scheduling, contention for shared-memory entities (like connection pools and monitors), and native OS limits all impose restrictions on how many threads a JVM can create.
Well, if the JVM is limited in its maximum number of threads, and the number of threads determines how many concurrent requests a server can handle, then the number of concurrent requests will be determined by the number of threads.
(There are other issues that can impose lower limits---GC thrashing, for example. Threads are a fundamental limiting factor, but not the only one!)
Lift decouples thread from requests. In Lift, a request does not tie up a thread. Rather, a thread does an action (like reading the request), then sends a message to an actor. Actors are an important part of the story, because they are scheduled via "lightweight" threads. A pool of threads gets used to process messages within actors. It's important to avoid blocking operations inside of actors, so these threads get returned to the pool rapidly. (Note that this pool isn't visible to the application, it's part of Scala's support for actors.) A request that's currently blocked on database or disk I/O, for example, doesn't keep a request-handling thread occupied. The request handling thread is available, almost immediately, to receive more connections.
This method for decoupling requests from threads allows a Lift server to have many more concurrent requests than a thread-per-request server. (I'd also like to point out that the Grizzly library supports a similar approach without actors.) More concurrent requests means that a single Lift server can support more users than a regular Java EE server.

at mtnyguard
"Scala and Lift don't do anything to either help or hinder horizontal scaling"
Ain't quite right. Lift is highly statefull framework. For example if a user requests a form, then he can only post the request to the same machine where the form came from, because the form processeing action is saved in the server state.
And this is actualy a thing which hinders scalability in a way, because this behaviour is inconistent to the shared nothing architecture.
No doubt that lift is highly performant but perfomance and scalability are two different things. So if you want to scale horizontaly with lift you have to define sticky sessions on the loadbalancer which will redirect a user during a session to the same machine.

Jetty maybe the point of entry, but the actor ends up servicing the request, I suggest having a look at the twitter-esque example, 'skitter' to see how you would be able to create a very scalable service. IIRC, this is one of the things that made the twitter people take notice.

I really like #dre's reply as he correctly states the statefulness of lift being a potential problem for horizontal scalability.
The problem -
Instead of me describing the whole thing again, check out the discussion (Not the content) on this post. http://javasmith.blogspot.com/2010/02/automagically-cluster-web-sessions-in.html
Solution would be as #dre said sticky session configuration on load balancer on the front and adding more instances. But since request handling in lift is done in thread + actor combination you can expect one instance handle more requests than normal frameworks. This would give an edge over having sticky sessions in other frameworks. i.e. Individual instance's capacity to process more may help you to scale
you have Akka lift integration which would be another advantage in this.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js