In synchronous model, when a client connects to the server, both the client and server have to sync with each other to finish some operations.
Meanwhile, the asynchronous model allows client and server to work separated and independently. The client sends a request to establish a connection and do something. While the server is processing the request, the client can do something else. Upon completion of an operation, the completion event is placed onto a queue in an Event Demultiplexer, waiting for a Proactor (such as HTTP Handler) to send the request back and invoke a Completion Handler (on the client). The terms are used as in boost::asio document The Proactor Design Pattern: Concurrency Without Threads.
By working this way, the asynchronous model can accepts simultaneous connections without having to create a thread per connection, thus improve overall performance. In order to achieve the same effect as asynchronous model, the first model (synchronous) must be multi-threaded. For more detail, refer to: Proactor Pattern (I actually learn proactor pattern which is used to asynchronous model from that document. In here it has description on a typical synchronous I/O web server).
Is my understanding on the subject correct? If so, which means the asynchronous server can accepts request and return results asynchronously (the first connection request the service on web server does not need to be the first to reply to)? In essence, asynchronous model does not use threading (or threading is used in individual components, such as in the Proactor, Asynchronous Event Multiplexer (boost::asio document) component, not by creating an entire client-server application stack, which is describe in the multi-threaded model in Proactor Pattern document, section 2.2 - Common Traps and Pitfalls of Conventional Concurrency Models).

The Proactor model assumes splitting the network session process in a subtasks like: resolving hostname, accepting or connecting, reading or writing some part of information, closing connection - and allows you to switch between subtasks from different sessions. Whereas, the Reactor model sees the network session process as a (almost) single task.
The absolute Proactor advantages:
The performance is boosted because of the task "outsourcing". For example, you can send resolution request to the DNS and wait 5 minutes for answer doing nothing (Reactor) - or you can do other stuff while waiting (Proactor).
The absolute Proactor disadvantages:
The performance is decreased because of the task switching, which means that for the single session you execute more code (Proactor) than it should be (Reactor).
But the overall performance usually is measured in a number of "satisfied" clients per time period. So, the advantages of Proactor vs. Reactor depend on the situation. Here goes some examples.
HTTP server. The client wants to see something in his browser window. He doesn't need to wait before the whole page is loaded to see the first pieces of text. The Proactor is effective, since the partial page loading is faster than the whole page loading. Still the whole page is loaded about the same time as in the Reactor model.
Low-latency game server. The client wants to get the complete result of his command as quick as possible. The Reactor is effective, since there are no subtasks like partial reading or writing - the client won't see anything until he reads the full response. So, the Reactor won't do additional switches between subtasks and at each moment it's guaranteed that some client gets progress on his command, while the Proactor will force all of the clients wait each other unpredictable time.
The multi-threading can give you a linear acceleration in both cases.


Network Server - proactive vs reactive pattern

Context (C++): I need to develop a network server, which can handle more than 1000 clients per second, with more than 100 requests per second.
Each request starts a state machine between the client and server, wherein the client and server exchange further data, before the server sends a final response.
Problem : Some of the processing is done by a third party library that requests callbacks from us and calls these callbacks when it requires some data from the client. So, we dont controll this thread and must wait for the data from client before we can process further.
Question: With such a high amount of messages, we decided we would use libevent or some of its derivatives e.g. https://github.com/facebook/wangle or https://github.com/Qihoo360/evpp.
The problem is that libevent is based on reactor pattern and we do not have a way to leave processing in a thread as soon as it enters the state machine.
So, my question is if the proactor pattern would be better here, and is there any library that can give us this behavior?
OK, so after much deliberation, we decided that we should go ahead and make a "proxy" in front of our application. this proxy can then distribute the load to multiple running instances of our application using this 3rd party. Then we can use reactor pattern.
Any other suggestions are welcome..

I don't understand how to async operations can make a HTTP server concurrent

I am developing a HTTP server using boost asio. So far, I have been using async operations (aync_read, async_write etc.), but I want to make my server concurrent, that is, the same as a server that creates a new thread per each new client connected.
I have read some forums etc. and, apparently, a concurrent server can be made only by using the mentioned async operations. I do not understand how is this possible.
I mean, taking into account that the async operations' handlers are executed in the thread that called to io_service.run(), lets take that a client is being responsed at this moment. How can another client make a petition and been answered while the main thread is busy with the first client?
The meaning of the word "concurrent" is ambiguous.
You are right, an asynchronous server is not concurrent at all. It can process only one request at a time. But the key insight is that what most servers do is actually they take a request, do some light processing (parsing, serialization, validation, some light business logic, etc.) and then call external resources (e.g. some database). The server can then process other requests while waiting for the external resource. So it's only an illusion of being concurrent (processing happens one after another but really fast). And it works as long as the processing is relatively fast compared to io.
If your server is supposed to do some hard cpu computations then obviously there will be no concurrency at all. In that case the only way to make it concurrent is to add threads or processes (possibly on multiple machines).
Asynchronous IO does not make the server concurrent.
In fact, Asynchronous IO does not mean "multi-threaded" or "multi-processed" at all. Node.js servers are mono-threaded and using asynchronous IO.
Asynchronous IO just means your thread does not wait for the IO to finish, but does other stuff meanwhile (like accepting and processing new incoming requests).
So no, the premise that Asynchronous IO makes the server concurrent is wrong. it does not make it concurrent, it makes it scalable, as thread-per-request is not so scalable, but a proper thread-pool + event queue/coroutines are. the threads only deal with CPU bound tasks and the event queue/coroutines manages enqueuing and dequeuing started/finished IO operations.
Not sure if you're only looking for a theoretical answer or a design example, but have you seen the HTTP Server 3 example for boost.asio?
Concurrency is achieved by having a small thread pool to execute the work. When callbacks need to be handled, all threads calling io_service.run() can be chosen to execute the task.

Which method does JAX-RS use for asynchronous communication?

I need to work with a REST service (build with an JAX-RS implementation) in an heterogeneous environment, so I wondered how the abstractions of programming languages are converted to the real restful endpoints. I think most aspects are clear, but when it comes to asynchronous communications in REST I know several possibilities: keeping the connection open, returning a resource that can constantly be queried, chunked messages or the client transmits a callback resource.
My approach was to read the JAX-RS 2.0 Specification, but I think there is actually little stated about the REST implementation of asynchronous requests. Then I read the Jersey documentation and came to the conclusion that the JAX-RS implementations just keep the connection open for as long as the processing needs. So with "asynchronous" JAX-RS just refers to the blocking of methods on the server/client side and does not use any special behavior of REST. My first question: Is my analysis correct?
If this is the case, I have two new questions:
Is this really compliant to the REST paradigm in respect to the stateless constraint?
Considering the long-running processes that maybe work for days, is an open connection eventually automatically closed (e.g. by the OS or by a TCP timer)?
Thanks in advance!
REST architecture has got nothing to do with asynchronous programming paradigms IMO. Asynchronous implementation using #Suspended and AsynResponse interface in JAX-RS involves suspending the thread which initiated the request
To answer your questions
'So with "asynchronous" JAX-RS just refers to the blocking of methods on the server/client side and does not use any special behavior of REST'
-> REST has got nothing to do with async design in JAX-RS, but the way you design that Resource class and the setup the async method should involve RESTful principles.
Also, there is no 'blocking' as such - in fact its exactly the opposite. The I/O thread on server end is immediately suspended and returned to the container. The actual processing might still take a long time, but the real goal was to 'not block' and occupy threads. A Web container has limited number of threads dedicated to serving input requests. Prospective clients will get blocked if ALL the container threads are busy processing other clients. This is avoided by JAX-RS because it suspends the thread, returns it to the web container and responds on a different thread (internal server thread). All this increases the overall responsiveness of the application
'Considering the long-running processes that maybe work for days, is an open connection eventually automatically closed (e.g. by the OS or by a TCP timer)?'
--> Not sure what would happen in case this happens. But its not necessary to have your clients waiting 'forever' - you can specify timeouts using the TimeoutHandler (guess you might have already read this)
Just my two cents!

HTTP stream server: threads?

I already wrote here about the http chat server I want to create: Alternative http port?
This http server should stream text to every user in the same chat room on the website. The browser will stay connected and wait for further html code. (yes that works, the browser won't reject the connection).
I got a new question: Because this chat server doesn't need to receive information from the client, it's not necessary to listen to the client after the server sent its first response. New chat messages will be send to the server on a new connection.
So I can open 2 threads, one waiting for new clients (or new messages) and one for the html streaming.
Is this a good idea or should I use one thread per client? I don't think it's good to have one thread/client when there are many chat users online, since the server should handle multiple different chats with their own rooms.
3 posibilities:
1. One thread for all clients, send text to each client successive - there shouldn't be much lag since it's only text
this will be like: user1.send("text");user2.send("text"),...
2. One thread per chat or chatroom
3. One thread per chat user - ... many...
Thank you, I haven't done much with sockets yet ;).
Right now, you seem to be thinking in terms of a given thread always carrying out a given (type of) task. While that basic design can make sense, to produce a scalable server like this, it generally doesn't work very well.
Often a slightly more abstract viewpoint works out better: you have tasks that need to get done, and threads that do those tasks -- but a thread doesn't really "care" about what task it executes.
With this viewpoint, you simply need to create some sort of data structure that describes each task that needs to be done. When you have a task you want done, you fill in a data structure to describe the task, and hand it off to get done. Somewhere, there are some threads that do the tasks.
In this case, the exact number of threads becomes mostly irrelevant -- it's something you can (and do) adjust to fit the number of CPU cores available, the type of tasks, and so on, not something that affects the basic design of the program.
I think easiest pattern for this simple app is to have pool of threads and then for each client pick available thread or make it wait until one becomes available.
If you want serious understanding of http server architecture concepts google following:
apache architecture
nginx architecture

why is the lift web framework scalable?

I want to know the technical reasons why the lift webframework has high performance and scalability? I know it uses scala, which has an actor library, but according to the install instructions it default configuration is with jetty. So does it use the actor library to scale?
Now is the scalability built right out of the box. Just add additional servers and nodes and it will automatically scale, is that how it works? Can it handle 500000+ concurrent connections with supporting servers.
I am trying to create a web services framework for the enterprise level, that can beat what is out there and is easy to scale, configurable, and maintainable. My definition of scaling is just adding more servers and you should be able to accommodate the extra load.
Lift's approach to scalability is within a single machine. Scaling across machines is a larger, tougher topic. The short answer there is: Scala and Lift don't do anything to either help or hinder horizontal scaling.
As far as actors within a single machine, Lift achieves better scalability because a single instance can handle more concurrent requests than most other servers. To explain, I first have to point out the flaws in the classic thread-per-request handling model. Bear with me, this is going to require some explanation.
A typical framework uses a thread to service a page request. When the client connects, the framework assigns a thread out of a pool. That thread then does three things: it reads the request from a socket; it does some computation (potentially involving I/O to the database); and it sends a response out on the socket. At pretty much every step, the thread will end up blocking for some time. When reading the request, it can block while waiting for the network. When doing the computation, it can block on disk or network I/O. It can also block while waiting for the database. Finally, while sending the response, it can block if the client receives data slowly and TCP windows get filled up. Overall, the thread might spend 30 - 90% of it's time blocked. It spends 100% of its time, however, on that one request.
A JVM can only support so many threads before it really slows down. Thread scheduling, contention for shared-memory entities (like connection pools and monitors), and native OS limits all impose restrictions on how many threads a JVM can create.
Well, if the JVM is limited in its maximum number of threads, and the number of threads determines how many concurrent requests a server can handle, then the number of concurrent requests will be determined by the number of threads.
(There are other issues that can impose lower limits---GC thrashing, for example. Threads are a fundamental limiting factor, but not the only one!)
Lift decouples thread from requests. In Lift, a request does not tie up a thread. Rather, a thread does an action (like reading the request), then sends a message to an actor. Actors are an important part of the story, because they are scheduled via "lightweight" threads. A pool of threads gets used to process messages within actors. It's important to avoid blocking operations inside of actors, so these threads get returned to the pool rapidly. (Note that this pool isn't visible to the application, it's part of Scala's support for actors.) A request that's currently blocked on database or disk I/O, for example, doesn't keep a request-handling thread occupied. The request handling thread is available, almost immediately, to receive more connections.
This method for decoupling requests from threads allows a Lift server to have many more concurrent requests than a thread-per-request server. (I'd also like to point out that the Grizzly library supports a similar approach without actors.) More concurrent requests means that a single Lift server can support more users than a regular Java EE server.
at mtnyguard
"Scala and Lift don't do anything to either help or hinder horizontal scaling"
Ain't quite right. Lift is highly statefull framework. For example if a user requests a form, then he can only post the request to the same machine where the form came from, because the form processeing action is saved in the server state.
And this is actualy a thing which hinders scalability in a way, because this behaviour is inconistent to the shared nothing architecture.
No doubt that lift is highly performant but perfomance and scalability are two different things. So if you want to scale horizontaly with lift you have to define sticky sessions on the loadbalancer which will redirect a user during a session to the same machine.
Jetty maybe the point of entry, but the actor ends up servicing the request, I suggest having a look at the twitter-esque example, 'skitter' to see how you would be able to create a very scalable service. IIRC, this is one of the things that made the twitter people take notice.
I really like #dre's reply as he correctly states the statefulness of lift being a potential problem for horizontal scalability.
The problem -
Instead of me describing the whole thing again, check out the discussion (Not the content) on this post. http://javasmith.blogspot.com/2010/02/automagically-cluster-web-sessions-in.html
Solution would be as #dre said sticky session configuration on load balancer on the front and adding more instances. But since request handling in lift is done in thread + actor combination you can expect one instance handle more requests than normal frameworks. This would give an edge over having sticky sessions in other frameworks. i.e. Individual instance's capacity to process more may help you to scale
you have Akka lift integration which would be another advantage in this.