How to design a client server architect - c++

I like to know the server (TCP based) architecture to support large scale of clients(at least10K) to implement Fix server. My points are
How we design it.
How to listen on the open port? Use select or poll or any other function.
How to process the response of the client? On large scale we cannot create the one thread for each client.
Should the processing of response is in the different executable and share the request and response to the server executable through IPC.
There is much more on it. I would appreciate if anyone explains it or provide any link.
Thanks

An excellent resource for information on this topic is The C10K problem. Although the dimensions there seem a little old, the techniques are still applicable today.

The architecture depends on what you want to do with the clients incoming data. My guess is that for every incoming message you would perform some computations and probably also return a response.
In that case I would create 1 main listener thread that receives all the incoming messages (Actually, if your hardware has more than 1 physical network device, I would use a listener thread per device and make sure each one is listening to a specific device).
Get the number of CPUs that you have on your machine and create worker threads for each CPU and bind them each thread to one cpu (Maybe number of working thread should be num_of_cpu-1, to leave an availalbe cpu for the listener and dispatcher).
Each thread has a queue and semaphore, the main listener thread just push the incoming data into those queues. There are many way to perform load balancing (Will talk about it later).
Each working thread just works on the requests given to it, and put the response on another queue that is read by the dispatcher.
The dispatcher - there are 2 options here, use a thread for dispatcher (or thread per network device as for listeners), or have the dispatcher actually be the same thread as the listener.
There is some advantage to put them both on the same thread, since it makes it easier to detect lost socket connection and use the same fds for both reading and writing without thread synchronization. However, it could be that using 2 different threads would give better performance, it need to be tested.
Note about load balancing:
This is a topic of its own.
The simplest thing is to use 1 queue for all working threads, but the problem is that they have to lock in order to pop items and the locking can damage performance. (But you get the most balanced load).
Another quite simple approach would be to have a private queue for every worker and perform round-robin when inserting. After every X cycles check the size of all the queues. If some queues are much larger than others then leave them out for the next X cycles and then recheck them again. This is not the best approach, but a simple one to implement and gives some load balancing while no locking is needed.
By the way - There is a way to implement queue between 2 threads without blocking - but this is also another topic.
I hope it helps,
Guy

If the client and server are on a secure network then the security aspect is to be minimal - to the extent that the transfers are encrypted. If the clients and the server are not on a secure network - you first want the server and client to authenticate each other and then initiate encrypted data transfer. For data transfer, server-side authentication should suffice. At the end of this authentication use the session key to generate encrypted data stream (symmetric). consider using TFTP it is simple to implement and scales reasonably well.

Related

Does the grpc async server has a better performance compared to the sync server with a restricted number of threads?

I need to implement a server to which multiple clients can send requests simultaneously. The code processing an individual request might block (with the thread going to sleep) in the middle.
At the moment I am using the C++ GRPC synchronised server. Each time a client sends a request, a new thread is spawned on the server's side. This is a problem since a server can create too many threads simultaneously.
I am considering two solutions to avoid the problem:
1) Use the sync server with the ResourceQuota (e.g. restrict the max number of threads to 10).
2) Use the async server.
Implementing the second solution is considerably more difficult than implementing the first solution. What advantage (if any) the second solution would give compared to the first one? Which solution would give better results in terms of:
The amount of time an individual client needs to wait to get a response to RPC
The resources (memory, threads) used on the server.

Connecting to remote services from multiple threaded requests

I have a boost asio application with many threads, similar to a web server, handling hundreds of concurrent requests. Every request will need to make calls to both memcached and redis (via libmemcached and redispp respectively). Is the best practice in this situation to make a separate connection to both redis and memcached from each thread (effectively tripling the open sockets on the server, three per request)? Or is there a way for me to build a static object, with a single memcached/redis connection, and allow all threads to share that single connection? I'm a bit confused when it comes to the thread safety of something like this, and everything needs to be asynchronous between the threads, but blocking for each thread's individual request (so each thread has a linear progression, but many threads can be in different places in their own progression at any given time). Does that make sense?
Thanks so much!
Since memcached have syncronous protocol you should not write next request before you got answer to prevous. So, no other thread can chat in same memcached connection. I'd prefer to make thread-local connection if you work with it in "blocking" mode.
Or you can make it work in "async" manner: make pool of connections, pick a connection from it (and lock it). After request is done, return it to pool.
Also, you can make a request queue and process it in special thread(s) (using multigets and callbacks).

TCP/IP and designing networking application

i'm reading about way to implemnt client-server in the most efficient manner, and i bumped into that link :
http://msdn.microsoft.com/en-us/library/ms740550(VS.85).aspx
saying :
"Concurrent connections should not exceed two, except in special purpose applications. Exceeding two concurrent connections results in wasted resources. A good rule is to have up to four short lived connections, or two persistent connections per destination "
i can't quite get what they mean by 2... and what do they mean by persistent?
let's say i have a server who listens to many clients , whom suppose to do some work with the server, how can i keep just 2 connections open ?
what's the best way to implement it anyway ? i read a little about completion port , but couldn't find a good examples of code, or at least a decent explanation.
thanks
Did you read the last sentence:
A good rule is to have up to four
short lived connections, or two
persistent connections per
destination.
Hard to say from the article, but by destination I think they mean client. This isn't a very good article.
A persistent connection is where a client connects to the server and then performs all its actions without ever dropping the connection. Even if the client has periods of time when it does not need the server, it maintains its connection to the server ready for when it might need it again.
A short lived connection would be one where the client connects, performs its action and then disconnects. If it needs more help from the server it would re-connect to the server and perform another single action.
As the server implementing the listening end of the connection, you can set options in the listening TCP/IP socket to limit the number of connections that will be held at the socket level and decide how many of those connections you wish to accept - this would allow you to accept 2 persistent connections or 4 short lived connections as required.
What they mean by, "persistent," is a connection that is opened, and then held open. It's pretty common problem to determine whether it's more expensive to tie up resources with an "always on" connection, or suffer the overhead of opening and closing a connection every time you need it.
It may be worth taking a step back, though.
If you have a server that has to listen for requests from a bunch of clients, you may have a perfect use case for a message-based architecture. If you use tightly-coupled connections like those made with TCP/IP, your clients and servers are going to have to know a lot about each other, and you're going to have to write a lot of low-level connection code.
Under a message-based architecture, your clients could place messages on a queue. The server could then monitor that queue. It could take messages off the queue, perform work, and place the responses back on the queue, where the clients could pick them up.
With such a design, the clients and servers wouldn't have to know anything about each other. As long as they could place properly-formed messages on the queue, and connect to the queue, they could be implemented in totally different languages, and run on different OS's.
Messaging-oriented-middleware like Apache ActiveMQ and Weblogic offer API's you could use from C++ to manage and use queues, and other messaging objects. ActiveMQ is open source, and Weblogic is sold by Oracle (who bought BEA). There are many other great messaging servers out there, so use these as examples, to get you started, if messaging sounds like it's worth exploring.
I think key words are "per destination". Single tcp connection tries to accelerate up to available bandwidth. So if you allow more connections to same destination, they have to share same bandwidth.
This means that each transfer will be slower than it could be and server has to allocate more resources for longer time - data structures for each connection.
Because establishing tcp connection is "time consuming", it makes sense to allow establish second connection in time when you are serving first one, so they are overlapping each other. for short connections setup time could be same as for serving the connection itself (see poor performance example), so more connections are needed for filling all bandwidth effectively.
(sorry I cannot post hyperlinks yet)
here msdn.microsoft.com/en-us/library/ms738559%28VS.85%29.aspx you can see, what is poor performance.
here msdn.microsoft.com/en-us/magazine/cc300760.aspx is some example of threaded server what performs reasonably well.
you can limit number of open connections by limiting number of accept() calls. you can limit number of connections from same source just by canceling connection when you find out, that you allready have more then two connections from this location (just count them).
For example SMTP works in similar way. When there are too many connections, it returns 4xx code and closes your connection.
Also see this question:
What is the best epoll/kqueue/select equvalient on Windows?

HTTP stream server: threads?

I already wrote here about the http chat server I want to create: Alternative http port?
This http server should stream text to every user in the same chat room on the website. The browser will stay connected and wait for further html code. (yes that works, the browser won't reject the connection).
I got a new question: Because this chat server doesn't need to receive information from the client, it's not necessary to listen to the client after the server sent its first response. New chat messages will be send to the server on a new connection.
So I can open 2 threads, one waiting for new clients (or new messages) and one for the html streaming.
Is this a good idea or should I use one thread per client? I don't think it's good to have one thread/client when there are many chat users online, since the server should handle multiple different chats with their own rooms.
3 posibilities:
1. One thread for all clients, send text to each client successive - there shouldn't be much lag since it's only text
this will be like: user1.send("text");user2.send("text"),...
2. One thread per chat or chatroom
3. One thread per chat user - ... many...
Thank you, I haven't done much with sockets yet ;).
Right now, you seem to be thinking in terms of a given thread always carrying out a given (type of) task. While that basic design can make sense, to produce a scalable server like this, it generally doesn't work very well.
Often a slightly more abstract viewpoint works out better: you have tasks that need to get done, and threads that do those tasks -- but a thread doesn't really "care" about what task it executes.
With this viewpoint, you simply need to create some sort of data structure that describes each task that needs to be done. When you have a task you want done, you fill in a data structure to describe the task, and hand it off to get done. Somewhere, there are some threads that do the tasks.
In this case, the exact number of threads becomes mostly irrelevant -- it's something you can (and do) adjust to fit the number of CPU cores available, the type of tasks, and so on, not something that affects the basic design of the program.
I think easiest pattern for this simple app is to have pool of threads and then for each client pick available thread or make it wait until one becomes available.
If you want serious understanding of http server architecture concepts google following:
apache architecture
nginx architecture

why is the lift web framework scalable?

I want to know the technical reasons why the lift webframework has high performance and scalability? I know it uses scala, which has an actor library, but according to the install instructions it default configuration is with jetty. So does it use the actor library to scale?
Now is the scalability built right out of the box. Just add additional servers and nodes and it will automatically scale, is that how it works? Can it handle 500000+ concurrent connections with supporting servers.
I am trying to create a web services framework for the enterprise level, that can beat what is out there and is easy to scale, configurable, and maintainable. My definition of scaling is just adding more servers and you should be able to accommodate the extra load.
Thanks
Lift's approach to scalability is within a single machine. Scaling across machines is a larger, tougher topic. The short answer there is: Scala and Lift don't do anything to either help or hinder horizontal scaling.
As far as actors within a single machine, Lift achieves better scalability because a single instance can handle more concurrent requests than most other servers. To explain, I first have to point out the flaws in the classic thread-per-request handling model. Bear with me, this is going to require some explanation.
A typical framework uses a thread to service a page request. When the client connects, the framework assigns a thread out of a pool. That thread then does three things: it reads the request from a socket; it does some computation (potentially involving I/O to the database); and it sends a response out on the socket. At pretty much every step, the thread will end up blocking for some time. When reading the request, it can block while waiting for the network. When doing the computation, it can block on disk or network I/O. It can also block while waiting for the database. Finally, while sending the response, it can block if the client receives data slowly and TCP windows get filled up. Overall, the thread might spend 30 - 90% of it's time blocked. It spends 100% of its time, however, on that one request.
A JVM can only support so many threads before it really slows down. Thread scheduling, contention for shared-memory entities (like connection pools and monitors), and native OS limits all impose restrictions on how many threads a JVM can create.
Well, if the JVM is limited in its maximum number of threads, and the number of threads determines how many concurrent requests a server can handle, then the number of concurrent requests will be determined by the number of threads.
(There are other issues that can impose lower limits---GC thrashing, for example. Threads are a fundamental limiting factor, but not the only one!)
Lift decouples thread from requests. In Lift, a request does not tie up a thread. Rather, a thread does an action (like reading the request), then sends a message to an actor. Actors are an important part of the story, because they are scheduled via "lightweight" threads. A pool of threads gets used to process messages within actors. It's important to avoid blocking operations inside of actors, so these threads get returned to the pool rapidly. (Note that this pool isn't visible to the application, it's part of Scala's support for actors.) A request that's currently blocked on database or disk I/O, for example, doesn't keep a request-handling thread occupied. The request handling thread is available, almost immediately, to receive more connections.
This method for decoupling requests from threads allows a Lift server to have many more concurrent requests than a thread-per-request server. (I'd also like to point out that the Grizzly library supports a similar approach without actors.) More concurrent requests means that a single Lift server can support more users than a regular Java EE server.
at mtnyguard
"Scala and Lift don't do anything to either help or hinder horizontal scaling"
Ain't quite right. Lift is highly statefull framework. For example if a user requests a form, then he can only post the request to the same machine where the form came from, because the form processeing action is saved in the server state.
And this is actualy a thing which hinders scalability in a way, because this behaviour is inconistent to the shared nothing architecture.
No doubt that lift is highly performant but perfomance and scalability are two different things. So if you want to scale horizontaly with lift you have to define sticky sessions on the loadbalancer which will redirect a user during a session to the same machine.
Jetty maybe the point of entry, but the actor ends up servicing the request, I suggest having a look at the twitter-esque example, 'skitter' to see how you would be able to create a very scalable service. IIRC, this is one of the things that made the twitter people take notice.
I really like #dre's reply as he correctly states the statefulness of lift being a potential problem for horizontal scalability.
The problem -
Instead of me describing the whole thing again, check out the discussion (Not the content) on this post. http://javasmith.blogspot.com/2010/02/automagically-cluster-web-sessions-in.html
Solution would be as #dre said sticky session configuration on load balancer on the front and adding more instances. But since request handling in lift is done in thread + actor combination you can expect one instance handle more requests than normal frameworks. This would give an edge over having sticky sessions in other frameworks. i.e. Individual instance's capacity to process more may help you to scale
you have Akka lift integration which would be another advantage in this.