Connecting to remote services from multiple threaded requests

Connecting to remote services from multiple threaded requests - c++

I have a boost asio application with many threads, similar to a web server, handling hundreds of concurrent requests. Every request will need to make calls to both memcached and redis (via libmemcached and redispp respectively). Is the best practice in this situation to make a separate connection to both redis and memcached from each thread (effectively tripling the open sockets on the server, three per request)? Or is there a way for me to build a static object, with a single memcached/redis connection, and allow all threads to share that single connection? I'm a bit confused when it comes to the thread safety of something like this, and everything needs to be asynchronous between the threads, but blocking for each thread's individual request (so each thread has a linear progression, but many threads can be in different places in their own progression at any given time). Does that make sense?
Thanks so much!

Since memcached have syncronous protocol you should not write next request before you got answer to prevous. So, no other thread can chat in same memcached connection. I'd prefer to make thread-local connection if you work with it in "blocking" mode.
Or you can make it work in "async" manner: make pool of connections, pick a connection from it (and lock it). After request is done, return it to pool.
Also, you can make a request queue and process it in special thread(s) (using multigets and callbacks).

Related

Does the grpc async server has a better performance compared to the sync server with a restricted number of threads?

I need to implement a server to which multiple clients can send requests simultaneously. The code processing an individual request might block (with the thread going to sleep) in the middle.
At the moment I am using the C++ GRPC synchronised server. Each time a client sends a request, a new thread is spawned on the server's side. This is a problem since a server can create too many threads simultaneously.
I am considering two solutions to avoid the problem:
1) Use the sync server with the ResourceQuota (e.g. restrict the max number of threads to 10).
2) Use the async server.
Implementing the second solution is considerably more difficult than implementing the first solution. What advantage (if any) the second solution would give compared to the first one? Which solution would give better results in terms of:
The amount of time an individual client needs to wait to get a response to RPC
The resources (memory, threads) used on the server.

C++ gRPC thread number configuration

Does gRPC server/client have any concept of thread pool for connections? That it is possible to reuse threads, pre-allocate threads, queue request on thread limit reached, etc.
If no, how does it work, is it just allocating/destroying a thread when it need, without any limitation and/or reuse? If yes, is it possible to configure it?

It depends whether you are using sync or async API.
For a sync client, your RPC calls blocks the calling thread, so it is not really relevant. For a sync server, there is an internal threadpool handling all the incoming requests, you can use a grpc::ResourceQuota on a ServerBuilder to limit the max number of threads used by the threadpool.
For async client and server, gRPC uses CompletionQueue as a way for users to define their own threading model. A common way of building clients and servers is to use a user-provided threadpool to run CompletionQueue::Next in each thread. Then once it gets some tag from the Next call, you can cast it to a user-defined type and run some methods to proceed the state. In this case, the user has full control of threads being used.
Note gRPC does create some internal threads, but they should not be used for the majority of the rpc work.

I don't understand how to async operations can make a HTTP server concurrent

I am developing a HTTP server using boost asio. So far, I have been using async operations (aync_read, async_write etc.), but I want to make my server concurrent, that is, the same as a server that creates a new thread per each new client connected.
I have read some forums etc. and, apparently, a concurrent server can be made only by using the mentioned async operations. I do not understand how is this possible.
I mean, taking into account that the async operations' handlers are executed in the thread that called to io_service.run(), lets take that a client is being responsed at this moment. How can another client make a petition and been answered while the main thread is busy with the first client?

The meaning of the word "concurrent" is ambiguous.
You are right, an asynchronous server is not concurrent at all. It can process only one request at a time. But the key insight is that what most servers do is actually they take a request, do some light processing (parsing, serialization, validation, some light business logic, etc.) and then call external resources (e.g. some database). The server can then process other requests while waiting for the external resource. So it's only an illusion of being concurrent (processing happens one after another but really fast). And it works as long as the processing is relatively fast compared to io.
If your server is supposed to do some hard cpu computations then obviously there will be no concurrency at all. In that case the only way to make it concurrent is to add threads or processes (possibly on multiple machines).

Asynchronous IO does not make the server concurrent.
In fact, Asynchronous IO does not mean "multi-threaded" or "multi-processed" at all. Node.js servers are mono-threaded and using asynchronous IO.
Asynchronous IO just means your thread does not wait for the IO to finish, but does other stuff meanwhile (like accepting and processing new incoming requests).
So no, the premise that Asynchronous IO makes the server concurrent is wrong. it does not make it concurrent, it makes it scalable, as thread-per-request is not so scalable, but a proper thread-pool + event queue/coroutines are. the threads only deal with CPU bound tasks and the event queue/coroutines manages enqueuing and dequeuing started/finished IO operations.

Not sure if you're only looking for a theoretical answer or a design example, but have you seen the HTTP Server 3 example for boost.asio?
Concurrency is achieved by having a small thread pool to execute the work. When callbacks need to be handled, all threads calling io_service.run() can be chosen to execute the task.

best practice of django + PyMongo pooling?

I have a db = pymongo.Connection() call in Django's views.py for a simple MongoDB connection to store some simple statistics.
What's the best practice to make it auto support MongoDB connection pooling?
Where do I need to put the end_request() code?
How do I choose the max_pool_size parameter during connection?

How does connection pooling work in PyMongo?
Every Connection instance has built-in connection pooling. By default,
each thread gets its own socket reserved on its first operation. Those
sockets are held until end_request() is called by that thread.
Calling end_request() allows the socket to be returned to the pool,
and to be used by other threads instead of creating a new socket.
Judicious use of this method is important for applications with many
threads or with long running threads that make few calls to PyMongo
operations.
Alternatively, a Connection created with auto_start_request=False will
share sockets (safely) among all threads.
I think it comes down to the type of application you have and how long the requests will hold onto a connection. The idea of calling end_request helps with long running requests holding on to a socket for a long time and causing many sockets to get created. If a single request can release the connection when it no longer needs it, then the socket can be repurposed for other requests.
If they are fast requests, then I believe the auto_start_request=False works by reusing the socket.
Ensuring a connection keeps using the same socket means that is will have consistent reads. Think if you made a query but it got delayed, and then immeditely made another query and it used a different socket. This socket manages to respond before the previous. You would have inconsistent data since it does not reflect the previous write.

How to design a client server architect

I like to know the server (TCP based) architecture to support large scale of clients(at least10K) to implement Fix server. My points are
How we design it.
How to listen on the open port? Use select or poll or any other function.
How to process the response of the client? On large scale we cannot create the one thread for each client.
Should the processing of response is in the different executable and share the request and response to the server executable through IPC.
There is much more on it. I would appreciate if anyone explains it or provide any link.
Thanks

An excellent resource for information on this topic is The C10K problem. Although the dimensions there seem a little old, the techniques are still applicable today.

The architecture depends on what you want to do with the clients incoming data. My guess is that for every incoming message you would perform some computations and probably also return a response.
In that case I would create 1 main listener thread that receives all the incoming messages (Actually, if your hardware has more than 1 physical network device, I would use a listener thread per device and make sure each one is listening to a specific device).
Get the number of CPUs that you have on your machine and create worker threads for each CPU and bind them each thread to one cpu (Maybe number of working thread should be num_of_cpu-1, to leave an availalbe cpu for the listener and dispatcher).
Each thread has a queue and semaphore, the main listener thread just push the incoming data into those queues. There are many way to perform load balancing (Will talk about it later).
Each working thread just works on the requests given to it, and put the response on another queue that is read by the dispatcher.
The dispatcher - there are 2 options here, use a thread for dispatcher (or thread per network device as for listeners), or have the dispatcher actually be the same thread as the listener.
There is some advantage to put them both on the same thread, since it makes it easier to detect lost socket connection and use the same fds for both reading and writing without thread synchronization. However, it could be that using 2 different threads would give better performance, it need to be tested.
Note about load balancing:
This is a topic of its own.
The simplest thing is to use 1 queue for all working threads, but the problem is that they have to lock in order to pop items and the locking can damage performance. (But you get the most balanced load).
Another quite simple approach would be to have a private queue for every worker and perform round-robin when inserting. After every X cycles check the size of all the queues. If some queues are much larger than others then leave them out for the next X cycles and then recheck them again. This is not the best approach, but a simple one to implement and gives some load balancing while no locking is needed.
By the way - There is a way to implement queue between 2 threads without blocking - but this is also another topic.
I hope it helps,
Guy

If the client and server are on a secure network then the security aspect is to be minimal - to the extent that the transfers are encrypted. If the clients and the server are not on a secure network - you first want the server and client to authenticate each other and then initiate encrypted data transfer. For data transfer, server-side authentication should suffice. At the end of this authentication use the session key to generate encrypted data stream (symmetric). consider using TFTP it is simple to implement and scales reasonably well.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js