best practice of django + PyMongo pooling? - django

I have a db = pymongo.Connection() call in Django's views.py for a simple MongoDB connection to store some simple statistics.
What's the best practice to make it auto support MongoDB connection pooling?
Where do I need to put the end_request() code?
How do I choose the max_pool_size parameter during connection?

How does connection pooling work in PyMongo?
Every Connection instance has built-in connection pooling. By default,
each thread gets its own socket reserved on its first operation. Those
sockets are held until end_request() is called by that thread.
Calling end_request() allows the socket to be returned to the pool,
and to be used by other threads instead of creating a new socket.
Judicious use of this method is important for applications with many
threads or with long running threads that make few calls to PyMongo
operations.
Alternatively, a Connection created with auto_start_request=False will
share sockets (safely) among all threads.
I think it comes down to the type of application you have and how long the requests will hold onto a connection. The idea of calling end_request helps with long running requests holding on to a socket for a long time and causing many sockets to get created. If a single request can release the connection when it no longer needs it, then the socket can be repurposed for other requests.
If they are fast requests, then I believe the auto_start_request=False works by reusing the socket.
Ensuring a connection keeps using the same socket means that is will have consistent reads. Think if you made a query but it got delayed, and then immeditely made another query and it used a different socket. This socket manages to respond before the previous. You would have inconsistent data since it does not reflect the previous write.

Related

Multiple writes on same socket c++

I'm currently trying to develop a server and some clients which communicate with each other using something like a proxy in the middle. The "proxy" will have sockets opened to every client and server on the system. This means that I'm currently using threads to keep all the connections opened. Every time a client decides to send a message it uses its socket with the proxy and sends the message. Then the proxy will propagate the message to every other node using the respective socket.
As you can see, a node can be receiving messages by having the proxy writing on the socket or a node may want to send messages by writing on the socket.
How do I guarantee that the content in the socket does not get overwritten ? Do I have to use mutexes to lock the access to the socket ? What is a good practice to solve this problem ?
Connections are bi-directional. Content going one way does not overwrite content going the other way. No mutex is needed for this.
Besides, you couldn't use a mutex anyway, as both sides of the connection are separate.

Handing over an established TCP connection from one process to another

I am writing a simple web server with C++ that handles long-lived connections. However, I need to reload my web server from time to time. I wonder if there is a way that I can hand over the established connections from one process to another process to be able to retain my established connections after reload.
Would that be enough to only pass file descriptors? what would happen to connection states?
Any similar open source project that does the same thing?
Any thoughts or ideas?
Thanks,
I really have no idea whether this is possible, but I think not. If you fork() then the child will "inherit" the descriptors, but I don't know whether they behave like the should (though I suspect that they do.) And with forking, you can't run new code (can you?) Simple descriptor numbers are process-specific, so just passing them to a new, unrelated process won't work either, and they will be closed when your process terminates anyway.
One solution (in the absence of a simpler one,) is to break your server into two processes:
Front-end: A very simple process that just accepts the connections, keep them open and forwards any data it receives to the second process, and vice versa.
Server: The real web server, that does all the logic and processing, but does not communicate with the clients directly.
The first and second processes communicate via a simple protocol. One feature of this protocol must that it does support the second process being terminated and relaunched.
Now, you can reload the actual server process without losing the client connections (since they are handled by the front-end process.) And since this front-end is extremely simple and probably has very few configurations and bugs, you rarely need to reload it at all. (I'm assuming that you need to reload your server process because it runs into bugs that need to be fixed or you need to change configurations and stuff.)
Another important and helpful feature that this system can have is to be able to transition between server processes "gradually". That is, you already have a front-end and a server running, but you decide to reload the server. You launch another server process that connects to the front-end (while the old server is still running and connected,) and the front-end process forwards all the new client connections to the new server process (or even all the new requests coming from the existing client connections.) And when the old server finishes processing all the requests that it has under processing, it gracefully and cleanly exits.
As I said, this is a solution you might to try only if nothing easier and simpler is found.

Connecting to remote services from multiple threaded requests

I have a boost asio application with many threads, similar to a web server, handling hundreds of concurrent requests. Every request will need to make calls to both memcached and redis (via libmemcached and redispp respectively). Is the best practice in this situation to make a separate connection to both redis and memcached from each thread (effectively tripling the open sockets on the server, three per request)? Or is there a way for me to build a static object, with a single memcached/redis connection, and allow all threads to share that single connection? I'm a bit confused when it comes to the thread safety of something like this, and everything needs to be asynchronous between the threads, but blocking for each thread's individual request (so each thread has a linear progression, but many threads can be in different places in their own progression at any given time). Does that make sense?
Thanks so much!
Since memcached have syncronous protocol you should not write next request before you got answer to prevous. So, no other thread can chat in same memcached connection. I'd prefer to make thread-local connection if you work with it in "blocking" mode.
Or you can make it work in "async" manner: make pool of connections, pick a connection from it (and lock it). After request is done, return it to pool.
Also, you can make a request queue and process it in special thread(s) (using multigets and callbacks).

How to design a client server architect

I like to know the server (TCP based) architecture to support large scale of clients(at least10K) to implement Fix server. My points are
How we design it.
How to listen on the open port? Use select or poll or any other function.
How to process the response of the client? On large scale we cannot create the one thread for each client.
Should the processing of response is in the different executable and share the request and response to the server executable through IPC.
There is much more on it. I would appreciate if anyone explains it or provide any link.
Thanks
An excellent resource for information on this topic is The C10K problem. Although the dimensions there seem a little old, the techniques are still applicable today.
The architecture depends on what you want to do with the clients incoming data. My guess is that for every incoming message you would perform some computations and probably also return a response.
In that case I would create 1 main listener thread that receives all the incoming messages (Actually, if your hardware has more than 1 physical network device, I would use a listener thread per device and make sure each one is listening to a specific device).
Get the number of CPUs that you have on your machine and create worker threads for each CPU and bind them each thread to one cpu (Maybe number of working thread should be num_of_cpu-1, to leave an availalbe cpu for the listener and dispatcher).
Each thread has a queue and semaphore, the main listener thread just push the incoming data into those queues. There are many way to perform load balancing (Will talk about it later).
Each working thread just works on the requests given to it, and put the response on another queue that is read by the dispatcher.
The dispatcher - there are 2 options here, use a thread for dispatcher (or thread per network device as for listeners), or have the dispatcher actually be the same thread as the listener.
There is some advantage to put them both on the same thread, since it makes it easier to detect lost socket connection and use the same fds for both reading and writing without thread synchronization. However, it could be that using 2 different threads would give better performance, it need to be tested.
Note about load balancing:
This is a topic of its own.
The simplest thing is to use 1 queue for all working threads, but the problem is that they have to lock in order to pop items and the locking can damage performance. (But you get the most balanced load).
Another quite simple approach would be to have a private queue for every worker and perform round-robin when inserting. After every X cycles check the size of all the queues. If some queues are much larger than others then leave them out for the next X cycles and then recheck them again. This is not the best approach, but a simple one to implement and gives some load balancing while no locking is needed.
By the way - There is a way to implement queue between 2 threads without blocking - but this is also another topic.
I hope it helps,
Guy
If the client and server are on a secure network then the security aspect is to be minimal - to the extent that the transfers are encrypted. If the clients and the server are not on a secure network - you first want the server and client to authenticate each other and then initiate encrypted data transfer. For data transfer, server-side authentication should suffice. At the end of this authentication use the session key to generate encrypted data stream (symmetric). consider using TFTP it is simple to implement and scales reasonably well.

C++ MySQL and multithreading - 1 DB connection per user?

Is in multithreaded application suitable to have 1 connection per 1 connected client? To me it seems ineffective but if there is not connection pooling, how it could be done when one wants to let each connection communicate with DB?
Thanks
If you decide to share a connection amongst threads, you need to be sure that one thread completely finishes with the connection before another uses it (use a mutex, semaphore, or critical section to protect the connections). Alternately, you could write your own connection pool. This is not as hard as it sounds ... make 10 connections (or however big your pool needs to be) on startup and allocate/deallocate them on demand. Again protecting with mutex/cs/sema.
That depends on your architecture.
It sounds like you're using a server->distributed client model? In that case I would implement some sort of a layer for DB access, and hide connection pooling, etc. behind a data access facade.