How does cherrypy handle user threads? - django

I'm working on a django app right and I'm using cherrypy as the server. Cherrypy creates a new thread for every page view. I'd like to be able to access all of these threads (threads responsible for talking to django) from within any of them. More specifically I'd like to be able to access the thread_data for each of these threads from within any of them. Is this possible? If so, how do I do it?

CherryPy's wsgiserver doesn't create a new thread for every request--it uses a pool. Each of those worker threads is a subclass of threading.Thread, so all of them should be accessible via threading.enumerate().
However, if you're talking specifically about cherrypy.thread_data, that's something else: a threading.local. If you're using a recent version of Python, then all that's coded in C and you (probably rightfully) don't have cross-thread access to it from Python. If you really need it and really know what you're doing, the best technique is usually to stick an additional reference to such things in a global container at the same time that they are inserted into the thread_data structure. I recommend dicts with weakrefs as keys for those global containers--there are enough Python ORM's that use them for connection pools (see my own Geniusql, for example) that you should be able to learn how to implement them fairly easily.

My first response to a question like this isn't to tell you how to do it but to stress that you really should reconsider before moving forward with this. I normally shy away from threaded web-servers, in favor of multi-process or asynchronous solutions. Adding explicit inter-thread communication to the mix only increases those fears.
When a question like this is asked, there is a deeper goal. I suspect that what you think inter-thread communication would solve can actually be solved in some other, safer way.

Related

Thread per connection vs Reactor pattern (with a thread pool)?

I want to write a simple multiplayer game as part of my C++ learning project.
So I thought, since I am at it, I would like to do it properly, as opposed to just getting-it-done.
If I understood correctly: Apache uses a Thread-per-connection architecture, while nginx uses an event-loop and then dedicates a worker [x] for the incoming connection. I guess nginx is wiser, since it supports a higher concurrency level. Right?
I have also come across this clever analogy, but I am not sure if it could be applied to my situation. The analogy also seems to be very idealist. I have rarely seen my computer run at 100% CPU (even with a umptillion Chrome tabs open, Photoshop and what-not running simultaneously)
Also, I have come across a SO post (somehow it vanished from my history) where a user asked how many threads they should use, and one of the answers was that it's perfectly acceptable to have around 700, even up to 10,000 threads. This question was related to JVM, though.
So, let's estimate a fictional user-base of around 5,000 users. Which approach should would be the "most concurrent" one?
A reactor pattern running everything in a single thread.
A reactor pattern with a thread-pool (approximately, how big do you suggest the thread pool should be?
Creating a thread per connection and then destroying the thread the connection closes.
I admit option 2 sounds like the best solution to me, but I am very green in all of this, so I might be a bit naive and missing some obvious flaw. Also, it sounds like it could be fairly difficult to implement.
PS: I am considering using POCO C++ Libraries. Suggesting any alternative libraries (like boost) is fine with me. However, many say POCO's library is very clean and easy to understand. So, I would preferably use that one, so I can learn about the hows of what I'm using.
Reactive Applications certainly scale better, when they are written correctly. This means
Never blocking in a reactive thread:
Any blocking will seriously degrade the performance of you server, you typically use a small number of reactive threads, so blocking can also quickly cause deadlock.
No mutexs since these can block, so no shared mutable state. If you require shared state you will have to wrap it with an actor or similar so only one thread has access to the state.
All work in the reactive threads should be cpu bound
All IO has to be asynchronous or be performed in a different thread pool and the results feed back into the reactor.
This means using either futures or callbacks to process replies, this style of code can quickly become unmaintainable if you are not used to it and disciplined.
All work in the reactive threads should be small
To maintain responsiveness of the server all tasks in the reactor must be small (bounded by time)
On an 8 core machine you cannot cannot allow 8 long tasks arrive at the same time because no other work will start until they are complete
If a tasks could take a long time it must be broken up (cooperative multitasking)
Tasks in reactive applications are scheduled by the application not the operating system, that is why they can be faster and use less memory. When you write a Reactive application you are saying that you know the problem domain so well that you can organise and schedule this type of work better than the operating system can schedule threads doing the same work in a blocking fashion.
I am a big fan of reactive architectures but they come with costs. I am not sure I would write my first c++ application as reactive, I normally try to learn one thing at a time.
If you decide to use a reactive architecture use a good framework that will help you design and structure your code or you will end up with spaghetti. Things to look for are:
What is the unit of work?
How easy is it to add new work? can it only come in from an external event (eg network request)
How easy is it to break work up into smaller chunks?
How easy is it to process the results of this work?
How easy is it to move blocking code to another thread pool and still process the results?
I cannot recommend a C++ library for this, I now do my server development in Scala and Akka which provide all of this with an excellent composable futures library to keep the code clean.
Best of luck learning C++ and with which ever choice you make.
Option 2 will most efficiently occupy your hardware. Here is the classic article, ten years old but still good.
http://www.kegel.com/c10k.html
The best library combination these days for structuring an application with concurrency and asynchronous waiting is Boost Thread plus Boost ASIO. You could also try a C++11 std thread library, and std mutex (but Boost ASIO is better than mutexes in a lot of cases, just always callback to the same thread and you don't need protected regions). Stay away from std future, cause it's broken:
http://bartoszmilewski.com/2009/03/03/broken-promises-c0x-futures/
The optimal number of threads in the thread pool is one thread per CPU core. 8 cores -> 8 threads. Plus maybe a few extra, if you think it's possible that your threadpool threads might call blocking operations sometimes.
FWIW, Poco supports option 2 (ParallelReactor) since version 1.5.1
I think that option 2 is the best one. As for tuning of the pool size, I think the pool should be adaptive. It should be able to spawn more threads (with some high hard limit) and remove excessive threads in times of low activity.
as the analogy you linked to (and it's comments) suggest. this is somewhat application dependent. now what you are building here is a game server. let's analyze that.
game servers (generally) do a lot of I/O and relatively few calculations, so they are far from 100% CPU applications.
on the other hand they also usually change values in some database (a "game world" model). all players create reads and writes to this database. which is exactly the intersection problem in the analogy.
so while you may gain some from handling the I/O in separate threads, you will also lose from having separate threads accessing the same database and waiting for its locks.
so either option 1 or 2 are acceptable in your situation. for scalability reasons I would not recommend option 3.

How to convert my project to become a multi threaded application

I have a project and I want to convert it to multi-threaded application. What are the things that can be done to make it a multi threaded application
List out things to be done to convert into multithreaded application
e.g mutex lock on shared variables.
I was not able to find a question which list all those under single hood.
project is in C
Single threaded application need not be concerned about being thread safe.
This issue arises when you have multiple threads which are trying to access a commonly shared resource. At that time, you must be concerned.
So, no need to worry.
EDIT (after question been edited ) :
You need to go through the following links.
Single threaded to multithreaded application
Single threaded to multithreaded application - What we need to consider ?
Advice - Single threaded to multithreaded application
Also a good advice for converting single to multithreaded application.Check out.
Single threaded -> Multithreaded application :: Good advice.
The big issue is that, in general, when designing your application it is very difficult to choose single thread and then later on add multi-threading. The choice is fundamental to the design idioms you are going to strive towards. Here's a brief but poor guide of some of the things you should be paying attention towards and how to modify your code (note, none of these are set in stone, there's always a way around):
Remove all mutable global variables. I'd say this goes for single threaded applications too but that's just me.
Add "const" to as many variables as you can as a first pass to decide where there are state changes and take notes from the compilation errors. This is not to say "turn all your variables to const." It is just s simple hack to figure out where your problem areas are going to be.
For those items which are mutable and which will be shared (that is, you can't leave them as const without compilation warnings) put locks around them. Each lock should be logged.
Next, introduce your threads. You're probably about to suffer a lot of deadlocks, livelocks, race conditions, and what not as your single threaded application made assumptions about the way and order your application would run.
Start by paring away unneeded locks. That is, look to the mutable state which isn't shared amongst your threads. Those locks are superfluous and need to go.
Next, study your code. At this point, determining where your threaded issues are is more art than science. Although, there are decent principals about how to go about this, that's about all I can say.
If that sounds like too much effort, it's time to look towards the Actor model for concurrency. This would be akin to creating several different applications which call one another through a message passing scheme. I find that Actors are not only intuitive but also massively friendly to determining where and how you might encounter threading issues. When setting up Actors, it's almost impossible not to think about all the "what ifs."
Personally, when dealing with a single threaded to multi threaded conversion, I do as little as possible to meet project goals. It's just safer.
This depends very heavily on exactly how you intend to use threads. What does your program do? Where do you want to use threads? What will those threads be doing?
You will need to figure out what resources these threads will be sharing, and apply appropriate locking. Since you're starting with a single-threaded application, it's a good idea to minimize the shared resources to make porting easier. For example, if you have a single GUI thread right now, and need to do some complex computations in multiple threads, spawn those threads, but don't have them directly touch any data for the GUI - instead, send a asynchronous message to the GUI thread (how you do this depends on the OS and GUI library) and have it handle any changes to GUI-thread data in a serialized fashion on the GUI thread itself.
As general advice, don't simply add threads willy-nilly. You should know exactly which variables and data structures are shared between threads, where they are accessed, and why. And you should be keeping said sharing to the minimum.
Without a much more detailed description of your application, it's nearly impossible to give you a complete answer.
It will be a good idea to give some insight in your understanding of threading aswell.
However, the most important is that each time a global variable is accessed or a pointer is used, there's a good chance you'll need to do that inside of a mutex.
This wikipedia page should be a good start : http://en.wikipedia.org/wiki/Thread_safety

Is Perforce's C++ P4API thread-safe?

Simple question - is the C++ API provided by Perforce thread-safe? There is no mention of it in the documentation.
By "thread-safe" I mean for server requests from the client. Obviously there will be issues if I have multiple threads trying to set client names and such on the same connection.
But given a single connection object, can I have multiple threads fetching changelists, getting status, translating files through a p4 map, etc.?
Late answer, but... From the release notes themselves:
Known Limitations
The Perforce client-server protocol is not designed to support
multiple concurrent queries over the same connection. For this
reason, multi-threaded applications using the C++ API or the
derived APIs (P4API.NET, P4Perl, etc.) should ensure that a
separate connection is used for each thread or that only one
thread may use a shared connection at a time.
It does not look like the client object has thread affinity, so in order to share a connection between threads, one just has to use a mutex to serialize the calls.
If the documentation doesn't mention it, then it is not safe.
Making something thread-safe in any sense is often difficult and may result in a performance penalty because of the addition of locks. It wouldn't make sense to go through the trouble and then not mention it in the documentation.

Django with fastcgi and threads

I have a Django app, which spawns a thread to communicate with another server, using Pyro.
Unfortunately, it seems like under fastcgi, multiple versions of this thread are fired off, and a dictionary that should be globally constant within my program, isn't. (Sometimes it has the values I expect, sometimes not)
What's the best way to ensure that there's one and only one copy of a dictionary in a django / fastcgi app?
I strongly recommend against relying on global anything in django. The problem is that, just as you seem to be encountering, the type of deployment will determine how (or whether or not) this global state is shared. To be a style nazi, that's a completely different level of abstraction from the code, which is relying on some guarantee of consistent global state.
I'm not experienced with fastcgi, but my understanding is that it, like many other frameworks, has a pre-forked and a threaded mode. In pre-forked mode, you have separate processes, not threads, running your python code. This spells nightmare for shared global state.
Barring some fragile workaround, which ought to be possible and which someone may or may not suggest, the only persistence you can really rely on is in the database, and, to a lesser extent, whatever caching mechanism you choose. You could use the low-level api to cache and retrieve keys and values.

How to access MySQL from multiple threads concurrently

We're doing a small benchmark of MySQL where we want to see how it performs for our data.
Part of that test is to see how it works when multiple concurrent threads hammers the server with various queries.
The MySQL documentation (5.0) isn't really clear about multi threaded clients. I should point out that I do link against the thread safe library (libmysqlclient_r.so)
I'm using prepared statements and do both read (SELECT) and write (UPDATE, INSERT, DELETE).
Should I open one connection per thread? And if so: how do I even do this.. it seems mysql_real_connect() returns the original DB handle which I got when I called mysql_init())
If not: how do I make sure results and methods such as mysql_affected_rows returns the correct value instead of colliding with other thread's calls (mutex/locks could work, but it feels wrong)
As maintainer of a fairly large C application that makes MySQL calls from multiple threads, I can say I've had no problems with simply making a new connection in each thread. Some caveats that I've come across:
Edit: it seems this bullet only applies to versions < 5.5; see this page for your appropriate version: Like you say you're already doing, link against libmysqlclient_r.
Call mysql_library_init() (once, from main()). Read the docs about use in multithreaded environments to see why it's necessary.
Make a new MYSQL structure using mysql_init() in each thread. This has the side effect of calling mysql_thread_init() for you. mysql_real_connect() as usual inside each thread, with its thread-specific MYSQL struct.
If you're creating/destroying lots of threads, you'll want to use mysql_thread_end() at the end of each thread (and mysql_library_end() at the end of main()). It's good practice anyway.
Basically, don't share MYSQL structs or anything created specific to that struct (i.e. MYSQL_STMTs) and it'll work as you expect.
This seems like less work than making a connection pool to me.
You could create a connection pool. Each thread that needs a connection could request a free one from the pool. If there's no connection available then you either block, or grow the pool by adding a new connection to it.
There's an article here describing the pro's and cons of a connection pool (though it is java based)
Edit: Here's a SO question / answer about connection pools in C
Edit2: Here's a link to a sample Connection Pool for MySQL written in C++. (you should probably ignore the goto statements when you implement your own.)
Seems clear to me from the mySQL Docs that any specific MYSQL structure can be used in a thread without difficulty - using the same MYSQL structure in different threads simultaneously is clearly going to give you extremely unpredictable results as state is stored within the MYSQL connection.
Thus either create a connection per thread or used a pool of connections as suggested above and protect access to that pool (i.e. reserving or releasing a connection) using some kind of Mutex.
MySQL Threaded Clients in C
It states that mysql_real_connect() is not thread safe by default. The client library needs to be compiled for threaded access.