Boost::Thread or fork() : Multithreaded HTTP Proxy - c++

I'm testing boost::thread on a system. It happens that I needed to act as a fork(), because one thread modifies the other variables, even member variables of class
I do the project using fork() or is there some alternative still using boost::thread?
Basically I run this program in Linux and maybe FreeBSD.
It is an http proxy,accept() in main thread, and a function that accepts a class (where there is the file descriptor socket) in a secondary thread that makes the service.
Is there a better way to implement a proxy?

fork() spawns a process which have independent memory regions. Changed must be mediate through IPC.
boost::thread create a thread which can share memory.
They are not comparable.
To create thread-local storage, use boost::thread_specific_ptr.
See http://www.boost.org/doc/libs/1_42_0/doc/html/thread/thread_local_storage.html.
(You may also decorate a global variable as __thread int xyz; to make it thread-local, if the compiler and architecture can support it.)

It sounds like you are trying to allow multiple threads to alter global variables without each others' changes affecting any of the other threads. By forking, the entire memory space of your application is basically copied and each branch of the fork has its own variables and the two branches cannot communicate except through IPC.
If you want to use boost::thread, you'll have to do this copying yourself if you don't want threads to affect each other since the same memory space is common among all threads. You could just create the variables local to each thread function.
Using threads instead of forking will be much more flexible especially when you want to start letting the threads share data. If you want to have variables that all of the threads can change, they should be protected by mutex locks when being changed so that only one thread can change a variable at one time.

Related

main() having read-only access to edited values inside continuously running threads - C++

I create several pthreads inside the main() function of my C++ program. Once the threads are created each of them are responsible to create a UDP socket and listen on specific port. Based on the data arriving on each port, the threads modify global variables which are at the same time going to be read by the main function inside a loop. The question is if I can use mutex to follow the same procedure inside the main function, or the main function is itself consuming a lot of resources that the better way is to make another thread responsible to read the modified data? (and ofcourse in this new thread I have the option of using mutex)
Thanks in advance.
There's no significant difference between the "main" thread and other threads it creates. Once the additional threads are created, they all have the same access rights and memory mappings.

Interprocess communication: Shared memory vs thread object access

I always learnt, that shared memory is the fastest way to share data between two threads (like e.g. http://www.boost.org/doc/libs/1_55_0/doc/html/interprocess.html). However, today I discovered that using boost::ref(X) it is possible to give boost a reference to X enabling access to X from outside the thread. Therefore the following pseudocode should work:
MyObjext X(para1,para2); // MyObject has a () operator
boost::thread thr(boost::ref(X));
X.setSomeMember(1);
This got me thinking: Assuming setSomeMember is thread safe, then - for most applications - this approach seems much easier, since most applications spawn their threads as they need and thus can always save and access the object X. So, why would I use shared memory or message queues anyway, if I have access to the thread object directly? Is it maybe faster? Or am I missing something here?
They're just different features - you happen to highlight the similarities.
Yes, threads are more lightweight than processes.
What you lose is isolation (processes can only share what's explicitely exposed, and only given the right permissions). There is no such control for inter-thread sharing.
If one thread messes up the shared state, all threads die, the same goes for shared memory. However, if one thread dies, the whole process dies, which doesn't happen for separate processes.
All in all, it's different. Inter-process synchronization/sharing is more heavy weight but has more features (how will you run a separate thread on a different host :)).

V8 order of instantiating the variables (multi-thread)

I'm new with Google's V8 and I'm not sure how to fully use the variable types they give. I'll start by explaining what i wish to flow to be:
In the main thread I want to compile the JS scripts.
In several threads I want to run scripts when I "add" to the context different information using instance->SetAccessor(...) or prototype->Set(...) (or any other option if there is.
I am not sure when I need to do the following:
where and when to create the v8::handleScope? is creating one in the main thread is enough? or I need one for each thread?
where and when to create v8::isolate and v8::locker? should it be per thread or not? should it be before or after the v8::handleScope?
Any info will help (:
If you want to run the scripts in parallel from each thread with no cross-thread sharing, then each thread needs its own isolate. You may or may not actually need one for the main thread, or you could maybe use the default isolate. I'd recommend making sure that the default isolate has been initialized before running any threads though, just in case one of your other threads ends up initializing it. You should be ok if you are using isolates but it won't do any harm to be sure.
If you need cross-thread sharing of objects etc then you'll need to research this and it is likely to be difficult. Not even sure if v8 can really support it yet or not. Having separate isolates and avoiding sharing of objects is much easier.
You should be able to compile your scripts in the context of an isolate intended for the thread that is going to execute it in the main thread and then pass the script and the isolate to the thread and not touch either again in the main thread until the worker thread is done with it. This ought to work, but I've not checked if v8 checks the thread-id that the isolate was created in and the one it executes in. It's worth writing a little test app to check that this will work.
The other option is to check the compilation in the main thread and compile it again in the worker thread and encapsulate the isolate in the thread. This is the way I have done it in the past. It's easier but less efficient.
The handle scope should be allocated on the stack only in the functions where it is needed. Don't use a global variable for the handle scope or allocate it on the heap.
Your compiled script should use a persistent handle.
Enter the handle scope after you have entered the isolate scope.

STA (Single Threaded Apartment) COM Object - Spawn worker threads?

Is it a bad thing to spawn worker threads in your STA COM object (ie. COM object creates a thread to perform a task)? I think, the answer is - that depends!
For example in my case:
The worker threads that I am using will not interfere/access COM or COM Services.
Reason why I am asking this is because by STA COM definition STA can only house one thread. Spawning multiple threads kind of goes against this principle unless the worker threads and the work they do NOT interfere/deal with COM/COM services.
In this case I am thinking this is perfectly fine and in my opinion the worker threads should not be considered by COM as part of the logical STA.
What are your thoughts on this?
No, that's not a bad thing. Apartments explicitly exist to help you getting multi-threaded code working. An STA thread is a safe home for a COM server that's not thread-safe, COM's apartment threading model ensures that it is always used in a thread-safe way. All you have to do is the marshal the interface pointer you want to use in the worker thread (IGlobalInterfaceTable for example) and you can call the methods without doing anything special.
This doesn't come for free of course, there's overhead involved in marshaling the call. How much depends on how responsive the STA thread is when it pumps its message loop. If you intended to create the worker thread explicitly to use that COM server in a multi-threaded way then of course you'll not be ahead, you made it slower.
Don't let the worker threads use COM in any way, and you should be fine. This means you can't call COM objects in the worker and you can't call COM runtime APIs from the worker... either directly or indirectly.
The important thing to realize is that any new threads you create are new threads in their own right; it actually doesn't matter at all which thread created them. The two things that matter are: (1) that those new threads themselves call CoInitializeEx and either get their own STA each, or share an MTA together, and (2) any COM object pointers you transfer between threads get marshaled appropriately. Do not ever just pass a COM object pointer from one thread to another in a global variable; instead use the GIT or CoMarshalInterThreadInterfaceInStream as appropriate.
(One exception to this: you can pass COM pointers freely between MTA threads; but only once that pointer has been appropriately marshaled into the MTA in the first place.)
Also, you need to be very aware of there objects live and what their affinities are. If you create an object on a STA thread, and marshal a pointer to another thread, then the typical case is that the object will still live on that original STA thread with calls returning to that thread, unless you takes specific steps to specify otherwise. (Things to watch for here: what the object's threading model is, and whether it 'aggregates the free-threaded marshaller'.)
So it's not a bad thing; but be sure that you do it appropriately. For example, you might think that using two threads might be more efficient; but then later on realize that a lot of time is being spent by that worker thread calling back to the object on the original thread, giving you worse performance than a single-threaded case. So you need to think out your threads and object strategy carefully first.
(Having said all of that, you can of course spin up as many threads as you want that don't call CoInitialize, so long as they don't use COM or COM objects in any way; if those threads to need so somehow communicate with the threads that do use COM, it's up to you to manage that communication using any 'classic' IPC mechanism of your choice - eg. messages, globals, etc.)

Why do locks work?

If the locks make sure only one thread accesses the locked data at a time, then what controls access to the locking functions?
I thought that boost::mutex::scoped_lock should be at the beginning of each of my functions so the local variables don't get modified unexpectedly by another thread, is that correct? What if two threads are trying to acquire the lock at very close times? Won't the lock's local variables used internally be corrupted by the other thread?
My question is not boost-specific but I'll probably be using that unless you recommend another.
You're right, when implementing locks you need some way of guaranteeing that two processes don't get the lock at the same time. To do this, you need to use an atomic instruction - one that's guaranteed to complete without interruption. One such instruction is test-and-set, an operation that will get the state of a boolean variable, set it to true, and return the previously retrieved state.
What this does is this allows you to write code that continually tests to see if it can get the lock. Assume x is a shared variable between threads:
while(testandset(x));
// ...
// critical section
// this code can only be executed by once thread at a time
// ...
x = 0; // set x to 0, allow another process into critical section
Since the other threads continually test the lock until they're let into the critical section, this is a very inefficient way of guaranteeing mutual exclusion. However, using this simple concept, you can build more complicated control structures like semaphores that are much more efficient (because the processes aren't looping, they're sleeping)
You only need to have exclusive access to shared data. Unless they're static or on the heap, local variables inside functions will have different instances for different threads and there is no need to worry. But shared data (stuff accessed via pointers, for example) should be locked first.
As for how locks work, they're carefully designed to prevent race conditions and often have hardware level support to guarantee atomicity. IE, there are some machine language constructs guaranteed to be atomic. Semaphores (and mutexes) may be implemented via these.
The simplest explanation is that the locks, way down underneath, are based on a hardware instruction that is guaranteed to be atomic and can't clash between threads.
Ordinary local variables in a function are already specific to an individual thread. It's only statics, globals, or other data that can be simultaneously accessed by multiple threads that needs to have locks protecting it.
The mechanism that operates the lock controls access to it.
Any locking primitive needs to be able to communicate changes between processors, so it's usually implemented on top of bus operations, i.e., reading and writing to memory. It also needs to be structured such that two threads attempting to claim it won't corrupt its state. It's not easy, but you can usually trust that any OS implemented lock will not get corrupted by multiple threads.