I have a C++ program that performs some lengthy computation in parallel using OpenMP. Now that program also has to respond to user input and update some graphics. So far I've been starting my computations from the main / GUI thread, carefully balancing the workload so that it is neither to short to mask the OpenMP threading overhead nor to long so the GUI becomes unresponsive.
Clearly I'd like to fix that by running everything concurrently. As far as I can tell, OpenMP 2.5 doesn't provide a good mechanism for doing this. I assume it wasn't intended for this type of problem. I also wouldn't want to dedicate an entire core to the GUI thread, it just needs <10% of one for its work.
I thought maybe separating the computation into a separate pthread which launches the parallel constructs would be a good way of solving this. I coded this up but had OpenMP crash when invoked from the pthread, similar to this bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36242 . Note that I was not trying to launch parallel constructs from more than one thread at a time, OpenMP was only used in one pthread throughout the program.
It seems I can neither use OpenMP to schedule my GUI work concurrently nor use pthreads to have the parallel constructs run concurrently. I was thinking of just handling my GUI work in a separate thread, but that happens to be rather ugly in my case and might actually not work due to various libraries I use.
What's the textbook solution here? I'm sure others have used OpenMP in a program that needs to concurrently deal with a GUI / networking etc., but I haven't been able to find any information using Google or the OpenMP forum.
Thanks!
There is no textbook solution. The textbook application for OpenMP is non-interactive programs that read input files, do heavy computation, and write output files, all using the same thread pool of size ~ #CPUs in your supercomputer. It was not designed for concurrent execution of interactive and computation code and I don't think interop with any threads library is guaranteed by the spec.
Leaving theory aside, you seem to have encountered a bug in the GCC implementation of OpenMP. Please file a bug report with the GCC maintainers and for the time being, either look for a different compiler or run your GUI code in a separate process, communicating with the OpenMP program over some IPC mechanism. (E.g., async I/O over sockets.)
Related
I work in lab and wrote multithreaded computational program, on C++11 using std::thread. Now I have an opportunity to run my program on multi-cpu server.
Server:
Runs Ubuntu server
Has 40 Intel CPU's
I know nothing about multi-cpu programming. First idea, that comes into my mind to run 40 applications and then glue their results together. It is possible, but I want to know more about my opportunities.
If I compile my code on server by it's gcc compiler, does resulting application take advantage of multi-cpu?
If #1 answer depends, how can I check it?
Thank you!
If your program runs multithreaded your OS should take care automatically that it uses the CPUs available.
Make sure to distribute the work you have to do to about the same number of threads there are CPUs you can use. Make sure it is not just one thread that does the work and the other threads are just waiting for the termination of this thread.
You question is not only about multi-thread, but about multi-cpu.
Basically the operating system will automatically spread out the threads over the cores. You don't need to do anything.
Once you are using C++11, you have std::thread::get_id() that you can call and identify the different thread, but you CAN NOT identify the core you are using. Use pthreads directly + "cpu affinity" for this.
You can google for "CPU affinity" for more details on how to get control over it. If you want this kind of precision. You can identify the core as well as choose the core... You can start with this: http://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
[Update:] I'm spawning multiple processes now and it works fairly well, though the basic threading problem still exists. [/]
I'm trying to thread a c++ (g++ 4.6.1) program that compiles a bunch of opencl kernels. Most of the time taken is spent inside clBuildProgram. (It's genetic programming and actually running the code and evaluating fitness is much much faster.) I'm trying to thread the compilation of these kernels and not having any luck so far. At this point, there's no shared data between threads (aside from having the same platform and device reference), but it will only run one thread at a time. I can run this code as several processes (just launching them in different terminal windows in linux) and it will then use up multiple cores but not within one process. I can use multiple cores with the same basic threading code (std::thread) with just basic math, so I think it's something to do with either the opencl compile or some static data I forgot about. :) Any ideas? I've done my best to make this thread-safe, so I'm stumped.
I'm using AMD's SDK (opencl 1.1, circa 6/13/2010) and a 5830 or 5850 to run it. The SDK and g++ are not as up to date as they could be. The last time I installed a newer linux distro in order to get the newer g++, my code was running at half speed (at least the opencl compiles were), so I went back. (Just checked the code on that install and it runs at half speed still with no threading differences.) Also, when I said it only runs one thread at a time, it will launch all of them and then alternate between two until they finish, then do the next two, etc. And it does look like all of the threads are running until the code gets to building the program. I'm not using a callback function in clBuildProgram. I realize there's a lot that could be going wrong here and it's hard to say without the code. :)
I am pretty sure this problem occurs inside of or in the call of clBuildProgram. I'm printing the time taken inside of here and the threads that get postponed will come back with a long compile time for their first compile. The only shared data between these clBuildProgram calls is the device id, in that each thread's cl_device_id has the same value.
This is how I'm launching the threads:
for (a = 0; a < num_threads; a++) {
threads[a] = std::thread(std::ref(programs[a]));
threads[a].detach();
sleep(1); // giving the opencl init f()s time to complete
}
This is where it's bogging down (and these are all local variables being passed, though the device id will be the same):
clBuildProgram(program, 1, & device, options, NULL, NULL);
It doesn't seem to make a difference whether each thread has a unique context or command_queue. I really suspected this was the problem which is why I mention it. :)
Update: Spawning child processes with fork() will work for this.
You might want to post something on AMD's support forum about that. Considering the many failed OpenGL implementations about thread consistency that the spec requires, it would not surprise me that OpenCL drivers are still suboptimal on that sense. They could use process ID internally to separate data instead, who knows.
If you have a working multi processed generation, then I suggest you keep that, and communicate results using IPC. Either you can use boost::ipc which has interesting ways of using serialization (e.g with boost::spirit to reflect the data structures). Or you could use posix pipes, or shared memory, or just dump compilation results to files and poll the directory from your parent process, using boost::filesystem and directory iterators...
Forked processes may inherit some handles; so there are ways to use unnamed pipes as well I believe, that could help you into avoiding the need to create a pipe server that would instantiate client pipes, which can lead to extensive protocol coding.
I am using a simple Concurrency Runtime task_group in Visual Studio 2010 to run a single working thread to separate the work from the GUI thread.
However one of my colleagues told me that I'm using CR wrong: it was designed for parallelizing lightweight tasks with small context and not for separating bulky and I/O-dependent threads from the GUI. He said that he'd taken this from the documentation, but failed to provide any specific links.
So, what are the limitations of Microsoft Concurrency Runtime and to solve what problems I should NOT use it?
Of course CR is not portable, but let's leave it out: I'm talking about situations, when you code compiles, but you get problems nevertheless.
The concurrency runtime is a cooperative scheduling infrastructure. If you're not going to take advantage of cooperative scheduling, then you're better off creating threads when you need to, and letting the OS take care of scheduling.
If you are into cooperative scheduling, then there's really no point to wait for an IO operation to complete, because you're blocking a thread which could have otherwise been used for running other tasks, which do not depend on this IO operation to complete. If other tasks depend on the IO task to complete, you can simply make them continuations, and the ConcRT scheduler will make sure to run them when their time comes.
So it's really not about limitations here. It's simply about knowing what you're trying to achieve, and picking the right tool for the job.
As Yam mentioned, concurrency runtime does not provide the parallel execution guarantee, it just makes a potential possibility, and that is the difference between notions of tasks and threads. If you get your tasks right (not too granular to spend much time on switching between tasks, and not too coarse to always have some work for all the cores - in your case - just one), then the overhead will not be significant, and your program will be ready for running on a multi-core or a multi-processor platform, "future proof" as MSFT people like to say.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
Preemptive multitasking in C/C++: can a running thread be interrupted by some timer and switch between tasks?
Many VMs and other language runtimes using green-threading and such are implemented in these terms; can C/C++ apps do the same?
If so, how?
This is going to be platform dependent, so please discuss this in terms of the support particular platforms have for this; e.g. if there's some magic you can do in a SIGALRM handler on Linux to swap some kind of internal stack (perhaps using longjmp?), that'd be great!
I ask because I am curious.
I have been working for several years making async IO loops. When writing async IO loops I have to be very careful not to put expensive to compute computation into the loop as it will DOS the loop.
I therefore have an interest in the various ways an async IO loop can be made to recover or even fully support some kind of green threading or such approach. For example, sampling the active task and the number of loop iterations in a SIGALRM, and then if a task is detected to be blocking, move the whole of everything else to a new thread, or some cunning variation on this with the desired result.
There was some complaints about node.js in this regard recently, and elsewhere I've seen tantalizing comments about other runtimes such as Go and Haskell. But lets not go too far away from the basic question of whether you can do preemptive multitasking in a single thread in C/C++
Windows has fibers that are user-scheduled units of execution sharing the same thread.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682661%28v=vs.85%29.aspx
UPD: More information about user-scheduled context switching can be found in LuaJIT sources, it supports coroutines for different platforms, so looking at the sources can be useful even if you are not using lua at all. Here is the summary: http://coco.luajit.org/portability.html,
As far as i understand you are mixing things that are usually not mixed:
Asynchrounous Singals
A signal is usually delivered to the program (thus in your description one thread) on the same stack that is currently running and runs the registered signal handler... in BSD unix there is an option to let the handler run on a separate so-called "signal stack".
Threads and Stacks
The ability to run a thread on its own stack requires the ability to allocate stack space and save and restore state information (that includes all registers...) - otherwise clean "context switch" between threads/processes etc. is impossible. Usually this is implemented in the kernel and very often using some form of assembler since that is a very low-level and very time-sensitive operation.
Scheduler
AFAIK every system capable of running threads has some sort of scheduler... which is basically a piece of code running with the highest privileges. Often it has subscribed to some HW signal (clock or whatever) and makes sure that no other code ever registers directly (only indirectly) to that same signal. The scheduler has thus the ability to preemt anything on that system. Main conern is usually to give the threads enough CPU cycles on the available cores to do their job. Implementation usually includes some sort of queues (often more than one), priority handling and several other stuff. Kernel-side threads usually have a higher priority than anything else.
Modern CPUs
On modern CPUs the implementation is rather complicated since involves dealing with several cores and even some "special threads" (i.e. hyperthreads)... since modern CPUs usually have several levels of Cache etc. it is very important to deal with these appropriately to achieve high performance.
All the above means that your thread can and most probably will be preempted by OS on a regular basis.
In C you can register signal handlers which in turn preempt your thread on the same stack... BEWARE that singal handlers are problematic if reentered... you can either put the processing into the signal handler or fill some structure (for example a queue) and have that queue content consumed by your thread...
Regarding setjmp/longjmp you need to be aware that they are prone to several problems when used with C++.
For Linux there is/was a "full preemption patch" available which allows you to tell the scheduler to run your thread(s) with even higher priority than kernel thread (disk I/O...) get!
For some references see
http://www.kernel.org/doc/Documentation/scheduler/sched-rt-group.txt
http://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt
http://www.kernel.org/doc/Documentation/scheduler/
http://www.kernel.org/doc/Documentation/rt-mutex.txt
http://www.kernel.org/doc/Documentation/rt-mutex-design.txt
http://www.kernel.org/doc/Documentation/IRQ.txt
http://www.kernel.org/doc/Documentation/IRQ-affinity.txt
http://www.kernel.org/doc/Documentation/preempt-locking.txt
http://tldp.org/LDP/LG/issue89/vinayak2.html
http://lxr.linux.no/#linux+v3.1/kernel/sched.c#L3566
Can my thread help the OS decide when to context switch it out?
https://www.osadl.org/Realtime-Preempt-Kernel.kernel-rt.0.html
http://www.rtlinuxfree.com/
C++: Safe to use longjmp and setjmp?
Use of setjmp and longjmp in C when linking to C++ libraries
http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_21.html
For seeing an acutal implementation of a scheduler etc. checkout the linux serouce code at https://kernel.org .
Since your question isn't very specific I am not sure whether this is a real answer but I suspect it has enough information to get you started.
REMARK:
I am not sure why you might want to implement something already present in the OS... if it for a higher performance on some async I/O then there are several options with maximum performance usually available on the kernel-level (i.e. write kernel-mode code)... perhaps you can clarify so that a more specific answer is possible.
Userspace threading libraries are usually cooperative (e.g: GNU pth, SGI's statethreads, ...). If you want preemptiveness, you'd go to kernel-level threading.
You could probably use getcontext()/setcontext()... from a SIGALARM signal handler, but if it works, it would be messy at best. I don't see what advantage has this approach over kernel threading or event-based I/O: you get all the non-determinism of preemptiveness, and you don't have your program separated into sequential control flows.
As others have outlined, preemptive is likely not very easy to do.
The usual pattern for this is using co-procedures.
Coprocedures are a very nice way to express finite state machines (e.g. text parsers, communication handlers).
You can 'emulate' the syntax of co-procedures with a modicum of preprocessor macro magic.
Regarding optimal input/output scheduling
You could have a look at Boost Asio: The Proactor Design Pattern: Concurrency Without Threads
Asio also has a co-procedure 'emulation' model based on a single (IIRC) simple preprocessor macro, combined with some amount of cunningly designed template facilities that come things eerily close to compiler support for _stack-less co procedures.
The sample HTTP Server 4 is an example of the technique.
The author of Boost Asio (Kohlhoff) explains the mechanism and the sample on his Blog here: A potted guide to stackless coroutines
Be sure to look for the other posts in that series!
What you're asking makes no sense. What would your one thread be interrupted by? Any executing code has to be in a thread. And each thread is basically a sequential execution of code. For a thread to be interrupted, it has to be interrupted by something. You can't just jump around randomly inside your existing thread as a response to an interrupt. Then it's no longer a thread in the usual sense.
What you normally do is this:
either you have multiple threads, and one of your threads is suspended until the alarm is triggered,
alternatively, you have one thread, which runs in some kind of event loop, where it receives events from (among other sources) the OS. When the alarm is triggered, it sends a message to your thread's event loop. If your thread is busy doing something else, it won't immediately see this message, but once it gets back into the event loop and processing events, it'll get it, and react.
The title is an oxymoron, a thread is an independent execution path, if you have two such paths, you have more than one thread.
You can do a kind of "poor-man's" multitasking using setjmp/longjmp, but I would not recommend it and it is cooperative rather than pre-emptive.
Neither C nor C++ intrinsically support multi-threading, but there are numerous libraries for supporting it, such as native Win32 threads, pthreads (POSIX threads), boost threads, and frameworks such as Qt and WxWidgets have support for threads also.
I have a GUI application, which listens to a network port from a second thread. I was looking at OpenMP and I was wondering if there are easy ways to create threads like this. I was searching for documentation, but the OpenMP site is not very convenient to navigate. Could someone help?
As far as I understand OpenMP is a compiler-assisted parallelizing framework/library targeted to heavy computations. You hint the compiler which parts of your code (usually loops) can run in parallel. The compiler does its magic (inserting library calls, sharing/unsharing variables, etc.) and, poof, the program can now run faster (sometimes) on several cores. It might be possible to do what you want with OpenMP, I don't know, but I think you are looking at the wrong tool. Doing things directly with pthreads is one alternative.