Some code must not run genuinely parallel: either not inside a OpenMP parallel region or within a OpenMP single block. How can I assert that the code is not running genuinely parallel (ignoring for the time being the complications arising from nested parallelism)?
I don't think there's a way to do exactly what you want.
However, you could use the "critical" construct to ensure a code section is executed only by one thread at a time, and then set some flag to prevent it to be run by another thread, if it should be run only once.
Related
OpenMP's tasks and parallel for loops seem like a good choice for concurrent processing within a specific process, but I'm not sure if it covers all use-cases.
What if I just want to spawn off an asynchronous task that doesn't need to return anything and I don't want to wait for it, I just want it to run in the background then end when done... does OpenMP have this ability? And if not can I safely use std:thread whilst also using OpenMP pragmas for other things?
And what if the spawned thread itself uses OpenMP task group, whilst the parent thread is also using another OpenMP task group for something else? Is this going to cause issues?
It is a short question. I believe there is no way to cancel a job submitted to libclang through python bindings (for example code completion task).
Can anybody prove me wrong? I am interested in using libclang in a multi threaded environment but it seems it is intended to be accesses from single thread only. If there is also no mechanism to cancel tasks, then one has to wait till the task finishes even if the results are not needed anymore. Does anybody have any ideas on how to overcome this?
[..] it seems it is intended to be accesses from single thread only.
I don't have anything that clearly backs this, but as the documentation nowhere even talks about thread safety I think all of libclang should be considered not thread safe.
But: Seeing that basically everything libclang does is (indirectly) bound to an CXIndex I would guess that you could have a CXIndex per thread and then use those (or anything that's created from them) in parallel (but not "share" anything between threads).
If there is also no mechanism to cancel tasks, then one has to wait till the task finishes even if the results are not needed anymore. Does anybody have any ideas on how to overcome this?
The "safe" solution is to move all libclang related code into a dedicated process. From your main application you then start (or kill) these processes (using OS dependent mechanisms) as you like. This is, of course, "heavy" in terms of both performance (starting processes) and development effort (serializing communication between processes).
The alternative is to hope (or verify in the source code) that the libclang devs keep all data associated to a CXIndex and thus don't introduce possible data races in their code. Then you can give every thread its own index, its own translation units etc. When you have a "job", you launch a thread (or reuse one) to work on it. If in the mean time the results are no longer needed, then you just discard the results when (if) they ever get ready.
I know you cannot kill a boost thread, but can you change it's task?
Currently I have an array of 8 threads. When a button is pressed, these threads are assigned a task. The task which they are assigned to do is completely independent of the main thread and the other threads. None of the the threads have to wait or anything like that, so an interruption point is never reach.
What I need is to is, at anytime, change the task that each thread is doing. Is this possible? I have tried looping through the array of threads and changing what each thread object points to to a new one, but of course that doesn't do anything to the old threads.
I know you can interrupt pThreads, but I cannot find a working link to download the library to check it out.
A thread is not some sort of magical object that can be made to do things. It is a separate path of execution through your code. Your code cannot be made to jump arbitrarily around its codebase unless you specifically program it to do so. And even then, it can only be done within the rules of C++ (ie: calling functions).
You cannot kill a boost::thread because killing a thread would utterly wreck some of the most fundamental assumptions a programmer makes. You now have to take into account the possibility that the next line doesn't execute for reasons that you can neither predict nor prevent.
This isn't like exception handling, where C++ specifically requires destructors to be called, and you have the ability to catch exceptions and do special cleanup. You're talking about executing one piece of code, then suddenly inserting a call to some random function in the middle of already compiled code. That's not going to work.
If you want to be able to change the "task" of a thread, then you need to build that thread with "tasks" in mind. It needs to check every so often that it hasn't been given a new task, and if it has, then it switches to doing that. You will have to define when this switching is done, and what state the world is in when switching happens.
I have three nested loops but only the innermost is parallelizable. The outer and middle loop stop conditions depend on the calculations done by the innermost loop and therefore I cannot change the order.
I have used a OPENMP pragma directive just before the innermost loop but the performance with two threads is worst than with one. I guess it is because the threads are being created every iteration of the outer loops.
Is there any way to create the threads outside the outer loops but just use it in the innermost loop?
Thanks in advance
OpenMP should be using a thread-pool, so you won't be recreating threads every time you execute your loop. Strictly speaking, however, that might depend on the OpenMP implementation you are using (I know the GNU compiler uses a pool). I suggest you look for other common problems, such as false sharing.
Unfortunately, current multicore computer systems are no good for such fine-grained inner-loop parallelism. It's not because of a thread creation/forking issue. As Itjax pointed out, virtually all OpenMP implementations exploit thread pools, i.e., they pre-create a number of threads, and threads are parked. So, there is actually no overhead of creating threads.
However, the problems of such parallelizing inner loops are the following two overhead:
Dispatching jobs/tasks to threads: even if we don't need to physically create threads, at least we must assign jobs (= create logical tasks) to threads which mostly requires synchronizations.
Joining threads: after all threads in a team, then these threads should be joined (unless nowait OpenMP directive used). This is typically implemented as a barrier operation, which is also very intensive synchronization.
Hence, one should minimize the actual number of thread assigning/joining. You may decrease such overhead by increasing the amount of work of the inner loop per invocation. This could be done by some code changes like loop unrolling.
I have a C++ program that allows a user to single step through processor instructions as a processor simulator emulates a MIPS processor. The problem is that, at least in my testing stages, I need to initialize ~2^32 integers to 0xDEADBEEF. I do this at start-up. It isn't extremely important that this happens completely before I start "single stepping". Is it possible for the initialization function to occur in parallel to the rest of the program so that, it will eventually finish but as it progresses I can still single step? How would one do this?
Instead of initializing a huge amount of memory up front, could you initialize it in smaller chunks when the emulator brings them into existence for the program being run?
It depends. A separate process means that what happens to its runtime environment has no affect on other processes' runtime environments. If your initialization needs to affect the runtime environment of the main process, then a separate process would be tricky. A separate thread would be better, as it can affect the runtime environment of the process that started the thread. You can just have the thread run at initialization and do its thing while the rest of the program goes on as normal. This may or may not include the need for semaphores (Or other similar synchronization methods); it depends on whether or not mutual exclusion exists.
In short, yes it's possible. How you accomplish this though depends on how you do this "single stepping."
You could run your initialization function in a separate thread, however you will need to create a locking mechanism on memory to ensure that the memory in-use by the simulated application isn't over-written by the init function.