I am building an application using RMA in MPI. I am stuck at how to achieve the following, suppose I have 2 windows win1 and win2, what I want to do is write data into both the windows but I want this to be atomic so that untill both the elements are written into their respective windows no other process accesses the window at same target process.
Is this possible to achieve in MPI? making write on single window atomic is possible with Exclusive Lock but is there any way I can lock multiple windows together?
Related
Hi I am a newbie learning Direct 3D 12.
So far, I understood that Direct 3D 12 is designed for multithreading and I'm trying to make my own simple multithread demo by following the tutorial by braynzarsoft.
https://www.braynzarsoft.net/viewtutorial/q16390-03-initializing-directx-12
Environment is windows, using C++, Visual Studio.
As far as I understand, multithreading in Direct 3D 12 seems, in a nutshell, populating command lists in multiple threads.
If it is right, it seems
1 Swap Chain
1 Command Queue
N Command Lists (N corresponds to number of threads)
N Command Allocators (N corresponds to number of threads)
1 Fence
is enough for a single window program.
I wonder
Q1. When do we need multiple command queues?
Q2. Why do we need multiple fences?
Q3. When do we submit commands multiple times?
Q4. Does GetCPUDescriptorHandleForHeapStart() return value changes?
Q3 comes from here.
https://developer.nvidia.com/sites/default/files/akamai/gameworks/blog/GDC16/GDC16_gthomas_adunn_Practical_DX12.pdf
Purpose of Q4 is I thought of calling the function once and store the value for reuse, it didn't change when I debugged.
Rendering loop in my mind is (based on Game Loop pattern), for example,
Thread waits for fence value (eg. Main thread).
Begin multiple threads to populate command lists.
Wait all threads done with population.
ExecuteCommandLists.
Swap chain present.
Return to 1 in the next loop.
If I am totally misunderstanding, please help.
Q1. When do we need multiple command queues?
Read this https://learn.microsoft.com/en-us/windows/win32/direct3d12/user-mode-heap-synchronization:
Asynchronous and low priority GPU work. This enables concurrent execution of low priority GPU work and atomic operations that enable one GPU thread to consume the results of another unsynchronized thread without blocking.
High priority compute work. With background compute it is possible to interrupt 3D rendering to do a small amount of high priority compute work. The results of this work can be obtained early for additional processing on the CPU.
Background compute work. A separate low priority queue for compute workloads allows an application to utilize spare GPU cycles to perform background computation without negative impact on the primary rendering (or other) tasks.
Streaming and uploading data. A separate copy queue replaces the D3D11 concepts of initial data and updating resources. Although the application is responsible for more details in the Direct3D 12 model, this responsibility comes with power. The application can control how much system memory is devoted to buffering upload data. The app can choose when and how (CPU vs GPU, blocking vs non-blocking) to synchronize, and can track progress and control the amount of queued work.
Increased parallelism. Applications can use deeper queues for background workloads (e.g. video decode) when they have separate queues for foreground work.
Q2. Why do we need multiple fences?
All gpu work is asynchronous. So you can think of fences as low level tools to achieve the same result as futures/coroutines. You can check if the work has been completed, wait for work to complete or set an event on completion. You need a fence whenever you need to guarantee a resource holds the output of work (when resource barriers are insufficient).
Q4. Does GetCPUDescriptorHandleForHeapStart() return value changes?
No it doesn't.
store the value for reuse, it didn't change when I debugged.
The direct3d12 samples do this, you should know them intimately if you want to become proficient.
Rendering loop in my mind is (based on Game Loop pattern), for example,
That sounds okay, but I urge you to look at the direct3d12 samples and steal the patterns (and the code) they use there.
So, the profiler is written in c++ and is launched by the CLR automatically when the process to be profiled is launched. The process then launches another application (the main target of profiling). Profiler is launched for this process also. All this is taken care of, but the problem is:
Only one of these two profilers can communicate with the front end application via NamedPipe. I need both the profilers to write on the same pipe so that the front end application remains straight-forward and simple. Is this possible using some kind of semaphore to ensure that one of the processes write to the pipe at one time? I use the CreateFile() function to open the pipe in the profiler.
I have used Qt quite a lot, but recently needed to debug the threads I have been creating and found many more threads then I was expecting.
So my program is a simple console only (no GUI) Qt application (linux).
Threads that I have created:
It has a main() (which executes the QtCoreApplication) - so that is the main thread.
A thread to process received data from the com port (using FTDI D2XX thirdparty code drivers)
And that is all. When I do ps -T... and find my application there are 7 threads. I have two classes that are QObjects using signals and slots, so maybe they need a thread each for message handling, that takes me to 4 threads... so I am at a loss as to why I might have 7 threads for my application.
Can anyone explain more about what is going on? can post code if needed. Note I only use new QThread once in my code (for the moment).
Qt doesn't create any per-QObject threads. It creates helper threads for some plaform-specific reasons, e.g. QProcess sometimes needs helper threads.
The FTDI D2XX unix driver uses libusb and that implementation is completely backwards and uses additional threads on top of the thread you've provided for it. Frankly said, you shouldn't be using the D2XX driver on Linux or OS X. Just use the kernel driver.
You should simply run the D2XX driver in a trivial non-Qt test application that opens the device and reads from it continuously and see how many threads it spawns. You'll be dismayed...
I have an application that I run with MPI for distributed computing. Let's say there are two MPI ranks started on a single machine, I start my target application on rank-0 which then spawns few threads. I want each of these threads to access a simple array[block] of data that was created by rank-1.
How can I do this? Shared memory?(Is it the only way). Can I use something in MPI(I'm a beginner)?
Thanks!
What's the best/proper method to collect log messages from multiple threads and have them all be displayed using a window? (while the threads are running).
I am currently trying to redirect stdout (cout) in to a wxTextCtrl but failing miserably when attempting to do so over multiple threads. Any help would be appreciated.
Logging has had a few major updates recently in the wxWidgets trunk, you can read about them here. One of them is to add support for logging from threads other than the main thread.
In what way is it failing? I'm not familiar with the wxTextCtrl, but unless it has built in synchronization (ie. its thread safe) that could be a big issue. The simplest way to protect a single resource like this is via a named 'mutex'. The following example is what you can do in each thread to make sure that only one accesses this resource (the output window) at a time.
// In each thread's initialization:
HANDLE mutexHandle = CreateMutex(0,FALSE,"__my_protecting_mutex__");
// Whenever you use the debug output:
WaitForSingleObject(mutexHandle, /* Timeout if you like. */ 0xFFFFFFFF );
// Do our printing here.
ReleaseMutex(mutexHandle);
// In each thread's cleanup:
CloseHandle(mutexHandle);
So this basically guarantees that only one thread can be in between the wait and the release. Now if your issue is actually routing to the wxTextCtrl, I would need some more details.
Edit: I just realized that what I posted is Windows specific, and maybe you aren't on windows! If you aren't I don't have experience with other platform's synchronization methods, but boost has some generic libraries which are not platform specific.