What is the best way to update GPU resource with heap type upload? - directx-12

I'm using ID3D12Resource::Map method to update a GPU resource. Is that the most efficient way? What alternatives do exist?

Upload heap resources have an associated cost for reading, which is higher than a default resource.
In case of Constant buffers, this is generally fine (as you are in a write once/read once scenario), but in other cases (large Vertex/Index buffers), this is generally not desirable.
A general approach is to create two resources (one within an upload heap, one within a default heap),copy your data into the upload resource (using map as you mentioned), then perform a gpu copy using either CopyResource or CopyBufferRegion in the default resource.
Please make sure that you have the Right resource state set up before/after the resource copy, using ResourceBarrier and a transition state.
Before to call copy, resource should be in the D3D12_RESOURCE_STATE_COPY_DEST before copy, and any of the read flag that is dependent on your resource.
Please also note that you can use a Copy Command queue for the GPU copy (so you can avoid to perform in in the Direct Command list), but you will need to ensure that the copy is completed before to use the resource (by using Fences).
Multi engine is described in msdn here

Related

AWS s3 simultaneous move of the same object

Lately I consider a thing that I haven't totally figured. Basically, the thing is summarized in the question. In a broader way, my question is as follows:
(Assume that these things are done by scripts on different machines or different tasks within the same machine (concurrently))
Assume we have bucket called "bucket-one" and an object in it where the key is "foo/bar.ext"
one task tries to move "foo/bar.ext" to "foo2/bar.ext" and the other tries to move the object to "foo3/bar.txt", and ,say we use boto3 s3 client/resource, for example (probably does not affect the output though).
What happens when you try concurrently move an object at the exact same time from a folder to another folder within the same bucket ?
Outputs I have in mind are:
Both request would succeed with both moving the same file into different folders, so that we have now both "foo2/bar.ext" and "foo3/bar.ext".
Only one of them is moved either to "foo2/bar.ext" or "foo3/bar.ext"
Both of the requests failed, object is not moved and retained as "foo/bar.ext".
All of the above may happen without precisely knowing the output beforehand.
The second question would be the same thing only a change in time "not at the exact time, but very close (nearly-same time)".
I know the odds are not likely at all, but I am curios what it would result in.
Thanks
The only possible outcome is that you get both destination objects.
S3 doesn't support moving an object to a new key, it only supports making a copy of the object at a new key (whether in the same bucket or a different bucket) and then deleting the original object with a second request.
Deleting an object that is already in the process of being copied or downloaded has no impact on operations that are already in progress on that object.
Additionally, authorized delete operations on recently-deleted objects never fail (this may in fact always be true of delete requests, but this detail isn't important, here) so neither process will be aware that the other process has also just deleted the object when they try, because that operation will succeed.
You don't even need things to occur at the exact same time, in order to end up with two objects.
If the events occur in the order Copy 1, Copy 2, Delete 1, Delete 2, this would still be the outcome, no matter how close in time Copy 1 and Copy 2 occur as long as Delete 1 hasn't prevented Copy 2 from starting... but in fact, delete operations on objects are not themselves instantaneous, so Copy 2 could potentially still work even it starts a brief time period after Delete 1 has already finished. This is caused by the eventual consistency behavior that S3 provides for delete and overwrite operations, an optimization that trades this consistency for higher performance PUT and GET (including copy). The amount of time for full consistency is not a fixed value and is often close to zero. There is no exposed interface for determining whether a bucket's index replicas are fully consistent.
See Amazon S3 Data Consistency Model in the Amazon Simple Storage Service Developer Guide.

Is rebinding a graphics pipeline in Vulkan a guaranteed no-op?

In the simplified scenario where each object to be rendered is translated into a secondary command buffer and each of those command buffers bind a graphics pipeline initially: is a guaranteed no-op to bind the pipeline that was immediately bound before? Or the order of execution of the secondary command buffers is not guaranteed at all?
is a guaranteed no-op to bind the pipeline that was immediately bound before?
No. In fact, in the case you're outlining, you should assume precisely the opposite. Why?
Since each of your CBs is isolated from the others, the vkCmdBindPipeline function has no way to know what was bound beforehand. Remember: the state of a command buffer that has started recording is undefined. Which means that the command buffer building code cannot make any assumptions about any state which you did not set within this CB.
In order for the driver to implement the optimization you're talking about, it would have to, at vkCmdExecuteCommands time, introspect into each secondary command buffer and start ripping out anything that is duplicated across CB boundaries.
That might be viable if vkCmdExecuteCommands has to copy all of the commands out of secondary CBs into primary ones. But that would only be reasonable for systems where secondary CBs don't exist at a hardware level and thus have to be implemented by copying their commands into the primary CB. But even in this case, implementing such culling would make the command take longer to execute compared to simply copying some tokens into the primary CB's storage.
When dealing with a low-level API, do not assume that the driver is going to use information outside of its immediate purview to optimize your code. Especially when you have the tools for doing that optimization yourself.
This is (yet another) a reason why you should not give each individual object its own CB.
Or the order of execution of the secondary command buffers is not guaranteed at all?
The order of execution of commands is unchanged by their presence in CBs. However, the well-defined nature of the state these commands use is affected.
Outside of state inherited by secondary CBs, every secondary CB's state begins undefined. That's why you have to bind a pipeline for each one. Commands that rely on previously issued state only have well-defined behavior if that previously issued state is within the CB containing that command (or is inherited state).

How to wipe some contents of boost managed_shared_memory?

The boost::interprocess::managed_shared_memory manual and most other resources I checked always shows examples where there is a parent process and a bunch of children spawned by it.
In my case, I have several processes spawned by a third part application and I can only control the "children". It means I cannot have a central brain to allocate and deallocate the shared memory segment. All my processes must be able to do so (Therefore, I can't erase data on exit).
My idea was to open_or_create a segment and, using a lock stored (find_or_construct'ed) in this area, I check a certain hash to see if the memory area was created by this same software version.
If this is not true, the memory segment must be wiped to avoid breaking code.
Ideally, I would want to keep the lock object because there could be other processes already waiting on it.
Things I though:
List all object names and delete all but the lock.
This can not be done because the objects might be using different implementations
Also I couldn't find where to list the names.
Use shared_memory_object::truncate
I could not find much about it
By using a managed_shared_memory, I don't know how reliable it would be because I'm not sure the lock was the first allocated data.
Refcount the processes and wipe data on last one
Prone to fatal termination problems.
Use a separated shared memory area just for this bookkeeping.
Sounds reasonable, but overkill?
Any suggestions or insights?
This sounds like a "shared ownership" scenario.
What you'd usually think of in such a scenario, would be shared pointers:
http://www.boost.org/doc/libs/1_58_0/doc/html/interprocess/interprocess_smart_ptr.html#interprocess.interprocess_smart_ptr.shared_ptr
Interprocess has specialized shared pointers (and ditto make_shared) for exactly this purpose.
Creating the shared memory realm can be done "optimistically" from each participating process (open_or_create). Note that creation needs to be synchronized. Further segment manager operations are usually already implicitly synchronized:
Whenever the same managed shared memory is accessed from different processes, operations such as creating, finding, and destroying objects are automatically synchronized. If two programs try to create objects with different names in the managed shared memory, the access is serialized accordingly. To execute multiple operations at one time without being interrupted by operations from a different process, use the member function atomic_func() (see Example 33.11).

Where is the buffer allocated in opencl?

I was trying to create a memory buffer in OpenCL with C++ binding. The sentence looks like
cl::Buffer buffer(context,CL_MEM_READ_ONLY,sizeof(float)*(100));
This sentence confuses me because it doesn't specify which device the memory is allocated on. In principle context contains all devices, including cpu and gpu, on the chosen platform. Is it true that the buffer is put in a common region shared by all the devices?
The spec does not define where the memory is. For the API user, it is "in the context".
If you have one device only, probably (99.99%) is going to be in the device. (In rare cases it may be in the host if the device does not have enough memory for the time being)
In case of many different devices, it will be in one of them at the creation. But it may move transparently to another device depending on the kernel launches.
This is the reason why the call clEnqueueMIgrateMemObjects (OpenCL 1.2 only) exists.
It allows the user to tell some hints to the API about where the memory will be needed, and prepare the copy in advance.
Here is the definition of what it does:
clEnqueueMIgrateMemObjects provides a mechanism for assigning which device an OpenCL memory object resides. A user may wish to have more explicit control over the location of their memory objects on creation. This could be used to:
Ensure that an object is allocated on a specific device prior to usage.
Preemptively migrate an object from one device to another.
Typically, memory objects are implicitly migrated to a device for which enqueued commands, using the memory object, are targeted

How to modify a data structure while a process is already accessing it ?

I have written a program (suppose X) in c++ which creates a data structure and then uses it continuously.
Now I would like to modify that data structure without aborting the previous program.
I tried 2 ways to accomplish this task :
In the same program X, first I created data structure and then tried to create a child process which starts accessing and using that data structure for some purpose. The parent process continues with its execution and asks the user for any modification like insertion, deletion, etc and takes input from console and subsequently modification is done. The problem here is, it doesn't modify the copy of data structure that the child process was using. Later on, I figured out this won't help because the child process is using its own copy of data structure and hence modifications done via parent process won't be reflected in it. But definitely, I didn't want this to happen. So I went for multithreading.
Instead of creating child process, I created an another thread which access that data structure and uses it and tried to take user input from console in different thread. Even,
this didn't work because of very fast switching between threads.
So, please help me to solve this issue. I want the modification to be reflected in the original data structure. Also I don't want the process (which is accessing and using it continuously) to wait for sometimes since it's time crucial.
First point: this is not a trivial problem. To handle it at all well, you need to design a system, not just a quick hack or two.
First of all, to support the dynamic changing, you'll almost certainly want to define the data structure in code in something like a DLL or .so, so you can load it dynamically.
Part of how to proceed will depend on whether you're talking about data that's stored strictly in memory, or whether it's more file oriented. In the latter case, some of the decisions will depend a bit on whether the new form of a data structure is larger than an old one (i.e., whether you can upgrade in place or no).
Let's start out simple, and assume you're only dealing with structures in memory. Each data item will be represented as an object. In addition to whatever's needed to access the data, each object will provide locking, and a way to build itself from an object of the previous version of the object (lazily -- i.e., on demand, not just in the ctor).
When you load the DLL/.so defining a new object type, you'll create a collection of those the same size as your current collection of existing objects. Each new object will be in the "lazy" state, where it's initialized, but hasn't really been created from the old object yet.
You'll then kick off a thread that walks makes the new collection known to the rest of the program, then walks through the collection of new objects, locking an old object, using it to create a new object, then destroying the old object and removing it from the old collection. It'll use a fairly short timeout when it tries to lock the old object (i.e., if an object is in use, it won't wait for it very long, just go on to the next. It'll iterate repeatedly until all the old objects have been updated and the collection of old objects is empty.
For data on disk, things can be just about the same, except your collections of objects provide access to the data on disk. You create two separate files, and copy data from one to the other, converting as needed.
Another possibility (especially if the data can be upgraded in place) is to use a single file, but embed a version number into each record. Read some raw data, check the version number, and use appropriate code to read/write it. If you're reading an old version number, read with the old code, convert to the new format, and write in the new format. If you don't have space to update in place, write the new record to the end of the file, and update the index to indicate the new position.
Your approach to concurrent access is similar to sharing a cake between a classroom full of blindfolded toddlers. It's no surprise that you end up with a sticky mess. Each toddler will either have to wait their turn to dig in or know exactly which part of the cake she alone can touch.
Translating to code, the former means having a lock or mutex that controls access to a data structure so that only one thread can modify it at any time.
The latter can be done by having a data structure that is modified in place by threads that each know exactly which parts of the data structure they can update, e.g. by passing a struct with details on which range to update, effectively splitting up the data beforehand. These should not overlap and iterators should not be invalidated (e.g. by resizing), which may not be possible for a given problem.
There are many many algorithms for handling resource competition, so this is grossly simplified. Distributed computing is a significant field of computer science dedicated to these kinds problems; study the problem (you didn't give details) and don't expect magic.