In the samples I have looked at so far some of the commands are something like:
D3D12_DESCRIPTOR_HEAP_DESC with D3D12_DESCRIPTOR_HEAP_TYPE::D3D12_DESCRIPTOR_HEAP_TYPE_RTV
ID3D12Device::CreateDescriptorHeap
D3D12_CPU_DESCRIPTOR_HANDLE with ID3D12DescriptorHeap::GetCPUDescriptorHandleForHeapStart (and maybe ID3D12Device::GetDescriptorHandleIncrementSize)
(Optional?) ID3D12Device::CreateRenderTargetView
(Optional?) IDXGISwapChain3::GetBuffer
Schedule rendering stuff to a command list, of note OMSetRenderTargets and DrawInstanced, and then close the command list.
ID3D12CommandQueue::ExecuteCommandLists
(Optional?) IDXGISwapChain3::Present
Schedule a signal with ID3D12CommandQueue::Signal on a fence
Wait for the GPU to finish with ID3D12Fence::SetEventOnCompletion and WaitForSingleObjectEx
If possible, how can step 4.1 be replaced with a buffer of choice? I.e. how can one create a ID3D12Resource* and render to it, and then read from it into say a std::vector? (I assume if this is possible that step 6.1 can be ignored, since there is no render target view for the swap chain to present to. Perhaps step 4 is unnecessary as well in this case ? Maybe only OMSetRenderTargets matters ?)
It depends on the video memory architecture as to where exactly the render target can be located. On some systems, it's in dedicated video memory that only the video card can access. In some systems it's in video memory shared across the bus that both CPU and GPU can access. In unified memory architectures, everything is in system memory.
Therefore, you have restrictions on where exactly the render target can be located. This is why you have to use D3D12_HEAP_TYPE_DEFAULT and specify D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET when creating the ID3D12Resource you plan to bind as a render target (this is implicitly done by DXGI when you create the swap chain render target as well).
Generally speaking you can't and shouldn't use the low-level DXGI surface creation APIs to create Direct3D resources. They mostly exist for system use rather than by applications.
Unless you happen to be on a UMA system, you should minimize the CPU access to the render target as it will require expensive copies otherwise. Even on UMA systems, there's also required de-tiling as well to get the results into a linear form.
Direct3D 12 also offers the "Placed Resource" methods as well which can provide more control over where exactly the memory is allocated (or more specifically where the virtual memory addresses are allocated), but you still have to abide by the underlying architecture limitations. Depending on the memory architecture, you can "alias" multiple different ID3D12Resource instances all using the same memory (such as a render target being be aliased as a unordered access resource), but you are responsible for inserting the required resource barriers into the command list (and test it) to make sure that it works reliably on all DX12 hardware. See MSDN.
You do not have to Present your render target if you don't need the user to see the result.
Memory Management Strategies
UMA Optimizations: CPU Accessible Textures and Standard Swizzle
Getting Started with Direct3D 12
If you are new to DirectX 12, you should see the DirectX Tool Kit for DirectX 12 tutorials. If you aren't already familiar with DirectX 11, you should wait on DirectX 12 and start with DirectX Tool Kit for DirectX 11.
Related
Would using multi-GPUs in Vulkan be something like making many command queues then dividing command buffers between them?
There are 2 problems:
In OpenGL, we use GLEW to get functions. With more than 1 GPU, each GPU has its own driver. How'd we use Vulkan?
Would part of the frame be generated with a GPU & the others with other GPUs like use Intel GPU to render UI & AMD or Nvidia GPU to render game screen in labtops for example? Or would a frame be generated in a GPU & the next frame in an another GPU?
Updated with more recent information, now that Vulkan exists.
There are two kinds of multi-GPU setups: where multiple GPUs are part of some SLI-style setup, and the kind where they are not. Vulkan supports both, and supports them both in the same computer. That is, you can have two NVIDIA GPUs that are SLI-ed together, and the Intel embedded GPU, and Vulkan can interact with them all.
Non-SLI setups
In Vulkan, there is something called the Vulkan instance. This represents the base Vulkan system itself; individual devices register themselves to the instance. The Vulkan instance system is, essentially, implemented by the Vulkan SDK.
Physical devices represent a specific piece of hardware that implements the interface to a GPU. Each piece of hardware that exposes a Vulkan implementation does so by registering its physical device with the instance system. You can query which physical devices are available, as well as some basic properties about them (their names, how much memory they offer, etc).
You then create logical devices for the physical devices you use. Logical devices are how you actually do stuff in Vulkan. They have queues, command buffers, etc. And each logical device is separate... mostly.
Now, you can bypass the whole "instance" thing and load devices manually. But you really shouldn't. At least, not unless you're at the end of development. Vulkan layers are far too critical for day-to-day debugging to just opt out of that.
There are mechanisms, core in Vulkan 1.1, that allow individual devices to be able to communicate some information to other devices. In 1.1, only certain kinds of information can be shared across physical devices (namely, fences and semaphores, and even then, only on Linux through sync files). While these APIs could provide a mechanism for sharing data between two physical devices, at present, the restriction on most forms of data sharing is that both physical devices must have matching UUIDs (and therefore are the same physical device).
SLI setups
Dealing with SLI is covered by two Vulkan 1.0 extensions: KHR_device_group and KHR_device_group_creation. The former is for dealing with "device groups" in Vulkan, while the latter is an instance extension for creating device-grouped devices. Both of these are core in Vulkan 1.1.
The idea with this is that the SLI aggregation is exposed as a single VkDevice, which is created from a number of VkPhysicalDevices. Each internal physical device is a "sub-device". You can query sub-devices and some properties about them. Memory allocations are specific to a particular sub-device. Resource objects (buffers and images) are not specific to a sub-device, but they can be associated with different memory allocations on the different sub-devices.
Command buffers and queues are not specific to sub-devices; when you execute a CB on a queue, the driver figures out which sub-device(s) it will run on, and fills in the descriptors that use the images/buffers with the proper GPU pointers for the memory that those images/buffers have been bound to on those particular sub-devices.
Alternate-frame rendering is simply presenting images generated from one sub-device on one frame, then presenting images from a different sub-device on another frame. Split-frame rendering is handled by a more complex mechanism, where you define the memory for the destination image of a rendering command to be split among devices. You can even do this with presentable images.
In vulkan you need to enumerate the devices and select the one you want to work with. There will be nothing stopping you from trying to work with 2 different ones separately. Each vulkan call needs at least 1 parameter as context. The loader layer will then forward the call to the correct driver. Or you can load the functions for each device separately to avoid the loader's trampoline.
A generated frame will need to be forwarded to the card that is connected to the screen for display. So it's more likely that 1 GPU is responsible for graphics and the others are used for physics.
Only a single device can be connected to a specific surface at a time so that device needs to get the rendered frame to copy it into the renderable image that gets pushed to the screen.
Device group is the way to go. Look at the vulkan specification for documentation. Vulkan handle all the dispatch to the others GPUs (when they are connected by sli/crossfire). All you need to do is to tell vulkan how the dispatch is done (for example dispatch one frame on a GPU and the next on another one). If you need to do compute work you will need to address each GPU individually. Please find a link for a reference: https://www.ea.com/seed/news/khronos-munich-2018-halcyon-vulkan
Is it possible to display a black dot by changing values in the screen(video ie monitor) memory map in RAM using a c program?
I don't want to use any library functions as my primary aim is to learn how to develop a simple OS.
I tried accessing the starting screen memory map ie 0xA0000 (in C).
I tried to run the program but got a Segmentation Fault since no direct access is provided. In super user, the program gets executed without any change.
Currently I am testing in VirtualBox.
A "real" operating system will not use the framebuffer at address 0xA0000, so you can't draw on the screen by writing to it directly. Instead your OS probably has proper video drivers that will talk to the hardware in various very involved ways.
In short there's no easy way to do what you want to do on a modern OS.
On the other hand, if you want to learn how to write your own OS, then it would be very good practice to try to write a minimal kernel that can output to the VGA text framebuffer at 0xB8000 and maybe then the VGA graphic framebuffer at 0xA0000.
You can start using those framebuffers and drawing on the screen almost immediately after the BIOS jumps to your kernel, with a minimal amount of setting up. You could do that directly from real mode in maybe a hundred lines of assembler tops, or perhaps in C with a couple lines of assembler glue first.
Even simpler would be to have GRUB set up the hardware, boot your minimal kernel, and you can directly write to it in a couple lines.
Short answer is no because the frame buffer on modern operating systems is setup as determined by the vbios and kernel driver(s). It depends on amount of VRAM present on the board, the size of the GART, physical Ram present and a whole bunch of other stuff (VRAM reservation, whether it should be visible to CPU or not, etc). On top of this, modern OS's are utilizing multiple back buffers and flipping the HW to display between these buffers, so even if you could directly poke to the frame buffer, the address would change from frame to frame.
If you are interesting in do this for learning purposes, I would recommend creating a simple OGL or D3D (for example) 'function' which takes a 'fake' system allocated frame buffer and presents it to the screen using regular HW operations.
You could even set the refresh up on a timer to fake update.
Then your fake OS would just write pixels to the fake system memory buffer and this rendering function would take care of displaying it as if it were real.
I was searching for the difference between these two in terms of GPU reading speed and occasionally CPU writing(less then once per frame or even only once). I don't want to use D3D11_USAGE_DYNAMIC cause data will not be updated >= once per frame.
Is there a significant performance increase with Default + Staging combo over the Default buffer?
The best performance advice for Direct3D 11 here is actually the same as Direct3D 10.0. Start by reviewing this talk Windows to Reality: Getting the Most out of Direct3D 10 Graphics in Your Games from Gamefest 2007.
To your question, any updates to a resource (texture or buffer) will have some potential performance impact due, but for 'occasional' updates, the best option is to use a STAGING resource and then CopyResource to a DEFAULT resource for actual rendering.
DYNAMIC for textures should be reserved for very frequent updates (like say video texture playback), and of course for vertex buffers when doing dynamic draw submission. Constant buffers are really intended to be DYNAMIC or use UpdateSubresource which really depends on your update pattern (a topic covered in the talk above).
Whenever possible, creating resources as IMMUTABLE and with pInitialData is the best option with DirectX 11 as it potentially allows the driver some opportunity for multi-threaded resource creation which is more efficient.
The main thing to be aware of with this pattern is that STAGING resources can result in virtual-memory fragmentation which can be a problem for 32-bit (X86) apps, so you should try to use them rather than creating a lot of them or destroying them and recreating them. See the talk "Why Your Windows Game Won't Run In 2,147,352,576 Bytes" which is attached to this blog post.
I'd suggest your initial version would be to try to use DEAFAULT+UpdateSubresource and then compare it with a DEAULT+STAGING+CopyResource solution because it really heavily depends on your content and code.
Motivation - Write a program in C (and Assembly, if required) to color a rectangular area in the screen red.
STRICT requirements - GNU/Linux running with the bare minimum utilities and interfaces in text/console mode. So no X (or equivalent like Wayland/Mir), no non-default (outside POSIX, LSB, etc. provided by the kernel) library or interface and no extra assumptions except the presence of the device driver for the monitor.
Effectively, what I am looking for is information on how to write a program which will eventually send a signal through the VGA port and cable to the monitor to color a particular portion of the screen red.
Apologies if this sounds rude, but no "Why do you want to do this?" or "Why don't you use ABC library?" answer. I am trying to understand how to write an implementation of the X server or a kernel framebuffer (/dev/fb0) library for example. It is ok to provide a link to the source of a C library.
no extra assumptions except the presence of the device driver for the monitor.
That means you can use X or Wayland, because those are the graphics driver infrastructure on Linux.
Linux (the Kernel) by itself doesn't contain any graphics primitives. It provides some interfaces to talk to the GPU, allocate memory on it and configure the on-screen framebuffer. But except raw framebuffer memory access the Linux kernel does have no way to perform drawing operations. For that you need some infrastructure in userspace.
Wayland builds on top of DRI2, which in turn talks to the DRM Kernel-API. Then you require GPU dependent state tracker. Mesa has state trackers for a number of GPUs and provides OpenGL and OpenVG frontends.
The NVidia and ATI propiatary, closed source graphics drivers are designed to work with X only. So with those to make use of the GPU you must use X. That's the way it is.
Outside of that you can manipulate the on-screen framebuffer memory through /dev/fbdev, but that's mere pixel pushing, without any GPU acceleration.
Once upon a time we we had svgalib (or was it called vgalib?) which did exactly what you are trying to do. I would recomend that you look at its source code. I don't know if it is still possible to find it anywhere, or if it would actually work with a modern kernel. Just whatever you do, be prepared to reboot often.
For whatever it's worth, for any future viewers, I have found a decent tutorial at http://betteros.org/tut/graphics1.php. It goes through Framebuffer, DRI/DRM, and X Windows at "the lowest level possible" (basically ioctls and file read/write).
I have gotten both the Framebuffer and DRI/DRM examples to work on both QEMU Debian (on a MacOS Host) and a Raspberry Pi, with a bit of modification for the latter.
I've been looking for a answer to this question for some time. Anyone know how to do it?
I've got some ideas, can you tell me if they are valid and which is the best one to use(if there are actually suitable solutions).
Create a single directx9 device. Make a copy for the different threads. Render the loading screen(with already loaded buffers) while loading the new level assets and creating their Vertex and index buffers.
Create 2 different directx9 devices. One for each thread. One device is responsible for rendering only(and is attached to the window) and the other has no rendering surface and is taking care of making and filling the buffers.
Create a device with a thread safety flag(I think there is such thing, but it may not be called this way) and do the same as in 1.
Thanks!
If you simply want to load a level, then you don't really need separate thread for that. You could repaint scene while loading resources, for example. I'd advise to avoid multithreading unless you can't live without it.
If you still want multithreading, pass D3DCREATE_MULTITHREADED into IDirect3D9::CreateDevice. Note that DirectX SDK explicitly warns that using this flag may degrade performance.
Creating single device is preferred solution, i.e. I"d advise to use #1 .
It Is possible to share resources between several devices, but this functionality is available only on windows vista. Because people still use WinXP today, if you use something like that, your users will hate you.