How do GPUs support both OpenGL and DirectX API execution? - opengl

Im trying to understand how OpenGL and DirectX work with the graphic card.
If i write a program in OpenGL that do a triangle, and another one in DirectX that do the same thing, what exactly happen to the GPU side?
Does when we run the program, every call to the OpenGL library and every call to DirectX library will produce code for GPU, and the GPU's machine code produced from the two program will be the the same? (Like if the DirectX and OpenGL are like Java Bytecode, precompiled, then when its actually running, it produce the same thing)
Or does the GPU have 2 different instruction set, one for each. I mean, what is exaclty OpenGL and DirectX for the GPU, how can it do the difference between the 2 API?
Is this only different from programmer perspective?

I already answered those here On Windows, how does OpenGL differ from DirectX?
full quote of one of the answers follows
This question is almost impossible to answer because OpenGL by itself is just a front end API, and as long as an implementations adheres to the specification and the outcome conforms to this it can be done any way you like.
The question may have been: How does an OpenGL driver work on the lowest level. Now this is again impossible to answer in general, as a driver is closely tied to some piece of hardware, which may again do things however the developer designed it.
So the question should have been: "How does it look on average behind the scenes of OpenGL and the graphics system?". Let's look at this from the bottom up:
At the lowest level there's some graphics device. Nowadays these are GPUs which provide a set of registers controlling their operation (which registers exactly is device dependent) have some program memory for shaders, bulk memory for input data (vertices, textures, etc.) and an I/O channel to the rest of the system over which it recieves/sends data and command streams.
The graphics driver keeps track of the GPUs state and all the resources application programs that make use of the GPU. Also it is responsible for conversion or any other processing the data sent by applications (convert textures into the pixelformat supported by the GPU, compile shaders in the machine code of the GPU). Furthermore it provides some abstract, driver dependent interface to application programs.
Then there's the driver dependent OpenGL client library/driver. On Windows this gets
loaded by proxy through opengl32.dll, on Unix systems this resides in two places:
X11 GLX module and driver dependent GLX driver
and /usr/lib/ may contain some driver dependent stuff for direct rendering
On MacOS X this happens to be the "OpenGL Framework".
It is this part that translates OpenGL calls how you do it into calls to the driver specific functions in the part of the driver described in (2).
Finally the actual OpenGL API library, opengl32.dll in Windows, and on Unix /usr/lib/; this mostly just passes down the commands to the OpenGL implementation proper.
How the actual communication happens can not be generalized:
In Unix the 3<->4 connection may happen either over Sockets (yes, it may, and does go over network if you want to) or through Shared Memory. In Windows the interface library and the driver client are both loaded into the process address space, so that's no so much communication but simple function calls and variable/pointer passing. In MacOS X this is similar to Windows, only that there's no separation between OpenGL interface and driver client (that's the reason why MacOS X is so slow to keep up with new OpenGL versions, it always requires a full operating system upgrade to deliver the new framework).
Communication betwen 3<->2 may go through ioctl, read/write, or through mapping some memory into process address space and configuring the MMU to trigger some driver code whenever changes to that memory are done. This is quite similar on any operating system since you always have to cross the kernel/userland boundary: Ultimately you go through some syscall.
Communication between system and GPU happen through the periphial bus and the access methods it defines, so PCI, AGP, PCI-E, etc, which work through Port-I/O, Memory Mapped I/O, DMA, IRQs.


kernel vs user-space audio device driver on macOS

I'm in a need to develop an audio device driver for System Audio Capture(based on Soundflower). But soon a problem appeared that it seems IOAudioFamily stack is being deprecated in OSX 10.10 and later. Looking through the IOAudioDevice and IOAudioEngine header files it seems that apple recommends now using the <CoreAudio/AudioServerPlugIn.h> API which runs in user-space. But I can't find lots of information on this user-space device drivers topic. It seems that the only resource is the Apple provided sample devices from
Looking through the examples I find that its a lot harder and more work to develop a user-space driver instead of I/O Kit kernel based.
So the question arises what should motivate to develop a device driver in user-space instead of kernel space?
The "SimpleAudioDriver" example is somewhat misnamed. It demonstrates pretty much every feature of the API. This is handy as a reference if you actually need to use those features. It's also structured in a way that's maybe a little more complicated than necessary.
For a virtual device, the NullAudioDriver is probably a much better base, and much, much easier to understand (single source file, if I remember correctly). SimpleAudioDriver is more useful for dealing with issues such as hotplugging, multiple instances of identical devices, etc.
IOAudioEngine is deprecated as you say, and has been since OS X 10.10. Expect it to go away eventually, so if you build your driver with it, you'll probably need to rewrite it sooner than if you create a Core Audio Server Plugin based one.
Testing and debugging audio drivers is awkward either way (due to being so time sensitive), but I'd say userspace ones are slightly less frustrating to deal with. You'll still want to test on a different machine than your development Mac, because if coreaudiod crashes or hangs, apps usually start locking up too, so being able to just ssh in, delete your plugin and kill coreaudiod is handy. Certainly quicker turnaround than having to reboot.
(FWIW, I've shipped both kernel and userspace OS X audio drivers, and I spend a lot of time working on kexts.)
There is a great book on this subject, available free online here:
See page 37 for a summary of why you might want a user-space driver, copied here for convenience:
The advantages of user-space drivers are:
The full C library can be linked in. The driver can perform many exotic tasks without resorting to external programs (the utility
programs implementing usage policies that are usually distributed
along with the driver itself).
The programmer can run a conventional debugger on the driver code without having to go through contortions to debug a running kernel.
If a user-space driver hangs, you can simply kill it. Problems with the driver are unlikely to hang the entire system, unless the hardware
being controlled is really misbehaving.
User memory is swappable, unlike kernel memory. An infrequently used device with a huge driver won’t occupy RAM that other programs could
be using, except when it is actually in use.
A well-designed driver program can still, like kernel-space drivers, allow concurrent access to a device.
If you must write a closed-source driver, the user-space option makes it easier for you to avoid ambiguous licensing situations and
problems with changing kernel interfaces.

How does an OpenGL program interface with different graphic cards

From what I understand (correct me if I am wrong), the OpenGL api converts the function calls written by the programmer in the source code into the specific gpu driver calls of our graphic card. Then, the gpu driver is able to really send instructions and data to the graphic card through some hardware interface like PCIe, AGP or PCI.
My question is, does openGL knows how to interact with different graphic cards because there are basically only 3 types of physical connections (PCIe, AGP and PCI)?
I think it is not that simple, because I always read that different graphic cards have different drivers, so a driver is not just a way to use the physical interfaces, but it serves also the purpose to have graphic cards able to perform different types of commands (which are vendor specific).
I just do not get the big picture.
How are Direct3D and OpenGL instructions handled in a graphics card?

I am trying to understand better how GPUs work, and I am confused about how they handled high level APIs like Direct3D or OpenGL. It is very common to see graphic cards advertising they support Direct3D and OpenGL hardware acceleration. Does this mean that they handle Direct3D and OpenGL instructions directly in hardware?
I haven't been able to find clear evidence to this, or to them being compiled to an assembly representation that the GPU can handle. If there is such a conversion who does that? The software library (Direct3D/OpenGL), the driver or the GPU itself?
On that same line, where is the graphics pipeline defined? in the gpu hardware, the driver, or the software library? This confuses me specially with the idea of programmable pipelines.
Is there a good resource where I can find information about these details?
You have asked a very broad and complicated question. Actually, you have asked several broad, complicated questions.
The software that has final governance over the operation of any hardware is called the hardware's "driver". Naturally, for graphics hardware, this is called the "graphics driver." Like all drivers, the graphics driver is effectively an installable part of the OS; the OS is what allows the graphics driver to do its job and talk to the hardware. The two work hand in hand.
There are effectively two kinds of D3D or OpenGL (heretofore known as "the API") calls: those that talk to the driver and those that do not. Every call that actually draws something needs to (eventually) talk to the driver, but calls that set up later drawing calls may just store data locally.
When you make a drawing call, the API does some checks to make sure that you as the user have made a valid rendering call. If so, the API has some options as to what to do. It turns out that talking directly to the driver takes a long time, regardless of how many commands you give it when you start talking. Therefore, what often happens is that the API stores your rendering call and returns immediately. Then, possibly in another thread, it may look to see how many rendering calls have been stored. If there are "enough", then it will forward them to the driver. This is called "marshalling".
The driver's job is to take these calls that have been forwarded and convert them into stuff that the GPU will do.
On that same line, where is the graphics pipeline defined? in the gpu hardware, the driver, or the software library?
That's actually a pretty tricky question these days, and becoming trickier every hardware generation.
In the old days, the construction of the graphics pipeline was rigidly controlled by the GPU hardware. These days, this is less true, though there is some hardware control. On modern hardware (capable of OpenGL 3.0 or Direct3D10 or better), it would be theoretically possible, if you had direct access to the graphics driver, to design an API that used a somewhat altered version of the graphics pipeline. So the APIs dictate much of what the graphics pipeline looks like.
Each stage in the rendering pipeline takes certain values from the precious stage(s) as input and generates some number of values as output. A stage is "programmable" if the mechanism for generating the outputs from the inputs involves executing a user-supplied program, called a "shader". So there is no such thing as a programmable pipeline (yet); just programmable stages of a fixed pipeline.
There's no such thing as D3D or OGL instructions. Direct3D or OpenGL will call into the graphics driver and they will perform whatever they need to do to make it happen. This is not completely true of shaders, which do have a uniform bytecode at the API (D3D/OGL) level, and in this case, the API provides a compiler, but those are, as far as I know, still transformed in hardware-dependent ways before being executed. Of course, Direct3D and OpenGL also include user-mode components to improve performance or provide a better interface- for example, they will batch calls to the kernel to reduce context switches.
The reality of GPU making is that Microsoft and nVidia/ATi get together and think about what they want and what's feasible to implement, and come up with a group specification, as the reality is that none of this would work if the major hardware and software vendors didn't co-operate. Nobody will buy a GPU that doesn't support DirectX- and nobody will buy Windows where no GPU implements DirectX. Of course, "nobody" is relative- but it would be a huge loss for all concerned, and of course, if you have a game that is built to only the D3D10 API, then the driver supporting D3D10 is a must to run the game- effectively increasing the value of the product by increasing the range of software it can run, which is a selling point. This means that the semantic difference between being defined by the hardware vendor or software vendor is minimal, realistically- especially as the only two real 3D rendering API's on the PC, OpenGL and Direct3D, follow very similar models for the graphical pipeline, as far as I know.
However, with the new programmable GPUs, you could argue that the graphical pipeline doesn't really exist- a DX11 device can be used for any graphics pipeline you can conceive of, if you have the patience to program it.
Ultimately, the GPU is protected by a strong driver-level abstraction. It implements a C-style interface, and whatever's permitted or necessary in that implementation goes. Everything after that is completely implementation-defined.
You could check out the MSDN documentation for writing a graphics driver. I've seen it, but don't have a link handy, and it describes the interfaces that you must adhere to and other things.
You already got two very good answers. But maybe the best thing is, reading the actual programming documentation for AMD/ATI's GPUs:
Unfortunately NVidia won't publish theirs.

How do I add a virtual GPU into Qemu?

I was wondering how to go about adding a virtual GPU into Qemu?
I have been told it involves adding a new graphics output module that uses OpenGL?
You probably refer to Create virtual hardware, kernel, qemu for Android Emulator in order to produce OpenGL graphics
The very first thing I suggest you do is reading the source code how commands to the virtual graphics adaptors already implemented are turned into graphical output. Then you should rewrite this, to use OpenGL commands instead. Once you got this you must literally invent a new, virtual GPU to offer the guest system. I'd not even attempt to emulate a GeForce or Radeon. GeForces are not publically documented anyway.
qemu doesn't provide a real kind of API for implementing a GPU. Of course there's some internal API for that, used to implement that VESA and S3 emulation, but a new GPU will require you to redo a lot of that again.
The virtual hardware should offer some I/O to pass drawing commands and data. In theory you could pass the full set OpenGL commands there. However OpenGL is hardware agnostic, whereas you actually implement "hardware", so you must find some balance there. Then in qemu you must implement that virtual hardware to execute the rendering commands apropriately.
Last but not least you must implement drivers for that virtual hardware, which will involve adding a new driver to Mesa and creating a driver for Xorg.

Creating a native application for X86?

Is there a way I could make a C or C++ program that would run without an operating system and that would draw something like a red pixel to the top left corner? I have always wondered how these types of applications are made. Since Windows is written in C I imagine there is a way to do this.
If you're writing for a bare processor, with no library support at all, you'll have to get all the hardware manuals, figure out how to access your video memory, and perform whatever operations that hardware requires to get a pixel drawn onto the display (or a sound on the beeper, or a block of memory read from the disk, or whatever).
When you're using an operating system, you'll rely on device drivers to know all this for you. Programs are still written, every day, for platforms without operating systems, but rarely for a bare processor. Many small MPUs come with a support library, usually a set of routines that lets you manipulate whatever peripheral devices they support.
It can certainly be done. You typically write the code in C, and you pretty much have to do everything on your own, with no standard library. To set your pixel, you'd usually load a pointer to the physical address of the screen, and write the correct value to that pointer. Alternatively, on a PC you could consider using the VESA BIOS. In all honesty, it's fairly similar to the way most code for MS-DOS was written (most used MS-DOS to read and write data on disk, but little else).
The core bootloader and the part of the Kernel that bootstraps the OS are written in assembly. See for a brief writeup of how an operating system boots. There's no way I'm aware of to write a bootloader or Kernel purely in a higher level language such as C or C++ without using assembly.
You need to write a bootstrapper and a loader combination followed by a payload which involves setting the VGA mode manually by interrupt, grabbing a handle to the basic video buffer and then writing a value to the 0th byte.
Start here:
Without an OS it's difficult to have a loader, which means no dynamic libc. You'd have to link statically, as well as have a decent amount of bootstrap code written in assembly (although it could be provided as object files which you could then link with). Also, since you'd be at the mercy of whatever the system has, you'd be stuck with the VESA video modes (unless you want to write your own graphics driver and subsystem, which you don't).
There is, but not generally from within the OS. Initially, they are an asm stub that's executed from the MBR on the drive. See MBR. For x86 processors, this is generally 16-bit processing code, this generally jumps into the operating system code from here, and upgrades to 32-bit/64-bit mode depending on the operating system and chipset.