I'm developing an OpenGL application using OpenGL2.1 and want to upload textures via threads.
What I have done so far:
Create a second context and share between the two
Upload texture data in a thread
Everything is working fine, except that I notice a small "lag" when the texture upload happens! I know this is because the driver have to synchronize the two contexts. The problem is that I want it to stream the texture. I don't want to update the texture later. I just want to load textures in the background while displaying an "almost smooth" loading animation without stalling the whole application.
That's the point I searched and found that PBOs can be used for DMA data transfer of pixel data. Is it possible to use a PBO for texture upload? If so, how?
You don't need a second context to upload the texture data async. Just make sure you don't use the buffer right after triggering the upload, or it will stall waiting for the copy to finish.
Here's an example of this process: http://www.songho.ca/opengl/gl_pbo.html#unpack
And here's a bit more info about what PBOs are and how they should be used: http://www.opengl.org/wiki/Pixel_Buffer_Object
Related
I have a openGL application which is rendering data into a rgba texture. I want to encode and stream it using gstreamer framework (using nvenc plugin for h264 encoding).
I was looking through the documentation to solve these problems:
How to export the existing openGL context of the app to nvenc element.
How to pass the texture id to source from?
How will synchronization work. i.e nvenc has to wait for rendering to finish and similarly app has to wait for nvenc to finish reading from the texture. I am assuming it would either involve using sync fences or glMemoryBarriers.
Any sample code would really be really helpful.
I do want to avoid any texture copies to cpu memory. Nvidia's NVENC sdk mentions that it uses CUDA context to make the calls, and an openGL texture can be imported into CUDA context using cudaGraphicsGLRegisterImage call. So my expectation is that from app to video encoded frame can be done without any copies.
I know this is an old question, but just in case someone else hit this problem...
If your NVENC calls and OpenGL app is in the same thread, you don't need to do anything with the context.
If not, you should probably create two OpenGL contexts, one for rendering, one for encoding. The two contexts should share objects as explained in https://www.khronos.org/opengl/wiki/OpenGL_Context.
You can also create only one context and transfer the context between threads by making it "current" to the thread that's accessing the OpenGL objects, but I found the two contexts way easier.
Texture id is an integer, just pass it.
NvEncMapInputResource "provides synchronization guarantee that any graphics or compute work submitted on the input buffer is completed before the buffer is used for encoding". NvEncEncodePicture has "synchronous mode of encoding".
As of today, NVENC supports OpenGL encode device on linux, so you don't have to register OpenGL texture in CUDA. NVENC can directly access the OpenGL texture, so there's no memory copy on the client side.
If you're working on windows, I believe you can create a CUDA encode device, then get a CUarray from an OpenGL texture, and NVENC can access the CUarray.
Sample code of OpenGL and CUDA encode device can be found in samples of NVENC SDK.
EDIT:
The synchronization guarantee of NvEncMapInputResource seems to hold only in single thread case (or in the same GL context?). Adding a sync object before mapping is mandatory if rendering and encoding are happening in different threads and contexts.
I'm trying to use the MediaFoundation API to encode a video but I'm having problems pushing the samples to the SinkWriter.
I'm getting the frames to encode through the Desktop Duplication API. What I end up with is an ID3D11Texture2D with the desktop image in it.
I'm trying to create an IMFVideoSample containing this surface and then push that video sample to a SinkWriter.
I've tried going about this in different ways:
I called MFCreateVideoSampleFromSurface(texture, &pSample) where texture is the ID3D11Texture2D, filled in the SampleTime and SampleDuration and then passed the created sample to the SinkWriter.
SinkWriter returned E_INVALIDARG.
I tried creating the sample by passing nullptr as the first argument and creating the buffer myself using MFCreateDXGISurfaceBuffer, and then passing the resulting buffer into the Sample.
That didn't work either.
I read through the MediaFoundation documentation and couldn't find detailed information on how to create the sample out of a DirectX texture.
I ran out of things to try.
Has anyone out there used this API before and can think of things I should check, or of any way on how I can go about debugging this?
First of all you should learn to use mftrace tool.
Very likely, it will tell you the problem right away.
But my guess is, following problems are likely.
Probably, some other attributes are required besides SampleTime / SampleDuration.
Probably, SinkWriter needs a texture it can read on CPU. To fix that, when a frame is available, create a staging texture of the same format + size, call CopyResource to copy desktop to staging texture, then pass that staging texture to MF.
Even if you use a hardware encoder so the CPU never tries to read the texture data, I don’t think it’s a good idea to directly pass your desktop texture to MF.
When you set a D3D texture for sample, no data is copied anywhere, the sample merely retains the texture.
MF works asynchronously, it may buffer several samples in its topology nodes if they want to.
DD gives you data synchronously, you may only access the texture between AcquireNextFrame and ReleaseFrame calls.
I understand that you usually create complex 3D models in Blender or some other 3D modelling software and export it afterwords as .obj. This .obj file gets parsed into your program and openGL will render it. This as far as I understand real-time rendering.
Now I was wondering if there is something like pre-rendered objects. I'm a little bit confused because there are so many articles/videos about real-time rendering but I haven't found any information about none real-time rendering. Does something like this exists or not? The only thing which would come into my mind as none real-time rendering would be a video.
I guess this is pretty much a yes or no question :) but if it exists maybe someone could point me to some websites with explanations.
"Real-time rendering" means that the frames are being generated as fast as they can be displayed. "Non-real-time rendering", or "offline rendering" means generating frames one at a time, taking as much time as necessary to achieve the desired image quality, and then later assembling them into a movie. Video at the quality of video games can be rendered in real time; something as elaborate as a Pixar movie, though, has to be done in offline mode. Individual frames can still take hours of rendering time!
It's not entirely clear what you mean by "prerendered objects", however there are things called VBOs and Vertex Arrays that store the object's geometry in VRAM so as to not have to load it into the rendering pipeline using glVertex3f() or similar every frame. This is called Immediate Mode.
VBOs and Vertex arrays are used instead of immediate mode because they're far faster than calling the graphics driver to load data into VRAM for every vertex because they are kept in VRAM, which is faster than normal RAM, ready to be booted into the render pipeline.
The page here may help, too.
There's nothing stopping you from rendering to an off-screen frame-buffer (i.e., an FBO) and then saving that to disk rather than displaying it to the screen. For instance, that's how GPGPU techniques used to work before the advent of CUDA, OpenCL, etc. ... You would load your data as an unfiltered floating point texture, perform your calculation using pixel-shaders on the FBO, and then save the results back to disk.
In the link I posted above, it states in the overview:
This extension defines a simple interface for drawing to rendering
destinations other than the buffers provided to the GL by the
window-system.
It then goes on to state,
By allowing the use of a framebuffer-attachable image as a rendering
destination, this extension enables a form of "offscreen" rendering.
So, you would get your "non-real-time" rendering by rendering off-screen some scene that renders slower than 30fps, and then saving those results to some movie file or file-sequence format that can be played back at a later date.
So what I need is simple: Imagine we have no gui at all - ssh access to some linux where we gonna build and host our app. That app would generate video stream. We have some SDL app with OpenGL shader in it. All we want is to get rendering (as normally we would have in SDL window) as a char* (with size W*H*3) How to do such thing? How to make SDL render stuff not onto its gui window but into some swappable pointer?
To be of any use, OpenGL should be hardware accelerated, so first check if your server does have a GPU that meets your requirements. If you're on a rented virtual server or some standard root server, then you very likely don't have a GPU.
If you have a GPU, then there are two possible methods:
Method 1 -- the easy one
You'll (unfortunately) have to configure and start the X server for it and this X server must also be the current virtual terminal (i.e. it must be the active thing on the graphics card). Then you give the user who'll be running that video generator access to that X display (read man xauth and what it references)
The next step is independent of SDL, it's an OpenGL think: Create a Framebuffer Object onto which the desired graphics is rendered; a PBuffer would work as well, and actually I'd prefer it in this situation, however I found Framebuffer Objects be more reliable than PBuffers on current Linux and its drivers.
Then render to this Framebuffer Object or PBuffer as usual and retrieve the content using glReadPixels
Method 2 -- the flexible one
On the low level this is quite similar to Method 1, but things get abstracted for you: Get VirtualGL http://www.virtualgl.org/ to perform the actual OpenGL rendering on the GPU. Instead of starting your application on a secondary X server you make direct use of the VirtualGL server provided sending the GLX stream and get a JPEG image stream back. You could also use a secondary X server running a virtual framebuffer and take a continous screencapture of that. Or probably most elegant: Write your own X.Org video driver that passes the video to the video streamer directly.
You cannot directly render to a byte array in OpenGL.
There are two ways to work with this. The first way is the simplest and doesn't require context gimmickery, and the second way does.
So first, the simple way.
In order for OpenGL to work, you need to have a window. That doesn't mean the window needs to be visible, but you need to create one to get a valid OpenGL context. Therefore Step 1: Create a window and minimize it.
Now, in order to get valid rendering, the pixels in the framebuffer must pass the "pixel ownership test." When rendering to the framebuffer that holds the screen itself, pixels of the window that are not actually visible on screen fail the pixel ownership test. So the values of those pixels are undefined if you use glReadPixels.
However, this only pertains to the default framebuffer that is associated with the window. Framebuffer objects always pass the pixel ownership test. Therefore, Step 2: Create a framebuffer object and the associated renderbuffers for your needs.
From there, it's pretty simple. Just render as normal and do a glReadPixels when you want to get the data. Pixel buffer objects can be used to asynchronous transfer pixel data, if performance is a concern. Step 3: Render and use glReadPixels to get the data.
The second way is more widely available (FBOs require extension support or OpenGL 3.0), but more platform-specific.
Instead of creating an FBO in step 2, you instead have Step 2: use glXCreatePbuffer to create a pbuffer. A pbuffer is an off-screen render target that acts like the default framebuffer. You glXMakeContextCurrent to tell OpenGL to render to the pbuffer instead of the default framebuffer.
Steps 1 and 3 are the same as above.
Currently I am loading an image in to memory on a 2nd thread, and then during the display loop (if there is a texture load required), load the texture.
I discovered that I could not load the texture on the 2nd thread because OpenGL didn't like that; perhaps this is possible but I did something wrong - so please correct me if this is actually possible.
On the other hand, if my failure was valid - how do I load a texture without disrupting the rendering loop? Currently the textures take around 1 second to load from memory, and although this isn't a major issue, it can be slightly irritating for the user.
You can load a texture from disk to memory on any thread you like, using any tool you wish for reading the files.
However, when you bind it to OpenGL, it's going to need to be handled on the same thread as the rendering for that OpenGL context. That being said, this discussion suggests that using a PBO in a second thread is an option, and can speed up the process.
You can certainly load the texture from disk into RAM in any number of threads you like, but OpenGL won't upload to VRAM in multiple threads for the reason mentioned in Reed's answer.
Given the loading from disk is the slowest part, thats the bit you'll probably want to thread. The loading thread(s) build up a queue of textures to be uploaded, then this queue is consumed by the thread that owns the GL context (mind your access to that queue by the various threads however). You could also consider a non-threaded approach of uploading N textures per frame, where N is a number that doesn't slow the rendering down too much.