How to convert data from usb to OpenGL texture passby CPU? - opengl

What't the best data-pass from usb-camera to opengl texture?
The only way I know is usb-camera -> (cv.capture()) cv_image -> glGenTexture(image.bytes)
Since CPU would parse the image for each frame, frame rate is lower.
Is there any better way?
I'm using nvidia jetson tx2, is there some way relative to the environment?

Since USB frames must be reassembled anyway by the USB driver and UVC protocol handler, the data is passing through the CPU anyway. The biggest worry is having redundant copy operations.
If the frames are transmitted in M-JPEG format (which almost all UVC compliant cameras do support), then you must decode it on the CPU anyway, since GPU video decoding acceleration HW usually doesn't cover JPEG (also JPEG is super easy to decode).
For YUV color formats it is advisable to create two textures, one for the Y channel, one for the UV channels. Usually YUV formats are planar (i.e. images of a single component per pixel each), so you'd make the UV texture a 2D array with two layers. Since chroma components may be subsampled you need the separate textures to support the different resolutions.
RGB data goes in is a regular 2D texture.
Use a pixel buffer object (PBO) for transfer. By mapping the PBO into host memory (glMapBuffer) you can decode the images coming from the camera directly into that staging PBO. After unmapping a call to glTexSubImage2D will then transfer the image to the GPU memory – in the case of a unified memory architecture this "transfer" might be as simple as shuffling around a few internal buffer references.
Since you didn't mention the exact API used to access the video device, it's difficult to give more detailed information.

Related

Is there a direct way to render/encode Vulkan output as an ffmpeg video file?

I'm about to generate 2D and 3D music animations and render them to video using C++. I was thinking about using OpenGL, but I've read that, unfortunately, it is being discontinued in favour of Vulkan, which seems to offer higher performance using a GPU, but is also a lower-level API, making it more difficult to learn. I still have almost no knowledge in both OpenGL and Vulkan, beginning to learn now.
My question is:
is there a way to encode the Vulkan render output (showing a window or not) into a video file, preferentially through FFPMEG? If so, how could I do that?
Requisites:
Speed: the decrease in performance should be nearly that of encoding the video only, not much more than that (e.g. by having to save lossless frames as images first and then encoding a video from them).
Controllable FPS and resolution: the video fps and frame resolution can be freely chosen.
Reliability, reproducibility: running a code that gives a same Vulkan output twice should result in 2 equal videos independently of the system, i.e. no dropping frames, async problems (I want to sync with audio) or whatsoever. The chosen video fps should stay fixed (e.g. 60 fps), no matter if the computer can render 300 or 3 fps.
What I found out so far:
An example of taking "screenshots" from Vulkan output: it writes to a ppm image at the end, which is a binary uncompressed image file.
An encoder for rendering videos from OpenGL output, which is what I want, but using OpenGL in that case.
That Khronos includes in the Vulkan API a video subset.
A video tool to decode, demux, process videos using FFMPEG and Vulkan.
That is possible to render the output into a buffer without the need of a screen to display it.
First of all, ffmpeg is a framework used for video encoding and decoding. Second, if you have no experience with any of the GPU rendering API you should start with OpenGL. Vulkan is very low-level and complicated. OpenGL will be here for a very long time and will not be immediately replaced with Vulkan.
The off-screen rendering option you mentioned is probably the best one. It doesn't really matter though, you can also use the image from the framebuffer. The image is just a matrix of RGBA pixels. You need these data as the input for the video encoding. Please take a look at how ffmpeg works. You need to send the rendered frame data in the encoder which produces video packets that are stored in a video file. You need to chose a container (mp4, mkv, avi,...) and video format (h265, av1, vp9,...). You can of course implement a frame limiter and render the scene with a constant framerate or just pick the frames that have a constant timestep.
The performance problem happens, when you transfer the data from RAM to GPU memory and vice versa. For example, when downloading the rendered image from the buffer and passing it to the CPU encoder. Therefore, the most optimal approach would be with Vulkan, using the new video extension and directly sending the rendered frames in the HW accelerated encoder without any transfers from the GPU memory. You can also run the encoder in a different thread to make it work asynchronously.
But honestly, it's not trivial. The most simple solution (not realtime) for you to create a video from 3D render would be to:
Create a fixed FPS game loop
Make screenshots of the scene by downloading the framebuffer data in OGL or Vulkan
Process the frames by ffmpeg binary to create a video file
Another hack would be to use a screen recording software (OBS, Fraps, etc.) to create the video form your 3D app.

How to upload DXT5 compressed pixel data generated from a GPU operation without a CPU copy?

So what I want to do is:
Load a file encrypted with any algorithm (in my case AES-256) into GPU memory (with CUDA).
Decrypt the file with all the GPU parallel awesomeness we have right now and let it stay in GPU memory.
Now tell OpenGL (4.3) that there is a texture in memory that needs to be read and decompressed from DDS DXT5.
Point 3 is where I have my doubts. Since to load a compressed DDS DXT5 in OpenGL one has to call openGL::glCompressedTexImage[+ 2D|3D|2DARB...] with the compression type (GL_COMPRESSED_RGBA_S3TC_DXT5_EXT) and a pointer to the image data buffer.
So, to make it short -> is there a way to pass a texture buffer address already in GPU memory to OpenGL (in DDS format)? Without this option, I would need to transfer the AES decrypted file back to the CPU and tell OpenGL to load it again into the GPU....
Many thanks for any help or short examples ;)
You need to do two things.
First, you must ensure synchronization and visibility for the operation that generates this data. If you're using a compute shader to generate the data into an SSBO, buffer texture, or whatever, then you'll need to use glMemoryBarrier, with the GL_PIXEL_BUFFER_BARRIER_BIT set. If you're generating this data via a rendering operation to a buffer texture, then you won't need an explicit barrier. But if the FS is writing to an SSBO or via image load/store, you'll still need the explicit barrier as described above.
If you're using OpenCL, then you'll have to employ OpenCL's OpenGL interop functionality to make the result of the CL operation visible to GL.
Once that's done, you just use the buffer as a pixel unpack buffer, just as you would for any asynchronous pixel transfer operation. Compressed textures work with GL_PIXEL_UNPACK_BUFFER just like uncompressed ones.
Remember: in OpenGL, all buffer objects are the same. OpenGL doesn't care if you use a buffer as an SSBO one minute, then use it for pixel transfers the next. As long as you synchronize it, everything is fine.
Bind your buffer to GL_PIXEL_UNPACK_BUFFER and call glCompressedTexSubImage2D with data being an offset into the buffer.
Read more about PBO here.

Is it possible to render a scene on a graphics card and then transfer the image back to system memory

I'd like to know if it is possible to:
Take an image stored in system memory, for example cv::Mat.
Transfer it to the graphics card.
Do some processing on the graphics card using openGL / directX / 3D Engine
(I'm aware OpenCV has functionality for some of it's algorithms but this is not what I'm looking for). For example rendering a mesh.
Then transfer the data back to a cv::Mat.
I'd like to know a good platform independent way of doing this
With OpenGL there are 2 essential functions:
Take an image stored in system memory, for example cv::Mat.
Transfer it to the graphics card.
glTexImage…
Then transfer the data back to a cv::Mat.
glReadPixels
all the rest like FBOs, PBOs and such is just for making things more efficient or to allow for reading back from certain resources, that don't have a direct way to access them.

Does a conversion take place when uploading a texture of GL_ALPHA format (glTexImage2D)?

The documentation for glTexImage2D says
GL_RED (for GL) / GL_ALPHA (for GL ES). "The GL converts it to floating point and assembles it into an RGBA element by attaching 0 for green and blue, and 1 for alpha. Each component is clamped to the range [0,1]."
I've read through the GL ES specs to see if it specifies whether the GPU memory is actually 32bit vs 8bit, but it seems rather vague. Can anyone confirm whether uploading a texture as GL_RED / GL_ALPHA gets converted from 8bit to 32bit on the GPU?
I'm interested in answers for GL and GL ES.
I've read through the GL ES specs to see if it specifies whether the GPU memory is actually 32bit vs 8bit, but it seems rather vague.
Well, that's what it is. The actual details are left for the actual implementation to decide. Giving such liberties in the specification allows actual implementations to contain optimizations tightly tailored to the target system. For example a certain GPU may cope better with a 10 bits per channel format, so it's then at liberty to convert to such a format.
So it's impossible to say in general, but for a specific implementation (i.e. GPU + driver) a certain format will be likely choosen. Which one depends on GPU and driver.
Following on from what datenwolf has said, I found the following in the "POWERVR SGX
OpenGL ES 2.0 Application Development Recommendations" document:
6.3. Texture Upload
When you upload textures to the OpenGL ES driver via glTexImage2D, the input data is usually in linear scanline
format. Internally, though, POWERVR SGX uses a twiddled layout (i.e.
following a plane-filling curve) to greatly improve memory access
locality when texturing. Because of this different layout uploading
textures will always require a somewhat expensive reformatting
operation, regardless of whether the input pixel format exactly
matches the internal pixel format or not.
For this reason we
recommend that you upload all required textures at application or
level start-up time in order to not cause any framerate dips when
additional textures are uploaded later on.
You should especially
avoid uploading texture data mid-frame to a texture object that has
already been used in the frame.

What is the most efficient process to push YUV texture data onto a GPU in OpenGL?

Does anyone know of an efficient way to push 2vuy non-planar data onto a GPU in a way that doesn't require swizzling?
I am grabbing the raw 2vuy data from an h264 video file and successfully loading it into a texture that I map to an an OpenGL object. I notice that my code spends a fair amount of time in glgProcessPixelsWithProcessor. My glTexImage2D call looks like the following:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_YCBCR_422_APPLE,
GL_UNSIGNED_SHORT_8_8_APPLE, data);
Apple says in its OpenGL guide that GL_YCBCR_422_APPLE, provides "acceptable" performance (p103), but that
Note: If your data needs only to be swizzled, glgProcessPixels performs the swizzling reasonably fast although not as fast as if the data didn't need swizzling. But non-native data formats are converted one byte at a time and incurs a performance cost that is best to avoid.
I assume that there is some kind of internal format conversion going on the CPU. I noticed in another thread that glgProcessPixels is running a block method as well.
Is my path the most efficient? If not, what is?
Your code, as it stands right now depends on extensions of Apple. I can't tell what's happening inside.
However what I suggest is, that you create three 2D textures, each with exactly one channel, where each texture receives one of the color planes; using independent textures makes supporting chroma subsampling (that 422) simpler.
In a shader you'd then perform the colorspace conversion. When writing down the math I suggest you do this via a contact color space, like XYZ, as this allows you, to take the color profile of the output device into account; ICC profiles provide the conversion data from XYZ color space coordinates to device color space (RGB) coordinates.