glTexSubImage2D is slow uploading YUV data

glTexSubImage2D is slow uploading YUV data - opengl

I am writing a video player, I decoded the video frames and use 3 sampler2Ds for receiving the YUV format frame data, it renders perfectly fine.
But it has serious performance issue with glTexSubImage2D. I observed strange behaviors:
Uploading YUV data in an order of Y, U, V, then uploading U is randomly slow(1ms to 100ms).
Uploading YUV data in an order of Y, V, U, then uploading V is randomly slow(1ms to 100ms).
Uploading YUV data in an order of U, V, Y, then uploading Y is constantly slow(10ms to 50ms).
For other components that are not slow to upload, they take less than 1ms. I use glTexSubImage2D with internalFormat=GL_RED, format=GL_RED, dataType=GL_UNSIGNED_BYTE.
What could be the possible cause of these strange behaviors?

Related

Modifying CISCO openh264 to take image frames and out compressed frames

Has anyone tried to modify the CISCO openh264 library to take JPEG images as input and compress them into P and I frames (output as frames, NOT video) and similarly to modify decoder to take compressed P and I frames and generate uncompressed-frames ?
I have a camera looking at a static scene and taking pictures (1280x720p) every 30 second. The scene is almost static. Currenlty I am using JPEG compression to compress each frame individually and it is resulting in an image size of ~270KB. This compressed frame is transferred via internet to a storage server. Since there is very little motion in the scene, the 'I' frame size will be very small (I think it should be ~20-50KB). So it will be very cost effective to transmit I frames over internet instead of JPEG images.
Can anyone guide me to some project or about how to proceed with this task ?

You are describing exactly what a codec does. It takes images, and compresses them. There relationship in time is irrelevant to the compression step. The decoder than decides how to display or just write them to disk. You don't need to modify open264, what you want to do is exactly what it is designed to do.

Encode image in JPG with OpenCV avoiding the ghost effect

I have an application (openCV - C++) that grab an image from webcam, encode it in JPG and trasmitt it from a Server to Client. Thwebcam is stereo so actually I have two image, LEFT and RIGHT. In the client, when I recieve the image I decode it and I generate an Anaglyph 3D Effect.
For do this I use the OpenCV...
Well I encode the image in this way:
params.push_back(CV_IMWRITE_JPEG_QUALITY);
params.push_back(60); //image quality
imshow(image); // here the anagliphic image is good!
cv::imencode(".jpg", image, buffer, params);
and decode in this way:
cv::Mat imageRecieved = cv::imdecode( cv::Mat(v), CV_LOAD_IMAGE_COLOR );
What I see is that this kind of encode generate in the Anaglyph image a "ghost effect" (artifact?) so there is a bad effect with the edges of the object. If look a door for example there a ghost effect with the edge of the door. I'm sure that this depends of the encode because if I show the Anaglyph image before encode instruction this work well. I cannot use the PNG because it generate to large image and this is a problem for the connection between the Server and the Client.
I look for the GIF but, if I understood good, is nt supported by the cv::encode function.
So there is another way to encode a cv:Mat obj in JPG withou this bad effect and without increase to much the size of the image?

If your server is only used as an image storage, you can send to the server the 2 original stereo images (compressed) and just generate the Anaglyph when you need it. I figure that if you fetch the image pair (JPEG) from the server and then generate the Anaglyph (client-side), it will have no ghosting. It might be that the compressed pair of images combined is smaller than the Anaglyph .png.

I assume the anaglyph encoding is using line interlacing to combine both sides into one image.
You are using JPEG to compress the image.
This algorithm optimized to compress "photo-like" real world images from cameras, and works very well on these.
The difference of "photo-like" and other images, regarding image compression, is about the frequencies occurring in the image.
Roughly speaking, in "photo-like" images, the high frequency part is relatively small, and mostly not important for the image content.
So the high frequencies can be safely compressed.
If two frames are interlaced line by line, this creates an image with very strong high frequency part.
The JPEG algorithm discards much of that information as unimportant, but because it is actually important, that causes relatively strong artefacts.
JPEG basically just "does not work" on this kind of images.
If you can change the encoding of the anaqlyph images to side by side, or alternating full images from left and right, JPEG compression should just work fine.
Is this an option for you?
If not, it will get much more complicated. One problem - if you need good compression - is that the algorithms that are great for compressing images with very high frequencies are really bad at compressing "photo-like" data, which is still the larger part of your image.
Therefore, please try really hard to change the encoding to be not line-interlacing, that should be about an order of magnitude easier than other options.

Keep alpha-transparency of a video through HDMI

The scenario I'm dealing with is actually as follow: I need to get the screen generated by OpenGL and send it through HDMI to a FPGA component while keeping the alpha channel. But right now the data that is being sent through HDMI is only RGB (24bit without alpha channel) So i need a way to force sending the Alpha bits through this port somehow.
See image: http://i.imgur.com/hhlcbb9.jpg
One solution i could think of is to convert the screen buffer from RGBA mode to RGB while mixing the Alpha channels within the RGB buffer.
For example:
The original buffer: [R G B A][R G B A][R G B A]
The output i want: [R G B][A R G][B A R][G B A]
The point is not having to go through every single pixels.
But I'm not sure if it's possible at all using OpenGL or any technology (VideoCore kernel?)

opengl frame buffer
Do you actually mean a framebuffer, or some kind of texture? Because framebuffers cannot be resized, and the size of this resulting image will be larger in the number of pixels by 25%. You can't actually do that.
You could do it with a texture, but only by resizing it. You would have to get the texel data with glGetTexImage into some buffer, then upload the texel data to another texture with glTexImage2D. You would simply change the pixel transfer format and texture width appropriately. The read would use GL_RGBA, and the write would use GL_RGB, with an internal format of GL_RGB8.
The performance of this will almost certainly not be very good. But it should work.
But no, there is no simple call you can make to cause this to happen.

You may be able to send the alpha channel separately in opengl via different rgb or hdmi video output on your video card.
So your pc now outputs RGB down one cable, and then the Alpha down the other cable. Your alpha probably needs to be converted to grey scale so that it is still 24 bits.
You then select the which signal is the key in your fpga.
I'm presuming your fpga supports chroma key.
I have done something similar before but using a program called Brainstorm which uses a specific video card that supports SDI out and it splits the RGB, and the Alpha into separate video channels(cables), and then the vision mixer does the keying.
Today, however, I have created a new solution which mixes the multiple video channels first on the pc, and then outputs the final mixed video directly to either a RTMP streaming server or to a DVI to SDI scan converter.

Efficient transfer of planar images for rendering in OpenGL?

What is the most efficient way to transfer planar YUVA images for rendering in OpenGL?
Currently I'm using 4 separate textures (Y, U, V, A) to which I upload to from 4 separate PBOs during each frame. However, it seems to be much more efficient to transfer a lot of data in few textures, e.g. transferring YUV422 to a single packed texture is ~50% faster than transferring the same data to 3 (Y, U, V) separate textures.
Some thoughts I've had on the matter is whether I could use 2 array textures, one for (Y, A) and one for (U, V), would that be faster?
Another alternative I've considered is to convert from planar to packed while copying data to the PBO for transfer, though this does have some CPU overhead.
Any suggestions?
NOTE: dim(Y) == dim(A) && dim(U) == dim(V) && dim(Y) != dim(U).

I was wondering how you are generating the textures? ie generated dynamically or loaded in from a file? If they are loaded from a file I would recommend loading the textures as a single rgba texture to load it and utilise a fragment shader to process it as yuva, the data can then be loaded in one go, and should yield substantially better results in performance.
If there is some more information on how the texture is being utilised I should be able to give you a more detailed answer.
EDIT: The way that I normally handle YUVA is to render to a texture; use the GPU to convert the RGBA to YUVA then send the result back to the CPU via glGetTexImage for example and handle the resultant data as YUVA (or drop the alpha and use it as YUV).
In regards to the differently sized textures I wouldn't worry, pack the data as you see fit, and read it out as you see fit, you can have each channel have 0 values in all the areas where you don't have any valid data for example or use and off but memorable value (like your birthday and have values along the lines of 0.17122012 (today's date)) so you can easily programmatically ignore them, or make the channel handling code only read particular dimensions based on the channel its operating on, the extraneous data is minimal or if 0s its even less, and the speed gains of utilising a GPU to handle the data offset it and still leave a fast system in place.
Hope that helps a bit more.

Large JPEG/PNG Image Sequence Looping

I have been working on my project about remotely sensed image processing, and image sequence looping. Each resulting image (in JPEG or PNG format) has approximately 8000 * 4000 pixels. Our users usually want to loop an image sequence (more than 50 images) on the basis of region of interest at a time. Thus, I have to extract the required viewing area from the each image according to user's visualization client size. For example, if user's current client view is 640 * 480, I'll have to find a size of 640 * 480 data block from each original image based on the current x (columns) and y (rows) coordinates, and remap to the client view. When user pans to another viewing area by mouse dragging, our program must accordingly re-load regional data out of each original image as soon as possible.
I know neither JPEG library nor PNG library has some built-in data block read routines, such as
long ReadRectangle (long x0, long y0, long x1, long y1, char* RectData);
long ReadInaRectangle (long x0, long y0, short width, short height, char* RectData);
The built-in JPEG decompressor lacks this kind of functionality. I know that JPEG2000 format has provisions for decompressing a specific area of the image. I'm not entirely sure about JEPG.
Someone suggest that I use CreateFileMapping, MapViewOfFile, and CreateDIBSection to commit the number of bytes of a file mapping to map to the view. Unlike the simple flat binary image formats such *.raw, *.img, and *.bmp, JPEG's Blob will contain not only the image data but also the complicated JPG header. So it's not easy to map a block of data view out of the JPEG file.
Someone recommend that I use image tiling or image pyramid technology to generate sub-images, just like mnay popular, image visualization (Google Earth, and etc.), and GIS applications (WebGIS, and etc.) do.
How can I solve this problem?
Thanks for your help.
Golden Lee

If you're OK with region co-ordinates being multiples of 8, the JPEG library from ijg may be able to help you load partial JPEG images.
You'd want to:
Get all DCT coefficients for the entire image. Here's an example of how to do this. Yes, this will involve entropy decoding of the entire image, but this is the less expensive step of JPEG decoding (IDCT is the most expensive one, and we're avoiding it).
Throw away the blocks that you don't need (each block consists of 8x8) coefficients. You'll have to do this by hand, but since the layout is quite simple (the blocks are in scanline order) it shouldn't be that hard.
Apply block inverse DCT to each of the frames. You can probably get IJG to do that for you. If you can't, then you'll have to do your own IDCT and color transform back to [0, 255] because intensities are in [-127, 128] in the world of JPEG.
If all goes well, you'll get your decoded JPEG image. Because of chroma subsampling, the luma and chroma channels may be of different dimensions, and you will have to compensate for this yourself by scaling.
The first two steps are pretty much covered by the links. The fourth one is quite trivial (you can get the type of chroma subsampling using the IJG interface, and scaling -- essentially upsampling -- is easily achieved by using something like OpenCV or rolling your own code). The third one is something I haven't tried yet, but it sounds like it would be possible.

It's easy with the gd library. LibGD is an open source code library for the dynamic creation of images on the fly by programmers.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js