AVFrame buf size calculation - c++

I am having trouble encoding a video with the ffmpeg libraries as i am getting segfaults and/or out of bound memory writing when i am writing raw video data to an AVFrame. I therefore just wanted to ask if one of my assumptions was right.
Am i right to assume that the size of AVFrame.data[i] is always equal to AVFrame.linesize[i]*AVFrame.Height? Or could there be scenarios where that is not the case, and if so how can i then reliably calculate the size of AVFrame.data[i]?

It work in most of the cases, but I wouldn't relay on that for all different formats:-
AVFrame.linesize[i] = AVFrame.Width * PixelSize (where PixelSize eg. RGBA = 4bytes)
BufferSize = AVFrame.linesize[i] * AVFrame.Height
The best way, it should be by using FFmpeg's official av_image_get_buffer_size
int buffer_size = av_image_get_buffer_size(AVPixelFormat.AV_PIX_FMT_RGBA, codecCtx->width, codecCtx->height, 1);

It depends on the pixel format. For example YUV 4:4:4, yes every plane is linesizeheight. But for 4:2:2 it is linesizeheight for the Y plane, but linesize*height/2 for the U and V planes.

Related

SDL putting lots of pixel data onto the screen

I am creating a program that allows you to view fractals like the Mandelbrot or Julia set. I would like to render them as quickly as possible. I would love a way to put an array of uint8_t pixel values onto the screen. The array is formatted like this...
{r0,g0,b0,r1,g1,b1,...}
(A one dimensional array or RGB color values)
I know I have the proper data because before I just set individual points and it worked...
for(int i = 0;i < height * width;++i) {
//setStroke and point are functions that I made that together just draw a colored point
r.setStroke(data[i*3],data[i*3+1],data[i*3+2]);
r.point(i % r.window.w,i / r.window.w);
}
This is a pretty slow operation especially if the screen is big (which I would like it to be)
Is there any faster way to just put all the data onto the screen.
I tried doing something like this
void* pixels;
int pitch;
SDL_Texture* img = SDL_CreateTexture(ren,
SDL_GetWindowPixelFormat(win),SDL_TEXTUREACCESS_STREAMING,window.w,window.h);
SDL_LockTexture(img, NULL, &pixels, &pitch);
memcpy(pixels, data, window.w * 3 * window.h);
SDL_UnlockTexture(img);
SDL_RenderCopy(ren,img,NULL,NULL);
SDL_DestroyTexture(img);
I have no idea what I'm doing so please have mercy
Edit (thank you for comments :))
So here is what I do now
SDL_Texture* img = SDL_CreateTexture(ren, SDL_PIXELFORMAT_RGB888,SDL_TEXTUREACCESS_STREAMING,window.w,window.h);
SDL_UpdateTexture(img,NULL,&data[0],window.w * 3);
SDL_RenderCopy(ren,img,NULL,NULL);
SDL_DestroyTexture(img);
But I get this Image... which is not what it should look like
I am thinking that my data is just formatted wrong, right now it is formatted as an array of uint8_t in RGB order. Is there another way I should be formatting it (note I do not need an alpha channel)

Unable to create image from compressed texture data (S3TC)

I've been trying to load compressed images with S3TC (BC/DXT) compression in Vulkan, but so far I haven't had much luck.
Here is what the Vulkan specification says about compressed images:
https://www.khronos.org/registry/dataformat/specs/1.1/dataformat.1.1.html#S3TC:
Compressed texture images stored using the S3TC compressed image formats are represented as a collection of 4×4 texel blocks, where each block contains 64 or 128 bits of texel data. The image is encoded as a normal 2D raster image in which each 4×4 block is treated as a single pixel.
https://www.khronos.org/registry/vulkan/specs/1.0/xhtml/vkspec.html#resources-images:
For images created with linear tiling, rowPitch, arrayPitch and depthPitch describe the layout of the subresource in linear memory. For uncompressed formats, rowPitch is the number of bytes between texels with the same x coordinate in adjacent rows (y coordinates differ by one). arrayPitch is the number of bytes between texels with the same x and y coordinate in adjacent array layers of the image (array layer values differ by one). depthPitch is the number of bytes between texels with the same x and y coordinate in adjacent slices of a 3D image (z coordinates differ by one). Expressed as an addressing formula, the starting byte of a texel in the subresource has address:
// (x,y,z,layer) are in texel coordinates
address(x,y,z,layer) = layerarrayPitch + zdepthPitch + yrowPitch + xtexelSize + offset
For compressed formats, the rowPitch is the number of bytes between compressed blocks in adjacent rows. arrayPitch is the number of bytes between blocks in adjacent array layers. depthPitch is the number of bytes between blocks in adjacent slices of a 3D image.
// (x,y,z,layer) are in block coordinates
address(x,y,z,layer) = layerarrayPitch + zdepthPitch + yrowPitch + xblockSize + offset;
arrayPitch is undefined for images that were not created as arrays. depthPitch is defined only for 3D images.
For color formats, the aspectMask member of VkImageSubresource must be VK_IMAGE_ASPECT_COLOR_BIT. For depth/stencil formats, aspect must be either VK_IMAGE_ASPECT_DEPTH_BIT or VK_IMAGE_ASPECT_STENCIL_BIT. On implementations that store depth and stencil aspects separately, querying each of these subresource layouts will return a different offset and size representing the region of memory used for that aspect. On implementations that store depth and stencil aspects interleaved, the same offset and size are returned and represent the interleaved memory allocation.
My image is a normal 2D image (0 layers, 1 mipmap), so there's no arrayPitch or depthPitch. Since S3TC compression is directly supported by the hardware, it should be possible to use the image data without decompressing it first. In OpenGL this can be done using glCompressedTexImage2D, and this has worked for me in the past.
In OpenGL I've used GL_COMPRESSED_RGBA_S3TC_DXT1_EXT as image format, for Vulkan I'm using VK_FORMAT_BC1_RGBA_UNORM_BLOCK, which should be equivalent.
Here's my code for mapping the image data:
auto dds = load_dds("img.dds");
auto *srcData = static_cast<uint8_t*>(dds.data());
auto *destData = static_cast<uint8_t*>(vkImageMapPtr); // Pointer to mapped memory of VkImage
destData += layout.offset(); // layout = VkImageLayout of the image
assert((w %4) == 0);
assert((h %4) == 0);
assert(blockSize == 8); // S3TC BC1
auto wBlocks = w /4;
auto hBlocks = h /4;
for(auto y=decltype(hBlocks){0};y<hBlocks;++y)
{
auto *rowDest = destData +y *layout.rowPitch(); // rowPitch is 0
auto *rowSrc = srcData +y *(wBlocks *blockSize);
for(auto x=decltype(wBlocks){0};x<wBlocks;++x)
{
auto *pxDest = rowDest +x *blockSize;
auto *pxSrc = rowSrc +x *blockSize; // 4x4 image block
memcpy(pxDest,pxSrc,blockSize); // 64Bit per block
}
}
And here's the code for initializing the image:
vk::Device device = ...; // Initialization
vk::AllocationCallbacks allocatorCallbacks = ...; // Initialization
[...] // Load the dds data
uint32_t width = dds.width();
uint32_t height = dds.height();
auto format = dds.format(); // = vk::Format::eBc1RgbaUnormBlock;
vk::Extent3D extent(width,height,1);
vk::ImageCreateInfo imageInfo(
vk::ImageCreateFlagBits(0),
vk::ImageType::e2D,format,
extent,1,1,
vk::SampleCountFlagBits::e1,
vk::ImageTiling::eLinear,
vk::ImageUsageFlagBits::eSampled | vk::ImageUsageFlagBits::eColorAttachment,
vk::SharingMode::eExclusive,
0,nullptr,
vk::ImageLayout::eUndefined
);
vk::Image img = nullptr;
device.createImage(&imageInfo,&allocatorCallbacks,&img);
vk::MemoryRequirements memRequirements;
device.getImageMemoryRequirements(img,&memRequirements);
uint32_t typeIndex = 0;
get_memory_type(memRequirements.memoryTypeBits(),vk::MemoryPropertyFlagBits::eHostVisible,typeIndex); // -> typeIndex is set to 1
auto szMem = memRequirements.size();
vk::MemoryAllocateInfo memAlloc(szMem,typeIndex);
vk::DeviceMemory mem;
device.allocateMemory(&memAlloc,&allocatorCallbacks,&mem); // Note: Using the default allocation (nullptr) doesn't change anything
device.bindImageMemory(img,mem,0);
uint32_t mipLevel = 0;
vk::ImageSubresource resource(
vk::ImageAspectFlagBits::eColor,
mipLevel,
0
);
vk::SubresourceLayout layout;
device.getImageSubresourceLayout(img,&resource,&layout);
auto *srcData = device.mapMemory(mem,0,szMem,vk::MemoryMapFlagBits(0));
[...] // Map the dds-data (See code from first post)
device.unmapMemory(mem);
The code runs without issues, however the resulting image isn't correct. This is the source image:
And this is the result:
I'm certain that the problem lies in the first code snipped I've posted, however, in case it doesn't, I've written a small adaption of the triangle demo from the Vulkan SDK which produces the same result. It can be downloaded here. The source-code is included, all I've changed from the triangle demo are the "demo_prepare_texture_image"-function in tri.c (Lines 803 to 903) and the "dds.cpp" and "dds.h" files. "dds.cpp" contains the code for loading the dds, and mapping the image memory.
I'm using gli to load the dds-data (Which is supposed to "work perfectly with Vulkan"), which is also included in the download above. To build the project, the Vulkan SDK include directory has to be added to the "tri" project, and the path to the dds has to be changed (tri.c, Line 809).
The source image ("x64/Debug/test.dds" in the project) uses DXT1 compression. I've tested in on different hardware as well, with the same result.
Any example code for initializing/mapping compressed images would also help a lot.
Your problem is actually quite simple - in the demo_prepare_textures function, the first line, there is a variable tex_format, which is set to VK_FORMAT_B8G8R8A8_UNORM (which is what it is in the original sample). This eventually gets used to create the VkImageView. If you just change this to VK_FORMAT_BC1_RGBA_UNORM_BLOCK, it displays the texture correctly on the triangle.
As an aside - you can verify that your texture loaded correctly, with RenderDoc, which comes with the Vulkan SDK installation. Doing a capture of it, the and looking in the TextureViewer tab, the Inputs tab shows that your texture looks identical to the one on disk, even with the incorrect format.

How to use a .raw file in opengl

I'm trying to read a .raw image format and do some modifications on it in OpenGL. I can read the image like this:
int width, height;
BYTE * data;
FILE * file;
file = fopen( filename, "rb" );
if ( file == NULL ) return 0;
width = 256;
height = 256;
data = malloc( width * height * 3 );
fread( data, width * height * 3, 1, file );
fclose( file );
But i dont know how to use glDrawPixels to draw the picture.
My second problem is that I dont know how can I access each pixel. I mean in a .raw image format, each pixel should have 3 integers for storing RGB values(Am I right?). How can I access these RGB values directly?
There's no such thing as a .raw in the hard and fast sense. The name implies image data with no header but doesn't specify the format of the data. RGB is likely but so is RGBA and it's trivial to think of almost endless other possibilities.
Assuming RGB ordering, one byte per channel, then: each pixel is three bytes wide. So the nth pixel is:
r = data[n*3 + 0]
g = data[n*3 + 1]
b = data[n*3 + 2]
Assuming the data is set out so that the pixels are stored in left-to-right order, line by line, then on the first line the pixel at x=3 is at n=3, on the second it's at n=(width of first line)+3, on the third it's at n=(combined width of first two lines)+3, etc.
So:
r = data[(x + y*width)*3 + 0]
g = data[(x + y*width)*3 + 1]
b = data[(x + y*width)*3 + 2]
To use glDrawPixels just follow what the manual tells you to specify as the parameters. It says:
void glDrawPixels( GLsizei width,
GLsizei height,
GLenum format,
GLenum type,
const GLvoid * data);
You say that width and height are 256. You've said that the format is RGB. Scan down the documentation and you'll see that the corresponding GLenum is GL_RGB. You're saying each channel is a single byte in size. So that's GL_UNSIGNED_BYTE. You've loaded the data to data. So:
glDrawPixels(256, 256, GL_RGB, GL_UNSIGNED_BYTE, data);
Further comments: obviously get this working first so you've something to build on but glDrawPixels is almost unused in practice. As a result it isn't even part of OpenGL ES or, correspondingly, WebGL. Look at the semantics of the thing. You supply your buffer every time you call. OpenGL can't know whether it has been modified since the last call. So every call transfers your data from CPU to GPU. Look into submitting your data once as a texture and drawing using geometry. That'll save the per-call transfer cost and therefore be a lot more efficient.

Setting individual pixels of an RGB frame for ffmpeg encoding

I'm trying to change the test pattern of an ffmpeg streamer, Trouble syncing libavformat/ffmpeg with x264 and RTP , into familiar RGB format. My broader goal is to compute frames of a streamed video on the fly.
So I replaced its AV_PIX_FMT_MONOWHITE with AV_PIX_FMT_RGB24, which is "packed RGB 8:8:8, 24bpp, RGBRGB..." according to http://libav.org/doxygen/master/pixfmt_8h.html .
To stuff its pixel array called data, I've tried many variations on
for (int y=0; y<HEIGHT; ++y) {
for (int x=0; x<WIDTH; ++x) {
uint8_t* rgb = data + ((y*WIDTH + x) *3);
const double i = x/double(WIDTH);
// const double j = y/double(HEIGHT);
rgb[0] = 255*i;
rgb[1] = 0;
rgb[2] = 255*(1-i);
}
}
At HEIGHTxWIDTH= 80x60, this version yields
, when I expect a single blue-to-red horizontal gradient.
640x480 yields the same 4-column pattern, but with far more horizontal stripes.
640x640, 160x160, etc, yield three columns, cyan-ish / magenta-ish / yellow-ish, with the same kind of horizontal stripiness.
Vertical gradients behave even more weirdly.
Appearance was unaffected by an AV_PIX_FMT_RGBA attempt (4 not 3 bytes per pixel, alpha=255). Also unaffected by a port from C to C++.
The argument srcStrides passed to sws_scale() is a length-1 array, containing the single int HEIGHT.
Access each Pixel of AVFrame asks the same question in less detail, so far unanswered.
The streamer emits one warning, which I doubt affects appearance:
[rtp # 0x269c0a0] Encoder did not produce proper pts, making some up.
So. How do you set the RGB value of a pixel in a frame to be sent to sws_scale() (and then to x264_encoder_encode() and av_interleaved_write_frame())?
Use avpicture_fill() as described in Encoding a screenshot into a video using FFMPEG .
Instead of passing data directly to sws_scale(), do this:
AVFrame* pic = avcodec_alloc_frame();
avpicture_fill((AVPicture *)pic, data, AV_PIX_FMT_RGB24, WIDTH, HEIGHT);
and then replace the 2nd and 3rd args of sws_scale() with
pic->data, pic->linesize,
Then the gradients above work properly, at many resolutions.
The argument srcStrides passed to sws_scale() is a length-1 array, containing the single int HEIGHT.
Stride (AKA linesize) is the distance in bytes between two lines. For various reasons having mostly to do with optimization it is often larger than simply width in bytes, so there is padding on the end of each line.
In your case, without any padding, stride should be width * 3.

Trying to read raw image data into Java through JNI

I'm using JNI to obtain raw image data in the following format:
The image data is returned in the format of a DATA32 (32 bits) per pixel in a linear array ordered from the top left of the image to the bottom right going from left to right each line. Each pixel has the upper 8 bits as the alpha channel and the lower 8 bits are the blue channel - so a pixel's bits are ARGB (from most to least significant, 8 bits per channel). You must put the data back at some point.
The DATA32 format is essentially an unsigned int in C.
So I obtain an int[] array and then try to create a Buffered Image out of it by
int w = 1920;
int h = 1200;
BufferedImage b = new BufferedImage(w, h, BufferedImage.TYPE_INT_ARGB);
int[] f = (new Capture()).capture();
for(int i = 0; i < f.length; i++){;
b.setRGB(x, y, f[i]);
}
f is the array with the pixel data.
According to the Java documentation this should work since BufferedImage.TYPE_INT_ARGB is:
Represents an image with 8-bit RGBA color components packed into integer pixels. The image has a DirectColorModel with alpha. The color data in this image is considered not to be premultiplied with alpha. When this type is used as the imageType argument to a BufferedImage constructor, the created image is consistent with images created in the JDK1.1 and earlier releases.
Unless by 8-bit RGBA, them mean that all components added together are encoded in 8bits? But this is impossible.
This code does work, but the image that is produced is not at all like the image that it should produce. There are tonnes of artifacts. Can anyone see something obviously wrong in here?
Note I obtain my pixel data with
imlib_context_set_image(im);
data = imlib_image_get_data();
in my C code, using the library imlib2 with api http://docs.enlightenment.org/api/imlib2/html/imlib2_8c.html#17817446139a645cc017e9f79124e5a2
i'm an idiot.
This is merely a bug.
I forgot to include how I calculate x,y above.
Basically I was using
int x = i%w;
int y = i/h;
in the for loop, which is wrong. SHould be
int x = i%w;
int y = i/w;
Can't believe I made this stupid mistake.