Related
I'm following ffmpeg tutorial in http://dranger.com/ffmpeg/tutorial01.html.
I have just found that avpicture_get_size function is deprecated.
So I have checked ffmpeg's document(https://www.ffmpeg.org/doxygen/3.0/group__lavu__picture.html#ga24a67963c3ae0054a2a4bab35930e694) and found substitute av_image_get_buffer_size.
But I can't understand align parameter meaning 'linesize alignment'......
What is it meaning?
Some parts of FFmpeg, notably libavcodec, require aligned linesizes[], which means that it requires:
assert(linesize[0] % 32 == 0);
assert(linesize[1] % 32 == 0);
assert(linesize[2] % 32 == 0);
This allows it to use fast/aligned SIMD routines (for example SSE2/AVX2 movdqa or vmovdqa instructions) for data access instead of their slower unaligned counterparts.
The align parameter to this av_image_get_buffer_size function is this line alignment, and you need it because the size of the buffer is affected by it. E.g., the size of a Y plane in a YUV buffer isn't actually width * height, it's linesize[0] * height. You'll see that (especially for image sizes that are not a multiple of 16 or 32), as you increase align to higher powers of 2, the return value slowly increases.
Practically speaking, if you're going to use this picture as output buffer for calls to e.g. avcodec_decode_video2, this should be 32. For swscale/avfilter, I believe there is no absolute requirement, but you're recommended to still make it 32.
My practice:
1.avpicture deprecated problem, I replace avpicture functions with AVFrame & imgutils functions. code sample:
//AVPicture _picture;
AVFrame *_pictureFrame;
uint8_t *_pictureFrameData;
...
//_pictureValid = avpicture_alloc(&_picture,
// AV_PIX_FMT_RGB24,
// _videoCodecCtx->width,
// _videoCodecCtx->height) == 0;
_pictureFrame = av_frame_alloc();
_pictureFrame->width = _videoCodecCtx->width;
_pictureFrame->height = _videoCodecCtx->height;
_pictureFrame->format = AV_PIX_FMT_RGB24;
int size = av_image_get_buffer_size(_pictureFrame->format,
_pictureFrame->width,
_pictureFrame->height,
1);
//dont forget to free _pictureFrameData at last
_pictureFrameData = (uint8_t*)av_malloc(size);
av_image_fill_arrays(_pictureFrame->data,
_pictureFrame->linesize,
_pictureFrameData,
_pictureFrame->format,
_pictureFrame->width,
_pictureFrame->height,
1);
...
if (_pictureFrame) {
av_free(_pictureFrame);
if (_pictureFrameData) {
free(_pictureFrameData);
}
}
2.align parameter
first I set align to 32, but for some video streams it did not work, cause distorted images. Then I set it to 16(my environment : mac, Xcode, iPhone6), the some streams works well. But at last i set align to 1, for I have found this
Fill in the AVPicture fields, always assume a linesize alignment of 1.
If you look at the definition of avpicture_get_size in version 3.2 you see the following code:
int avpicture_get_size(enum AVPixelFormat pix_fmt, int width, int height)
{
return av_image_get_buffer_size(pix_fmt, width, height, 1);
}
It simply calls the suggested function: av_image_get_buffer_size with the align parameter set to 1. I did not go further to find out the full significance of why 1 is used for the depreciated function. As usual with ffmpeg, one can probably figure it out by reading the right code and enough code (with some code experiments).
I am writing a C++ code where a sequence of N different frames is generated after performing some operations implemented therein. After each frame is completed, I write it on the disk as IMG_%d.png, and finally I encode them to a video through ffmpeg using the x264 codec.
The summarized pseudocode of the main part of the program is the following one:
std::vector<int> B(width*height*3);
for (i=0; i<N; i++)
{
// void generateframe(std::vector<int> &, int)
generateframe(B, i); // Returns different images for different i values.
sprintf(s, "IMG_%d.png", i+1);
WriteToDisk(B, s); // void WriteToDisk(std::vector<int>, char[])
}
The problem of this implementation is that the number of desired frames, N, is usually high (N~100000) as well as the resolution of the pictures (1920x1080), resulting into an overload of the disk, producing write cycles of dozens of GB after each execution.
In order to avoid this, I have been trying to find documentation about parsing directly each image stored in the vector B to an encoder such as x264 (without having to write the intermediate image files to the disk). Albeit some interesting topics were found, none of them solved specifically what I exactly want to, as many of them concern the execution of the encoder with existing images files on the disk, whilst others provide solutions for other programming languages such as Python (here you can find a fully satisfactory solution for that platform).
The pseudocode of what I would like to obtain is something similar to this:
std::vector<int> B(width*height*3);
video_file=open_video("Generated_Video.mp4", ...[encoder options]...);
for (i=0; i<N; i++)
{
generateframe(B, i+1);
add_frame(video_file, B);
}
video_file.close();
According to what I have read on related topics, the x264 C++ API might be able to do this, but, as stated above, I did not find a satisfactory answer for my specific question. I tried learning and using directly the ffmpeg source code, but both its low ease of use and compilation issues forced me to discard this possibility as a mere non-professional programmer I am (I take it as just as a hobby and unluckily I cannot waste that many time learning something so demanding).
Another possible solution that came to my mind is to find a way to call the ffmpeg binary file in the C++ code, and somehow manage to transfer the image data of each iteration (stored in B) to the encoder, letting the addition of each frame (that is, not "closing" the video file to write) until the last frame, so that more frames can be added until reaching the N-th one, where the video file will be "closed". In other words, call ffmpeg.exe through the C++ program to write the first frame to a video, but make the encoder "wait" for more frames. Then call again ffmpeg to add the second frame and make the encoder "wait" again for more frames, and so on until reaching the last frame, where the video will be finished. However, I do not know how to proceed or if it is actually possible.
Edit 1:
As suggested in the replies, I have been documenting about named pipes and tried to use them in my code. First of all, it should be remarked that I am working with Cygwin, so my named pipes are created as they would be created under Linux. The modified pseudocode I used (including the corresponding system libraries) is the following one:
FILE *fd;
mkfifo("myfifo", 0666);
for (i=0; i<N; i++)
{
fd=fopen("myfifo", "wb");
generateframe(B, i+1);
WriteToPipe(B, fd); // void WriteToPipe(std::vector<int>, FILE *&fd)
fflush(fd);
fd=fclose("myfifo");
}
unlink("myfifo");
WriteToPipe is a slight modification of the previous WriteToFile function, where I made sure that the write buffer to send the image data is small enough to fit the pipe buffering limitations.
Then I compile and write the following command in the Cygwin terminal:
./myprogram | ffmpeg -i pipe:myfifo -c:v libx264 -preset slow -crf 20 Video.mp4
However, it remains stuck at the loop when i=0 at the "fopen" line (that is, the first fopen call). If I had not called ffmpeg it would be natural as the server (my program) would be waiting for a client program to connect to the "other side" of the pipe, but it is not the case. It looks like they cannot be connected through the pipe somehow, but I have not been able to find further documentation in order to overcome this issue. Any suggestion?
After some intense struggle, I finally managed to make it work after learning a bit how to use the FFmpeg and libx264 C APIs for my specific purpose, thanks to the useful information that some users provided in this site and some others, as well as some FFmpeg's documentation examples. For the sake of illustration, the details will be presented next.
First of all, the libx264 C library was compiled and, after that, the FFmpeg one with the configure options --enable-gpl --enable-libx264. Now let us go to the coding. The relevant part of the code that achieved the requested purpose is the following one:
Includes:
#include <stdint.h>
extern "C"{
#include <x264.h>
#include <libswscale/swscale.h>
#include <libavcodec/avcodec.h>
#include <libavutil/mathematics.h>
#include <libavformat/avformat.h>
#include <libavutil/opt.h>
}
LDFLAGS on Makefile:
-lx264 -lswscale -lavutil -lavformat -lavcodec
Inner code (for the sake of simplicity, the error checkings will be omitted and the variable declarations will be done when needed instead of the beginning for better understanding):
av_register_all(); // Loads the whole database of available codecs and formats.
struct SwsContext* convertCtx = sws_getContext(width, height, AV_PIX_FMT_RGB24, width, height, AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL); // Preparing to convert my generated RGB images to YUV frames.
// Preparing the data concerning the format and codec in order to write properly the header, frame data and end of file.
char *fmtext="mp4";
char *filename;
sprintf(filename, "GeneratedVideo.%s", fmtext);
AVOutputFormat * fmt = av_guess_format(fmtext, NULL, NULL);
AVFormatContext *oc = NULL;
avformat_alloc_output_context2(&oc, NULL, NULL, filename);
AVStream * stream = avformat_new_stream(oc, 0);
AVCodec *codec=NULL;
AVCodecContext *c= NULL;
int ret;
codec = avcodec_find_encoder_by_name("libx264");
// Setting up the codec:
av_dict_set( &opt, "preset", "slow", 0 );
av_dict_set( &opt, "crf", "20", 0 );
avcodec_get_context_defaults3(stream->codec, codec);
c=avcodec_alloc_context3(codec);
c->width = width;
c->height = height;
c->pix_fmt = AV_PIX_FMT_YUV420P;
// Setting up the format, its stream(s), linking with the codec(s) and write the header:
if (oc->oformat->flags & AVFMT_GLOBALHEADER) // Some formats require a global header.
c->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
avcodec_open2( c, codec, &opt );
av_dict_free(&opt);
stream->time_base=(AVRational){1, 25};
stream->codec=c; // Once the codec is set up, we need to let the container know which codec are the streams using, in this case the only (video) stream.
av_dump_format(oc, 0, filename, 1);
avio_open(&oc->pb, filename, AVIO_FLAG_WRITE);
ret=avformat_write_header(oc, &opt);
av_dict_free(&opt);
// Preparing the containers of the frame data:
AVFrame *rgbpic, *yuvpic;
// Allocating memory for each RGB frame, which will be lately converted to YUV:
rgbpic=av_frame_alloc();
rgbpic->format=AV_PIX_FMT_RGB24;
rgbpic->width=width;
rgbpic->height=height;
ret=av_frame_get_buffer(rgbpic, 1);
// Allocating memory for each conversion output YUV frame:
yuvpic=av_frame_alloc();
yuvpic->format=AV_PIX_FMT_YUV420P;
yuvpic->width=width;
yuvpic->height=height;
ret=av_frame_get_buffer(yuvpic, 1);
// After the format, code and general frame data is set, we write the video in the frame generation loop:
// std::vector<uint8_t> B(width*height*3);
The above commented vector has the same structure than the one I exposed in my question; however, the RGB data is stored on the AVFrames in a specific way. Therefore, for the sake of exposition, let us assume we have instead a pointer to a structure of the form uint8_t[3] Matrix(int, int), whose way to access the color values of the pixels for a given coordinate (x, y) is Matrix(x, y)->Red, Matrix(x, y)->Green and Matrix(x, y)->Blue, in order to get, respectively, to the red, green and blue values of the coordinate (x, y). The first argument stands for the horizontal position, from left to right as x increases and the second one for the vertical position, from top to bottom as y increases.
Being that said, the for loop to transfer the data, encode and write each frame would be the following one:
Matrix B(width, height);
int got_output;
AVPacket pkt;
for (i=0; i<N; i++)
{
generateframe(B, i); // This one is the function that generates a different frame for each i.
// The AVFrame data will be stored as RGBRGBRGB... row-wise, from left to right and from top to bottom, hence we have to proceed as follows:
for (y=0; y<height; y++)
{
for (x=0; x<width; x++)
{
// rgbpic->linesize[0] is equal to width.
rgbpic->data[0][y*rgbpic->linesize[0]+3*x]=B(x, y)->Red;
rgbpic->data[0][y*rgbpic->linesize[0]+3*x+1]=B(x, y)->Green;
rgbpic->data[0][y*rgbpic->linesize[0]+3*x+2]=B(x, y)->Blue;
}
}
sws_scale(convertCtx, rgbpic->data, rgbpic->linesize, 0, height, yuvpic->data, yuvpic->linesize); // Not actually scaling anything, but just converting the RGB data to YUV and store it in yuvpic.
av_init_packet(&pkt);
pkt.data = NULL;
pkt.size = 0;
yuvpic->pts = i; // The PTS of the frame are just in a reference unit, unrelated to the format we are using. We set them, for instance, as the corresponding frame number.
ret=avcodec_encode_video2(c, &pkt, yuvpic, &got_output);
if (got_output)
{
fflush(stdout);
av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base); // We set the packet PTS and DTS taking in the account our FPS (second argument) and the time base that our selected format uses (third argument).
pkt.stream_index = stream->index;
printf("Write frame %6d (size=%6d)\n", i, pkt.size);
av_interleaved_write_frame(oc, &pkt); // Write the encoded frame to the mp4 file.
av_packet_unref(&pkt);
}
}
// Writing the delayed frames:
for (got_output = 1; got_output; i++) {
ret = avcodec_encode_video2(c, &pkt, NULL, &got_output);
if (got_output) {
fflush(stdout);
av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base);
pkt.stream_index = stream->index;
printf("Write frame %6d (size=%6d)\n", i, pkt.size);
av_interleaved_write_frame(oc, &pkt);
av_packet_unref(&pkt);
}
}
av_write_trailer(oc); // Writing the end of the file.
if (!(fmt->flags & AVFMT_NOFILE))
avio_closep(oc->pb); // Closing the file.
avcodec_close(stream->codec);
// Freeing all the allocated memory:
sws_freeContext(convertCtx);
av_frame_free(&rgbpic);
av_frame_free(&yuvpic);
avformat_free_context(oc);
Side notes:
For future reference, as the available information on the net concerning the time stamps (PTS/DTS) looks so confusing, I will next explain as well how I did manage to solve the issues by setting the proper values. Setting these values incorrectly caused that the output size was being much bigger than the one obtained through the ffmpeg built binary commandline tool, because the frame data was being redundantly written through smaller time intervals than the actually set by the FPS.
First of all, it should be remarked that when encoding there are two kinds of time stamps: one associated to the frame (PTS) (pre-encoding stage) and two associated to the packet (PTS and DTS) (post-encoding stage). In the first case, it looks like the frame PTS values can be assigned using a custom unit of reference (with the only restriction that they must be equally spaced if one wants constant FPS), so one can take for instance the frame number as we did in the above code. In the second one, we have to take into account the following parameters:
The time base of the output format container, in our case mp4 (=12800 Hz), whose information is held in stream->time_base.
The desired FPS of the video.
If the encoder generates B-frames or not (in the second case the PTS and DTS values for the frame must be set the same, but it is more complicated if we are in the first case, like in this example). See this answer to another related question for more references.
The key here is that luckily it is not necessary to struggle with the computation of these quantities, as libav provides a function to compute the correct time stamps associated to the packet by knowing the aforementioned data:
av_packet_rescale_ts(AVPacket *pkt, AVRational FPS, AVRational time_base)
Thanks to these considerations, I was finally able to generate a sane output container and essentially the same compression rate than the one obtained using the commandline tool, which were the two remaining issues before investigating more deeply how the format header and trailer and how the time stamps are properly set.
Thanks for your excellent work, #ksb496 !
One minor improvement:
c=avcodec_alloc_context3(codec);
should be better written as:
c = stream->codec;
to avoid a memory leak.
If you don't mind, I've uploaded the complete ready-to-deploy library onto GitHub: https://github.com/apc-llc/moviemaker-cpp.git
Thanks to ksb496 I managed to do this task, but in my case I need to change some codes to work as expected. I thought maybe it could help others so I decided to share (with two years delay :D).
I had an RGB buffer filled by directshow sample grabber that I needed to take a video from. RGB to YUV conversion from given answer didn't do the job for me. I did it like this :
int stride = m_width * 3;
int index = 0;
for (int y = 0; y < m_height; y++) {
for (int x = 0; x < stride; x++) {
int j = (size - ((y + 1)*stride)) + x;
m_rgbpic->data[0][j] = data[index];
++index;
}
}
data variable here is my RGB buffer (simple BYTE*) and size is data buffer size in bytes. It's start filling RGB AVFrame from bottom left to top right.
The other thing is that my version of FFMPEG didn't have av_packet_rescale_ts function. It's latest version but FFMPEG docs didn't say this function is deprecated anywhere, I guess this might be the case for windows only. Anyway I used av_rescale_q instead that does the same job. like this :
AVPacket pkt;
pkt.pts = av_rescale_q(pkt.pts, { 1, 25 }, m_stream->time_base);
And the last thing, using this format conversion I needed to change my swsContext to BGR24 instead of RGB24 like this :
m_convert_ctx = sws_getContext(width, height, AV_PIX_FMT_BGR24, width, height,
AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);
avcodec_encode_video2 & avcodec_encode_audio2 seems to be deprecated. FFmpeg of Current Version (4.2) has new API: avcodec_send_frame & avcodec_receive_packet.
I am currently developing a little screenshot application which records both of my screen's desktop in a file.
I am using the GetFrontBufferData() function and it is working great.
Unfortunately when changing the screen color depth from 32 to 16 bits (to perform some tests) I have a bad image (purple image with changed resolution) and the recorded screenshot has a very poor quality:
Does someone know if there is a way to use GetFrontBufferData() with a 16 bits screen ?
edit:
My init direct3D:
ZeroMemory(&d3dPresentationParameters,sizeof(D3DPRESENT_PARAMETERS));//Fills a block of memory with zeros.
d3dPresentationParameters.Windowed = TRUE;
d3dPresentationParameters.Flags = D3DPRESENTFLAG_LOCKABLE_BACKBUFFER;
d3dPresentationParameters.BackBufferFormat = d3dFormat;//d3dDisplayMode.Format;//D3DFMT_A8R8G8B8;
d3dPresentationParameters.BackBufferCount = 1;
d3dPresentationParameters.BackBufferHeight = gScreenRect.bottom = uiHeight;
d3dPresentationParameters.BackBufferWidth = gScreenRect.right = uiWidth;
d3dPresentationParameters.MultiSampleType = D3DMULTISAMPLE_NONE;
d3dPresentationParameters.MultiSampleQuality = 0;
d3dPresentationParameters.SwapEffect = D3DSWAPEFFECT_DISCARD;
d3dPresentationParameters.hDeviceWindow = hWnd;
d3dPresentationParameters.PresentationInterval = D3DPRESENT_INTERVAL_DEFAULT;
d3dPresentationParameters.FullScreen_RefreshRateInHz = D3DPRESENT_RATE_DEFAULT;
The thread I use to capture screenshots:
CreateOffscreenPlainSurface(uiWidth, uiHeight, D3DFMT_A8R8G8B8, D3DPOOL_SYSTEMMEM, pBackBuffer, NULL)) != D3D_OK )
{
DBG("Error: CreateOffscreenPlainSurface failed = 0x%x", iRes);
break;
}
GetFrontBufferData(0, pCaptureSurface)) != D3D_OK)
{
DBG("Error: GetFrontBufferData failed = 0x%x", iRes);
break;
}
//D3DXSaveSurfaceToFile("Desktop.bmp", D3DXIFF_BMP, pBackBuffer,NULL, NULL); //Test purposes
ZeroMemory(lockedRect, sizeof(D3DLOCKED_RECT));
LockRect(lockedRect, NULL, D3DLOCK_READONLY)) != D3D_OK )
{
DBG("Error: LockRect failed = 0x%x", iRes);
break;
}
if( (iRes = UnlockRect()) != D3D_OK )
{
DBG("Error: UnlockRect failed = 0x%x", iRes);
break;
}
/**/
This code is perfectly working with 32 bits color depth but not with 16bits.
When creating the device I create 2 devices for both screens (iScreenNber). This is also working in 32bits (not in 16).
When saving the captured screenshot into 2 bmp files for testing (in 16 bits), I have one screen which represents the main display perfectly and the other screen is black.
When using memcpy to use pData, I have the above screenshot with purple color and bad resolution
edit2:
I noticed the following:
When saving Offscreen surface to a BMP file, I get the main display (on 1.bmp) which is refreshed each frame (so it is working just fine). For the second display, I just get the first frame then nothing more.
Quoting MSDN for GetFrontBufferData "The buffer pointed to by pDestSurface will be filled with a representation of the front buffer, converted to the standard 32 bits per pixel format D3DFMT_A8R8G8B8." I guess this is a problem for 16 bits color depth.
The first problem comes from the memcpy which does not handle properly the 16 bits color depth and I still don't know why ----> Help needed for this !!
Second problem is the second display which is not working and I don't why either
What am I doing wrong here ? I just get a black image on my Desktop N°xx.bmp file
Thank you very much for your help.
This is how I create a surface to capture screenshots:
IDirect3DSurface9* pCaptureSurface = NULL;
HRESULT hr = pD3DDevice->CreateOffscreenPlainSurface(
D3DPresentParams.BackBufferWidth,
D3DPresentParams.BackBufferHeight,
D3DPresentParams.BackBufferFormat,
D3DPOOL_SYSTEMMEM,
&pCaptureSurface,
NULL);
pD3DDevice->GetFrontBufferData(0, pCaptureSurface);
If you didn't store D3DPresentParams anywhere, you can use IDirect3DDevice9::GetDisplayMode to obtain width, height and format of your swap chain. All operations of resizing and format conversion you can perform after capturing a front buffer. Also, as I know, display format doesn't support alpha channel, so it typically is D3DFMT_X8R8G8B8, not D3DFMT_A8R8G8B8.
Update:
Actually, you try to capture a whole screen by using d3d device, without rendering anything. A purpose of d3d/opengl is to create or process images and do it GPU-accelerated. Taking a screenshot is just copying some video memory, it doesn't use all GPU power. So, using any GPU API brings no significant gain. Moreover, when you capture front buffer rendered not by yourself, strange things occur, you see. To extend your app you may capture image by GDI and then load it into texture and do any GPU postprocessing.
So i found some answers to my problem.
1) Second monitor wasn't working and I was unable to capture screenshot from it in 16 bits
This comes from the memcpy(..) line in the code. Because I am working with a 16 bits monitor, when executing the memcpy, the surface memory is corrupt and this leads to a black screen.
I still didn't find the solution for this but I'm working on.
2) The colors of the screenshot are wrong
This is, without any surprise, due to the 16 bits color depth. Because I am using GetFrontBufferData, and I am quoting Microsoft: The buffer pointed to by pDestSurface will be filled with a representation of the front buffer, converted to the standard 32 bits per pixel format D3DFMT_A8R8G8B8. This means, if I want to use the pixel data from LockRect(...), I have to "re-convert" my data into 16 bits mode. Therefore, I need to convert my pData data from D3DFMT_A8R8G8B8 to D3DFMT_R5G6B5 which is pretty simple.
3) How to debug the application ?
Thanks to your comments, I've been told that I should analyze pScreeInfo->pData content when I was in 16bits (thanks to Niello). Therefore, I've created a simple method using raw data from pScreeInfo->pData and copying in a .bmp:
HRESULT hr;
DWORD dwBytesRead;
UINT uiSize = 1920 * 1080 * 4;
HANDLE hFile;
hFile = CreateFile(TEXT("data.raw"), GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
BOOL bOk = ReadFile(hFile, pData, uiSize, &dwBytesRead, NULL);
if(!bOk)
exit(0);
pTexture = NULL;
hr = pScreenInfo->g_pD3DDevice->CreateTexture(width, height, 1, 0, D3DFMT_A8R8G8B8, D3DPOOL_MANAGED, &pTexture, NULL);
D3DLOCKED_RECT lockedRect;
hr = pTexture->LockRect(0, &lockedRect, NULL, D3DLOCK_READONLY);
memcpy(lockedRect.pBits, pData, lockedRect.Pitch * height);
hr = pTexture->UnlockRect(0);
hr = D3DXSaveTextureToFile(test, D3DXIFF_BMP, pTexture,NULL);
bOk = CloseHandle(hFile);
SAFE_RELEASE(pTexture);
This piece of code allowed me to notice that pData data was correct and I could get a good .bmp file at the end which means that GetFrontBufferData(...) was correctly working and the problem comes from the memcpy(...)
4) Remaining problems
I am still trying to know how I can solve the memcpy issue to see where the problem comes from. This is the last problem since the colors are good now (thanks to the 32bits to 16 bits conversion)
Thank everybody for your helpful comments !
I'm trying to change the test pattern of an ffmpeg streamer, Trouble syncing libavformat/ffmpeg with x264 and RTP , into familiar RGB format. My broader goal is to compute frames of a streamed video on the fly.
So I replaced its AV_PIX_FMT_MONOWHITE with AV_PIX_FMT_RGB24, which is "packed RGB 8:8:8, 24bpp, RGBRGB..." according to http://libav.org/doxygen/master/pixfmt_8h.html .
To stuff its pixel array called data, I've tried many variations on
for (int y=0; y<HEIGHT; ++y) {
for (int x=0; x<WIDTH; ++x) {
uint8_t* rgb = data + ((y*WIDTH + x) *3);
const double i = x/double(WIDTH);
// const double j = y/double(HEIGHT);
rgb[0] = 255*i;
rgb[1] = 0;
rgb[2] = 255*(1-i);
}
}
At HEIGHTxWIDTH= 80x60, this version yields
, when I expect a single blue-to-red horizontal gradient.
640x480 yields the same 4-column pattern, but with far more horizontal stripes.
640x640, 160x160, etc, yield three columns, cyan-ish / magenta-ish / yellow-ish, with the same kind of horizontal stripiness.
Vertical gradients behave even more weirdly.
Appearance was unaffected by an AV_PIX_FMT_RGBA attempt (4 not 3 bytes per pixel, alpha=255). Also unaffected by a port from C to C++.
The argument srcStrides passed to sws_scale() is a length-1 array, containing the single int HEIGHT.
Access each Pixel of AVFrame asks the same question in less detail, so far unanswered.
The streamer emits one warning, which I doubt affects appearance:
[rtp # 0x269c0a0] Encoder did not produce proper pts, making some up.
So. How do you set the RGB value of a pixel in a frame to be sent to sws_scale() (and then to x264_encoder_encode() and av_interleaved_write_frame())?
Use avpicture_fill() as described in Encoding a screenshot into a video using FFMPEG .
Instead of passing data directly to sws_scale(), do this:
AVFrame* pic = avcodec_alloc_frame();
avpicture_fill((AVPicture *)pic, data, AV_PIX_FMT_RGB24, WIDTH, HEIGHT);
and then replace the 2nd and 3rd args of sws_scale() with
pic->data, pic->linesize,
Then the gradients above work properly, at many resolutions.
The argument srcStrides passed to sws_scale() is a length-1 array, containing the single int HEIGHT.
Stride (AKA linesize) is the distance in bytes between two lines. For various reasons having mostly to do with optimization it is often larger than simply width in bytes, so there is padding on the end of each line.
In your case, without any padding, stride should be width * 3.
I've been working for a while on image processing and I've noticed weird things.
I'm reading a BMP file, using simple methods like ReadFile and stuff, and using Microsoft's BMP structures.
Here is the code:
ReadFile(_bmpFile,&bmpfh,sizeof(bfh),&data,NULL);
ReadFile(_bmpFile, &bmpih, sizeof(bih), &data, NULL);
imagesize = bih.biWidth*bih.biHeight;
image = new RGBQUAD[imagesize];
ReadFile(_bmpFile,image, imagesize*sizeof(RGBQUAD),&written,NULL);
That is how I read the file and then I'm turning it into gray scale using a simple for-loop.
for (int i = 0; i < imagesize; i++)
{
RED = image[i].rgbRed;
GREEN = image[i].rgbGreen;
BLUE = image[i].rgbBlue;
avg = (RED + GREEN + BLUE ) / 3;
image[i].rgbRed = avg;
image[i].rgbGreen = avg;
image[i].rgbBlue = avg;
}
Now when I write the file using this code:
#pragma pack(push, 1)
WriteFile(_bmpFile, &bmpfh, sizeof(bfh), &data, NULL);
WriteFile(_bmpFile, &bmpih, sizeof(bih), &data, NULL);
WriteFile(_bmpFile, image, imagesize*sizeof(RGBQUAD), &written, NULL);
#pragma pack(pop)
The file is getting much bigger(30MB -> 40MB).
The reason it happens is because I'm using RGBQUAD instead RGBTRIPLE, but if i'm using RGBTRIPLE I have a problem converting small pictures into
gray scale - can't open the picture after creating it(says it's not in the right structure).
Also the file size is missing one byte, (1174kb and after 1173kb)
Has anybody seen this before (it only occurs with small pictures)?
In a BMP file, every scan line has to be padded out so the next scan line starts on a 32-bit boundary. If you do 32 bits per pixel, that happens automatically, but if you use 24 bits per pixel, you'll need to add code to do it explicitly.
You are ignoring stride (Jerry's comment) and the pixel format of the bitmap. Which is 24bpp judging by the file size increase, you are writing it as though it is 32bpp. Your grayscale conversion is wrong, the human eye isn't equally sensitive to red, green and blue.
Consider using GDI+, you #include <gdiplus.h> in your code to use the Bitmap class. Its LockBits() method gives you access to the bitmap bits. The ColorMatrixEffect class lets you apply a color transformation in a single operation. Check this answer for the color matrix you need to get a grayscale image. The MSDN docs start here.
Each horizontal row in a BMP must be a multiple of 4 bytes long.
If the pixel data does not take up a multiple of 4 bytes, then 0x00 bytes are added at the end of the row. For a 24-bpp image, the number of bytes per row is (imageWidth*3 + 3) & ~3. The number of padding bytes is ((imageWidth*3 + 3) & ~3) - (imageWidth*3).
This was answered by immibis.
I would like to add that the size of array is ((imageWidth*3 + 3) & ~3)*imageHeight.
I hope this helps