I have a program which runs in a window using OpenGL (VS2012 with freeglut 2.8.1). Basically at every time step (run via a call to glutPostRedisplay from my glutIdleFunc hook) I call my own draw function followed by a call to glFlush to display the result. Then I call my own screenShot function which uses the glReadPixels function to dump the pixels to a tga file.
The problem with this setup is that the files are empty when the window gets minimised. That is to say, the output from glReadPixels is empty; How can I avoid this?
Here is a copy of the screenShot function I am using (I am not the copyright holder):
//////////////////////////////////////////////////
// Grab the OpenGL screen and save it as a .tga //
// Copyright (C) Marius Andra 2001 //
// http://cone3d.gz.ee EMAIL: cone3d#hot.ee //
//////////////////////////////////////////////////
// (modified by me a little)
int screenShot(int const num)
{
typedef unsigned char uchar;
// we will store the image data here
uchar *pixels;
// the thingy we use to write files
FILE * shot;
// we get the width/height of the screen into this array
int screenStats[4];
// get the width/height of the window
glGetIntegerv(GL_VIEWPORT, screenStats);
// generate an array large enough to hold the pixel data
// (width*height*bytesPerPixel)
pixels = new unsigned char[screenStats[2]*screenStats[3]*3];
// read in the pixel data, TGA's pixels are BGR aligned
glReadPixels(0, 0, screenStats[2], screenStats[3], 0x80E0,
GL_UNSIGNED_BYTE, pixels);
// open the file for writing. If unsucessful, return 1
std::string filename = kScreenShotFileNamePrefix + Function::Num2Str(num) + ".tga";
shot=fopen(filename.c_str(), "wb");
if (shot == NULL)
return 1;
// this is the tga header it must be in the beginning of
// every (uncompressed) .tga
uchar TGAheader[12]={0,0,2,0,0,0,0,0,0,0,0,0};
// the header that is used to get the dimensions of the .tga
// header[1]*256+header[0] - width
// header[3]*256+header[2] - height
// header[4] - bits per pixel
// header[5] - ?
uchar header[6]={((int)(screenStats[2]%256)),
((int)(screenStats[2]/256)),
((int)(screenStats[3]%256)),
((int)(screenStats[3]/256)),24,0};
// write out the TGA header
fwrite(TGAheader, sizeof(uchar), 12, shot);
// write out the header
fwrite(header, sizeof(uchar), 6, shot);
// write the pixels
fwrite(pixels, sizeof(uchar),
screenStats[2]*screenStats[3]*3, shot);
// close the file
fclose(shot);
// free the memory
delete [] pixels;
// return success
return 0;
}
So how can I print the screenshot to a TGA file regardless of whether Windows decides to actually display the content on the monitor?
Note: Because I am trying to keep a visual record of the progress of a simulation, I need to print every frame, regardless of whether it is being rendered. I realise that last statement is a bit of a contradiction, since I need to render the frame in order to produce the screengrab. To rephrase; I need glReadPixels (or some alternative function) to produce the updated state of my program at every step so that I can print it to a file, regardless of whether windows will choose to display it.
Sounds like you're running afoul of the pixel ownership problem.
Render to a FBO and use glReadPixels() to slurp images out of that instead of the front buffer.
I would suggest keeping the last rendered frame stored in memory and updating this memory's contents whenever an update is called and there is actual pixel data in the new render. Either that or you could use the accum perhaps, though I cant quite recall how it stores older frames (it may just end up updating out so fast that it stores no render data as well.
Another solution might be to use a shader to manually render each frame and write the result to a file
Related
Ive been looking through the internet for a while now trying to find out if i can DIRECTLY manipulate Pixels like those that makes up the triangle in a mesh not vertex co-ordinates.
When a mesh is being formed in OpenGL the vertex co-ordinates that forms independent triangles are each filled with Pixels that gives it color.
Those pixels are what im trying to manipulate. So far in every Tutorial all i'm seeing is how to alter vertex coords even in the Fragment shader parts of Glsl tutorials i'm not finding anything on the Pixels directly. I'm being shown Texture and Vertex co-ordinates no direct Pixel manipulation.
So far what i know happens is each vertex is assigned some color value and all the Pixel processes get done during execution and you see the results.
So can Pixels be directly altered in OpenGl for each Triangle or What would u guys recommend? Cuz i've heard it might be possible in OpenCV but thats stuff is about Textures
If I get it right you have high poly mesh and want to simplify it by creating normal map for smaller poly count faces ...
Never done this but I would attack this problem like this:
create UV mapping of high poly mesh
create low poly mesh
so you need to merge smaller adjacent faces into bigger ones. Merging only faces that are not too angled to starting face (abs dot between normals is smaller than threshold)... You also need to remember original mesh of merged face.
for each merged face render normal
so render the merged face original polygon to texture but use UV as 2D vertex coordinates and output actual triangle normal as color
This will copy the normals into normal map at the correct position. Do not use any depth buffering blending lighting or whatever. Also the 2D view must be scaled and translated so the UV mapping will cover your texture (no perspective) Do not forget that the normal map (if RGB float used) is clamped so you should first normalize the normal and then convert to range <0,1> for example:
n = 0.5 * (vec3(1.0,1.0,1.0) + normalize(n));
read back the rendered texture
now it should hold the whole normal map. In case you do not have Render to texture available (older Intel HD) you can render to screen instead and then just use glReadPixels.
As you want to save this to image here a small VCL example of saving to 24 bit bmp:
//---------------------------------------------------------------------------
void screenshot(int xs,int ys) // xs,ys is GL screen resolution
{
// just in case your environment does not know basic programing datatypes
typedef unsigned __int8 BYTE;
typedef unsigned __int16 WORD;
typedef unsigned __int32 DWORD;
xs&=0xFFFFFFFC; // crop down resolution to be divisible by 4
ys&=0xFFFFFFFC; // in order make glReadPixel not crashing on some implementations
BYTE *dat,zero[4]={0,0,0,0};
int hnd,x,y,a,align,xs3=3*xs;
// allocate memory for pixel data
dat=new BYTE[xs3*ys];
if (dat==NULL) return;
// copy GL screen to dat
glReadPixels(0,0,xs,ys,GL_BGR,GL_UNSIGNED_BYTE,dat);
glFinish();
// BMP header structure
#pragma pack(push,1)
struct _hdr
{
char ID[2];
DWORD size;
WORD reserved1[2];
DWORD offset;
DWORD reserved2;
DWORD width,height;
WORD planes;
WORD bits;
DWORD compression;
DWORD imagesize;
DWORD xresolution,yresolution;
DWORD ncolors;
DWORD importantcolors;
} hdr;
#pragma pack(pop)
// BMP header extracted from uncompressed 24 bit BMP
const BYTE bmp24[sizeof(hdr)]={0x42,0x4D,0xE6,0x71,0xB,0x0,0x0,0x0,0x0,0x0,0x36,0x0,0x0,0x0,0x28,0x0,0x0,0x0,0xF4,0x1,0x0,0x0,0xF4,0x1,0x0,0x0,0x1,0x0,0x18,0x0,0x0,0x0,0x0,0x0,0xB0,0x71,0xB,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0};
// init hdr with 24 bit BMP header
for (x=0;x<sizeof(hdr);x++) ((BYTE*)(&hdr))[x]=bmp24[x];
// update hdr stuf with our image properties
align=0; // (4-(xs3&3))&3;
hdr.size=sizeof(hdr)+(ys*(xs3+align));
hdr.width=xs;
hdr.height=ys;
hdr.imagesize=ys*xs3;
// save BMP file (using VCL file functions exchange them with whatever you got)
hnd=FileCreate("screenshot.bmp"); // create screenshot image file (binary)
if (hnd!=-1) // if file created
{
FileWrite(hnd,&hdr,sizeof(hdr));// write bmp header
for (a=0,y=0;y<ys;y++,a+=xs3) // loop through all scan lines
{
FileWrite(hnd,&dat[a],xs3); // write scan line pixel data
if (align) // write scan line align zeropad if needed
FileWrite(hnd,zero,align);
}
FileClose(hnd); // close file
}
// cleanup before exit
delete[] dat; // release dat
}
//---------------------------------------------------------------------------
The only thing used from VCL are binary file access routines so just swap them with what you have at disposal. Now you can open this bmp in whatever image software and convert to whatever format you want like png... without the need to encode it yourself.
The bmp header structure was taken from this QA:
Importing BMP file in Turboc++ issue:BMP file is not being displayed properly in the output screen
Also beware of using char/int instead of BYTE/WORD/DWORD it usually leads to data corruption for tasks like this if you do not know what you doing...
You can do the same with color if the mesh is textured ... That way the normal map and color map would have the same UV mapping even if the original mesh uses more than single texture ...
In a camera application bitmap pixel arrays are retrieved from a streaming camera.
The pixel arrays are captured by writing them to a named pipe, where on the other end of the pipe, ffmpeg retrieves them and creates an AVI file.
I will need to create one custom frame (with custom text on), and pipe its pixels as the first frame in the resulting movie.
The question is how can I use a TBitmap (for convenience) to
Create a X by Y monochrome (8 bit) bitmap from scratch, with
custom text on. I want the background to be white, and the text to
be black. (Mostly figured this step out, see below.)
Retrieve the pixel array that I can send/write to the pipe
Step 1: The following code creates a TBitmap and writes text on it:
int w = 658;
int h = 492;
TBitmap* bm = new TBitmap();
bm->Width = w;
bm->Height = h;
bm->HandleType = bmDIB;
bm->PixelFormat = pf8bit;
bm->Canvas->Font->Name = "Tahoma";
bm->Canvas->Font->Size = 8;
int textY = 10;
string info("some Text");
bm->Canvas->TextOut(10, textY, info.c_str());
The above basically concludes step 1.
The writing/piping code expects a byte array with the bitmaps pixels; e.g.
unsigned long numWritten;
WriteFile(mPipeHandle, pImage, size, &numWritten, NULL);
where pImage is a pointer to a unsigned char buffer (the bitmaps pixels), and the size is the length of this buffer.
Update:
Using the generated TBitmap and a TMemoryStream for transferring data to the ffmpeg pipeline does not generate the proper result. I get a distorted image with 3 diagonal lines on it.
The buffersize for the camera frame buffers that I receive are are exactly 323736, which is equal to the number of pixels in the image, i.e. 658x492.
NOTE I have concluded that this 'bitmap' is not padded. 658 is not divisible by four.
The buffersize I get after dumping my generated bitmap to a memory stream, however, has the size 325798, which is 2062 bytes larger than it is supposed to be. As #Spektre pointed out below, this discrepancy may be padding?
Using the following code for getting the pixel array;
ByteBuffer CustomBitmap::getPixArray()
{
// --- Local variables --- //
unsigned int iInfoHeaderSize=0;
unsigned int iImageSize=0;
BITMAPINFO *pBitmapInfoHeader;
unsigned char *pBitmapImageBits;
// First we call GetDIBSizes() to determine the amount of
// memory that must be allocated before calling GetDIB()
// NB: GetDIBSizes() is a part of the VCL.
GetDIBSizes(mTheBitmap->Handle,
iInfoHeaderSize,
iImageSize);
// Next we allocate memory according to the information
// returned by GetDIBSizes()
pBitmapInfoHeader = new BITMAPINFO[iInfoHeaderSize];
pBitmapImageBits = new unsigned char[iImageSize];
// Call GetDIB() to convert a device dependent bitmap into a
// Device Independent Bitmap (a DIB).
// NB: GetDIB() is a part of the VCL.
GetDIB(mTheBitmap->Handle,
mTheBitmap->Palette,
pBitmapInfoHeader,
pBitmapImageBits);
delete []pBitmapInfoHeader;
ByteBuffer buf;
buf.buffer = pBitmapImageBits;
buf.size = iImageSize;
return buf;
}
So final challenge seem to be to get a bytearray that has the same size as the ones coming from the camera. How to find and remove the padding bytes from the TBitmap code??
TBitmap has a PixelFormat property to set the bit depth.
TBitmap has a HandleType property to control whether a DDB or a DIB is created. DIB is the default.
Since you are passing BMPs around between different systems, you really should be using DIBs instead of DDBs, to avoid any corruption/misinterpretation of the pixel data.
Also, this line of code:
Image1->Picture->Bitmap->Handle = bm->Handle;
Should be changed to this instead:
Image1->Picture->Bitmap->Assign(bm);
// or:
// Image1->Picture->Bitmap = bm;
Or this:
Image1->Picture->Assign(bm);
Either way, don't forget to delete bm; afterwards, since the TPicture makes a copy of the input TBitmap, it does not take ownership.
To get the BMP data as a buffer of bytes, you can use the TBitmap::SaveToStream() method, saving to a TMemoryStream. Or, if you just want the pixel data, not the complete BMP data (ie, without BMP headers - see Bitmap Storage), you can use the Win32 GetDiBits() function, which outputs the pixels in DIB format. You can't obtain a byte buffer of the pixels for a DDB, since they depend on the device they are rendered to. DDBs are only usable in-memory in conjunction with HDCs, you can't pass them around. But you can convert a DIB to a DDB once you have a final device to render it to.
In other words, get the pixels from the camera, save them to a DIB, pass that around as needed (ie, over the pipe), and then do whatever you need with it - save to a file, convert to DDB to render onscreen, etc.
This is just an addon to existing answer (with additional info after the OP edit)
Bitmap file-format has align bytes on each row (so there usually are some bytes at the end of each line that are not pixels) up to some ByteLength (present in bmp header). Those create the skew and diagonal like lines. In your case the size discrepancy is 4 bytes per row:
(xs + align)*ys + header = size
(658+ 4)*492 + 94 = 325798
but beware the align size depends on image width and bmp header ...
Try this instead:
// create bmp
Graphics::TBitmap *bmp=new Graphics::TBitmap;
// bmp->Assign(???); // a) copy image from ???
bmp->SetSize(658,492); // b) in case you use Assign do not change resolution
bmp->HandleType=bmDIB;
bmp->PixelFormat=pf8bit;
// bmp->Canvas->Draw(0,0,???); // b) copy image from ???
// here render your text using
bmp->Canvas->Brush->Style=bsSolid;
bmp->Canvas->Brush->Color=clWhite;
bmp->Canvas->Font->Color=clBlack;
bmp->Canvas->Font->Name = "Tahoma";
bmp->Canvas->Font->Size = 8;
bmp->Canvas->TextOutA(5,5,"Text");
// Byte data
for (int y=0;y<bmp->Height;y++)
{
BYTE *p=(BYTE*)bmp->ScanLine[y]; // pf8bit -> BYTE*
// here send/write/store ... bmp->Width bytes from p[]
}
// Canvas->Draw(0,0,bmp); // just renfder it on Form
delete bmp; bmp=NULL;
mixing GDI winapi calls for pixel array access (bitblt etc...) with VCL bmDIB bitmap might cause problems and resource leaks (hence the error on exit) and its also slower then usage of ScanLine[] (if coded right) so I strongly advice to use native VCL functions (as I did in above example) instead of the GDI/winapi calls where you can.
for more info see:
#4. GDI Bitmap
Delphi / C++ builder Windows 10 1709 bitmap operations extremely slow
Draw tbitmap with scale and alpha channel faster
Also you mention your image source is camera. If you use pf8bit it mean its palette indexed color which is relatively slow and ugly if native GDI algo is used (to convert from true/hi color camera image) for better transform see:
Effective gif/image color quantization?
simple dithering
I am writing a C++ code where a sequence of N different frames is generated after performing some operations implemented therein. After each frame is completed, I write it on the disk as IMG_%d.png, and finally I encode them to a video through ffmpeg using the x264 codec.
The summarized pseudocode of the main part of the program is the following one:
std::vector<int> B(width*height*3);
for (i=0; i<N; i++)
{
// void generateframe(std::vector<int> &, int)
generateframe(B, i); // Returns different images for different i values.
sprintf(s, "IMG_%d.png", i+1);
WriteToDisk(B, s); // void WriteToDisk(std::vector<int>, char[])
}
The problem of this implementation is that the number of desired frames, N, is usually high (N~100000) as well as the resolution of the pictures (1920x1080), resulting into an overload of the disk, producing write cycles of dozens of GB after each execution.
In order to avoid this, I have been trying to find documentation about parsing directly each image stored in the vector B to an encoder such as x264 (without having to write the intermediate image files to the disk). Albeit some interesting topics were found, none of them solved specifically what I exactly want to, as many of them concern the execution of the encoder with existing images files on the disk, whilst others provide solutions for other programming languages such as Python (here you can find a fully satisfactory solution for that platform).
The pseudocode of what I would like to obtain is something similar to this:
std::vector<int> B(width*height*3);
video_file=open_video("Generated_Video.mp4", ...[encoder options]...);
for (i=0; i<N; i++)
{
generateframe(B, i+1);
add_frame(video_file, B);
}
video_file.close();
According to what I have read on related topics, the x264 C++ API might be able to do this, but, as stated above, I did not find a satisfactory answer for my specific question. I tried learning and using directly the ffmpeg source code, but both its low ease of use and compilation issues forced me to discard this possibility as a mere non-professional programmer I am (I take it as just as a hobby and unluckily I cannot waste that many time learning something so demanding).
Another possible solution that came to my mind is to find a way to call the ffmpeg binary file in the C++ code, and somehow manage to transfer the image data of each iteration (stored in B) to the encoder, letting the addition of each frame (that is, not "closing" the video file to write) until the last frame, so that more frames can be added until reaching the N-th one, where the video file will be "closed". In other words, call ffmpeg.exe through the C++ program to write the first frame to a video, but make the encoder "wait" for more frames. Then call again ffmpeg to add the second frame and make the encoder "wait" again for more frames, and so on until reaching the last frame, where the video will be finished. However, I do not know how to proceed or if it is actually possible.
Edit 1:
As suggested in the replies, I have been documenting about named pipes and tried to use them in my code. First of all, it should be remarked that I am working with Cygwin, so my named pipes are created as they would be created under Linux. The modified pseudocode I used (including the corresponding system libraries) is the following one:
FILE *fd;
mkfifo("myfifo", 0666);
for (i=0; i<N; i++)
{
fd=fopen("myfifo", "wb");
generateframe(B, i+1);
WriteToPipe(B, fd); // void WriteToPipe(std::vector<int>, FILE *&fd)
fflush(fd);
fd=fclose("myfifo");
}
unlink("myfifo");
WriteToPipe is a slight modification of the previous WriteToFile function, where I made sure that the write buffer to send the image data is small enough to fit the pipe buffering limitations.
Then I compile and write the following command in the Cygwin terminal:
./myprogram | ffmpeg -i pipe:myfifo -c:v libx264 -preset slow -crf 20 Video.mp4
However, it remains stuck at the loop when i=0 at the "fopen" line (that is, the first fopen call). If I had not called ffmpeg it would be natural as the server (my program) would be waiting for a client program to connect to the "other side" of the pipe, but it is not the case. It looks like they cannot be connected through the pipe somehow, but I have not been able to find further documentation in order to overcome this issue. Any suggestion?
After some intense struggle, I finally managed to make it work after learning a bit how to use the FFmpeg and libx264 C APIs for my specific purpose, thanks to the useful information that some users provided in this site and some others, as well as some FFmpeg's documentation examples. For the sake of illustration, the details will be presented next.
First of all, the libx264 C library was compiled and, after that, the FFmpeg one with the configure options --enable-gpl --enable-libx264. Now let us go to the coding. The relevant part of the code that achieved the requested purpose is the following one:
Includes:
#include <stdint.h>
extern "C"{
#include <x264.h>
#include <libswscale/swscale.h>
#include <libavcodec/avcodec.h>
#include <libavutil/mathematics.h>
#include <libavformat/avformat.h>
#include <libavutil/opt.h>
}
LDFLAGS on Makefile:
-lx264 -lswscale -lavutil -lavformat -lavcodec
Inner code (for the sake of simplicity, the error checkings will be omitted and the variable declarations will be done when needed instead of the beginning for better understanding):
av_register_all(); // Loads the whole database of available codecs and formats.
struct SwsContext* convertCtx = sws_getContext(width, height, AV_PIX_FMT_RGB24, width, height, AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL); // Preparing to convert my generated RGB images to YUV frames.
// Preparing the data concerning the format and codec in order to write properly the header, frame data and end of file.
char *fmtext="mp4";
char *filename;
sprintf(filename, "GeneratedVideo.%s", fmtext);
AVOutputFormat * fmt = av_guess_format(fmtext, NULL, NULL);
AVFormatContext *oc = NULL;
avformat_alloc_output_context2(&oc, NULL, NULL, filename);
AVStream * stream = avformat_new_stream(oc, 0);
AVCodec *codec=NULL;
AVCodecContext *c= NULL;
int ret;
codec = avcodec_find_encoder_by_name("libx264");
// Setting up the codec:
av_dict_set( &opt, "preset", "slow", 0 );
av_dict_set( &opt, "crf", "20", 0 );
avcodec_get_context_defaults3(stream->codec, codec);
c=avcodec_alloc_context3(codec);
c->width = width;
c->height = height;
c->pix_fmt = AV_PIX_FMT_YUV420P;
// Setting up the format, its stream(s), linking with the codec(s) and write the header:
if (oc->oformat->flags & AVFMT_GLOBALHEADER) // Some formats require a global header.
c->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
avcodec_open2( c, codec, &opt );
av_dict_free(&opt);
stream->time_base=(AVRational){1, 25};
stream->codec=c; // Once the codec is set up, we need to let the container know which codec are the streams using, in this case the only (video) stream.
av_dump_format(oc, 0, filename, 1);
avio_open(&oc->pb, filename, AVIO_FLAG_WRITE);
ret=avformat_write_header(oc, &opt);
av_dict_free(&opt);
// Preparing the containers of the frame data:
AVFrame *rgbpic, *yuvpic;
// Allocating memory for each RGB frame, which will be lately converted to YUV:
rgbpic=av_frame_alloc();
rgbpic->format=AV_PIX_FMT_RGB24;
rgbpic->width=width;
rgbpic->height=height;
ret=av_frame_get_buffer(rgbpic, 1);
// Allocating memory for each conversion output YUV frame:
yuvpic=av_frame_alloc();
yuvpic->format=AV_PIX_FMT_YUV420P;
yuvpic->width=width;
yuvpic->height=height;
ret=av_frame_get_buffer(yuvpic, 1);
// After the format, code and general frame data is set, we write the video in the frame generation loop:
// std::vector<uint8_t> B(width*height*3);
The above commented vector has the same structure than the one I exposed in my question; however, the RGB data is stored on the AVFrames in a specific way. Therefore, for the sake of exposition, let us assume we have instead a pointer to a structure of the form uint8_t[3] Matrix(int, int), whose way to access the color values of the pixels for a given coordinate (x, y) is Matrix(x, y)->Red, Matrix(x, y)->Green and Matrix(x, y)->Blue, in order to get, respectively, to the red, green and blue values of the coordinate (x, y). The first argument stands for the horizontal position, from left to right as x increases and the second one for the vertical position, from top to bottom as y increases.
Being that said, the for loop to transfer the data, encode and write each frame would be the following one:
Matrix B(width, height);
int got_output;
AVPacket pkt;
for (i=0; i<N; i++)
{
generateframe(B, i); // This one is the function that generates a different frame for each i.
// The AVFrame data will be stored as RGBRGBRGB... row-wise, from left to right and from top to bottom, hence we have to proceed as follows:
for (y=0; y<height; y++)
{
for (x=0; x<width; x++)
{
// rgbpic->linesize[0] is equal to width.
rgbpic->data[0][y*rgbpic->linesize[0]+3*x]=B(x, y)->Red;
rgbpic->data[0][y*rgbpic->linesize[0]+3*x+1]=B(x, y)->Green;
rgbpic->data[0][y*rgbpic->linesize[0]+3*x+2]=B(x, y)->Blue;
}
}
sws_scale(convertCtx, rgbpic->data, rgbpic->linesize, 0, height, yuvpic->data, yuvpic->linesize); // Not actually scaling anything, but just converting the RGB data to YUV and store it in yuvpic.
av_init_packet(&pkt);
pkt.data = NULL;
pkt.size = 0;
yuvpic->pts = i; // The PTS of the frame are just in a reference unit, unrelated to the format we are using. We set them, for instance, as the corresponding frame number.
ret=avcodec_encode_video2(c, &pkt, yuvpic, &got_output);
if (got_output)
{
fflush(stdout);
av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base); // We set the packet PTS and DTS taking in the account our FPS (second argument) and the time base that our selected format uses (third argument).
pkt.stream_index = stream->index;
printf("Write frame %6d (size=%6d)\n", i, pkt.size);
av_interleaved_write_frame(oc, &pkt); // Write the encoded frame to the mp4 file.
av_packet_unref(&pkt);
}
}
// Writing the delayed frames:
for (got_output = 1; got_output; i++) {
ret = avcodec_encode_video2(c, &pkt, NULL, &got_output);
if (got_output) {
fflush(stdout);
av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base);
pkt.stream_index = stream->index;
printf("Write frame %6d (size=%6d)\n", i, pkt.size);
av_interleaved_write_frame(oc, &pkt);
av_packet_unref(&pkt);
}
}
av_write_trailer(oc); // Writing the end of the file.
if (!(fmt->flags & AVFMT_NOFILE))
avio_closep(oc->pb); // Closing the file.
avcodec_close(stream->codec);
// Freeing all the allocated memory:
sws_freeContext(convertCtx);
av_frame_free(&rgbpic);
av_frame_free(&yuvpic);
avformat_free_context(oc);
Side notes:
For future reference, as the available information on the net concerning the time stamps (PTS/DTS) looks so confusing, I will next explain as well how I did manage to solve the issues by setting the proper values. Setting these values incorrectly caused that the output size was being much bigger than the one obtained through the ffmpeg built binary commandline tool, because the frame data was being redundantly written through smaller time intervals than the actually set by the FPS.
First of all, it should be remarked that when encoding there are two kinds of time stamps: one associated to the frame (PTS) (pre-encoding stage) and two associated to the packet (PTS and DTS) (post-encoding stage). In the first case, it looks like the frame PTS values can be assigned using a custom unit of reference (with the only restriction that they must be equally spaced if one wants constant FPS), so one can take for instance the frame number as we did in the above code. In the second one, we have to take into account the following parameters:
The time base of the output format container, in our case mp4 (=12800 Hz), whose information is held in stream->time_base.
The desired FPS of the video.
If the encoder generates B-frames or not (in the second case the PTS and DTS values for the frame must be set the same, but it is more complicated if we are in the first case, like in this example). See this answer to another related question for more references.
The key here is that luckily it is not necessary to struggle with the computation of these quantities, as libav provides a function to compute the correct time stamps associated to the packet by knowing the aforementioned data:
av_packet_rescale_ts(AVPacket *pkt, AVRational FPS, AVRational time_base)
Thanks to these considerations, I was finally able to generate a sane output container and essentially the same compression rate than the one obtained using the commandline tool, which were the two remaining issues before investigating more deeply how the format header and trailer and how the time stamps are properly set.
Thanks for your excellent work, #ksb496 !
One minor improvement:
c=avcodec_alloc_context3(codec);
should be better written as:
c = stream->codec;
to avoid a memory leak.
If you don't mind, I've uploaded the complete ready-to-deploy library onto GitHub: https://github.com/apc-llc/moviemaker-cpp.git
Thanks to ksb496 I managed to do this task, but in my case I need to change some codes to work as expected. I thought maybe it could help others so I decided to share (with two years delay :D).
I had an RGB buffer filled by directshow sample grabber that I needed to take a video from. RGB to YUV conversion from given answer didn't do the job for me. I did it like this :
int stride = m_width * 3;
int index = 0;
for (int y = 0; y < m_height; y++) {
for (int x = 0; x < stride; x++) {
int j = (size - ((y + 1)*stride)) + x;
m_rgbpic->data[0][j] = data[index];
++index;
}
}
data variable here is my RGB buffer (simple BYTE*) and size is data buffer size in bytes. It's start filling RGB AVFrame from bottom left to top right.
The other thing is that my version of FFMPEG didn't have av_packet_rescale_ts function. It's latest version but FFMPEG docs didn't say this function is deprecated anywhere, I guess this might be the case for windows only. Anyway I used av_rescale_q instead that does the same job. like this :
AVPacket pkt;
pkt.pts = av_rescale_q(pkt.pts, { 1, 25 }, m_stream->time_base);
And the last thing, using this format conversion I needed to change my swsContext to BGR24 instead of RGB24 like this :
m_convert_ctx = sws_getContext(width, height, AV_PIX_FMT_BGR24, width, height,
AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);
avcodec_encode_video2 & avcodec_encode_audio2 seems to be deprecated. FFmpeg of Current Version (4.2) has new API: avcodec_send_frame & avcodec_receive_packet.
i ve written a bunch of functions to load the collada (.dae) document, but the problem is the opengl glut (console) window responds slowly to keyboard reponses, i've used only string.h, stdlib.h, and fstream.h, and ofcourse gl/glut.h my program's main functions are:
Void LoadModel()
{
COLLADA ca;
double digits[3];
ca.OpenFile(char fname);
ca.EnterLibGeo();// get the position of <library_geometries>
ca.GetFloats();// search for the <float_array> from start to end, and saves thier position in the file
ca.GetAtrributes("count", char Attrib); //same as collada dom's function but its mine
Int run=atoi(Attrib); // to convert the attributes of count which is string in the file to integer
glBegin(GL_TRIANGLES);
for (int i=0;i<=run;i++)
{
MakeFloats(digits); // will convert string digits to floating point values, this function uses the starting position and ending position which GetFloats() stored in variables
glVertex3f(digits[0], digits[1], digitd[2]);
}
glEnd();
glFlush();
}
this application search for tags without loading the whole file contents into memory, LoadModel() function will be called by void display(), so whenever i try to use the keyboard function of glut, it reloads the vertex data from the file, this is ok for small .dae files but the large .dae files make my program respond slow, its because my program draws vertex by loading the file() each second, is this the right way loading models??
You are reading in the file each end every time you render the mesh; don't do that.
Instead read the file once and keep the model in memory (perhaps preprocessed a bit to ease rendering).
The VBO method of loading the mesh based on your example is:
COLLADA ca;
double digits[3];
ca.OpenFile(char fname);
ca.EnterLibGeo();// get the position of <library_geometries>
ca.GetFloats();// search for the <float_array> from start to end, and saves thier position in the file
ca.GetAtrributes("count", char Attrib); //same as collada dom's function but its mine
Int run=atoi(Attrib); // to convert the attributes of count which is string in the file to integer
int vbo;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, run*3*sizeof(float), 0, GL_STATIC_DRAW);
do{
void* ptr = glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
for (int i=0;i<=run;i++)
{
MakeFloats(digits); // will convert string digits to floating point values, this function uses the starting position and ending position which GetFloats() stored in variables
memcpy(ptr+i*3*sizeof(float), digits, 3*sizeof(float));
}
}while(!glUnmapBuffer(GL_ARRAY_BUFFER));//if buffer got corrupted then remap and do again
Then you can bind the relative buffer and draw with glDrawArrays
Disk IO is relatively slow, and is more than likely the slowness you are seeing. You should try to remove any unnecessary work from your draw function. Only load your file once at start up then keep the data in memory. If you load different files based on key press either load all of them up front or once on demand.
I'm reading the moust cursor pixmap data from the StdFBShmem_t structure, as defined in the IOFrameBufferShared API.
Everything works fine, 90% of the time. However, I have noticed that some applications on the Mac set a cursor in a different format. According to the documentation for the data structures, the cursor pixmap format should always be in the same format as the frame buffer. My frame buffer is 32 bpp. I expect the pixmap data to be in the format 0xAARRGGBB, which it is (most of the time). However, in some cases, I'm reading data that looks like a mask. Specifically, the pixels in this data will either be 0x00FFFFFF or `0x00000000. This looks to me to be a mask for separate pixel data stored somewhere else.
As far as I can tell, the only application that uses this cursor pixel format is Qt Creator, but I need to work with all applications, so I'd like to sort this out.
The code I'm using to read the cursor pixmap data is:
NSAutoreleasePool *autoReleasePool = [[NSAutoreleasePool alloc] init];
NSPoint mouseLocation = [NSEvent mouseLocation];
NSArray *allScreens = [NSScreen screens];
NSEnumerator *screensEnum = [allScreens objectEnumerator];
NSScreen *screen;
NSDictionary *screenDesc = nil;
while ((screen = [screensEnum nextObject]))
{
NSRect screenFrame = [screen frame];
screenDesc = [screen deviceDescription];
if (NSMouseInRect(mouseLocation, screenFrame, NO))
break;
}
if (screen)
{
kern_return_t err;
CGDirectDisplayID displayID = (CGDirectDisplayID) [[screenDesc objectForKey:#"NSScreenNumber"] pointerValue];
task_port_t taskPort = mach_task_self();
io_service_t displayServicePort = CGDisplayIOServicePort(displayID);
io_connect_t displayConnection =0;
err = IOFramebufferOpen(displayServicePort,
taskPort,
kIOFBSharedConnectType,
&displayConnection);
if (KERN_SUCCESS == err)
{
union
{
vm_address_t vm_ptr;
StdFBShmem_t *fbshmem;
} cursorInfo;
vm_size_t size;
err = IOConnectMapMemory(displayConnection,
kIOFBCursorMemory,
taskPort,
&cursorInfo.vm_ptr,
&size,
kIOMapAnywhere | kIOMapDefaultCache | kIOMapReadOnly);
if (KERN_SUCCESS == err)
{
// For some reason, cursor data is not always in the same format as
// the frame buffer. For this reason, we need some way to detect
// which structure we should be reading.
QByteArray pixData(
(const char*)cursorInfo.fbshmem->cursor.rgb24.image[currentFrame],
m_mouseInfo.currentSize.width() * m_mouseInfo.currentSize.height() * 4);
IOConnectUnmapMemory(displayConnection,
kIOFBCursorMemory,
taskPort,
cursorInfo.vm_ptr);
} // IOConnectMapMemory
else
qDebug() << "IOConnectMapMemory Failed:" << err;
IOServiceClose(displayConnection);
} // IOServiceOpen
else
qDebug() << "IOFramebufferOpen Failed:" << err;
}// if screen
[autoReleasePool release];
My questions are:
How can I detect if the cursor is a different format
from the framebuffer?
Where can I read the actual pixel data? The bm18Cursor
structure contains a mask section, but it's not in the
right place for me to be reading it using the code
above.
How can I detect if the cursor is a different format from the framebuffer?
The cursor is in the framebuffer. It can't be in a different format than itself.
There is no way to tell what format it's in (x-radar://problem/7751503). There would be a way to divine at least the number of bytes per pixel if you could tell how many frames the cursor has, but since you can't (that information isn't set as of 10.6.1 — x-radar://problem/7751530), you are left trying to figure out two factors of a four-factor product (bytes per pixel × width × height × number of frames, where you only have the width, the height, and the product). And even if you can figure out those missing two factors, you still don't know what order the bytes are in or whether the color components are premultiplied by the alpha component.
Where can I read the actual pixel data?
In the cursor member of the shared-cursor-memory structure.
You should define IOFB_ARBITRARY_SIZE_CURSOR before including the I/O Kit headers. Cursors can be any size now, not just 16×16, which is the size you expect when you don't define that constant. As an example, the usual Mac arrow cursor is 24×24, the “Windows” arrow cursor in CrossOver is 32×32, and the arrow cursor in X11 is 10×16.
However, in some cases, I'm reading data that looks like a mask. Specifically, the pixels in this data will either be 0x00FFFFFF or 0x00000000. This looks to me to be a mask for separate pixel data stored somewhere else.
That sounds to me more like 16-bit pixels with an 8-bit alpha channel. At least it's more probably 5-6-5 than 5-5-5.
As far as I can tell, the only application that uses this cursor pixel format is Qt Creator, but I need to work with all applications, so I'd like to sort this out.
I'm able to capture the current cursor in that app just fine with my new cursor-capturing app. Is there a specific part of the app I should hit to make it show me a specific cursor?
You might try the CGSCreateRegisteredCursorImage function, as demonstrated by Karsten in a comment on my weblog.
It is a private function, so it may change or go away at any time, so you should check whether it exists and hold IOFramebuffer in reserve, but as long as it does exist, you may find it more reliable than the complex and thinly-documented IOFramebuffer.