I am trying to encode frame data from a monitor onto an MP4 file using MFVideoFormat_H264 and a sink writer on Media Foundation using MFCreateSinkWriterFromURL. I configured my input IMFMediaType to contain
inputMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
inputMediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);
inputMediaType->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
MFSetAttributeRatio(inputMediaType, MF_MT_FRAME_RATE, 60, 1);
MFSetAttributeRatio(inputMediaType, MF_MT_FRAME_RATE_RANGE_MAX, 120, 1);
FSetAttributeRatio(inputMediaType, MF_MT_FRAME_RATE_RANGE_MIN, 1, 1);
MFSetAttributeRatio(inputMediaType, MF_MT_PIXEL_ASPECT_RATIO, 1, 1);
MFSetAttributeSize(inputMediaType, MF_MT_FRAME_SIZE, 1920, 1080);
all on a 1080p monitor at 60Hz refresh rate. My outputMediaType is similar other than
outMediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264);
outputMediaType->SetUINT32(MF_MT_AVG_BITRATE, 10000000);
The sink writer itself is also configured such that MF_SINK_WRITER_DISABLE_THROTTLING=TRUE and
MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS=TRUE to get the best possible performance out of it using hardware acceleration when available. Everything works and videos get created successfully. However, each video seems to have stutter across the entire duration. I've attempted lowering the bitrate and raising the average FPS to try and compensate but its more putting a bandaid on it. My assumption is that there are dropped frames causing this stutter as a result of a bucket overflowing?
Is anyone aware of a fix for this issue of frame drops/stuttering in the final video file while retaining the h264 encoding format?
EDIT: I've also tinkered with the different attributes on the input and output types setting
hr = MFSetAttributeRatio(pMediaTypeIN/OUT, MF_MT_FRAME_RATE_RANGE_MIN, 60, 1); but to no avail.
Related
I am trying to write frames (cv::Mat) into a video file using cv::VideoWriter. I need maximum speed (latency should be minimum). So, I was trying to use Gstreamer with x264 encoding. I wrote the following code:
std::string command = "appsrc ! videoconvert ! x264enc ! filesink location=output.mp4";
auto writer_ =
cv::VideoWriter(command, cv::CAP_GSTREAMER, 0, frameRate_, cv::Size(frameWidth_, frameHeight_), true);
// frameRate_ = 25
// frameWidth_ = 1920
// frameHeight_ = 1080
//...
// for example
// auto frame = cv::Mat(cv::Size(1920, 1080), CV_8UC3, 0);
writer_.write(frame);
Everything works fine, but the output video has very low quality (not in terms of resolution, the resolution is still the same). The frames are pixelated. I have searched the Internet but could not find the reason why this is happening.
What can be the reason for this?
Suggestions on a faster video writing method (in OpenCV) are also appreciated!
Edit: I tried #Micka's suggestion and changed the bitrate (to >10000 to achieve the required quality) but the latency increased significantly. Is there a faster way to save videos without losing quality much?
I am trying to record an animation that I created using DirectX 11 as a video, which I can present whenever needed (without re-rendering). I am still learning about DirectX and windows API.
This's what I've done so far, I was able to capture animation frames using DirectXTk and following this post. After that I'm using OpenCV to collect frames from disk, and create a video. Is there a way to merge this process? That way I'd be able to append frames into a video file right after img capture.
Code for animation capture:
static int Frame_Number;
void D3D::screenCapture() {
//For each Call to Present() do the following:
//Get Device
//ID3D11Device* baseDevice;
HRESULT gd = m_swapChain->GetDevice(__uuidof(ID3D11Device), (void**)&m_device);
assert(gd == S_OK);
//Get context
//ID3D11DeviceContext* context;
m_device->GetImmediateContext(&m_deviceContext);
//get pointer to back buffer
ID3D11Texture2D* backbufferTex;
HRESULT gb = m_swapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (LPVOID*)&backbufferTex);
assert(gb == S_OK);
//Set-up Directory
std::wstringstream Image_Directory;
Image_Directory << L"path to directory/screenShots" << Frame_Number << L".JPG";
//Capture Frame
REFGUID GUID_ContainerFormatJpeg{ 0x19e4a5aa, 0x5662, 0x4fc5, 0xa0, 0xc0, 0x17, 0x58, 0x2, 0x8e, 0x10, 0x57 };
HRESULT hr = DirectX::SaveWICTextureToFile(m_deviceContext, backbufferTex, GUID_ContainerFormatJpeg, Image_Directory.str().c_str());
assert(hr == S_OK);
Frame_Number = Frame_Number + 1;
}
I call this function after I present the rendered scene to the screen. After that I use a python script to create a video from the captured frames.
This's not optimal, especially in the case of rendering many animations. It will take forever, I would like to eliminate the reading and writing to disk. Is there a way to get frames from
SaveWICTextureToFile
that I can push in a video in a sequential manner.
How could one accomplish this?
I would really appreciate any help or pointers.
Possible, but relatively hard, many pages of code.
Here’s a tutorial written by Microsoft. You gonna need to change following there.
Integrate with Direct3D. To do that, call MFCreateDXGIDeviceManager, then IMFDXGIDeviceManager.ResetDevice, then pass that IMFDXGIDeviceManager interface in MF_SINK_WRITER_D3D_MANAGER attribute when creating the sink writer. Also, don’t forget to set MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS to TRUE, you don’t want software encoders, they are way too slow.
Replace the video codec, use MFVideoFormat_H264 instead of MFVideoFormat_WMV3, and *.mp4 extension for the output file.
The sample code encodes video frames provided in system memory. Instead, you should supply your video frames in VRAM. Every time your D3D app renders a frame, create a new D3D texture, copy your render target into that new texture with CopyResource, then call MFCreateDXGISurfaceBuffer. This will create IMFMediaBuffer object referencing a video frame in VRAM. You can then submit that video frame to the sink writer, and it should do the rest of the things automagically.
If you’ll manage to implement that correctly, Media Foundation framework gonna use a proprietary hardware transform to convert RGBA textures into NV12 textures on GPU, then use another proprietary hardware transform to encode NV12 into h.264, download encoded video samples from VRAM to system RAM as soon as ready, and append these encoded samples into mpeg-4 container file on disk.
Both of the above transforms are implemented by GPU, in hardware. All 3 GPU vendors have hardware for that, and they ship media foundation transform DLLs to use their hardware as a part of their GPU drivers.
I use FFmpeg to record videos from a RTSP stream (the codec is H.264). It works. But I face a problem with the bitrate value. First, I set bitrate like below, but it doesn't work:
AVCodecContext *m_c;
m_c->bit_rate = bitrate_value;
Following this question I can set bitrate manually with this command:
av_opt_set(m_c->priv_data, "crf", "39", AV_OPT_SEARCH_CHILDREN);
But I have to test several times to choose value '39', which creates acceptable video quality. It's hard to do it again if I use another camera setting (image width, height, etc). Is there a way to set bitrate more easily, and adaptively?
Im working on project with opencv and c++. Version of Opencv is 3.1. HW setup is Nvidia gt460 and Intel i7 3820, 64Gb ram. Im trying to achieve multiple camera setup where all camera feeds will be merged in one big mosaic. In early stages maybe 4x4 later even bigger. After that I will be analyzing this mosaic and tracking multiple objects.
The problem is that when I create camera feed with capture command in Opencv and then store it to matrix, analyze it and show it. There's big FPS issue already with two camera feeds. I have tested three USB feeds as well as multiple UDP or RTSP streams. When using USB, delay is not the biggest problem but FPS are something like spliting between feeds. And using stream method is giving me low FPS and high delay (around 15 seconds). I also realized there is different delay between camera feeds even if I have cameras pointed on the same thing.
Is there anybody, who could help me or solved similiar problem?
Is it problem of Opencv that it cannot analyze more live feeds simultaneously?
Heres my merging code:
merged_frame = Mat(Size(1280, 960), CV_8UC3);
roi = Mat(merged_frame, Rect(0, 0, 640, 480));
cameraFeed.copyTo(roi);
roi = Mat(merged_frame, Rect(640, 0, 640, 480));
cameraFeed2.copyTo(roi);
roi = Mat(merged_frame, Rect(0, 480, 640, 480));
cameraFeed3.copyTo(roi);
roi = Mat(merged_frame, Rect(640, 480, 640, 480));
cameraFeed4.copyTo(roi);
There exists two functions hconcat and vconcat that are not in the documentation.
You can see an example of their use (which is quite easy if all your camera feeds provide frames that have the same resolution) here.
This will probably ask you to create temporary Mat objects to store intermediate results, but I think it's a more intuitive way to create a mosaic of frames.
I have a workflow as follows:
Get raw YUV frame.
Pass it in to Windows Media Foundation to encode in to an H.264 frame.
Convert the output to an FFmpeg AVPacket.
Inject the packet with av_interleaved_write_frame to an output file mixed with other things.
On Windows 7, this worked great. On Windows 8, av_interleaved_write_frame broke. The reason for this is that Windows 8 introduced B-Frames to the output, which av_interleaved_write_frame just didn't like, no matter how I set the pts/dts.
I modified the encoder to use 0 B-Frames, which then gave me the output I wanted. But...
After about 10-15 seconds of encoded frames, the video degrades from nearly perfect to extremely blocky and a very low frame rate. I've tried changing most of the settings available to Windows 8 to modify the encoder, but nothing seems to help.
The only thing that did make an affect was changing the bitrate of the encoder. The more I increase the encoder bitrate, the longer the video goes before it starts to degrade.
Any ideas on what changed between Windows 7 and Windows 8 that may have caused this to happen?
Encoder Setup Minus Success Checks (They All Succeed)
IMFMediaType * mInputType = NULL;
MFCreateMediaType(&mOutputType);
mOutputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
mOutputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264);
mOutputType->SetUINT32(MF_MT_AVG_BITRATE, 10000000);
MFSetAttributeSize(mOutputType, MF_MT_FRAME_SIZE, frameWidth, frameHeight);
MFSetAttributeSize(mOutputType, MF_MT_FRAME_RATE, 30, 1);
MFSetAttributeSize(mOutputType, MF_MT_FRAME_RATE_RANGE_MAX, 30, 1);
MFSetAttributeSize(mOutputType, MF_MT_FRAME_RATE_RANGE_MIN, 15, 1);
mOutputType->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
mOutputType->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, 1);
mOutputType->SetUINT32(MF_MT_FIXED_SIZE_SAMPLES, 1);
mOutputType->SetUINT32(MF_MT_SAMPLE_SIZE, frameWidth * frameHeight * 2);
mOutputType->SetUINT32(MF_MT_MPEG2_PROFILE, eAVEncH264VProfile_Main);
mOutputType->SetUINT32(CODECAPI_AVEncCommonRateControlMode, eAVEncCommonRateControlMode_Quality);
mOutputType->SetUINT32(CODECAPI_AVEncCommonQuality, 80);
Encoding of the frame is basically:
MFCreateMemoryBuffer to store the incoming YUV data.
MFCreateSample.
Attach the buffer to the sample.
Set the sample time and duration.
ProcessInput
ProcessOutput with an the proper output size
On success, build an AVPacket with the sample's info