Using libVLC as a video decoder - c++

I'm attempting to use libVLC as a video decoder for a motion detection project. Previously I was using ffmpeg libraries, but some issues with Matroska files brought me here. Along with playing video back at the correct rate, I also want to be able to get one frame after another at the fastest rate my system can handle, as once the user sets up some parameters, I want the motion detection algorithm to run through the video as quickly as it can. My libVLC setup code looks like this (error handling and minor details omitted for brevity):
const char* vlc_argv[] =
{
"--no-audio", /* skip any audio track */
};
libvlc_instance_t* inst = libvlc_new(sizeof(vlc_argv) / sizeof(*vlc_argv), vlc_argv);
auto media = libvlc_media_new_path (inst, filename.c_str());
player = libvlc_media_player_new_from_media(media);
libvlc_media_release(media);
// Needed to initialize the player ?
libvlc_media_player_play(player);
libvlc_media_player_pause(player);
fps = libvlc_media_player_get_fps(player);
length = libvlc_media_player_get_length(player);
width = libvlc_video_get_width(player);
height = libvlc_video_get_height(player);
// TODO: Add libvlc_video_set_callbacks to set up callbacks to render to memory buffer
However, I am left with the following questions:
Is there a more straightforward way to initialize the media player without starting playback besides calling libvlc_media_player_play then libvlc_media_player_pause?
All of the get functions (fps, length, width, height) all return zero. Do I need to do something like read the first frame to get these values, and if so, how am I supposed to know how large my decoded frame buffer must be?

From a VLC developer:
The regular playback system is really not meant for unpaced decoding. You'd need to use stream output, for which there is no programmable API as yet.
get calls are returning zero because you need to wait until the tracks are created.

Related

Capture default audio stream with ALSA in C++

I am doing a fun project to change Philips Hue bulb lights color based on the sound that is coming from the default ALSA device.
I want to write small C++ program that captures and analyzes default audio stream and split it into 3 changes low, mid, and high, then assing those channels to red, green, and blue.
I am trying to read how to create ALSA devices but I am struggling to figure out and Google how to capture streams with ALSA. This is the first time I work with Audio and ALSA. I am trying to avoid using python for now as I want to learn a bit more.
If you believe that it is not worth writing this on C++ I will do it in python.
This answer is broken into two parts. The first part discusses how to take the audio data and use it to represent LED "bits" for use in LED brightness setting. The second part discusses how to use C++ to read audio data from the ALSA sound card.
Part 1
An idea for splitting into RGB, you could work out how to convert the audio samples into 24 bit representation in a "perceptual manner". As we hear nonlinearly, you probably want to take the logarithm of the audio data. Because the audio data is both positive and negative, you probably want to do this on its absolute value. Finally for each buffer read from the ADC audio input, you probably want to take the RMS first (which will handle doing the absolute value for you).
So the steps in processing would be :
Capture the audio buffer
Take the RMS for each column of the audio buffer (each column is an audio channel).
Take the logarithm of the RMS value for each column.
Work out how to map each channel's log(RMS) value onto the LEDs. One idea is to use log base 2 (log2) of the RMS of the audio data as that will give you 32 bits of data, which you can divide down (rotate by 8 : log2(RMS) << 8) to get a 24 bit representation. Then work out how to map these 24 bits onto LEDs to achieve your aim.
For example in pseudo code :
float loudness=log2(RMS(buffer);
if (loudness)>pow(2.,16.))
setTheRedLED(loudness/pow(2.,16.));
else if (loudness)>pow(2.,8.))
setTheBlueLED(loudness/pow(2.,8.));
else
setTheGreenLED(loudness);
Part 2
You can use gtkiostream to implement C++ classes for handling audio with ALSA.
For example this ALSA::Capture class allows you to capture audio for processing.
To use it you include it into your code :
#include "ALSA/ALSA.H"
using namespace ALSA;
Then you can stream in audio to a matrix (matrix columns are audio channels). First however you instantiate the class in your C++ code :
Capture capture("hw:0"); // to open the device hw:0 you could use "default" or another device
// you can now reset params if you don't want to use the default, see here : https://github.com/flatmax/gtkiostream/blob/master/applications/ALSACapture.C#L82
capture.setParams(); // set the parameters
if (!capture.prepared()){
cout<<"should be prepared, but isn't"<<endl;
return -1;
}
// now define your audio buffer you want to use for signal processing
Eigen::Array<int, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> buffer(latency, chCnt);
// start capturing
if ((res=capture.start())<0) // start the device capturing
ALSADebug().evaluateError(res);
cout<<"format "<<capture.getFormatName(format)<<endl;
cout<<"channels "<<capture.getChannels()<<endl;
cout<<"period size "<<pSize<<endl;
// now do an infinite loop capturing audio and then processing it to do what you want.
while (true){
capture>>buffer; // capture the audio to the buffer
// do something with the audio in the buffer to separate out for red blue and green
}
A more complete capture example is available here.

Synchronizing FFMPEG video frames using PTS

I'm attempting to synchronize the frames decoded from an MP4 video. I'm using the FFMPEG libraries. I've decoded and stored each frame and successfully displayed the video over an OPENGL plane.
I've started a timer just before cycling through the frames; the aim being to synchronize the Video correctly. I then compare the PTS of each frame against this timer. I stored the PTS received from the packet during decoding.
What is displayed within my application does not seem to play at the rate I expect. It plays faster than the original video file would within a media player.
I am inexperienced with FFMPEG and programming video in general. Am I tackling this the wrong way?
Here is an example of what I'm attempting to do
FrameObject frameObject = frameQueue.front();
AVFrame frame = *frameObject.pFrame;
videoClock += dt;
if(videoClock >= globalPTS)
{
//Draw the Frame to a texture
DrawFrame(&frame, frameObject.m_pts);
frameQueue.pop_front();
globalPTS = frameObject.m_pts;
}
Please note I'm using C++, Windows, Opengl, FFMPEG and the VS2010 IDE
First off, Use int64_t pts = av_frame_get_best_effort_timestamp(pFrame) to get the pts. Second you must make sure both streams you are syncing use the same time base. The easiest way to do this is convert everything to AV_TIME_BASE_Q. pts = av_rescale_q ( pts, formatCtx->streams[videoStream]->time_base, AV_TIME_BASE_Q ); In this format, pts is in nanoseconds.

Writing variable framerate videos in openCV

The steps I follow for writing a video file in openCV are as follows:
CvVideoWriter *writer =cvCreateVideoWriter(fileName, Codec ID, frameRate, frameSize); // Create Video Writer
cvWriteFrame(writer, frame); // Write frame
cvReleaseVideoWriter(&writer); // Release video writer
The above code snippet writes at a fixed frame rate. I need to write out variable frame rate videos. The approach I had used earlier with libx264 involved writing individual timestamps to each frame.
So, the question is how do I write timestamps to a frame in openCV - what is the specific API ? More generally, how do I create variable frame rate videos ?
I don't think it is possible to do this with OpenCV directly without modifying the code to give access under the hood. You would need to use a different library like libvlc to do so using the imem to get your raw RGB frames in OpenCV into a file. This link provides an example using imem with raw images loaded from OpenCV. You would just need to change the :sout options to save to the file you want using your preferred codec.

Whole screen capture and render in DirectX [PERFORMANCE]

I need some way to get screen data and pass them to DX9 surface/texture in my aplication and render it at at least 25fps at 1600*900 resolution, 30 would be better.
I tried BitBliting but even after that I am at 20fps and after loading data into texture and rendering it I am at 11fps which is far behind what I need.
GetFrontBufferData is out of question.
Here is something about using Windows Media API, but I am not familiar with it. Sample is saving data right into file, maybe it can be set up to give you individual frames, but I haven't found good enough documentation to try it on my own.
My code:
m_memDC.BitBlt(0, 0, m_Rect.Width(),m_Rect.Height(), //m_Rect is area to be captured
&m_dc, m_Rect.left, m_Rect.top, SRCCOPY);
//at 20-25fps after this if I comment out the rest
//DC,HBITMAP setup and memory alloc is done once at the begining
GetDIBits( m_hDc, (HBITMAP)m_hBmp.GetSafeHandle(),
0L, // Start scan line
(DWORD)m_Rect.Height(), // # of scan lines
m_lpData, // LPBYTE
(LPBITMAPINFO)m_bi, // address of bitmapinfo
(DWORD)DIB_RGB_COLORS); // Use RGB for color table
//at 17-20fps
IDirect3DSurface9 *tmp;
m_pImageBuffer[0]->GetSurfaceLevel(0,&tmp); //m_pImageBuffer is Texture of same
//size as bitmap to prevent stretching
hr= D3DXLoadSurfaceFromMemory(tmp,NULL,NULL,
(LPVOID)m_lpData,
D3DFMT_X8R8G8B8,
m_Rect.Width()*4,
NULL,
&r, //SetRect(&r,0,0,m_Rect.Width(),m_Rect.Height();
D3DX_DEFAULT,0);
//12-14fps
IDirect3DSurface9 *frameS;
hr=m_pFrameTexture->GetSurfaceLevel(0,&frameS); // Texture of that is rendered
pd3dDevice->StretchRect(tmp,NULL,frameS,NULL,D3DTEXF_NONE);
//11fps
I found out that for 512*512 square its running on 30fps (for i.e. 490*450 at 20-25) so I tried dividing screen, but it didn't seem to work well.
If there is something missing in code please write, don't vote down. Thanks
Starting with Windows 8, there is a new desktop duplication API that can be used to capture the screen in video memory, including mouse cursor changes and which parts of the screen actually changed or moved. This is far more performant than any of the GDI or D3D9 approaches out there and is really well-suited to doing things like encoding the desktop to a video stream, since you never have to pull the texture out of GPU memory. The new API is available by enumerating DXGI outputs and calling DuplicateOutput on the screen you want to capture. Then you can enter a loop that waits for the screen to update and acquires each frame in turn.
To encode the frames to a video, I'd recommend taking a look at Media Foundation. Take a look specifically at the Sink Writer for the simplest method of encoding the video frames. Basically, you just have to wrap the D3D textures you get for each video frame into IMFSample objects. These can be passed directly into the sink writer. See the MFCreateDXGISurfaceBuffer and MFCreateVideoSampleFromSurface functions for more information. For the best performance, typically you'll want to use a codec like H.264 that has good hardware encoding support (on most machines).
For full disclosure, I work on the team that owns the desktop duplication API at Microsoft, and I've personally written apps that capture the desktop (and video, games, etc.) to a video file at 60fps using this technique, as well as a lot of other scenarios. This is also used to do screen streaming, remote assistance, and lots more within Microsoft.
If you don't like the FrontBuffer, try the BackBuffer:
LPDIRECT3DSURFACE9 surface;
surface = GetBackBufferImageSurface(&fmt);
to save it to a file use
D3DXSaveSurfaceToFile(filename, D3DXIFF_JPG, surface, NULL, NULL);

An efficient way to buffer HD video real-time without maxing out memory

I am writing a program that involves real-time processing of video from a network camera using OpenCV. I want to be able to capture (at any time during processing) previous images (e.g. say ten seconds worth) and save to a video file.
I am currently doing this using a queue as a buffer (to push 'cv::Mat' data) but this is obviously not efficient as a few seconds worth of images soon uses up all the PC memory. I tried compressing images using 'cv::imencode' but that doesn't make much difference using PNG, I need a solution that uses hard-drive memory and efficient for real-time operation.
Can anyone suggest a very simple and efficient solution?
EDIT:
Just so that everyone understands what I'm doing at the moment; here's the code for a 10 second buffer:
void run()
{
cv::VideoCapture cap(0);
double fps = cap.get(CV_CAP_PROP_FPS);
int buffer_lenght = 10; // in seconds
int wait = 1000.0/fps;
QTime time;
forever{
time.restart();
cv::mat image;
bool read = cap.read(image);
if(!read)
break;
bool locked = _mutex.tryLock(10);
if(locked){
if(image.data){
_buffer.push(image);
if((int)_buffer.size() > (fps*buffer_lenght))
_buffer.pop();
}
_mutex.unlock();
}
int time_taken = time.elapsed();
if(time_taken<wait)
msleep(wait-time_taken);
}
cap.release();
}
queue<cv::Mat> _buffer and QMutex _mutex are global variables. If you're familiar with QT, signals and slots etc, I've got a slot that grabs the buffer and saves it as a video using cv::VideoWriter.
EDIT:
I think the ideal solution will be for my queue<cv::Mat> _buffer to use hard-drive memory rather than pc memory. Not sure on which planet this is possible? :/
I suggest looking into real-time compression with x264 or similar. x264 is regularly used for real-time encoding of video streams and, with the right settings, can encode multiple streams or a 1080p video stream in a moderately powered processor.
I suggest asking in doom9's forum or similar forums.
x264 is a free h.264 encoder which can achieve 100:1 or better (vs raw) compression. The output of x264 can be stored in your memory queue with much greater efficiency than uncompressed (or losslessly compressed) video.
UPDATED
One thing you can do is store images to the hard disk using imwrite and update their filenames to the queue. When the queue is full, delete images as you pop filenames.
In your video writing slot, load the images as they are popped from the queue and write them to your VideoWriter instance
You mentioned you needed to use Hard Drive Memory
In that case, consider using the OpenCV HighGUI VideoWriter. You can create an instance of VideoWriter as below:
VideoWriter record("RobotVideo.avi", CV_FOURCC('D','I','V','X'),
30, frame.size(), true);
And write image captures to in as below:
record.write(image);
Find the documentation and the sample program on the website.