Can a high-performance jpeglib-turbo implmentation decompress/compress in <100ms? - libjpeg

I'm currently implementing a jpeg resizer in C++ using the jpeglib-turbo library.
I've been given the target of 100 milli-seconds for JPEG decompression and recompression using the library. The best I can come up with using the recommended optimisation settings (documented in jpeglib-turbo usage.txt) is around 320ms, so I'm wondering is 100ms even possible/realistic? This would be to decompress/recompress an image of 3000x4000 px from around 6Mb in size to 130Kb.
The code that I'm using for fast decompression is:
dinfo.dct_method = JDCT_IFAST;
dinfo.do_fancy_upsampling = FALSE;
dinfo.two_pass_quantize = FALSE;
dinfo.dither_mode = JDITHER_ORDERED;
dinfo.scale_num = 1/8;

Thanks for the answers.
It is actually possible to decompress and re-compress in around 100ms. After contacting the writer of libjpeg-turbo he told me that the dinfo.scale_num property I was using was wrong. This property is the scale numerator - I also needed to set the scale_denom (denominator) property.
So the good code would be:
dinfo.dct_method = JDCT_IFAST;
dinfo.do_fancy_upsampling = FALSE;
dinfo.two_pass_quantize = FALSE;
dinfo.dither_mode = JDITHER_ORDERED;
dinfo.scale_num = 1;
dinfo.scale_denom = 8;
I want the code to be so fast as the image scaling should be imperceptible for the user as it's in a client application where speed/user-experience is the most important thing.

Related

ffmpeg H264 Encode Frame at a time for network streaming

I'm working on a remote desktop application, I would like to send an encoded H264 packet over TCP by using ffmpeg for the encoding. However I couldn't find useful info for the particular case of encoding just one frame (already on YUV444) and get the packet.
I have several issues, the first was that:
avcodec_encode_video2
Was not blocking, I found that most of the time you get the "delayed" frames at the end, however, since this is a real time streaming the solution was:
av_opt_set(mCodecContext->priv_data, "tune", "zerolatency", 0);
Now I got the frame, but several issues, it takes a while and even worse I got a gray with trash pixels video as result. My configuration for the Codec Context:
m_pCodecCtx->bit_rate=8000000;
m_pCodecCtx->codec_id=AV_CODEC_ID_H264;
m_pCodecCtx->codec_type = AVMEDIA_TYPE_VIDEO;
m_pCodecCtx->width=1920;
m_pCodecCtx->height=1080;
m_pCodecCtx->pix_fmt=AV_PIX_FMT_YUV444P;
m_pCodecCtx->time_base.num = 1;
m_pCodecCtx->time_base.den = 25;
m_pCodecCtx->gop_size = 1;
m_pCodecCtx->keyint_min = 1;
m_pCodecCtx->i_quant_factor = float(0.71);
m_pCodecCtx->b_frame_strategy = 20;
m_pCodecCtx->qcompress = (float)0.6;
m_pCodecCtx->qmax = 51;
m_pCodecCtx->qmin = 20;
m_pCodecCtx->max_qdiff = 4;
m_pCodecCtx->refs = 4;
m_pCodecCtx->max_b_frames = 1;
m_pCodecCtx->thread_count = 1;
I would like to know how this could be done, how do I set the "I Frames"? and, that would be the optimal for a "one at a time" encoding? Also I'm not concerned right now with the quality, just need to be fast enough (under 16 ms).
For the encoding part:
nres = avcodec_encode_video2(m_pCodecCtx,&packet,m_pFrame,&framefinished);
if(nres<0){
qDebug() << "error encoding: " << nres << endl;
}
if(framefinished){
m_pFrame->pts++;
ofstream vidout("video.h264",ios::app);
if(vidout.good()){
vidout.write((const char*)&packet.data[0],packet.size);
}
vidout.close();
av_packet_unref(&packet);
}
I'm not using a container, just a raw file, ffplay reproduce raw files if the packets are right, and that's my principal issue. I'm planning to send the packet over tcp and decode on the client. Any help would be greatly appreciated.
You could take a look at the source code of webrtc.
It use openh264 and ffmpeg to accomplish your work.
I was study in it for a while. But I can't the the latest source code currently.
I found this :
source code.
Hope it helps.
Turns out I got it working since the beginning, I made very simple but important mistake, I was writing as text a binary file, so...
Thanks for the feedback and your help

opencv videocapture default setting

I am using mac book and have a program written in C++, the program is to extract successive frames from the webcam. The extracted frames are then grayscaled and smoothed using opencv functions. After that i would use CVNorm to find out the relative difference between 2 frames. I am using videoCapture class.
I found out that the frame rate is 30fps and using CVNorm, the relative difference obtained between successive frames are less than 200 most of the time.
I am trying to do the same thing on xcode so as to implement the program on ipad. This time I am using AVCaptureSession, the same steps are performed but i realize that the relative difference between 2 frames are much higher (>600).
Thus i would like to know about the default setting for videoCapture class, I know that i can edit the setting using cvSetCaptureProperty but i cannot find the default setting of it. After that i would compare it with the setting of the AVcaptureSession and hope to find out why there is such a huge difference in CVNorm when i use these 2 approaches to extract my frame.
Thanks in advance.
OpenCV's VideoCapture class is just a simple wrapper for capturing video from cameras or for reading video files. It is built upon several multimedia frameworks (avfoundation, dshow, ffmpeg, v4l, gstreamer, etc.) and totally hides them from the outside. The problem is coming from here, it is really hard to achieve the same behaviour of capturing under different platform and multimedia frameworks. The common set of (capture's) properties are poor and setting their values is rather only a suggestion instead of a requirement.
In summary, the default properties can differ under different platforms, but in case of AV Foundation framework:
The cvCreateCameraCapture_AVFoundation(int index) function will create a CvCapture instance under iOS, which is defined in cap_qtkit.mm. Seems like you are not able to change the sampling rate, only CV_CAP_PROP_FRAME_WIDTH, CV_CAP_PROP_FRAME_HEIGHT and DISABLE_AUTO_RESTART are supported.
The grabFrame() implementation is below. I'm absolutely not an Objective-C expert, but it seems like it waits until the capture updates the image or a time out occurs.
bool CvCaptureCAM::grabFrame() {
return grabFrame(5);
}
bool CvCaptureCAM::grabFrame(double timeOut) {
NSAutoreleasePool* localpool = [[NSAutoreleasePool alloc] init];
double sleepTime = 0.005;
double total = 0;
[NSTimer scheduledTimerWithTimeInterval:100 target:nil selector:#selector(doFireTimer:) userInfo:nil repeats:YES];
while (![capture updateImage] && (total += sleepTime)<=timeOut) {
[[NSRunLoop currentRunLoop] runUntilDate:[NSDate dateWithTimeIntervalSinceNow:sleepTime]];
}
[localpool drain];
return total <= timeOut;
}

How to reduce latency when streaming x264

I would like to produce a zerolatency live video stream and play it in VLC player with as little latency as possible.
This are the settings I currently use:
x264_param_default_preset( &m_Params, "veryfast", "zerolatency" );
m_Params.i_threads = 2;
m_Params.b_sliced_threads = true;
m_Params.i_width = m_SourceWidth;
m_Params.i_height = m_SourceHeight;
m_Params.b_intra_refresh = 1;
m_Params.b_vfr_input = true;
m_Params.i_timebase_num = 1;
m_Params.i_timebase_den = 1000;
m_Params.i_fps_num = 1;
m_Params.i_fps_den = 60;
m_Params.rc.i_vbv_max_bitrate = 512;
m_Params.rc.i_vbv_buffer_size = 256;
m_Params.rc.f_vbv_buffer_init = 1.1f;
m_Params.rc.i_rc_method = X264_RC_CRF;
m_Params.rc.f_rf_constant = 24;
m_Params.rc.f_rf_constant_max = 35;
m_Params.b_annexb = 0;
m_Params.b_repeat_headers = 0;
m_Params.b_aud = 0;
x264_param_apply_profile( &m_Params, "high" );
Using those settings, I have the following issues:
VLC shows lots of missing frames (see screenshot, "verloren"). I am not sure if this is an issue.
If I set a value <200ms for the network stream delay in VLC, VLC renders a few frames and than stops to decode/render frames.
If I set a value >= 200ms for the network stream delay in VLC, everything looks good so far but the latency is, obviously, 200ms, which is too high.
Question:
Which settings (x264lib and VLC) should I use in order to encode and stream with as little latency as possible?
On your x264 settings: many are redundant ie already contained in "zerolatency". However, as best as I can tell, your encoding latency is nevertheless zero frames, ie you put one frame in and you immediately (as soon as your CPU has finished encoding it, anyway) get one frame out. It never waits for a newer frame in order to give an encoded older frame (the way it would with lookahead, for example).
On why vlc pauses unless you give it a large network delay: The problem is that your combination of rate control and vbv settings when encoding is not ideal. What you want to do for low latency encode is to use CBR, and set the VBV buffer to the size of one frame, exactly. This enables a special VBV calculation, if you look in the x264 source.
You may also try not setting anything timing related whatsoever (no fps, no vbv) and use CRF with zerolatency. The results would depend on what container the video is packaged in for streaming.
Read this for more info: http://x264dev.multimedia.cx/archives/249
If you want to have the fastest possible encoding, then delete everything after
x264_param_default_preset( &m_Params, "veryfast", "zerolatency" );
and change veryfast to ultrafast. The rest is because of network delay + decoding.

MSER on Videotracking

I've a time problem. I have programmed a qt Gui for imageprocessing. For this case it would be relevant to implement blobdectetors for videoprocessing and object tracking. Principally it looks good. It uses with GUI behind processing, grabbing, mser operation and displaying just 0.07 to 0.08 seconds which could be used for a nice framerate over 10 fps.
For that purposes i user Qt 4 - C++, on Suse 12.3. OpenCV 2.4.3 and a laptop webcam. My problem is, that after a short while my program's hanging.
Looking at my system monitor i am seeing that CPU-Power has reached 100 % and a single run uses hard ressources of cpu uses processor for long time (without GUI). I don't understand what is going wrong. Has anybody experiences with that?
TY in advance!
Some Code snippets:
MSER Initialisation about GUI:
MSER FtMSERVid( MSERDelta, MSERMinArea, MSERMaxArea,MSERMaxVariation ,MSERMinDiversity);
videoprocessing function
double startTime = clock();
camDev.read(vidImg);
if(vidImg.empty() == true)
{
newLineInText(tr("No data from device"));
timer->stop();
ui->pbPlay->setText(tr(">"));
return;
}
MSERPointsVid.clear();
if(vidImg.channels() > 1)
cvtColor(vidImg, vidImg,CV_BGR2GRAY);
FtMSERVid(vidImg, MSERPointsVid);
Mat showMat = vidImg.clone();
if(showMat.channels() > 1)
{
cvtColor(showMat,showMat,CV_BGR2RGB);
qImg = QImage((uchar*)showMat.data,showMat.cols,showMat.rows,showMat.step,QImage::Format_RGB888);
}
else if(showMat.channels() == 1)
qImg = QImage((uchar*)showMat.data,showMat.cols,showMat.rows,showMat.step,QImage::Format_Indexed8);
ui->lblOrig->setPixmap(QPixmap::fromImage(qImg));
double endTime = clock();
double timeDuration = (endTime - startTime)/CLOCKS_PER_SEC;
if(numVid%10 == 0)
{
framesPS = int(1/timeDuration) - 1;
if(framesPS > 1)
framesPS = 1;
FPSChanged(framesPS);
numVid = 0;
}
your hints have helped me to solve a problem. MSER is creating a lot data and i have programmed for displaying that a secondly update into a table, which works independently. So far no problem but it is to much for the table to display all the Points.So it was provided just to fill the hullpoints in the table. I have changed the according vector and then it runs like nothing else.
That i have found out because of your hint to valgrind. I have never needed this before. The threading hints have let me learned much about threading. Thank you for that.
Ingeborg

How do I filter out out-of-hearing-range data from PCM samples using C++?

I have raw 16bit 48khz pcm data. I need to strip all data which is out of the range of human hearing.
For now I'm just doing a sum of all samples and then dividing by the sample count to calculate peak sound level, but I need to reduce false positives.
I have big peak level all the time, speaking and other sounds which I can hear increasing levels just a little, so I need to implement some filtering. I am not familiar with sound processing at all, so currently I am not using any filtering because I do not understand how to create it. My current code looks like this:
for(size_t i = 0; i < buffer.size(); i++)
level += abs(buffer[i]);
level /= buffer.size();
How can I implement this kind of filtering using C++?
Use a band pass filter.
A band-pass filter is a device that passes frequencies within a
certain range and rejects (attenuates) frequencies outside that range.
This sounds like exactly the sort of filter you are looking for.
I had a quick google search and found this thread that discusses implementation in C++.
It sounds like you want to do something (maybe start recording) if the sound level goes above a certain threshold. This is sometimes called a "gate". It also sounds like you are having trouble with false positives. This is sometimes handled with a "side-chain" applied to the gate.
The general principle of a gate is create an envelope of your signal, and then monitor the envelope to discover when it goes above a certain threshold. If it is above the threshold, your gate is "on", if not, your gate is "off". If you treat your signal before creating the envelope in some way to make it more or less sensitive to various parts of your signal/noise the treatment is called a "side-chain".
You will have to discover the details on your own because there is too much for a Q&A website, but maybe this is enough of a start:
float[] buffer; //defined elsewhere
float HOLD = .9999 ; //there are precise ways to compute this, but experimentation might work fine
float THRESH = .7 ; //or whatever
float env = 0; //we initialize to 0, but in real code be sure to save this between runs
for(size_t i = 0; i < buffer.size(); i++) {
// side-chain, if used, goes here
float b = buffer[i];
// create envelope:
float tmp = abs(b); // you could also do buffer[i] * buffer[i]
env = env * HOLD + tmp * (1-HOLD);
// threshold detection
if( env > THRESH ) {
//gate is "on"
} else {
//gate is "off"
}
}
The side-chain might consist of filters like an eq. Here is a tutorial on designing audio eq: http://blog.bjornroche.com/2012/08/basic-audio-eqs.html