Calculating the median at each pixel for multiple frames OpenCV (C++) - c++

The task is to extract stable background out of the video, the idea is to choose n random frames from the video and take median for each pixel. I was doing this task in python using numpy
medianFrame = np.median(frames, axis=0).astype(dtype=np.uint8)
But now i need to perform the same task in C++. I have tried the naive way of splitting channels and going through rows*cols number of pixels for n frames to calculate the median frame but its not efficient at all and takes way more time than what np.median was taking. I also try to use xtensor for performing the task but wasn`t able to worlk with it. Any suggestions or direction about how to approach this task would greatly help me, Thanks!

Related

How to plot spectrogram from an array or (vector,list etc) containing raw data?

I have been working to find temporal displacement between audio signals using a spectrogram. I have a short array containing data of a sound wave (pulses at specific frequencies). Now I want to plot spectrogram from that array. I have followed this steps (Spectrogram C++ library):
It would be fairly easy to put together your own spectrogram. The steps are:
window function (fairly trivial, e.g. Hanning)
FFT (FFTW would be a good choice but if licensing is an issue then go for Kiss FFT or
similar)
calculate log magnitude of frequency domain components(trivial: log(sqrt(re * re + im * im))
Now after performing these 3 steps, I am stuck at how to plot the spectrogram from this available data? Being naive in this field, I need some clear steps ahead to plot the spectrogram.
I know that a simple spectrogram has Frequency at Y-Axis, time at X-axis and magnitude as the color intensity.
But how do I get these three things to plot the spectrogram? (I want to observe and analyze data behind spectral peaks(what's the value on Y-axis and X-axis), the main purpose of plotting spectrogram).
Regards,
Khubaib

How to process data at less than camera's frame per second ability?

i am not sure of how to put my question properly so here it goes.
I am running an object detection algorithm which runs at 40 frame per seconds (fps) and fitted on a camera which acts as an 'eye' on a robot. Then, I process the information which is received from the algorithm and pass the actions to my robot.
The issue is each time, the algorithm runs, it gives me slightly new reading. I guess its because as it processes data every 40 times per second, it will give new information. But I don't need new information if my robot doesn't move as most of the objects are at the same position at the previous frame.
My question, how can i only enhance my algorithm to only give me information each time if there is a change in object positions? by comparing last frame reading with current frame reading for example
I think you should try to find the motion estimation of the image ,I think MPG-4 video is using an algorithm like that.
http://www.img.lx.it.pt/~fp/cav/Additional_material/MPEG4_video.pdf
But if you don't want something so sophisticated and you just want to be see if the second image is the sane with the first one just substract them and see the differance. You can also use a Gaussian filter to cut the high frequencies and subtract them and also put a threshhold to check if you want do the procesing or not

OpenCV: Detecting seizure-inducing lights in a video?

I have been working on an algorithm which can detect seizure-inducing strobe lights in a video.
Currently, my code returns virtually every frame as capable of causing a seizure (3Hz flashes).
My code calculates the relative luminance of each pixel and sees how many times the luminance goes up then down, etc. or down then up, etc. by more than 10% within any given second.
Is there any way to do this without comparing each individual pixel within a second of each other and that only returns the correct frames.
An example of what I am trying to emulate: https://trace.umd.edu/peat
The common approach to solving this type of problems is to convert the frames to grayscale and then construct a cube containing frames from a 1 to 3 seconds time interval. From this cube, you can extract the time-varying characteristics of either individual pixels (noisy), or blocks (recommended). The resulting 1D curves can first be observed manually to see if they actually show the 3Hz variation that you are looking for (sometimes, these variations are either lost or distorted because of the camera's auto exposure settings). If you can see it, they you should be able to use FFT to isolate and detect it automatically.
Convert the image to grayscale. Break the image up into blocks, maybe 16x16 or 64x64 or larger (experiment to see what works). Take the average luminance of each block over a minimum of 2/3 seconds. Create a wave of luminance over time. Do an fft on this wave and look for a minimum energy threshold around 3Hz.

How to estimate exposure time for camera to take a good image from a scene

I am trying to write code to calculate the correct exposure time for a camera to capture an image in correct brightness.
what I have is a camera that supply me data in RAW (Bayer raw data) and I can control its exposure time, and I want to control its exposure so when it captured an image, the image is in correct brightness (not too dark (under exposed) or too bright (over exposed).
I think I need an algorithm similar to this:
1-capture a sample image
2-calculate image brightness.
3-calculate correct exposure.
4-capture a new image,
5-check that the image brightness is correct if not go to step 3.
6- capture final image.
My question is:
How can I calculate image brightness?
If I calculate image brightness, how can I calculate exposure? One way of doing this is to do a search (for example start from very fast exposure time increase it till you get a correct exposure, but It is a very time consuming, is there any better way of doing this?)
To do this, I may need to calibrate my camera (as the relationship between brightness and exposure time is different between different sensors), how can I do this?
I am using OpenCV and I can use algorithms which is available in OpenCV (c++)
There are multiple ways to measure the "correct" brightness of the image. A common method is to calculate the intensity histogram and make sure that the values cover the entire range of values, and there is not too much cut-off. I'm not sure if there's a single "one fit all" way for any possible scene.
A faster way than linearly increasing the exposure is to do a binary search, by measuring at low and high exposure, then measuring in the middle, and then continuing to split the sub-range in the middle, until you find the optimum.

feature extraction using PCA

My job is to perform gesture recognition. I want to do that by training a support vector machine using the features extracted by performing PCA(Principal component Analysis).
But I'm getting a little confused about the procedure.
After going through various articles, I've figured out these steps.
Take 'd' number of images(n*n) of the same gesture.
Convert each n*n image into a sigle row.
Form a matrix of order d*(n*n).
Compute the eigen values & eigen vectors.
Use top 'k' eigen vectors to form a subspace.
Project the image from original n*n dimension to 'k' dimension.
Question:
1) I have a set of 100 gestures and performing above 6 steps will give me 100 subspaces.My testing should be done on a realtime video to find which class a gesture falls in. Onto which supspace do I project each video frame to reduce the dimension for feeding it to the classifier?
Thank you in advance.