Mono Slam scale consistency - computer-vision

When running mono Slam how is scale consistency achieved between frames? One short tutorial ran the 5point algorithm repeatedly and used a motion model for consistency of scale between frames but that is clearly not done in the general case.
I think I read something once about how the 5 point algorithm is used initially to estimate motion between first two frames then features are tracked over time using reprojection loss and 3d projection.
How is it done?

Related

stereo depth map but with a single moving camera measured with sensors

I've just gotten started learning about calculating depth from stereo images and before I went and committed to learning about this I wanted to check if it was a viable choice for a project I'm doing. I have a drone with a single rgb camera that has sensors that can give the orientation and movement of the drone. Would it be possible to sample two frames, the distance, and orientation differences between the samples and use this to calculate depth? I've seen in most examples the cameras are lined up horizontally. Is this necessary for stereo images or can I use any reasonable angle and distance between the two sampled images? Would it be feasible to do this in real time? My overall goal is to do some sort of monocular slam to have this drone navigate indoor areas. I know that ORB slam exists but I am mostly doing this for a learning experience and so would like to do things from scratch where possible.
Thank you.

Feed GStreamer sink into OpenPose

I have a custom USB camera with a custom driver on a custom board Nvidia Jetson TX2 that is not detected through openpose examples. I access the data using GStreamer custom source. I currently pull frames into a CV mat, color convert them and feed into OpenPose on a per picture basis, it works fine but 30 - 40% slower than a comparable video stream from a plug and play camera. I would like to explore things like tracking that is available for streams since Im trying to maximize the fps. I believe the stream feed is superior due to better (continuous) use of the GPU.
In particular the speedup would come at confidence expense and would be addressed later. 1 frame goes through pose estimation and 3 - 4 subsequent frames are just tracking the object with decreasing confidence levels. I tried that on a plug and play camera and openpose example and the results were somewhat satisfactory.
The point where I stumbled is that I can put the video stream into CV VideoCapture but I do not know, however, how to provide the CV video capture to OpenPose for processing.
If there is a better way to do it, I am happy to try different things but the bottom line is that the custom camera stays (I know ;/). Solutions to the issue described or different ideas are welcome.
Things I already tried:
Lower resolution of the camera (the camera crops below certain res instead of binning so cant really go below 1920x1080, its a 40+ MegaPixel video camera by the way)
use CUDA to shrink the image before feeding it to OpenPose (the shrink + pose estimation time was virtually equivalent to the pose estimation on the original image)
since the camera view is static, check for changes between frames, crop the image down to the area that changed and run pose estimation on that section (10% speedup, high risk of missing something)

OpenCV image-based optical flow field

I am looking for a simple algorithm to detect the optical flow of the entire input.
In OpenCV, the Lucas-Kanade point tracking functionality is really good, but it is very slow for more than a handful of points. I am looking for an image-based result, rather than point-based. The only information I can find is about LK tracking.
I can calculate the magnitude of motion based on simple frame differencing, but I want to know the direction too. I basically want to end up with an optical flow field texture that I can feed into a gpu fluid simulation.
There must be some simple algorithm based on elementary motion detectors or something. Something like a combination of frame differencing, scaling and blurring with 3 sequential frames.
Just to be super clear, I DON'T want information on the Lucas-Kanade method.
OpenCV has a BackgroundSubtractor class that does frame differencing, I guess you'll have to do the blurring part yourself. This is, however, not strictly a calculation of optical flow.
Farneback has a method for dense optical flow, implemented in OpenCV through the cv::calcOpticalFlowFarneback(..) method. It will generate a matrix "flow" which has magnitude and direction components. Horn-Schunck method is not a built-in in OpenCV.
PS: Lukas Kanade is not very slow. It's probably the extraction of feature points that is slow. Try using cv::FAST detector.

OpenCv C++ record video when motion detected from cam

I am attempting to use a straightforward motion detection code to detect movement from a camera. I'm using the OpenCV library and I have some code that takes the difference between two frames to detect a change.
I have the difference frame working just fine and it's black when no motion is present.
The problem is how now i can detect that blackness to stop recording or no darkness to begin recording frames.
Thank u all.
A very simple thing to do is to sum the entire diff image into an integer. If that sum is above a threshold you have movement. Then you can use a second threshold and when the sum is below that limit you stopped having movement.
You can also make the threshold only change the program state if some elapsed time has occurred since the last threshold. i.e. after movement is detected you don't check for lack of movement for 10 seconds.
Take a look at the code of the free software motion for getting inspiring ideas.
There are quite a few things to keep in mind for reliable motion detection. For example tolerate the slow changes from the sun's rotation. Or accepting momentary image glitches which can come especially from the cheapest cameras.
From a small experience I have had, I think that better than just adding up all differences, it works better to count the number of pixels whose variation exceeds a certain threshold.
Motion also offers masks, which let you for example ignore movements in a nearby road.
What about storing a black frame internally and using your same comparison code? If your new frame is different (above a threshold) from the all-black frame, start recording.
This seems the most straightforward since you already have the image-processing algorithms down.

Traffic Motion Recognition

I'm trying to build a simple traffic motion monitor to estimate average speed of moving vehicles, and I'm looking for guidance on how to do so using an open source package like OpenCV or others that you might recommend for this purpose. Any good resources that are particularly good for this problem?
The setup I'm hoping for is to install a webcam on a high-rise building next to the road in question, and point the camera down onto moving traffic. Camera altitude would be anywhere between 20 ft and 100ft, and the building would be anywhere between 20ft and 500ft away from the road.
Thanks for your input!
Generally speaking, you need a way to detect cars so you can get their 2D coordinates in the video frame. You might want to use a tracker to speed up the process and take advantage of the predictable motion of the vehicles. You, also, need a way to calibrate the camera so you can translate the 2D coordinates in the image to depth information so you can approximate speed.
So as a first step, look at detectors such as deformable parts model DPM, and tracking by detection methods. You'll probably need to port some code from Matlab (and if you do, please make it available :-) ). If that's too slow, maybe do some segmentation of foreground blobs, and track the colour histogram or HOG descriptors using a Particle Filter or a Kalman Filter to predict motion.