FCam - low light / HDR photos - c++

I am using FCam to take pictures and right now without modification, the pictures are par for a smartphone camera. FCam advertises HDR and low-light performance, but I don't see any examples of how to use that when taking pictures.
How do I take HDR pictures? From my experience with SLRs, you normally take 3 pictures, 1 under, 1 over, and 1 exposed properly for the scene.
Since I will be taking many pictures, how should I blend those pixels together? An average?
Walter

The FCam project page includes a complete camera app - FCamera, check the FCam Download Page, last item, which for "HDR Viewfinder" simply averages a long/short exposure image together, and for "HDR Capture" automatically records a burst of suitably-exposed shots. See src/CameraThread.cpp in the sources, I'm not sure how appropriate it is to quote from that but you'll find both pieces in CameraThread::run().
It doesn't average the HDR images for you, it records them as sequence. I guess that's by intent - much of the "HDR" appeal you achieve by carefully tuning the tone mapping after the averaging process, i.e. adjust how exactly the dynamic range compression back to 8bit is performed. If you'd do that in a hardcoded way on camera, you'll restrict the photographer's options with respect to achieving the optimal output. The MPI has a research group on HDR imaging techniques that provides sourcecode for this purpose.
In short, a "poor man's HDR" would just be an average. A "proper HDR" will never be 8-bit JPEG because that throws away the "high" bit in "high dynamic range" - for that reason, the conversion from HDR (which will have 16bit/color or even more) to e.g. JPEG is usually done as postprocessing (off-camera) step, from the HDR image sequence.
Note on HDR video
For HDR video, if you're recording with a single sensor on a hand-held you'll normally introduce motion between the images that form the "HDR sequence" (your total exposure time equals the sum of all subexposures, plus latency from sensor data reads and camera controller reprogramming).
That means image registration should be attempted before the actual overlay and final tone mapping operation as well, unless you're ok with the blur. Registration is somewhat compute intensive and another good reason to record the image stream first and perform the HDR video creation later (with some manual adjustment offered). The OpenCV library provides registration / matching functions.
The abovementioned MPI software is PFSTools, particularly the Tone Mapping operators (PFStmo) library. The research papers by one of the authors provide a good starting point; as to your question on how to perform the postprocessing, PFSTools are command-line utilities that interoperate/pass data via UNIX pipes; on Maemo / the N900, their use is straightforward thanks to the full Linux environment; just spawn a shell script via system().

Related

How can I horizontally mirror video in DirectShow?

I need to display the local webcam stream on the screen,
horizontally flipped,
so that the screen appears as a mirror.
I have a DirectShow graph which does all of this,
except for mirroring the image.
I have tried several approaches to mirror the image,
but none have worked.
Approach A: VideoControlFlag_FlipHorizontal
I tried setting the VideoControlFlag_FlipHorizontal flag
on the output pin of the webcam filter,
like so:
IAMVideoControl* pAMVidControl;
IPin* pWebcamOutputPin;
// ...
// Omitting error-handing for brevity
pAMVidControl->SetMode(pWebcamOutputPin, VideoControlFlag_FlipHorizontal);
However, this has no effect.
Indeed, the webcam filter claims to not have this capability,
or any other capabilities:
long supportedModes;
hr = pAMVidControl->GetCaps(pWebcamOutputPin, &supportedModes);
// Prints 0, i.e. no capabilities
printf("Supported modes: %ld\n", supportedModes);
Approach B: SetVideoPosition
I tried flipping the image by flipping the rectangles passed to SetVideoPosition.
(I am using an Enhanced Video Renderer filter, in windowless mode.)
There are two rectangles:
a source rectangle and a destination rectangle.
I tried both.
Here's approach B(i),
flipping the source rectangle:
MFVideoNormalizedRect srcRect;
srcRect.left = 1.0; // note flipped
srcRect.right = 0.0; // note flipped
srcRect.top = 0.0;
srcRect.bottom = 0.5;
return m_pVideoDisplay->SetVideoPosition(&srcRect, &destRect);
This results in nothing being displayed.
It works in other configurations,
but appears to dislike srcRect.left > srcRect.right.
Here's approach B(ii),
flipping the destination rectangle:
RECT destRect;
GetClientRect(hwnd, &destRect);
LONG left = destRect.left;
destRect.left = destRect.right;
destRect.right = left;
return m_pVideoDisplay->SetVideoPosition(NULL, &destRect);
This also results in nothing being displayed.
It works in other configurations,
but appears to dislike destRect.left > destRect.right.
Approach C: IMFVideoProcessorControl::SetMirror
IMFVideoProcessorControl::SetMirror(MF_VIDEO_PROCESSOR_MIRROR)
sounds like what I want.
This IMFVideoProcessorControl interface is implemented by the Video Processor MFT.
Unfortunately, this is a Media Foundation Transform,
and I can't see how I could use it in DirectShow.
Approach D: Video Resizer DSP
The Video Resizer DSP
is "a COM object that can act as a DMO",
so theoretically I could use it in DirectShow.
Unfortunately, I have no experience with DMOs,
and in any case,
the docs for the Video Resizer don't say whether it would support flipping the image.
Approach E: IVMRMixerControl9::SetOutputRect
I found
IVMRMixerControl9::SetOutputRect,
which explicitly says:
Because this rectangle exists in compositional space,
there is no such thing as an "invalid" rectangle.
For example, set left greater than right to mirror the video in the x direction.
However, IVMRMixerControl9 appears to be deprecated,
and I'm using an EVR rather than a VMR,
and there are no docs on how to obtain a IVMRMixerControl9 anyway.
Approach F: Write my own DirectShow filter
I'm reluctant to try this one unless I have to.
It will be a major investment,
and I'm not sure it will be performant enough anyway.
Approach G: start again with Media Foundation
Media Foundation would possibly allow me to solve this problem,
because it provides "Media Foundation Transforms".
But it's not even clear that Media Foundation would fit all my other requirements.
I'm very surprised that I am looking at such radical solutions
for a transform that seems so standard.
What other approaches exist?
Is there anything I've overlooked in the approaches I've tried?
How can I horizontally mirror video in DirectShow?
If Option E does not work (see comment above; neither source nor destination rectangle allows mirroring), and given that it's DirectShow I would offer looking into Option F.
However writing a full filter might be not so trivial if you never did this before. There are a few shortcuts here though. You don't need to develop a full filter: similar functionality can be reached at least using two alternate methods:
Sample Grabber Filter with a ISampleGrabberCB::SampleCB callback. You will find lots of mentions for this technic: when inserted into graph your code can receive a callback for every processed frame. If you rearrange pixels in frame buffer within the callback, the image will be mirrored.
Implement a DMO and insert it into filter graph with the help of DMO Wrapper Filter. You will have a chance to similarly rearrange pixels of frames, with a bit more of flexibility at the expense of more code to write.
Both mentioned will be easier to do because you don't have to use DirectShow BaseClasses, which are notoriously obsolete in 2020.
Both mentioned will not require to understand data flow in DirectShow filter. Both and also developing full DirectShow filter assume that your code supports rearrangement in a limited set of pixel formats. You can go with 24-bit RGB for example, or typical formats of webcams such as NV12 (nowadays).
If your pixel data rearrangement is well done without need to super-optimize the code, you can ignore performance impact - either way it can be neglected in most of the cases.
I expect integration of Media Foundation solution to be more complicated, and much more complicated if Media Foundation solution is to be really well optimized.
The complexity of the problem in first place is the combination of the following factors.
First, you mixed different solutions:
Mirroring right in web camera (driver) where your setup to mirror results that video frames are already mirrored at the very beginning.
Mirroring as data flows through pipeline. Even though this sounds simple, it is not: sometimes the frames are yet compressed (webcams quite so often send JPEGs), sometimes frames can be backed by video memory, there are multiple pixel formats etc
Mirroring as video is presented.
Your approach A is #1 above. However if there is no support for the respected mode, you can't mirror.
Mirroring in EVR renderer #3 is apparently possible in theory. EVR used Direct3D 9 and internally renders a surface (texture) into scene so it's absolutely possible to setup 3D position of the surface in the way that it becomes mirrored. However, the problem here is that API design and coordinate checks are preventing from passing mirroring arguments.
Then Direct3D 9 is pretty much deprecated, and DirectShow itself and even DirectShow/Media Foundation's EVR are in no way compatible to current Direct3D 11. Even though a capability to mirror via hardware might exist, you might have hard time to consume it through the legacy API.
As you want a simple solution you are limited with mirroring as the data is streamed through, #2 that is. Even though this is associated with reasonable performance impact you don't need to rely on specific camera or video hardware support: you just swap the pixels in every frame and that's it.
As I mentioned the easiest way is to setup SampleCB callback using either 24-bit RGB and/or NV12 pixel format. It depends on whatever else your application is doing too, but with no such information I would say that it is sufficient to implement 24-bit RGB and having the video frame data you would just go row by row and swap the three byte pixel data width/2 times. If the application pipeline allows you might want to have additional code path to flip NV12, which is similar but does not have the video to be converted to RGB in first place and so is a bit more efficient. If NV12 can't work, RGB24 would be a backup code path.
See also: Mirror effect with DirectShow.NET - I seem to already explained something similar 8 years ago.

Object Tracking in h.264 compressed video

I am working on a project that requires me to detect and track a human in a live video from a webcam connected to a Beagleboard xm.
I have completed this task using Opencv in pixel domain. The results on the board are very accurate but extremely slow. Many people have suggested me to leave pixel domain and do the same task in an h.264/MPEG-4 compressed video as it would extremely reduce the computational overhead.
I have read many research papers but failed to discover any software platform or a library that I can use to analyze and process h.264 compressed videos.
I will be thankful if someone can suggest me some library for h.264 compressed video analysis and guide me further.
Thanks and Regards.
I'm not sure how practical this really is (I've never tried to do it), but my guess would be that what they're referring to would be looking for a block of macro-blocks that all have (nearly) identical motion vectors.
For example, let's assume you have a camera that's not panning, and the picture shows a car driving across the screen. Looking at the motion vectors, you should have a (roughly) car-shaped bunch of macro-blocks that all have similar motion vectors (denoting the motion of the car). Then, rather than look at the entire picture for your object of interest, you can look at that block in isolation and try to identify it. Likewise, if the camera was panning with the car, you'd have a car-shaped block with small motion vectors, and most of the background would have similar motion vectors in the opposite direction of the car's movement.
Note, however, that this is likely to be imprecise at best. Just for example, let's assume our mythical car as driving in front of a brick building, with its headlights illuminating some of the bricks. In this case, a brick in one picture might (easily) not point back at the same brick in the previous picture, but instead point at the brick in the previous picture that happened to be illuminated about the same. The bricks are enough alike that the closest match will depend more on illumination than the brick itself.
You may be able, eventually, to parse and determine that h.264 has an object, but this will not be "object tracking" like your looking for. openCV is excellent software and what it does best. Have you considered scaling the video down to a smaller resolution for easier analysis by openCV?
I think you are highly over estimating the computing power of this $45 computer. Object recognition and tracking is VERY hard computationally speaking. I would start by seeing how many frames per second your board can track and optimize from there. Start looking at where your bottlenecks are, you may be better off processing raw video instead of having to decode h.264 video first. Again, RAW video takes a LOT of RAM, and processing through that takes a LOT of CPU.
Minimize overhead from decoding video, minimize RAM overhead by scaling down the video before analysis, but in the end, your asking a LOT from a 1ghz, 32bit ARM processor.
FFMPEG is a very old library that is not being supported now a days. It has very limited capabilities in terms of processing and object tracking in h.264 compressed video. Most of the commands usually are outdated.
The best thing would be to study h.264 thoroughly and then try to implement your own API in some language like Java or c#.

Appropriate image file format for losslessly compressing series of screenshots

I am building an application which takes a great many number of screenshots during the process of "recording" operations performed by the user on the windows desktop.
For obvious reasons I'd like to store this data in as efficient a manner as possible.
At first I thought about using the PNG format to get this done. But I stumbled upon this: http://www.olegkikin.com/png_optimizers/
The best algorithms only managed a 3 to 5 percent improvement on an image of GUI icons. This is highly discouraging and reveals that I'm going to need to do better because just using PNG will not allow me to use previous frames to help the compression ratio. The filesize will continue to grow linearly with time.
I thought about solving this with a bit of a hack: Just save the frames in groups of some number, side by side. For example I could just store the content of 10 1280x1024 captures in a single 1280x10240 image, then the compression should be able to take advantage of repetitions across adjacent images.
But the problem with this is that the algorithms used to compress PNG are not designed for this. I am arbitrarily placing images at 1024 pixel intervals from each other, and only 10 of them can be grouped together at a time. From what I have gathered after a few minutes scanning the PNG spec, the compression operates on individual scanlines (which are filtered) and then chunked together, so there is actually no way that info from 1024 pixels above could be referenced from down below.
So I've found the MNG format which extends PNG to allow animations. This is much more appropriate for what I am doing.
One thing that I am worried about is how much support there is for "extending" an image/animation with new frames. The nature of the data generation in my application is that new frames get added to a list periodically. But I do have a simple semi-solution to this problem, which is to cache a chunk of recently generated data and incrementally produce an "animation", say, every 10 frames. This will allow me to tie up only 10 frames' worth of uncompressed image data in RAM, not as good as offloading it to the filesystem immediately, but it's not terrible. After the entire process is complete (or even using free cycles in a free thread, during execution) I can easily go back and stitch the groups of 10 together, if it's even worth the effort to do it.
Here is my actual question that everything has been leading up to. Is MNG the best format for my requirements? Those reqs are: 1. C/C++ implementation available with a permissive license, 2. 24/32 bit color, 4+ megapixel (some folks run 30 inch monitors) resolution, 3. lossless or near-lossless (retains text clarity) compression with provisions to reference previous frames to aid that compression.
For example, here is another option that I have thought about: video codecs. I'd like to have lossless quality, but I have seen examples of h.264/x264 reproducing remarkably sharp stills, and its performance is such that I can capture at a much faster interval. I suspect that I will just need to implement both of these and do my own benchmarking to adequately satisfy my curiosity.
If you have access to a PNG compression implementation, you could easily optimize the compression without having to use the MNG format by just preprocessing the "next" image as a difference with the previous one. This is naive but effective if the screenshots don't change much, and compression of "almost empty" PNGs will decrease a lot the storage space required.

Assessing the quality of an image with respect to compression?

I have images that I am using for a computer vision task. The task is sensitive to image quality. I'd like to remove all images that are below a certain threshold, but I am unsure if there is any method/heuristic to automatically detect images that are heavily compressed via JPEG. Anyone have an idea?
Image Quality Assessment is a rapidly developing research field. As you don't mention being able to access the original (uncompressed) images, you are interested in no reference image quality assessment. This is actually a pretty hard problem, but here are some points to get you started:
Since you mention JPEG, there are two major degradation features that manifest themselves in JPEG-compressed images: blocking and blurring
No-reference image quality assessment metrics typically look for those two features
Blocking is fairly easy to pick up, as it appears only on macroblock boundaries. Macroblocks are a fixed size -- 8x8 or 16x16 depending on what the image was encoded with
Blurring is a bit more difficult. It occurs because higher frequencies in the image have been attenuated (removed). You can break up the image into blocks, DCT (Discrete Cosine Transform) each block and look at the high-frequency components of the DCT result. If the high-frequency components are lacking for a majority of blocks, then you are probably looking at a blurry image
Another approach to blur detection is to measure the average width of edges of the image. Perform Sobel edge detection on the image and then measure the distance between local minima/maxima on each side of the edge. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous approach. "No Reference Block Based Blur Detection" by Debing is a more recent paper
Regardless of what metric you use, think about how you will deal with false positives/negatives. As opposed to simple thresholding, I'd use the metric result to sort the images and then snip the end of the list that looks like it contains only blurry images.
Your task will be a lot simpler if your image set contains fairly similar content (e.g. faces only). This is because the image quality assessment metrics
can often be influenced by image content, unfortunately.
Google Scholar is truly your friend here. I wish I could give you a concrete solution, but I don't have one yet -- if I did, I'd be a very successful Masters student.
UPDATE:
Just thought of another idea: for each image, re-compress the image with JPEG and examine the change in file size before and after re-compression. If the file size after re-compression is significantly smaller than before, then it's likely the image is not heavily compressed, because it had some significant detail that was removed by re-compression. Otherwise (very little difference or file size after re-compression is greater) it is likely that the image was heavily compressed.
The use of the quality setting during re-compression will allow you to determine what exactly heavily compressed means.
If you're on Linux, this shouldn't be too hard to implement using bash and imageMagick's convert utility.
You can try other variations of this approach:
Instead of JPEG compression, try another form of degradation, such as Gaussian blurring
Instead of merely comparing file-sizes, try a full reference metric such as SSIM -- there's an OpenCV implementation freely available. Other implementations (e.g. Matlab, C#) also exist, so look around.
Let me know how you go.
I had many photos shot to an ancient book (so similar layout, two pages per image), but some were much blurred, to the point that the text could not be read. I searched for a ready-made batch script to find the most blurred one, but I didn't find any useful, so I used another part of script got on the net (based on ImageMagick, but no longer working; I couldn't retrieve the author for the credits!), useful to assessing the blur level of a single image, tweaked it, and automatised it over a whole folder. I uploaded here:
https://gist.github.com/888239
hoping it will be useful for someone else. It works on a Linux system, and uses ImageMagick (and some usually command line installed tools, as gawk, sort, grep, etc.).
One simple heuristic could be to look at width * height * color depth < sigma * file size. You would have to determine a good value for sigma, of course. sigma would be dependent on the expected entropy of the images you are looking at.

jpeg compression ratio

Is there a table that gives the compression ratio of a jpeg image at a given quality?
Something like the table given on the wiki page, except for more values.
A formula could also do the trick.
Bonus: Are the [compression ratio] values on the wiki page roughly true for all images? Does the ratio depend on what the image is and the size of the image?
Purpose of these questions: I am trying to determine the upper bound of the size of a compressed image for a given quality.
Note: I am not looking to make a table myself(I already have). I am looking for other data to check with my own.
I had exactly the same question and I was disappointed that no one created such table (studies based on a single classic Lena image or JPEG tombstone are looking ridiculous). That's why I made my own study. I cannot say that it is perfect, but it is definitely better than others.
I took 60 real life photos from different devices with different dimensions. I created a script which compress them with different JPEG quality values (it uses our company imaging library, but it is based on libjpeg, so it should be fine for other software as well) and saved results to CSV file. After some Excel magic, I came to the following values (note, I did not calculated anything for JPEG quality lower than 55 as they seem to be useless to me):
Q=55 43.27
Q=60 36.90
Q=65 34.24
Q=70 31.50
Q=75 26.00
Q=80 25.06
Q=85 19.08
Q=90 14.30
Q=95 9.88
Q=100 5.27
To tell the truth, the dispersion of the values is significant (e.g. for Q=55 min compression ratio is 22.91 while max value is 116.55) and the distribution is not normal. So it is not so easy to understand what value should be taken as typical for a specific JPEG quality. But I think these values are good as a rough estimate.
I wrote a blog post which explains how I received these numbers.
http://www.graphicsmill.com/blog/2014/11/06/Compression-ratio-for-different-JPEG-quality-values
Hopefully anyone will find it useful.
Browsing Wikipedia a little more led to http://en.wikipedia.org/wiki/Standard_test_image and Kodak's test suite. Although they're a little outdated and small, you could make your own table.
Alternately, pictures of stars and galaxies from NASA.gov should stress the compressor well, being large, almost exclusively composed of tiny speckled detail, and distributed in uncompressed format. In other words, HUBBLE GOTCHOO!
The compression you get will depend on what the image is of as well as the size. Obviously a larger image will produce a larger file even if it's of the same scene.
As an example, a random set of photos from my digital camera (a Canon EOS 450) range from 1.8MB to 3.6MB. Another set has even more variation - 1.5MB to 4.6MB.
If I understand correctly, one of the key mechanisms for attaining compression in JPEG is using frequency analysis on every 8x8 pixel block of the image and scaling the resulting amplitudes with a "quantization matrix" that varies with the specified compression quality.
The scaling of high frequency components often result in the block containing many zeros, which can be encoded at negligible cost.
From this we can deduce that in principle there is no relation between the quality and the final compression ratio that will be independent of the image. The number of frequency components that can be dropped from a block without perceptually altering its content significantly will necessarily depend on the intensity of those components, i.e. whether the block contains a sharp edge, highly variable content, noise, etc.