Windows Media Foundation: IMFSourceReader::SetCurrentMediaType execution time issue - c++

I'm currently working on retrieving image data from a video capturing device.
It is important for me that I have raw output data in a rather specific format and I need a continuous data stream. Therefore I figured to use the IMFSourceReader. I pretty much understand how it is working. For the whole pipeline to work I checked the output formats of the camera and looked at the available Media Foundation Transforms(MFTs).
The critical function here is IMFSourceReader::SetCurrentMediaType. I'd like to elaborate one critical functionality I discovered. If I just use the function with the parameters of my desired output format, it changes some parameters like fps or resolution, but the call succeeds. When I first call the function with a native media type with my desired parameters and a wrong subtype (like MJPG or sth.) and call it again with my desired parameters and the correct subtype the call succeeds and I end up with my correct parameters. I suspect this is only true, if fitting MFTs (decoders) are available.
So far I've pretty much beaten the WMF to get what I want. The Problem now is, that the second call of IMFSourceReader::SetCurrentMediaType takes a long time. The duration depends heavily on the camera used. Varying from 0.5s to 10s. To be honest I don't really know why its taking so long, but I think the calculation of the correct transformation paths and/or the initialization of the transformations is the problem. I recognized an excessive amount of loading and unloading of the same dlls(ntasn1.dll, ncrypt.dll, igd10iumd32.dll). But loading them once myself didn't change anything.
So does anybody know this issue and has a quick fix for it?
Or does anybody know a work around to:
Get raw image data via media foundation without the use ofIMFSourceReader?
Select and load the transformations myself, to support the source reader call?

You basically described the way Source Reader is supposed to work in first place. The underlying media source has its own media types and the reader can supply a conversion if it ever needs to fit requested media type and closest original.
Video capture devices tend to expose many [native] media types (I have a webcam which enumerates 475 of them!), so if format fitting does not go well, source reader might take some time to try one conversion or another.
Note that you can disable source reader's conversions by applying certain attributes like MF_READWRITE_DISABLE_CONVERTERS, in which case inability to set a video format directly on the source would result in an failure.
You can also read data in original device's format and decode/convert/process yourself by feeding the data into one or a chain of MFTs. Typically, when you set respective format on the source reader, the source reader manages the MFTs for you. If you however prefer, you can do it yourself too. Unfortunately you cannot build a chain of MFTs for the source reader to manage. Either you leave it on source reader completely, or you set native media type, you read the data in original format from the reader, and then you manage the MFTs on your side by doing IMFTransform::ProcessInput, IMFTransform::ProcessOutput and friends. This is not as easy as source reader, but is doable.

Since VuVirt does not want to write any answer, I'd like to add one for him and everybody who has the same issue.
Under some conditions the call IMFSourceReader::SetCurrentMediaType takes a long time, when the target format is RGB of some kind and is not natively available. So to get rid of it, I adjusted my image pipeline to be able to interpret YUV (YUY2). I still have no idea, why this is the case, but this is a working work around for me. I don't know any alternative to speed the call up.
Additional hint: I've found that there are usually several IMFTransforms to decode many natively available formats to YUY2. So, if you are able to use YUY2, you are safe. NV12 is another working alternative. Though there are probably more.
Thanks for your answer anyways

Related

Structured message compression

Are there any libraries for compression of structured messages? (like protobufs)
I am looking for something better than just passing a serialized stream through GZip. For example, if my message stores a triangle mesh, the coordinates of adjacent vertices will be highly correlated, so a smart compressor could store deltas instead of the raw coordinates, which would require less bits to encode.
Whereas a general compressor, that doesn't know anything about the stream structure, would be looking for repeating byte sequences, which in data like that, there won't be many.
Ideally, this should work completely automatically after being provided with a schema, but I wouldn't mind adding annotations to my schema, if it came to that.
The main problem here is that most of the time writing some schema will have a similar effort to programming a preprocessor for the data yourself. E.g. for your triangle mesh example, reordering the data or doing a delta on coordinates can be implemented very easy and will support any subsequent compressor very well.
A compressor going in that direction is ZPAQ. It can use config files tailored to specific data (the sample configuration site includes EXE, JPG, BMP configs as well as a specialized one to compress a file containing the mathematical constant pi). The downside is that the script language used here (ZPAQL) is quite complicated to use and you've got to know much of the ZPAQ internals.
Older versions of WinRAR used a virtual machine named RarVM (though deprecated now) that allowed for assembler-like code for custom data transformations, there's an open source project named rarvmtools on GitHub with some related tools.
For protobuf compression, there's a Google project called riegeli that might be able to further compress them.

Partial decoding h264 stream

I'm trying to get information about frames in h264 bitstream. Especially motion vectors of macroblocks. I think, I have to use ffmpeg code for it, but it's really huge and hard to understand.
So, can someone give me some tips or exapmles of partial decoding from raw data of single frame from h264 stream?
Thank you.
Unfortunately, to get that level of information from the bitstream you have to decode every macroblock, there's no quick option, like there would be for getting information from the slice header.
One option is to use the h.264 reference software and turn on the verbose debug output and/or add your own printf's where needed, but this is also a large code base to navigate:
http://iphome.hhi.de/suehring/tml/
(You can also use ffmpeg and add output where needed too as you said, but it would take some understanding of that code base too)
There are graphical tools for analyzing video bitstreams which will show you this type of information on a per-macroblock basis, many are expensive, but sometimes there are free trial versions available.

Can we load, display and manipulate image's matrix without using any library in c++?

is it possible to do changes to image's matrix without using any library in c++? to load and display image as well?
Sure. Grab a copy of the specification for whatever image format you're interested and write the read/write functions yourself.
Note that to write display functionality without an external library you'll likely need to run your code in kernel mode to get to the frame buffer memory, but that can certainly be done.
Not that you'd necessarily want to do it that way...
Like any typical file, an image file is simply made up of bytes; there is nothing special about an image file.
In my opinion, the most difficult part of reading/writing image files without the use of a library is understanding the file format. Once you understand the format, all you need to do is define appropriate data structures and read the image data into them (for more advanced formats you may have to do some extra work e.g. decompression).
The simplest image format to work with would have to be PPM. It's a pretty bad format but it's nice and easy to read in and write back to a file.
http://netpbm.sourceforge.net/doc/ppm.html
Apart from that, bitmaps are also pretty simple to work with. Like Drew said, just download a copy of the specification and work from there.
As for displaying images, I think you're best off using a library or framework unless you want to see how it's done for the sake of learning.

Extract and analyse sound from mp3 files

I have a set of mp3 files, some of which have extended periods of silence or periodic intervals of silence. How can I programmatically detect this?
I am looking for a library in C++, or preferably C#, that will allow me to examine the sound content of these files for the silences.
EDIT: I should elaborate what I am trying to achieve. I am capturing streaming sports commentary using VLC and saving it to mp3. When a game is delayed, or cancelled, the streaming commentary is replaced by a repetitive message saying commentary is not available. By looking for these periodic silences (or total silence), I can detect if there is no commentary and stop the streaming recording
For this reason I am reluctant to decompress the mp3 because if would mean my test for these silences would be very slow. Unless I can decode the last 5 minutes of the file?
Thanks
Andrew
I'm not aware of a library that will detect silence directly in the MP3 encoded data, since its not a trivial task to detect silence without first decompressing. Luckily, its easy to find libraries that decode MP3 files and access them as PCM data, and its trivial to detect silence in PCM Data. Here is one such Library for C# I found, but I'm sure there are tons: http://www.robburke.net/mle/mp3sharp/
Once you decode the data, you will have a list of PCM samples. In the most basic form, the algorithm you need to detect silence is simply to analyze a small chunks (could be as little as .25s or as much as several seconds), and make sure that the absolute value of each sample in the chunk is below a threshold. The threshold value you use determines how 'quiet' the sound has to be to be considered silence, and the chunk size determines how long the volume needs to be below that threshold to be considered silence (If you go with very short chunks, you will get lots of false positives due to samples near zero-crossings, but .25s or higher should be ok. There are improvements to the basic approach such as using historesis (which is basically using two thresholds, one for the transition to silence, and one for the transition from silence), and filtering.
Unfortunately, I don't know a library for C++ or C# that implements level detection off hand, and nothing immediately springs up on google, but at least for the simple version its pretty easy to code.
Edit: Also, this library seems interesting: http://naudio.codeplex.com/
Also, while not a true duplicate question, the answers here will be useful for you:
Detecting audio silence in WAV files using C#

How do I write a Perl script to filter out digital pictures that have been doctored?

Last night before going to bed, I browsed through the Scalar Data section of Learning Perl again and came across the following sentence:
the ability to have any character in a string means you can create, scan, and manipulate raw binary data as strings.
An idea immediately hit me that I could actually let Perl scan the pictures that I have stored on my hard disk to check if they contain the string Adobe. It seems by doing so, I can tell which of them have been photoshopped. So I tried to implement the idea and came up with the following code:
#!perl
use autodie;
use strict;
use warnings;
{
local $/="\n\n";
my $dir = 'f:/TestPix/';
my #pix = glob "$dir/*";
foreach my $file (#pix) {
open my $pic,'<', "$file";
while(<$pic>) {
if (/Adobe/) {
print "$file\n";
}
}
}
}
Excitingly, the code seems to be really working and it does the job of filtering out the pictures that have been photoshopped. But problem is many pictures are edited by other utilities. I think I'm kind of stuck there. Do we have some simple but universal method to tell if a digital picture has been edited or not, something like
if (!= /the origianl format/) {...}
Or do we simply have to add more conditions? like
if (/Adobe/|/ACDSee/|/some other picture editors/)
Any ideas on this? Or am I oversimplifying due to my miserably limited programming knowledge?
Thanks, as always, for any guidance.
Your best bet in Perl is probably ExifTool. This gives you access to whatever non-image information is embedded into the image. However, as other people said, it's possible to strip this information out, of course.
I'm not going to say there is absolutely no way to detect alterations in an image, but the problem is extremely difficult.
The only person I know of who claims to have an answer is Dr. Neal Krawetz, who claims that digitally altered parts of an image will have different compression error rates from the original portions. He claims that re-saving a JPEG at different quality levels will highlight these differences.
I have not found this to be the case, in my investigations, but perhaps you might have better results.
No. There is no functional distinction between a perfectly edited image, and one which was the way it is from the start - it's all just a bag of pixels in the end, after all, and any other metadata you can remove or forge all you want.
The name of the graphics program used to edit the image is not part of the image data itself but of something called meta data - which may be stored in the image file but, as others have noted, is neither required (so some programs may not store it, some may allow you an option of not storing it) nor reliable - if you forged an image, you might have forged the meta data as well.
So the answer to your question is "no, there's no way to universally tell if the pic was edited or not, although some image editing software may write its signature into the image file and it'll be left there by carelessness of the editing person.
If you're inclined to learn more about image processing in Perl, you could take a look at some of the excellent modules CPAN has to offer:
Image::Magick - read, manipulate and write of a large number of image file formats
GD - create colour drawings using a large number of graphics primitives, and emit the drawings in various formats.
GD::Graph - create charts
GD::Graph3d - create 3D Graphs with GD and GD::Graph
However, there are other utilities available for identifying various image formats. It's more of a question for Super User, but for various unix distros you can use file to identify many different types of files, and for MacOSX, Graphic Converter has never let me down. (It was even able to open the bizarre multi-file X-ray of my cat's shattered pelvis that I got on a disc from the vet.)
How would you know what the original format was? I'm pretty sure there's no guaranteed way to tell if an image has been modified.
I can just open the file (with my favourite programming language and filesystem API) and just write whatever I want into that file willy-nilly. As long as I don't screw something up with the file format, you'd never know it happened.
Heck, I could print the image out and then scan it back in; how would you tell it from an original?
As other's have stated, there is no way to know if the image was doctored. I'm guessing what you basically want to know is the difference between a realistic photograph and one that has been enhanced or modified.
There's always the option of running some extremely complex image recognition algorithm that would analyze every pixel in your image and do some very complicated stuff to determine if the image was doctored or not. This solution would probably involve AI which would examine millions of photos that are both doctored and those that are not and learn from them. However, this is more of a theoretical solution and isn't very practical... you would probably only see it in movies. It would be extremely complex to develop and probably take years. And even if you did get something like this to work, it probably still wouldn't be 100% correct all the time. I'm guessing AI technology still isn't at that level and could take a while until it is.
A not-commonly-known feature of exiftool allows you to recognize the originating software through an analysis of the JPEG quantization tables (not relying on image metadata). It recognizes tables written by many applications. Note that some cameras may use the same quantization tables as some applications, so this isn't a 100% solution, but it is worth looking into. Here is an example of exiftool run on two images, the first was edited by photoshop.
> exiftool -jpegdigest a.jpg b.jpg
======== a.jpg
JPEG Digest : Adobe Photoshop, Quality 10
======== b.jpg
JPEG Digest : Canon EOS 30D/40D/50D/300D, Normal
2 image files read
This will work even if the metadata has been removed.
There is existing software out there which uses various techniques (compression artifacting, comparison to signature profiles in a database of cameras, etc.) to analyze the actual image data for evidence of alteration. If you have access to such software and the software available to you provides an API for external access to these analysis functions, then there's a decent chance that a Perl module exists which will interface with that API and, if no such module exists, it could probably be created rather quickly.
In theory, it would also be possible to implement the image analysis code directly in native Perl, but I'm not aware of anyone having done so and I expect that you'd be better off writing something that low-level and processor-intensive in a fully-compiled language (e.g., C/C++) rather than in Perl.
http://www.impulseadventure.com/photo/jpeg-snoop.html
is a tool that does the job almost good
If there has been any cloning , there is a variation in the pixel density..or concentration which sometimes shows up.. upon manual inspection
a Photoshop cloned area will have even pixel density(my meaning is variation of Pixels wrt a scanned image)