Single channel or even compression with getUserMedia - compression

Is there some way to just get the left channel (not even sure if both channels do automatically get written to, even though microphone is mono) and, ideally, to compress the data from that channel. The only option I've seen written about is navigator.getUserMedia({audio:true}. Is that really it?

Related

Windows Media Foundation: IMFSourceReader::SetCurrentMediaType execution time issue

I'm currently working on retrieving image data from a video capturing device.
It is important for me that I have raw output data in a rather specific format and I need a continuous data stream. Therefore I figured to use the IMFSourceReader. I pretty much understand how it is working. For the whole pipeline to work I checked the output formats of the camera and looked at the available Media Foundation Transforms(MFTs).
The critical function here is IMFSourceReader::SetCurrentMediaType. I'd like to elaborate one critical functionality I discovered. If I just use the function with the parameters of my desired output format, it changes some parameters like fps or resolution, but the call succeeds. When I first call the function with a native media type with my desired parameters and a wrong subtype (like MJPG or sth.) and call it again with my desired parameters and the correct subtype the call succeeds and I end up with my correct parameters. I suspect this is only true, if fitting MFTs (decoders) are available.
So far I've pretty much beaten the WMF to get what I want. The Problem now is, that the second call of IMFSourceReader::SetCurrentMediaType takes a long time. The duration depends heavily on the camera used. Varying from 0.5s to 10s. To be honest I don't really know why its taking so long, but I think the calculation of the correct transformation paths and/or the initialization of the transformations is the problem. I recognized an excessive amount of loading and unloading of the same dlls(ntasn1.dll, ncrypt.dll, igd10iumd32.dll). But loading them once myself didn't change anything.
So does anybody know this issue and has a quick fix for it?
Or does anybody know a work around to:
Get raw image data via media foundation without the use ofIMFSourceReader?
Select and load the transformations myself, to support the source reader call?
You basically described the way Source Reader is supposed to work in first place. The underlying media source has its own media types and the reader can supply a conversion if it ever needs to fit requested media type and closest original.
Video capture devices tend to expose many [native] media types (I have a webcam which enumerates 475 of them!), so if format fitting does not go well, source reader might take some time to try one conversion or another.
Note that you can disable source reader's conversions by applying certain attributes like MF_READWRITE_DISABLE_CONVERTERS, in which case inability to set a video format directly on the source would result in an failure.
You can also read data in original device's format and decode/convert/process yourself by feeding the data into one or a chain of MFTs. Typically, when you set respective format on the source reader, the source reader manages the MFTs for you. If you however prefer, you can do it yourself too. Unfortunately you cannot build a chain of MFTs for the source reader to manage. Either you leave it on source reader completely, or you set native media type, you read the data in original format from the reader, and then you manage the MFTs on your side by doing IMFTransform::ProcessInput, IMFTransform::ProcessOutput and friends. This is not as easy as source reader, but is doable.
Since VuVirt does not want to write any answer, I'd like to add one for him and everybody who has the same issue.
Under some conditions the call IMFSourceReader::SetCurrentMediaType takes a long time, when the target format is RGB of some kind and is not natively available. So to get rid of it, I adjusted my image pipeline to be able to interpret YUV (YUY2). I still have no idea, why this is the case, but this is a working work around for me. I don't know any alternative to speed the call up.
Additional hint: I've found that there are usually several IMFTransforms to decode many natively available formats to YUY2. So, if you are able to use YUY2, you are safe. NV12 is another working alternative. Though there are probably more.
Thanks for your answer anyways

Streaming File Delta Encoding/Decoding

Here's the problem - I want to generate the delta of a binary file (> 1 MB in size) on a server and send the delta to a memory-constrained (low on RAM and no dynamic memory) embedded device over HTTP. Deltas are preferred (as opposed to sending the full binary file from the server) because of the high cost involved in transmitting data over the wire.
Trouble is, the embedded device cannot decode deltas and create the contents of the new file in memory. I have looked into various binary delta encoding/decoding algorithms like bsdiff, VCDiff etc. but was unable to find libraries that supported streaming.
Perhaps, rather than asking if there are suitable libraries out there, are there alternate approaches I can take that will still solve the original problem (send minimal data over the wire)? Although it would certainly help if there are suitable delta libraries out there that support streaming decode (written in C or C++ without using dynamic memory).
Maintain a copy on the server of the current file as held by the embedded device. When you want to send an update, XOR the new version of the file with the old version and compress the resultant stream with any sensible compressor. (Algorithms which allow high-cost encoding to allow low-cost decoding would be particularly helpful here.) Send the compressed stream to the embedded device, which reads the stream, decompresses it on the fly and XORs directly (a copy of) the target file.
If your updates are such that the file content changes little over time and retains a fixed structure, the XOR stream will be predominantly zeroes, and will compress extremely well: number of bytes transmitted will be small, effort to decompress will be low, memory requirements on the embedded device will be minimal. The further your model is from these assumptions, the less this approach will gain you.
Since you said the delta could be arbitrarily random (from zero delta to a completely different file), compression of the delta may be a lost cause. Lossless compression of random binary data is theoretically impossible. Also, since the embedded device has limited memory anyway, using a sophisticated -and therefore computationally expensive- library for compression/decompression of the occasional "simple" delta will probably be infeasible.
I would recommend simply sending the new file to the device in raw byte format, and overwriting the existing old file.
As Kevin mentioned, compressing random data should not be your goal. A few more comments about the type of data your working with would be helpful. Context is key in compression.
You used the term image which makes it sound like the classic video codec challenge. If you've ever seen weird video aliasing effects that impact the portion of the frame that has changed, and then suddenly everything clears up. You've likely witnessed the notion of a key frame along with a series of delta frames. Where the delta frames were not properly applied.
In this model, the server decides what's cheaper:
complete key frame
delta commands
The delta commands are communicated as a series of write instructions that can overlay the clients existing buffer.
Example Format:
[Address][Length][Repeat][Delta Payload]
[Address][Length][Repeat][Delta Payload]
[Address][Length][Repeat][Delta Payload]
There are likely a variety of methods for computing these delta commands. A brute force method would be:
Perform Smith Waterman between two images.
Compress the resulting transform into delta commands.

How to hack ffmpeg to consider I-Frames as key frames?

I'm trying to get ffmpeg to seek h264 interlaced videos, and i found that i can seek to any frame if i just force it.
I already hacked the decoder to consider I - Frames as keyframes, and it works nicely with the videos i need it to work with. And there will NEVER be any videos encoded with different encoders.
However, i'd like the seek to find me an I - Frame and not just any frame.
What i'd need to do is to hack The AVIndexEntry creation so that it marks any frame that is an I-Frame to be a key frame.
Or alternatively, hack the search thing to return I - Frames.
The code does get a tad dfficult to follow at this point.
Can someone please point me at the correct place in ffmpeg code which handles this?
This isn't possible as far as i can tell..
But if you do know where the I-Frames are, by either decoding the entire video or by just knowing, you can insert stuff into the AVIndexEntry information stored in the stream.
AVIndexEntries have a flag that tells if it's a keyframe, just set it to true on I-Frames.
Luckily, i happen to know where they are in my videos :)
-mika

Can we load, display and manipulate image's matrix without using any library in c++?

is it possible to do changes to image's matrix without using any library in c++? to load and display image as well?
Sure. Grab a copy of the specification for whatever image format you're interested and write the read/write functions yourself.
Note that to write display functionality without an external library you'll likely need to run your code in kernel mode to get to the frame buffer memory, but that can certainly be done.
Not that you'd necessarily want to do it that way...
Like any typical file, an image file is simply made up of bytes; there is nothing special about an image file.
In my opinion, the most difficult part of reading/writing image files without the use of a library is understanding the file format. Once you understand the format, all you need to do is define appropriate data structures and read the image data into them (for more advanced formats you may have to do some extra work e.g. decompression).
The simplest image format to work with would have to be PPM. It's a pretty bad format but it's nice and easy to read in and write back to a file.
http://netpbm.sourceforge.net/doc/ppm.html
Apart from that, bitmaps are also pretty simple to work with. Like Drew said, just download a copy of the specification and work from there.
As for displaying images, I think you're best off using a library or framework unless you want to see how it's done for the sake of learning.

Extract and analyse sound from mp3 files

I have a set of mp3 files, some of which have extended periods of silence or periodic intervals of silence. How can I programmatically detect this?
I am looking for a library in C++, or preferably C#, that will allow me to examine the sound content of these files for the silences.
EDIT: I should elaborate what I am trying to achieve. I am capturing streaming sports commentary using VLC and saving it to mp3. When a game is delayed, or cancelled, the streaming commentary is replaced by a repetitive message saying commentary is not available. By looking for these periodic silences (or total silence), I can detect if there is no commentary and stop the streaming recording
For this reason I am reluctant to decompress the mp3 because if would mean my test for these silences would be very slow. Unless I can decode the last 5 minutes of the file?
Thanks
Andrew
I'm not aware of a library that will detect silence directly in the MP3 encoded data, since its not a trivial task to detect silence without first decompressing. Luckily, its easy to find libraries that decode MP3 files and access them as PCM data, and its trivial to detect silence in PCM Data. Here is one such Library for C# I found, but I'm sure there are tons: http://www.robburke.net/mle/mp3sharp/
Once you decode the data, you will have a list of PCM samples. In the most basic form, the algorithm you need to detect silence is simply to analyze a small chunks (could be as little as .25s or as much as several seconds), and make sure that the absolute value of each sample in the chunk is below a threshold. The threshold value you use determines how 'quiet' the sound has to be to be considered silence, and the chunk size determines how long the volume needs to be below that threshold to be considered silence (If you go with very short chunks, you will get lots of false positives due to samples near zero-crossings, but .25s or higher should be ok. There are improvements to the basic approach such as using historesis (which is basically using two thresholds, one for the transition to silence, and one for the transition from silence), and filtering.
Unfortunately, I don't know a library for C++ or C# that implements level detection off hand, and nothing immediately springs up on google, but at least for the simple version its pretty easy to code.
Edit: Also, this library seems interesting: http://naudio.codeplex.com/
Also, while not a true duplicate question, the answers here will be useful for you:
Detecting audio silence in WAV files using C#