Video and audio blending/fading with gstreamer - gstreamer

I'm trying to evaluate functionality in gstreamer for applicability in a new application.
The application should be able to dynamically play videos and images depending on a few criteria (user input, ...) not really relevant for this question. The main thing I was not able to figure out was how I can achieve seamless crossfading/blending between successive content.
I was thinking about using the videomixer plugin and programatically transition the sinks alpha values. However, I'm not sure if this would work nor if it is a good idea to do so.
A gstreamer solution would be prefered because of the availability on development and target platform. Furthermore, a custom videosink implementation may be used in the end for rendering the content to proprietary displays.
Edit: Was able to code up a prototype using two file-sources fed into a videomixer, using GstInterpolationControlSource and GstTimedValueControlSource to bind and interpolate the videomixer alpha control inputs. The fades look perfect, however, what I did not quite have on the radar was that I cannot dynamically change the file sources location while the pipeline is running. Furthermore, it feels like misusing functions not intended for the job at hand.
Any feedback on how to tackle this use case would still be very much appreachiated. Thanks!

Related

What is the path from BITMAP[+WAVE(s)] to RTSP (Twitch) via C/C++ in Windows?

So I'm trying to get a basic tool to output video/audio(s) to Twitch. I'm new to this side (AV) of programming so I'm not even sure what to look for. I'm trying to use mainly Windows infrastructure and third party where not available.
What are the steps of getting raw bitmap and wave data into a codec and then into a rtsp client and finally showing up on Twitch? I'm not looking for code. I'm looking for concepts so I can search for as I'm not absolutely sure what to search for. I'd rather not go through OBS source code to figure it out and use that as last resort.
So I capture the monitor via Output Duplication and also the Sound on the system as a wave and the microphone as another wave. I'm trying to push this to Twitch. I know that there's Media Foundation on Windows but I don't know how far to streaming it can get as I assume there no netcode integrated in it? And also the libav* collection in FFMPEG.
What are the basic steps of sending bitmap/wave to Twitch via any of thee above libraries or even others as long as they work on Windows. Please don't add code, I just need a not very long conceptual explanation and I'll take it from there. Try to cover also how bitrate and framerate gets regulated (do I have do it or the codec does it)?
Assume absolute noob level in this area (concept-wise not code-wise).

adding "read aloud" feature to book app written in Cocos2D

I created a book app and used Cocos2D and physics engine (Chipmunk) to create it. I would like to add "read aloud" feature to it.
So far I found instructions/books and tutorials how to add read aloud feature when book is created with iBook Author (but I couldn't use iBook Author due to some limitations) using Epub3 and SMIL.
I also found a good tutorial from J. Shapiro how to make narrated book using AVSpeechSynthesizer. This helps, only that I would like to use recorded voice, rather than synthesized sound. I don't know if this approach can be modified to do so?
I also know how it can be done in Sprite Kit framework.
The only info that I couldn't find is how to add "read aloud" feature to the app written using Cocos2D. Could it be done within SimpleAudioEngine, or it can be combined with some other engine (possibly from Sprite Kit framework)?
I would appreciate very much if somebody can give me some references/pointers or tutorial links where to look for some answers how to add this feature.
Thanking you in advance.
I would like to use recorded voice, rather than synthesized sound
Good. Add your voice recording audio files (caf, wav or mp3 format) to the project. Play it back at the appropriate time using:
[[SimpleAudioEngine sharedEngine] playEffect:#"someVoiceRecordingFile.wav"];
Define what read aloud means to you because I find that a lot of terms, especially semi-vague ones like this, are used differently depending on who is using it.
When you say read aloud book do you essentially mean a digital storybook that reads the story to you by simply playing narration audio? I've created dozens of these and what you are asking has multiple steps depending on what features you are going for in your book. If you mean simply playing audio and that is it, then yes you could do that in cocos2d using SimpleAudioEngine (as one option) but I assume you already knew that which is why this question has a tab bit of vagueness to it. Either way you probably wouldn't want to play narration as an effect but rather stream it. To do that along with background music you'd stream background music via the left channel and narration via the right. You can easily add a method to SimpleAudioEngine to make this nice and neat. To get you started something similar to this can be used to access the right channel:
CDLongAudioSource* sound = [[CDAudioManager sharedManager] audioSourceForChannel:kASC_Right];
if ([sound isPlaying])
{
[sound stop];
}
[sound load:fileName];
Also use the proper settings and recommended formats for streaming audio such as aifc (or really all audio in general). Although I believe you can stream mp3 without it being decompressed first, the problem is with timing. If you are using highlighted text or looping audio then aifc is the better option. Personally I've never had a reason to use mp3. Wav with narration is something I'd avoid even if just for the file size increase. If the mp3 is decompressed even for streaming (which I'm not sure if it is off the top of my head) then you'd have a huge spike in memory that will be both highly unwanted and at times down right bad.
There are many other things that can go into it but those are the basic first steps. If you want to do things like highlighted text, per-word animations, etc then that will take more work of course and you'd need to be comfortable with cocos2d, SpriteKit, or whatever you decide to use. I'll be doing a tutorial series on it one day soon so I'll cover all of that stuff.
On the other hand, if you are talking about recording someone's voice and having it playback i.e. a mother recording herself reading the story so her child can hear her voice whenever they are using your app, then you'd simply record the audio like you would any other piece of audio, save it to the device, and play it back when the page is displayed in the proper reading mode (or whatever you personally call it). One place to look is the AVAudioRecorder that is part of the AVFoundation framework. Simply Google "iOS audio recording" for examples if you need them.

DirectShow-IMediaDet only extracts the first frame

I experienced a weird effect concerning DirectShow and splitters.
I was not able to track it, so maybe somebody can help?
In my application I extract a few frames from movies.
I do this by DirectShow's IMediaDet interface.
(BTW: It's XP SP3, DirectShow 9.0).
Everything works fine, as long as there is no media splitter involved
(this is the case for mp4, mkv, flv, ...).
Concerning codecs I use the K-Lite distribution.
Since some time there are two splitters: LAV and Haali.
The Gabest splitter has been removed since some time.
But only with the latter activated everything worked fine!
OK - the effect:
It's about IMediaDet.GetBitmapBits:
Some (most) medias that uses splitters always extract the very first frame.
And with some other medias with splitters this effect is only when I
used get_StreamLength before. (Although GetBitmapBits should switch
back to BitmapGrab mode, as the docu says.).
As said - everything works fine as far as no splitter is involved (mpg, wmv, ...).
Does someone experienced a similar effect?
Where may be the bug: In DShow, in the splitters, in my code?
Any help appreciated ... :-)
Your assumption is not quite correct. IMediaDet::GetBitmapBits builds a filter graph internally, and attempts to navigate playback to position of interest. Then starts streaming to get a valid image onto its Sample Grabber filter "BitBucket".
It does not matter if splitter is a separate filter or it is combined with source. Important part is the ability of the graph to seek, a faulty filter might be an obstacle there, even though the snapshot is taken. This is the symptom you are describing.
For instance the internal graph might be like this:
There is a dedicated multiplexer there, and snapshot is taken from correct position.

Video Mixing Options

I am working on a bigger project of video-wall and want to display multiple sources of videos on a single display.
something like this --
What are all my options?
Java with JMF
Python with GStreamer bindings
Before committing to a technology, I want to get a clear picture about available resources and their limitations.
With gstreamer you can realize this. You would use 4 uridecodebin instances and feed them into a videomixer. On each videomixer.pad you can set the xpos,ypos,z-order and alpha. Between the uridecodebins and the videomixer, you probably want to plug scaling and framerate adaptation.

Video mixer filter

I need to find a video filter in order to mix multiple video streams (let's say, maximum 4).
I've found a video mixer filter from MediaLooks and is ok, but the problem is that i'm trying to use it in a school project (for the entire semester) and so the 30 days trial is kind of unacceptable.
So my question to you is that: are you aware of a free direct show filter that could help. If this is not working then it means i must write one. The problem here is that i don't know from where to start.
If you need output to the display, you can use the VMR. If you need output to file, then I think you will need to write something. The standard solution to this is to write an allocator/presenter plugin for the VMR that allows you to get back the mixed video and then save it somewhere. This is more efficient that a fully software-only mixer filter.
G
I finally ended up by implementing my own filter.
The VideoMixerRender9 (and 7) will do the trick for you. You can set the opacity and area each video going into the VMR9. I suggest playing with it from within graphedit.
I would also like to suggest skipping that all together. If you use WPF, you will get far more media capabilities, much easier.
If you want low level DirectShow support, you can try my project, WPF Mediakit. I have a control called MediaUriElement that is similar to WPF's MediaElement.