Tee/passthrough DirectShow data as video source

Tee/passthrough DirectShow data as video source - c++

I have an application that gets video samples from a frame grabber card via DirectShow. The application then does some processing and sends the video signal over a network. I now want to duplicate this video signal such that another DirectShow-enabled software (like Skype) can use the original input signal, too.
I know that you can create Tee filters in DirectShow like the one used to split a video signal for recording and preview. However, as I understand, this filter is only useful within a single graph, ie I cannot use it to forward the video from my process to eg Skype.
I also know that I could write my own video source, but this would run in the process of the consuming application. The problem is that I cannot put the logic of my original application in such a video source filter.
The only solution I could think of is my application writing the frames to a shared memory block and a video source filter reading it from there. Synchronisation would be done using a shared mutex or so. Could that work? I specifically do not like the synchronisation part?
And more importantly, is there a better solution to solve this problem?

The APIs work as you identified: a video capture application, such as Skype, is requesting video stream without interprocess communication in mind, there is no IPC involved to consume output generated in another process. Your challenge here is to provide this IPC yourself so that one application is generating the data, and then another extends existing API (virtual video source device) and picks existing data, then delivers as generated.
With video, you have a relatively big stream of data and you are interested in avoiding its excessive copying. File mappings (AKA shared memory) are the right thing to do: you put bytes in one process and they are immediately visible in another. You can synchronize access to the data using names events and mutexes which both processes use collaboratively - to signal availability of new buffer of data, as indication that used buffer is no longer in use etc.

Related

How to read Timing Reference Signal/Ancillary data from a video stream?

I’m in search for a solution that makes it possible to read the Timing Reference Signal (TRS) and Ancillary data (HANC and VANC) of a serial digital video. The TRS gives information about the start and the end of active video (SAV/EAV), the Ancillary data gives, for example, information about embedded audio. With this I want to code an application that analyzes the data that is transported in the non-picture area of serial video.
I read much about GStreamer and found with GstVideo Ancillary a collection that makes it possible to handle the Ancillary Data of a video.
Unfortunately, it’s not clear to me, how this collection works. For me, it looks like that this collection can only construct Ancillary data for a video and it’s not possible to read ancillary data from a detected videostream.
Another idea is to read the whole video stream and display, in a first step, the data words of the stream. TRS and ANC packets have to start with a special sequence of identifiers that makes it possible to localize them. Is GStreamer for this the right choice? Are there better recommended libraries for this task?

Recording both rendering and recording device

I'm writing a program in C++, on Windows. I need to support Windows Vista+.
I want to record both the microphone and speaker simultaneously.
I'm using the WASAPI and can record the microphone and speaker separately, but I would like to have just one stream supplying me the input from both streams (for example, for recording a client play the guitar along with the music he hears on his headphones), instead of merging the two buffers together somehow (which I guess will lead me to timing issues).
Is there a way to do this?

I'm actually working on a library which can do exactly that, merge streams from multiple devices. You might want to give it a try: see xt-audio.com. If you're implementing this yourself, here's some things to consider:
If you're capturing the speakers through a WASAPI loopback interface you're operating in shared mode, in this case latency might be unacceptable for live performance. If possible stick to exclusive mode and use a loopback cable or hardware loopback device if you have one (e.g. the old fashioned "stereo mix" devices etc).
If you're merging buffers then yes, you're going to have timing issues. This is generally unavoidable when syncing independent devices. Pops/clicks can largely be avoided using a secondary intermediate buffer which introduces additional latency, but eventually you're going to have to pad/drop some samples to keep streams in sync.
Do NOT use separate threads for each independent stream. This will increase context switches and thereby increase the minimum achievable latency. Instead, designate one device as the master device, wait for that device's event to be raised, then read input from all devices whether they are "ready" or not (this is were dropping/padding comes into play).
In general you can get really decent performance from WASAPI exclusive mode, even running multiple streams together. But for something as critical as live performance you might want to consider a pro audio interface with ASIO drivers where everything just ticks off the same clock, or synchronization is at least handled at the driver level.

x264 NALUs serialization and handling

I have my x264 encoder, producing NALUs from a raw video stream. I need to send those NALUs over the network. What is the best way of doing so?
Encoder is inserted into a DirectShow graph, it's a transform filter and downstream I have the filter which handles networking. Can I pass NALUs, created by transform filter directly to network "render" filter? Will it create some memory issues?
I would like to know how memory allocated for NALUs is handled inside x264 - who is responsible for freeing it? Also I'm wondering if I can just serialize NALU to a bit stream manually and then rebuild it in the same way?

I need to send those NALUs over the network. What is the best way of doing so?
"Best" needs clarification: easiest to do, best in terms of compatibility, compatible to specific counterpart implementation etc.
Can I pass NALUs, created by transform filter directly to network "render" filter? Will it create some memory issues?
There is no stock network renderer, you should read up on how it needs to be done with specific renderer you are going to use.
I would like to know how memory allocated for NALUs is handled inside x264 - who is responsible for freeing it?
x264 manages buffers it fills, x264_encoder_encode returns you references on those buffers and you don't need to free data, just be sure to copy it out timely since it will be invalidated with next call. Don't forget x264_encoder_close afterwards - it will release all resources managed internally.
Also I'm wondering if I can just serialize NALU to a bit stream manually and then rebuild it in the same way?
Yes you can do it. If your network pair of filters can reproduce the same stream doing network stuff on their inner connection, then it is going to work out fine. The best network protocol in terms of interoperability with H.264 is RTP. It is however pretty complicated if compared to simply accept/send/receive/reproduce steps for a bitstream.
RTP: A Transport Protocol for Real-Time Applications
RTP Payload Format for H.264 Video

The best way to send out NALU on to the network would be through an RTP stream. Look at RFC 6184 for details on RTP packetization for H.264. I think you can safely pass NALU to your renderer provided your media buffers are large enough to hold you NALUs.

Multiple applications using GStreamer

I want to write (but first I want to understand how to do it) applications (more than one) based on GStreamer framework that would share the same hardware resource at the same time.
For example: there is a hardware with HW acceleration for video decoding. I want to start simultaneously two applications that are able to decode different video streams, using HW acceleration. Of course I assume that HW is able to handle such requests, there is appropriate driver (but not GStreamer element) for doing this, but how to write GStreamer element that would support such resource sharing between separate processes?
I would appreciate any links, suggestions where to start...

You have h/w that can be accessed concurrently. Hence two gstreamer elements accessing it concurrently should work! There is nothing Gstreamer specific here.
Say you wanted to write a decoding element, it is like any decoding element and you access your hardware correctly. Your drivers should take care of the concurrent access.
The starting place is the Gstreamer plugin writer's guide.

So you need a single process that controls the HW decoder, and decodes streams from multiple sources.
I would recommend building a daemon, possibly itself based on GStreamer also. The gdppay and gdpdepay provide quite simple ways to pass data through sockets to the daemon and back. The daemon would wait for connections on a specified port (or unix socket) and open a virtual decoder per each connection. The video decoder elements in the separate applications would internally connect to the daemon and get back the decoded video.

Loading large multi-sample audio files into memory for playback - how to avoid temporary freezing

I am writing an application needs to use large audio multi-samples, usually around 50 mb in size. One file contains approximately 80 individual short sound recordings, which can get played back by my application at any time. For this reason all the audio data gets loaded into memory for quick access.
However, when loading one of these files, it can take many seconds to put into memory, meaning my program if temporarily frozen. What is a good way to avoid this happening? It must be compatible with Windows and OS X. It freezes at this : myMultiSampleClass->open(); which has to do a lot of dynamic memory allocation and reading from the file using ifstream.
I have thought of two possible options:
Open the file and load it into memory in another thread so my application process does not freeze. I have looked into the Boost library to do this but need to do quite a lot of reading before I am ready to implement. All I would need to do is call the open() function in the thread then destroy the thread afterwards.
Come up with a scheme to make sure I don't load the entire file into memory at any one time, I just load on the fly so to speak. The problem is any sample could be triggered at any time. I know some other software has this kind of system in place but I'm not sure how it works. It depends a lot on individual computer specifications, it could work great on my computer but someone with a slow HDD/Memory could get very bad results. One idea I had was to load x samples of each audio recording into memory, then if I need to play, begin playback of the samples that already exist whilst loading the rest of the audio into memory.
Any ideas or criticisms? Thanks in advance :-)

Use a memory mapped file. Loading time is initially "instant", and the overhead of I/O will be spread over time.

I like solution 1 as a first attempt -- simple & to the point.
If you are under Windows, you can do asynchronous file operations -- what they call OVERLAPPED -- to tell the OS to load a file & let you know when it's ready.

i think the best solution is to load a small chunk or single sample of wave data at a time during playback using asynchronous I/O (as John Dibling mentioned) to a fixed size of playback buffer.
the strategy will be fill the playback buffer first then play (this will add small amount of delay but guarantees continuous playback), while playing the buffer, you can re-fill another playback buffer on different thread (overlapped), at least you need to have two playback buffer, one for playing and one for refill in the background, then switch it in real-time
later you can set how large the playback buffer size based on client PC performance (it will be trade off between memory size and processing power, fastest CPU will require smaller buffer thus lower delay).

You might want to consider a producer-consumer approach. This basically involved reading the sound data into a buffer using one thread, and streaming the data from the buffer to your sound card using another thread.
The data reader is the producer, and streaming the data to the sound card is the consumer. You need high-water and low-water marks so that, if the buffer gets full, the producer stops reading, and if the buffer gets low, the producer starts reading again.
A C++ Producer-Consumer Concurrency Template Library
http://www.bayimage.com/code/pcpaper.html
EDIT: I should add that this sort of thing is tricky. If you are building a sample player, the load on the system varies continuously as a function of which keys are being played, how many sounds are playing at once, how long the duration of each sound is, whether the sustain pedal is being pressed, and other factors such as hard disk speed and buffering, and amount of processor horsepower available. Some programming optimizations that you eventually employ will not be obvious at first glance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js