I was able to successfully encode an MP4 file which contains H.264 encoded video only (using IMFSinkWriter interface). Now I want to add an audio stream to it.
Whenever I try to create a sink writer for an audio using:
MFCreateSinkWriterFromURL(filePath, null, null, &pSinkWriter)
it deletes the previous file and writes only the audio (well, according to this link it is expected).
So my question is: how to add an audio stream to an existing file which contains only video stream?
Or, If I have both raw data from audio and video how do I encode both of them into a single media file (I suppose I have to do something called multiplexing. If so, can someone provide me helpful references)?
Sink Writer API creates a media file from scratch when you do IMFSinkWriter::BeginWriting to final completion when you do IMFSinkWriter::Finalize. You don't add new streams to finalized file (well, you can do it, but it works differently - see last paragraph below).
To create a media file with both video and audio you need to add two streams before you begin. Two calls IMFSinkWriter::AddStream, then two IMFSinkWriter::SetInputMediaType, then you start writing IMFSinkWriter::BeginWriting and you feed both video and audio data IMFSinkWriter::WriteSample providing respective stream index.
To adding a new stream to already existing file you need to create a completely new file. One of the options you have is to read already compressed data from existing file you have and write it to the new file using IMFSinkWriter::WriteSample method without re-compression. At the same time second stream can be written doing the compression. This way you can create a video and audio MP4 file by taking video from existing file and adding/encoding an additional audio track.
Related
I want to delete audio stream from video and get the only video stream. But, when I search on google could not find any tutorial except decoding. Is there a way to delete a specific stream from video.
You cannot directly delete a stream from a file. You can, however, write a new file that contains all but one (or more) streams of the original file, which can be done without decoding and encoding the streams.
For this purpose, you can use libavformat, which is part of ffmpeg. You first have to demux the video file, which gives you packages that contain the encoded data for each stream inside the container. Then, you write (mux) these packages into a new video container. Take a look at the remuxing example for details.
Note, however, you can get the same result, by calling the ffmpeg program and passing it the apropriate parameters, which is probably easier [SO1].
I use MFCreateSourceReaderFromByteStream to create an IMFSourceReader with a custom IMFByteStream getting data from a remote HTTP source.
When the source is an m4a file, everything works as expected. However, When the source is mp3, the function MFCreateSourceReaderFromByteStream does not return until the whole file is downloaded. Any idea on how to avoid that behavior and start to decode audio before the end of the download?
Assuming you are using default mediafoundation source, perhaps this is the default behaviour for the MP3 File Source and MPEG-4 File Source.
To confirm this, you can try using a custom audio mpeg file source, like this one I implemented : MFSrMpeg12Decoder
This mediafoundation source only manages mp1/mp2 audio file, and performs the decoding.This is not mp3, but it provides the bytestream once there is a valid audio mpeg header, and does not read full file (you can trust me...).
This will confirm that default MP3 File Source needs to read full file before provided the bytestream.
One possible answer would be that the MP3 file source reads the entire file to see if there is a variable bit rate, and thus provides the correct duration of the file (MF_PD_DURATION).
For m4a audio file, the duration is provided by the moov atom, so no need to read full file.
I need to extract audio assets on the fly and load them in to a timeline for playback.
I also need to render varying lengths of the asset files, but I have an idea I'm going to try out tomorrow that will sort that I think, if anyone has any tips that would be great though.
I have been playing with oboe RhythmGame code which is the closest, of the oboe samples, to what i'm trying to do. But it's not happy when I try and add or change audio sources on the fly.
Is this something oboe can do or will I have to cycle the audio stream on and off for each new set of files?
What you're proposing can definitely be done without needing to restart the audio stream. The audio stream will just request PCM data on each callback. It's your app's job to supply that PCM data.
In the RhythmGame sample compressed audio files are decoded into memory using the DataSource object. The Player object then wraps this DataSource to control playback through the set methods.
If you need to play audio data from files in a timeline I would create a new Timeline class which copies the relevant sections of audio data from DataSources and places them sequentially into a buffer. Then your audio stream can read directly from that buffer on each callback.
I am currently working on a student project, we have to create a live streaming service for videos with those constraints :
We capture the video from the Webcam with OpenCV
We want to stream the video while it's recorded
We have a file capture.avi that is saved to the computer, and while this file is saved, we want to stream it.
Currently, we have no idea how to do it, we don't know if the file transferred from A to B will be openable (Via VLC for example) in B, and if we won't have any interruption.
We plan to use RTSP for the network protocol. We code everything in C++.
Here the questions :
Does RTSP take care to stream a file that is being written
What format of the source should we use ? Should we stream the frames captured from OpenCV from A to B (So in B we have to use OpenCV to convert the frames to a video), or should we let OpenCV create a video file in A, and stream that video file from A to B ?
Thank you !
I don't believe it is safe to do so and what you need is two buffers.
The first would allow whatever library you want to use to write you recorded video to your local file system.
The later would allow your video to be streamed through your network.
Both should share the same context therefore the same data which would manage the synchronization of the two buffers.
I'm looking to write already compressed (h264) image data into an MPEG-4 video file. Since this code needs to be optimized to run on an embedded platform the code should be as simple as possible.
Best would be to just give some header information (like height width format fourcc etc.) and a filename and the compressed data and have that transformed into a data chunck and writen to that file.
So what i need either of these:
MPEG-4 header information (what goes where exactly)
Is there a main header or are there just headers for each data chunck
What header information is needed for a single video stream (rectangular)
What header information is needed for adding audio
A simple MPEG-4 file writer that does not have to do the compression itself and also allows to add audio frames. (c/c++)
.MP4 file format is described in MPEG-4 Part 14 specification. It is not just main header and subheaders, it has certain hierarchy and so called boxes in there. Some of your choice to write data into .MP4 file:
FFmpeg (libavcodec, libavformat) - related Q and related code link
In Windows via DirectShow API - GDCL MP4 Multiplexer or numerous commerical similar offerings
In Windows via Media Foundation API - MPEG-4 File Sink