Convert between different wave formats (WAVEFORMATEX) - c++

I'm writing a real-time audio application that runs a stream in exclusive mode. In order to properly present data to the device, it needs to arrive in a format that isn't of my own choosing. All of my audio processing is done with floating point samples before being sent to the device, and the device's wave format might not be (and probably isn't) set to WAVE_FORMAT_IEEE_FLOAT - for example, it might be WAVE_FORMAT_EXTENSIBLE or WAVE_FORMAT_PCM.
Is there an API that makes it easy to convert between one wave format (floating point) and another (the device's format)?

Use an Audio Compression Manager (ACM) conversion stream:
Converting Data from One Format to Another
If you cannot create a single stream from your format to the device's format, you will have to create two streams - one from your format to WAVE_FORMAT_PCM, and the other from WAVE_FORMAT_PCM to the device's format (all streams/devices have to support conversions to/from PCM).

Related

Converting PCM-ALAW data to an audio file using ffmpeg

In my project, I processed the received RTP packets with the payload, and extracted all the payload to a separate buffer. This payload is - PCM ALAW (Type 8). How do I implement a class that will take as arguments - the file name and a buffer with raw data to create an audio file. Exactly what steps do I have to go through in order to encode raw data into an audio file? As an example, I used this example.
That sounds way too complex. "PCM ALAW" is a bit misleading, but it's pretty clear that G.711 aLaw encoding is meant. That's a trivial "compression" which maps each 16 bits PCM sample to an 8 bits value. So a trivial lookup fixes that.
There's even a Free implementation of the aLaw encoding available. Just convert each sample to 16 bits PCM, stuff a standard Microsoft WAVE header in front of it, and call the result .WAV.
You'll need to fill in a few WAV headers based on the RTP type 8. Chiefly, that's "Mono, 8000 Hz, 16 bits per sample". One small problem with the header is that you can only write the full header once you know how many samples you have. You could update the header whenever you receive a RTP packet, but that's a bit I/O intensive. It might be nicer to do that once per 10 packets or so.

Slow motion effect when decoding OPUS audio stream

I'm capturing the audio stream of a voice chat program (it is proprietary, closed-source and I have no control over it) which is encoded with the OPUS Codec, and I want to decode it into raw PCM audio (Opus Decoder doc).
What I'm doing is:
Create an OPUS decoder: opusDecoder = opus_decoder_create(48000, 1, &opusResult);
Decode the stream: opusResult = opus_decode(opusDecoder, voicePacketBuffer, voicePacketLength, pcm, 9600, 0);
Save it to a file: pcmFile.write(pcm, opusResult * sizeof(opus_int16));
Read the file with Audacity (File > Import > Raw Data...)
Here comes the problem: sometimes it works perfectly well (I can hear the decoded PCM audio without glitch and with the original speed) but sometimes, the decoded audio stream is in "slow motion" (sometimes a little slower than normal, sometimes much slower).
I can't find out why because I don't change my program: the decoding settings remain the same. Yet, sometimes it works, sometimes it doesn't. Also, opus_decode() is always able to decode the data, it doesn't return an error code.
I read that the decoder has a "state" (opus_decoder_ctl() doc). I thought maybe time between opus_decode() calls is important?
Can you think of any parameter, be it explicit (like the parameters given to the functions) or implicit (time between two function calls), that might cause this effect?
"Slow motion" audio is almost always mismatch of sampling rate (recorded on high rate but played in low rate). For example if you record audio on 48kHz but play it as 8kHz.
Another possible reason of "slow motion" is more than one stream decoded by the same decoder. But in this case you also get distorted slow audio.
As for OPUS:
It always decode in rate that you specified in create parameters.
Inside it has pure math (without any timers or realtime related things) so it is not important when you call decode function.
Therefore some troubleshooting advises:
Make sure that you do not create decoder with different sampling rates
Make sure that when you import raw file in audacity you always import it in 48kHz mono
If any above do not help - check how many bytes you receive from decoder on each packet in normal/slow motion cases. For normal audio streams (with uniform inter-packet time) you always get the same number of raw audio samples.

Resample PCM network stream to 8000Hz 8-bit mono via libsndfile sf_open_virtual function

My goal is to take a PCM stream in Node.js that is, even for example, 44100Hz 16 bit stereo, and then resample it to 8000 Hz 8 bit mono to then be encoded into Opus and then streamed.
My thought was to try making bindings for libsndfile in C++ and using sf_open_virtual function for resampling on the stream. However:
How can I reply to its callback function requesting a certain amount
of data (found here:
http://www.mega-nerd.com/libsndfile/api.html#open_virtual) if my
program is still receiving data from the network? Do I just let it
hang in a loop until the loop detects that the buffer is a certain
percent full?
Since the PCM data is going to be headerless, how can
I specify the format type for libsndfile to expect?
Or am I over-complicating things totally?

MPEG4 out of Raw RTP Payload

Okay I got the following problem:
I have an IP Camera which is able to stream MPEG4 data over RTP
I am able to connect to this camera via RTSP
I can receive the raw RTP data.
So what problems do I have now?
1. Extract Data
What is the data I actually want? I know that I have to trunkate the RTP Header - but is there anything else I need to cut from the RTP packets?
2. Packetization Mode
I read that I should expect a field Packetization Mode in my SDP- well it's not there. Does that mean I have to assume some kind of standard packetization mode?
3. Depacketization
If I got it right I need to buffer all incoming frames with the Marker Bit = false until I get a frame with Marker Bit = true to get a complete MPEG4 Frame. What exactly do I have to understand by MPEG4 Frame? Keyframe + data until next keyframe?
4. Decode
Do I have the decode the data any further then? In other threads I saw that people used another decoder - but what is there left to decode? I mean the camera should send the data already MPEG4 coded?
5. Libraries
If I really need to decode the data, are there any open libraries I could use for that? Or maybe there is even a library which has some functions where I can just dump my RTP data and then magic happens and I get my mp4. ( But I assume there will be nothing like that .. )
Note: Everything I want to do should be part of my own application, meaning for example, I can't use an external software to parse the data.
Well long story short - I'd really need some kind of step by step explanation for this to do. I know this is a broad question but I don't know any further. I also looked into the RFCs, but I couldnt extract much information out of them.
Also I already looked up these two Questions:
How to process raw UDP packets so that they can be decoded by a decoder filter in a directshow source filter
MPEG4 extract from RTP payload
But also the long answer from the first question could not make everything clear to me.
UPDATE: Well I informed a bit further and now I don't know where to look anymore. It seems that all the packetization stuff etc. is actually not needed for my purpose. I also recorded a stream with openRTSP. When I open those files in a Hex-Editor I see that there are 16 Bytes which I can't identify, followed by the config part of the SDP. Then the frame starts with the usual 00 00 01 B6. Also oprenRTSP adds some kind of tail to the MP4 - well I actually don't know what I need and whats just some "extra" stuff which isn't mandatory.
I know that I have to trunkate the RTP Header - but is there anything
else I need to cut from the RTP packets?
RTP packet might have stuff data from a file format (such as MP4) or it could have directly based on RFC 3640 or something similar. You need to find that out.
What exactly do I have to understand by MPEG4 Frame? Keyframe + data
until next keyframe? Do I have the decode the data any further then?
In other threads I saw that people used another decoder - but what is
there left to decode? I mean the camera should send the data already
MPEG4 coded?
You should explore basics of MPEG compression to appreciate this fully. The depacketization only give you a string of bits. This is compressed data. You need to uncompress it (decode it) to see it on the screen.
are there any open libraries I could use for that?
try ffmpeg or MPEG4IP

transport stream & mpeg file format

I would like to convert a TS file to mpeg file. Is there any documents describing such process?
I know the TS architecture, but I don't know mpeg's file architecture. Any info on this subject will highly appreciated.
Thank you.
What you are probably wanting to do is convert from MPEG-TS (Transport Stream) to MPEG-PS (Program Stream). MPEG-PS is the format of a standard .mpg file as well as the format DVD video uses.
You probably should get a hold on the standard which is ISO/IEC 13818-1. This standard contains all of the MPEG-TS and MPEG-PS container details (it does not cover the coded video which is covered in ISO/IEC 13818-2).
Luckily, this conversion is rather simple since most of the entire MPEG-PS structure is contained within the MPEG-TS format. The transport stream contains a series of 188 byte packets that each have a header. PES (Program Elementary Stream) packets are contained within the packet payloads. PES packets contain the actual video or audio payload. A PES packet can be any length and most of the time they span several TS packets. Demuxing the PES packets from the transport stream really just involves removing the TS headers and concatenating the payload data correctly to form the PES packets.
Once you have a stream of PES packets, you will multiplex them into the Program Stream format as laid out in the standard. So basically, you don't need to parse the PES packets or their content, you can just lift them from one format and insert them into the other.
Even though the conversion is fairly simple, it still requires quite a bit of work since you will need to become pretty familiar with the container standard and be meticulous with your parsing of the bitstream to get things right. So even though I say the conversion is simple, that is only in the sense that it is simple compared to other format conversions where you might have to dig down further into the video data.
I am trying to add some good resources that might help.
Here are some documents that explains the details of Transport and Program streams and associated packetization structures.
This explains the differences between Transport stream and Program stream. http://www.vbrick.com/docs/VB_WhitePaper_TransportStreamVSProgramStream_rd2.pdf
This explains the over view of MPEG and includes packetization as well.
http://www.img.lx.it.pt/~fp/cav/Additional_material/MPEG2_overview.pdf
THis explains the other aspects of transport streams on how programs are selected using tables etc. http://www.bitrouter.com/pdf/tutorial-psip.pdf
Basically, you need to depacketize the transport stream and decompose into PES packets (along with the time stamps) and then apply the program stream packetization process.
The crucial thing is how do you maintain the relative gap and timing of the packets in PS streams when you mux it back. Hence, you must preserve the PTS/DTS timestamps in the PES packets.
I am listing some tools here - that are good example for part of your work - and they are better known to be with compliance to MPEG2 systems standard.
tstools ( http://tstools.berlios.de/)
mplex (from mjpegtools)
dvb-mplex (part of libdvb, http://www.metzlerbros.org/dvb/)
DVB-replax (also part of libdvb, http://freshmeat.net/projects/dvb-replex/ or http://www.metzlerbros.org/dvb/)
avidemux. http://avidemux.sourceforge.net/
Another good way to begun learning is to use Gstreamer plug-in framework if you want to understand the broader flow quickly.
FFMPEG can be used to convert from a TS to MPEG. More info here.