Full quality MP3 streaming via webRTC - mp3

I'm interested in webRTC's ability to P2P livestream an mp3 audio from user's machine. Only example, that I found is this: https://webrtc-mp3-stream.herokuapp.com/ from this article http://servicelab.org/2013/07/24/streaming-audio-between-browsers-with-webrtc-and-webaudio/
But, as you can see, the audio quality on receiving side is pretty poor (45kb\sec), is there any way to get a full quality MP3 streaming + ability to manipulate this stream's data (like adjusting frequencies with equalizer) on the each user's sides?
If impossible through webRTC, is there any other flash-plugin or pluginless options for this?
Edit: also I stumbled upon this 'shoutcast kinda' guys http://unltd.fm/ , declaring, that they are using webRTC to deliver top quality radio broadcasting including streaming mp3. If they are, then how?

WebRTC supports 2 audio codecs: OPUS (max bitrate 510kbit/s) and G711. You stick with OPUS, it is modern and more promising, introduced in 2012.
Main files in webrtc-mp3-stream are outdated by 2 years (Jul 18, 2013). I couldn't find OPUS preference in the code, so possibly demo runs via G711.
The webrtc-mp3-stream demo does the encoding job (MP3 as a media source), then it transmits the data over UPD/TCP via WebRTC. I do not think you need to decode it to MP3 on receiver side, this would be an overkill. Just try to enable OPUS to make the code of webrtc-mp3-stream more up-to-date.
Please refer to Is there a way to choose codecs in WebRTC PeerConnection? to enable OPUS to see the difference.

I'm the founder of unltd.fm.
igorpavlov is right but I can't comment answer. We also use OPUS (Stereo / 48Khz) codec over WebRTC.
Decoding mp3 ( or any other audio format ) using webaudio then encoding it in OPUS is the way to go. You "just" need to force SDP negotiations to use OPUS.
You should have send us an email you would have saved your 50 points ;)

You can increase the quality of a stream by setting the SDP to be stereo and increase the maxaveragebitrate:
let answer = await peer.conn.createAnswer(offerOptions);
answer.sdp = answer.sdp.replace('useinbandfec=1', 'useinbandfec=1; stereo=1; maxaveragebitrate=510000');
await peer.conn.setLocalDescription(answer);
This should output a SDP string which looks like this:
a=fmtp:111 minptime=10;useinbandfec=1; stereo=1; maxaveragebitrate=510000
This gives a potential maximum bitrate of 520kb/s for stereo, which is 260kps per channel. Actual bitrate depends on the speed of your network and strength of your signal.
You can read more about the other available SDP attributes at: https://www.rfc-editor.org/rfc/rfc7587

Related

Routing Audio from microphone to network using QT 6.4.x

With QT 6.4.x (Windows), how can I capture microphone audio and repackage it and forward the repackaged audio to a QUdpSocket.
The repackaging will involve changing the captured audio format from its typical 16 bit little endian format and converting to 24 bit big endian format where each packet will have a constant size potentially different size payload to that from the microphone. I am not sure but somehow I think I need to replace the QAudioSink with a QAudioDecoder as the description indicates:
The QAudioDecoder class is a high level class for decoding audio media files. It is similar to the QMediaPlayer class except that audio is provided back through this API rather than routed directly to audio hardware.
I have a partially working example that contains a mixture of sending synthesized audio directly to the speaker. This functionality is based off the 'Audio Output Example' that ships with Qt 6 (my modified example sends a sine wave generated tone to the speakers).
Also in this RtpWorker thread, using the 'Audio Source Example' for inspiration, I was also able to capture and intercept audio packets from the microphone, but I do not know how to send these packets (repackaged per the above) to a UDP socket in a fixed size datagrams, instead I just log the captured packets. I think I need an intermediate circular buffer (the write part of which fills it with captured microphone audio while the read part gets called by a QAudioSink or QAudioDecoder in pull mode).
Per my comment above I think I might need to send them to a QAudioDevice so I can handle the packaging and sending over the network myself.
My code is contained in 2 attachment in the following QTBUG-108383.
It would be great if someone could point to some useful examples that try to do something similar.
try to run Mac OS or Linux its seems Windows bug

Saving video from an RTP stream to a file

I'm trying to get and store the data from an IP camera and I'd appreciate some high level advice as to the best way to do this.
So far I've successfully initiated an RTSP conversation with the camera and have it sending me UDP packets with RTP payloads. But I'm unsure where to go from here.
I'm very happy to do the work, I'd just appreciate some pointers / a high level overview of the steps so I can deconstruct the project!
There is no direct answer to the OPs question here for his question is a bit broad, and without further information that pertains to what the OP intends to do with that information it is difficult to give a precise answer. What I can do here is to suggest to the OP steps that maybe taken and what problems to consider.
OP had stated:
So far I've successfully initiated an RTSP conversation with the camera and have it sending me UDP packets with RTP payloads. But I'm unsure where to go from here.
Now that you have an established communication with the camera and are able to receive data packets via video stream it is now a matter of being able to understand what the RTP payloads are, or how to interpret that data. So at this point you will have to do your research on the RTP protocol which appears to me to be a type of a Network Protocol. Once you have written your structure and functions to work successfully with this protocol it is a matter of breaking down the UPD packets into useful bytes of information. Normally in many cases when it comes to processing either graphic, video or audio data either from a file directly or a stream object, they are usually accompanied with some type of header information. Next, it is a matter of understanding this Header information which is normal in a form of a structure that gives information about the type of content this file or stream holds, so that you know how many bytes of information to extract from it.
I know it's not going to just be a case of saving the RTP payload directly to a file, but what other steps are involved?
The steps involved may vary depending on your needs and what you intend to do with the information: Are you trying to write the properties or the general information about the video content to a file such as: its compression type, its audio - video codec type, its resolution and frame rate information, its byte rate etc.? Or are you trying to write the actual video content itself to a file that your application will use either for play back or for editing purposes? This all depends on your intentions.
Is the data compressed, in which case I have to decompress it?
At this point once you have successfully been able to interpret the RTP Protocol and parsed the data packets by understanding their header information and saving it to a proper structure, it is then a matter of using that header information to determine what is actually within that stream object. For example and according to the PDF about the properties of the Video Camera that you have supplied the Video Compression can be saved in 2 types, H.264 or MJPEG, this you will have to determine by the information that was provided in the header, from here you would have to branch your code and be able to read and parse each type of compression or, accept the one that you are willing to work with and disregard the other. Next is the Audio Compression if you are concerned about the audio and the types available are AAC(Encoding only), G.711 A-Law, & G.711 U-Law and the same mechanisms apply here. The once are you able to get past the audio and video compression you will then need vital information about the video information itself such as what Resolution and Frame Rate (buffer sizes) were stored from the header information so you know how many bytes to read from the stream and how far to move your pointer through the stream. If you notice the Resolution And Frame Rate there are different acceptable formats available from each type of Compression that is being used:
H.26
1920 x 180 (2.1MP) # 30 fps (1080p)
1280 x 720 # 60 fps (720p)*
720 x 480/576 # 30/25 fps (D1)
704 x 480/576 # 30/20 fps (4CIF)
352 x 240/288 # 30/25 fps (CIF)
MJPEG
720 x 480/576 # 30/25 fps (D1)
740 x 480/578 # 30/25 fps (4CIF)
352 x 240/288 # 30/25 fps (CIF)
Now this is for the resolution & frame rate but the next thing to consider is
you are working with video stream so the above may not apply here in your case and according to the properties about Video-Stream capabilities from the Video Camera These are the types available that you will have to take into account for:
Single-stream H.264 up to 1080p (1920 x 1080) # 30 fps
Dual-stream H.264 and MJPEG
H.264: Primary stream programmable up to 1280 x 720 # 25/20 fps
MJPEG: Secondary stream programmable up to 720 x 576 # 25/20 fps
With these different types available for your Video Camera to use you have to take all these into consideration. Now this also depends on your intentions of your application and what you intend to do with the information. You can write your program to accept all of these types or you can program it to accept only one type with a specific format of that type. This depends on you.
Do I have to do any other modifications?
I don't think you would have any modifications to do unless if your intentions within your actual application is to modify the video - audio information itself. If your intentions within your application are to just read the file for simple playback then the answer would be no as long as all the appropriate information was saved properly and your file parser for reading your custom file structure is able to read in your file's contents and is able to parse the data appropriately for general playback.
Where can I learn about what I'll need to do specific to this camera?
I don't think you need to much more information about the camera itself, for the PDF that you provided in the link within your question has already given you enough information to go on with. What you would need from here is information and documentation about the specific Protocols, Packet Types, Compression & Stream types which a general search of these should suffice.
UDP
Do a Google search for c++ programming UDP Sockets for either Linux or Winsock.
RTP
Do a Google search for c++ programming RTP Packets
Video Compression
Do a Goggle search for both H.26 & MJPEG compression and structure information on stream objects.
Audio Compression
Do a Google search for each of AAC(encoding only), G.711 A-Law, G.711 U-Law if you are interested in the audio as well.
From there once you have the valid specifications for these data structures as a stream object and have required the appropriate header information to determine which type and format this video content is saved as then you should be able to easily parse the Data Packets appropriately. Now as to how you save them or write them to a file all depends on your intentions.
I have provided this as a guideline to follow in order to help lead you in the right direction in a similar manner that a chemist, physicist, scientist, or engineer would approach any typical problem.
The general steps are by following a scientific approach about the current problem. These typically are:
Assessing the Situation
Create either a Hypothesis or a Thesis about the Situation.
Gather the Known Facts
Determine the Unknowns
Draft a Model that Shows a Relationship Between the Known and Unknowns.
Perform both Research and Experimentation
Record or Log Events and Data
Analyze the Data
Draw a Conclusion
Now in the case of writing software application the concept is similar but the approaches may be different or vary as not all of the steps above may be needed and or some additional steps may be required. One such step in the Application Development Cycle that is not found in the Scientific approach would be the process of Debugging an Application. But the general guideline still applies. If you can keep to this type of strategy I am sure that you will be able to have the confidence in gathering what you will need and how to use it from a step by step process to achieve your goals.
I'm trying to get and store the data from a Cisco IPC camera, and I'd appreciate some high level advice as to the best way to do this.
You can probably use openRTSP to do this, which can also output to file. For this approach you would have to write NO code. Implementing RTP, RTSP and RTCP correctly is complex and a lot of work. Should you have requirements that openRTSP doesn't meet, you can use the live555 libraries to for RTSP/RTP/RTCP and write some minimal code to do something with the received video. The mailing list is very responsive provided that you ask "good" questions, and make sure you read the FAQ first.
I know it's not going to just be a case of saving the RTP payload directly to a file, but what other steps are involved?
You don't need to know this if you use openRTSP. If you use the live555 libraries directly, you'll be passed entire video frames that you would then either have to decode and write to file yourself depending on what you want to achieve. If you DO need/want to know about RTP and RTP payload formats, read the corresponding RFCs, e.g. RFC2326, RFC3550, RFC6184.
Is the data compressed, in which case I have to decompress it?
Generally you want to store compressed media in a file, and use media player software to decode it on playback (Otherwise you end up with huge files).
Where can I learn about what I'll need to do specific to this camera?
If you just want to save the video you ideally don't need to know anything about the camera, other than what standards it implements (which you already do).

Portable library to play samples on individual 5.1 channels with C/C++?

I'm looking for a free, portable C or C++ library which allows me to play mono sound samples on specific channels in a 5.1 setup. For example the sound should be played with the left front speaker whereby all other speakers remain silent. Is there any library capable of doing this?
I had a look at OpenAL. However, I can only specify the position from which the sound should come, but it seems to me that I cannot say something like "use only the front left channel to play this sound".
Any hints are welcome!
I had a look at OpenAL. However, I can only specify the position from which the sound should come, but it seems to me that I cannot say something like "use only the front left channel to play this sound".
I don't think this is quite true. I think you can do it with OpenAL, although it's not trivial. OpenAL only does the positional stuff if you feed it mono format data. If you give it stereo or higher, it plays the data the way it was provided. However, you're only guaranteed stereo support. You'll need to check to see if the 5.1 channel format extension is available on your system (AL_FORMAT_51CHN16). If so, then, I think that you feed your sound to the channel you want and feed zeroes to all the others channels when you buffer the samples. Note that you need hardware support for this on the sound card. A "generic software" device won't cut it.
See this discussion from the OpenAL mailing list.
Alternatively, I think that PortAudio is Open, cross-platform, and supports multiple channel output. You do still have to interleave the data so that if you're sending a sound to a single channel, you have to send zeroes to all the others. You'll also still need to do some checking when opening a stream on a device to make sure the device supports 6 channels of output.
A long time ago I used RTAudio. But I cannot say if this lib can do what you want to archive, but maybe it helps.
http://fmod.org could do the trick too
I use the BASS Audio Library http://www.un4seen.com for all my audio, sound and music projects. I am very happy with it.
BASS is an audio library to provide developers with powerful and efficient sample, stream (MP3, MP2, MP1, OGG, WAV, AIFF, custom generated, and more via add-ons), MOD music (XM, IT, S3M, MOD, MTM, UMX), MO3 music (MP3/OGG compressed MODs), and recording functions. All in a tiny DLL, under 100KB* in size. C/C++, Delphi, Visual Basic, MASM, .Net and other APIs are available. BASS is available for the Windows, Mac, Win64, WinCE, Linux, and iOS platforms.
I have never used it to play different samples in a 5.1 configuration. But, according their own documentation, it should be possible.
Main features
Samples Support for WAV/AIFF/MP3/MP2/MP1/OGG and custom generated samples
Sample streams Stream any sample data in 8/16/32 bit, with both "push" and "pull" systems. File streams MP3/MP2/MP1/OGG/WAV/AIFF file streaming. Internet file streaming. Stream data from HTTP and FTP servers (inc. Shoutcast, Icecast & Icecast2), with IDN and proxy server support and adjustable buffering. ** Custom file streaming ** Stream data from anywhere using any delivery method, with both "push" and "pull" systems
Multi-channel Support for more than plain stereo, including multi-channel OGG/WAV/AIFF files
...
Multiple outputs Simultaneously use multiple soundcards, and move channels between them
Speaker assignment Assign streams and MOD musics to specific speakers to take advantage of hardware capable of more than plain stereo (up to 4 separate stereo outputs with a 7.1 soundcard)
3D sound Play samples/streams/musics in any 3D position
Licensing
BASS is free for non-commercial use. If you are a non-commercial entity (eg. an individual) and you are not making any money from your product (through sales, advertising, etc), then you can use BASS in it for free. Otherwise, one of the following licences will be required.

Display streaming video in desktop app

I have a Windows native desktop app (C++/Delphi), and I'm successfully using Directshow to display live video in it from a 'local' video capture device.
The next thing I want to do is display video from a 'remote' capture device, streamed over the LAN.
To stream the video, I guess I can use something like Expression Encoder or VLC, but I'm not sure what's the easiest way to receive/decode the streamed video. Inserting an ActiveX VLC or Flash player might be one option (although the licensing may be an issue then), but I was wondering if there's any way to achieve this with Directshow...
Application needs to run on XP, and the video decoding should ideally be royalty free.
Suggestions, please!
Using Directshow for receiving and displaying your video can work, but the simplicity, "openness" and performances will depend on the video format and streaming method you'll be using.
A lot of open/free source filters exist for RTSP (e.g. based on live555), but you may also find that creating your own source filter is a better fit.
The best solution won't be the same for H264 diffusion through RTP/RTSP and for MJPEG diffusion through simple UDP.

How to convert a FLV file recorded with Red5 / FMS to MP3?

I'm looking for a way to extract the audio part of a FLV file.
I'm recording from the user's microphone and the audio is encoded using the Nellymoser Asao Codec. This is the default codec and there's no way to change this.
ffMpeg is the way to go !
It worked for me with SVN Rev 14277.
The command I used is : ffmpeg -i source.flv -nv -f mp3 destination.mp3
GOTCHA :
If you get this error message : Unsupported audio codec (n),
check the FLV Spec in the Audio Tags section.
ffMpeg can decode n=6 (Nellymoser).
But for n=4 (Nellymoser 8-kHz mono) and n=5 (Nellymoser 16-kHz mono) it doesn't work.
To fix this use the default microphone rate when recording your streams, overwise ffMpeg is unable to decode them.
Hope this helps !
This isn't an exact answer, but some relevant notes I've made from investigating FLV files for a business requirement.
Most FLV audio is encoded in the MP3 format, meaning you can extract it directly from the FLV container. If the FLV was created from someone recording from their microphone, the audio is encoded with the Nellymoser Asao codec, which is proprietary (IIRC).
I'd check out libavcodec, which handles FLV/MP3/Nellymoser natively, and should let you get to the audio.
I'm currently using FFmpeg version SVN-r12665 for this, with no problems (the console version, without any wrapper library). There are some caveats to using console applications from non-console .NET environments, but it's all fairly straightforward. Using the libavcodec DLL directly is much more cumbersome.
I was going to recommend this: http://code.google.com/hosting/takenDown?project=nelly2pcm&notice=7281.
But its been taken down. Glad I got a copy first :-)