I am trying to use DirectShow https://learn.microsoft.com/en-us/windows/win32/directshow/directshow
in order to get uncompressed byte array from .mp3 stream. I have an implementation that can playback .mp3 byte stream
bool coAudioPlayer::LoadImp(SoundDataType dataType, std::string const & filename, unsigned char const * pData, int64_t dataLen, bool bOnlyIfFilenameChanged)
{
...
m_pMemReader = new CMemReader(m_pMemStream, m_pMediaType, &hr);
m_pMemReader->AddRef();
hr = CoCreateInstance(CLSID_FilterGraph,
NULL,
CLSCTX_INPROC_SERVER,
IID_IGraphBuilder,
(void **)&this->m_pigb);
hr = m_pigb->AddFilter(m_pMemReader, NULL);
if (FAILED(hr))
{
return false;
}
m_pigb->QueryInterface(IID_IMediaControl, (void **)&m_pimc);
m_pigb->QueryInterface(IID_IMediaEventEx, (void **)&m_pimex);
m_pigb->QueryInterface(IID_IBasicAudio, (void**)&m_piba);
m_pigb->QueryInterface(IID_IMediaSeeking, (void**)&m_pims);
/* Render our output pin */
hr = m_pigb->Render(m_pMemReader->GetPin(0));
if (!SUCCEEDED(hr))
{
return false;
}
HRESULT hr = m_pimc->Run();
return m_bReady;
}
But I need to extend this functional and add approach to get uncompressed byte array (sound frames). As far as I understand under the hood DirectShow decodes it, but I don't see any way to retrieve this decoded array.
Is there a way to do it?
"uncompressed byte array" is quite a wrong definition of desired data. There is no such thing as media data in byte array format. MP3 audio would typically be decompressed by MP3 Decoder media object wrapper into DirectShow filter into audio of MEDIASUBTYPE_PCM format with certain properties (sampling rate, channel count, bits per sample). Specifically, selected bit depth (and this decoder does support multiple bit depth options!) defines directly the representation of audio data as byte array.
You don't need or want to access the data when you build a playback pipeline, such as in scenario with Render method you mention and do you don't have the data.
A typical way to access the content is to build a pipeline around Sample Grabber Filter. There is a huge amount of code and questions explaining the approach. The relevant keyword is SampleCB standing for ISampleGrabberCB::SampleCB method. For example: ffmpeg audio frame from directshow sampleCB imediasample.
Related
I have to use Opus Codec to encode & decode audio datas in C++ and I have to encapsulate the functions.
So I try to send a floats array to try to encode it and I decode the result of the Opus encoding function. Unfortunately, the result is not the same and I get a table that contains no value from the initial table.
Here is my code.
Encapsulation:
std::vector<float> codec::OpusPlugin::decode(packet_t &packet) {
std::vector<float> out(BUFFER_SIZE * NB_CHANNELS);
int ret = 0;
if (!this->decoder)
throw Exception("Can't decode since there is no decoder.");
ret = opus_decode_float(this->decoder, packet.data.data(), packet.size, reinterpret_cast<float*>(out.data()), FRAME_SIZE, 0);
if (ret < 0)
throw Exception("Error while decoding compressed data.");
return out;
}
// ENCODER
packet_t codec::OpusPlugin::encode(std::vector<float> to_encode) {
std::vector<unsigned char> data(BUFFER_SIZE * NB_CHANNELS * 2);
packet_t packet;
int ret = 0;
if (!this->encoder)
throw Exception("Can't encode since there is no decoder.");
ret = opus_encode_float(this->encoder, reinterpret_cast<float const*>(to_encode.data()), FRAME_SIZE, data.data(), data.size());
if (ret < 0)
throw Exception("Error while encoding data.");
packet.size = ret;
packet.data = data;
return packet;
}
And there is the call of the functions:
packet_t packet;
std::vector<float> floats = {0.23, 0, -0.312, 0.401230, 0.1234, -0.1543};
packet = CodecPlugin->encode(floats);
std::cout << "packet size: " << packet.size << std::endl;
std::vector<float> output = CodecPlugin->decode(packet);
for (int i = 0; i < 10; i++) {
std::cout << output.data()[i] << " ";
}
Here is the packet_t structure, where I stock the return value of encode and the unsigned char array (encoded value)
typedef struct packet_s {
int size;
std::vector<unsigned char> data;
} packet_t;
The output of the program is
*-1.44487e-15 9.3872e-16 -1.42993e-14 7.31834e-15 -5.09662e-14 1.53629e-14 -8.36825e-14 3.9531e-14 -8.72754e-14 1.0791e-13 which is not the array I initialize at the beginning.
I read a lot of times the documentation and code examples but I don't know where I did a mistake.
I hope you will be able to help me.
Thanks :)
We don't see how you initialize your encoder and decoder so we don't know what their sample rate, complexity or number of channels is. No matter how you have initialized them you are still going to have the following problems:
First Opus encoding doesn't support arbitrary frame sizes but instead 2.5ms, 5ms, 10ms, 20, 40ms or 60ms RFC 6716 - Definition of the Opus Audio Codec relevant section 2.1.4. Moreover opus supports only 8kHz, 12kHz, 16kHz, 24kHz or 48kHz sample rates. No matter which of those you have chosen your array of 10 elements doesn't correspond to any of the supported frame sizes.
Secondly Opus codec is a lossy audio codec. This means that after you encode any signal you will never (probably except some edge cases) be able to reconstruct the original signal after decoding the encoded opus frame. The best way to test if your encoder and decoder work is with a real audio sample. Opus encoding preserves the perceptual quality of the audio files. Therefore if you try to test it with arbitrary data you might not get the expected results back even if you implemented the encoding and decoding functions correctly.
What you can easily do is to make a sine function of 2000Hz(there are multiple examples on the internet) for 20ms. This means 160 array elements at a sample rate of 8000Hz if you wish to use 8kHz. A sine wave of 2kHz is within the human hearing range so the encoder is going to preserve it. Then decode it back and see whether the elements of the input and output array are similar as we've already established that it is unlikely that they are the same.
I am not good in C++ so I can't help you with code examples but the problems above hold true no matter what language is used.
I've used the OpenH264 turorial (https://github.com/cisco/openh264/wiki/UsageExampleForDecoder) to successfully decode an H264 frame, but I can't figure out from the tutorial what the output format is.
I'm using the "unsigned char *pDataResult[3];" (pData in the tutorial), and this gets populated, but I need to know the length in order to convert it to byte buffers to return it to java. I also need to know what is the ownership of this data (it seems to be owned by the decoder). This info isn't mentioned in the tutorial or docs as far as I can find.
unsigned char *pDataResult[3];
int iRet = pSvcDecoder->DecodeFrameNoDelay(pBuf, iSize, pDataResult, &sDstBufInfo);
The tutorial also lists an initializer, but gives "..." as the assignment.
//output: [0~2] for Y,U,V buffer for Decoding only
unsigned char *pData[3] =...;
Is the YUV data null terminated?
There is the SBufferInfo last parameter with TagSysMemBuffer:
typedef struct TagSysMemBuffer {
int iWidth; ///< width of decoded pic for display
int iHeight; ///< height of decoded pic for display
int iFormat; ///< type is "EVideoFormatType"
int iStride[2]; ///< stride of 2 component
} SSysMEMBuffer;
And the length is probably in there, but not clear exactly. Maybe it is "iWidth*iHeight" for each buffer?
pData is freed in decoder destructor with WelsFreeDynamicMemory in decoder.cpp, just as you supposed.
Decoder itself assign nullptr's to channels, but it's fine to initialize pData with them as a good habit.
You have iSize parameter as input, that is the byte buffers length you want.
Is it possible that the PTS of a particular frame in a file is different with the PTS of the same frame in the same file while it is being streamed?
When I read a frame using av_read_frame I store the video stream in an AVStream. After I decode the frame with avcodec_decode_video2, I store the time stamp of that frame in an int64_t using av_frame_get_best_effort_timestamp. Now if the program is getting its input from a file I get a different timestamp from when I stream the input (from the same file) to the program.
To change the input type I simply change the argv argument from "/path/to/file.mp4" to something like "udp://localhost:1234", then I stream the file with ffmpeg in command line: "ffmpeg -re -i /path/to/file.mp4 -f mpegts udp://localhost:1234". Can it be because the "-f mpegts" arguments change some characteristics of the media?
Below is my code (simplified). By reading the ffmpeg mailing list archives I realized that the time_base that I'm looking for is in the AVStream and not the AVCodecContext. Instead of using av_frame_get_best_effort_timestamp I have also tried using the packet.pts but the results don't change.
I need the time stamps to have a notion of frame number in a streaming video that is being received.
I would really appreciate any sort of help.
//..
//argv[1]="/file.mp4";
argv[1]="udp://localhost:7777";
// define AVFormatContext, AVFrame, etc.
// register av, avcodec, avformat_network_init(), etc.
avformat_open_input(&pFormatCtx, argv, NULL, NULL);
avformat_find_stream_info(pFormatCtx, NULL);
// find the video stream...
// pointer to the codec context...
// open codec...
pFrame=av_frame_alloc();
while(av_read_frame(pFormatCtx, &packet)>=0) {
AVStream *strem = pFormatCtx->streams[videoStream];
if(packet.stream_index==videoStream) {
avcodec_decode_video2(pCodecCtx, pFrame, &frameFinished, &packet);
if(frameFinished) {
int64_t perts = av_frame_get_best_effort_timestamp(pFrame);
if (isMyFrame(pFrame)){
cout << perts*av_q2d(strem->time_base) << "\n";
}
}
}
//free allocated space
}
//..
Timestamps are stored at the container level, so changing the container can change the timestamps. In addition, TS stores a timestamp for every frame (based on a 90kHz clock). MP4 only stores the frame durations with an assumed start time of 0 (this gets more complicated with bframes since the first PTS is zero, and the first DTS is < 0). So to get the time stamp all the frame durations are added. Mp4 also allows the clock rate be set. It is often 1001/3000 ticks per second for 29.97FPS, but it can be set to anything. so av_frame_get_best_effort_timestamp returns you ticks in codec->stream_base units. For TS codec->stream_base is always 1/90000
I am doing my assignment to read a .rgb video file and display it in the window. I have only known how to read and display an image in C++. What should I do when reading the video and display it frame by frame. I don't want to use third party libraries, just pure C++ and windows programming.
My idea is: firstly load the whole video file into the program using fopen and allocate the buffer for it. Then just like display an image, I wanna treat the whole video as an array of frames, so after rendering the first frame, I will go to the next frame. In addition, how to keep the video display at a constant fps? If you have any learning resources or coding pieces, it would be very helpful!
Thanks
Since you haven't mentioned platform you are using.
But this snippet will help you to read file frame by frame.
#include <stdio.h>
int main()
{
FILE * fp = NULL;
int size = 800 * 600 * 2;
unsigned char * rawData = NULL;
fp = fopen("raw.rgb", "r+b");
rawData = (unsigned char *)malloc(size);
if (NULL == rawData)
return -1;
if (fp)
{
while(!feof(fp))
{
fread(rawData, size, 1, fp);
// GOT FRAME
}
fclose(fp);
fp = NULL;
}
}
Doing this without using any third-party library will be a lot of lot of works!
You may use the OpenCV library to do the work. Check http://opencv.org/
I'm working on a project that will involve having to process PCM audio data through fft as its being played, preferably in sync. I'm using a linux g++ compiler and currently reading and playing audio data using OpenAL.
My question is this: is there a better way to process PCM audio data with an fft live as the audio is playing then using threads? If not, then what threading library would be best to use for these purposes.
this is my function that loads the wave data into an array of bytes, these can later be cast to ints for processing and all I use to play the data is OpenAL.
char* loadWAV(const char* fn, int& chan, int& samplerate, int& bps, int& size){
char buffer[4];
ifstream in(fn, ios::binary);
in.read(buffer, 4); //ChunkID "RIFF"
if(strncmp(buffer, "RIFF", 4) != 0){
cerr << "this is not a valid wave file";
return NULL;
}
in.read(buffer,4); //ChunkSize
in.read(buffer,4); //Format "WAVE"
in.read(buffer,4); // "fmt "
in.read(buffer,4); // 16
in.read(buffer,2); // 1
in.read(buffer,2); // NUMBER OF CHANNELS
chan = convertToInt(buffer,2);
in.read(buffer,4); // SAMPLE RATE
samplerate = convertToInt(buffer,4);
in.read(buffer,4); // ByteRate
in.read(buffer,2); // BlockAlign
in.read(buffer,2); // bits per sample
bps = convertToInt(buffer,2);
in.read(buffer,4); // "data"
in.read(buffer,4);
size = convertToInt(buffer,4);
char * data = new char[size];
in.read(data,size);
return data;
}
thank you for any and all help.
edit: to anyone who might be interested I wrote the function using this as a reference to know
how a WAV file is formated
Are you hoping to perform the FFT using OpenAL? I don't know if that's possible. Your code will likely be performing the FFT.
You don't need to explicitly set up any threads. However, your audio output library will probably do so on your behalf. I'm not familiar with OpenAL, but the way that a lot of audio libraries operate is by letting you specify a callback that will feed more audio into the output. Thus, your main program will load audio from the audio file, stuff it into a buffer (likely protected by a mutex) for the audio callback to read, compute an FFT over the audio window, and perhaps visualize the data for the user.
Again, the audio library will probably be managing the threading so you don't need to worry about the exact threading implementation or library. But be sure to manage shared data correctly with a mutex.