I've used the OpenH264 turorial (https://github.com/cisco/openh264/wiki/UsageExampleForDecoder) to successfully decode an H264 frame, but I can't figure out from the tutorial what the output format is.
I'm using the "unsigned char *pDataResult[3];" (pData in the tutorial), and this gets populated, but I need to know the length in order to convert it to byte buffers to return it to java. I also need to know what is the ownership of this data (it seems to be owned by the decoder). This info isn't mentioned in the tutorial or docs as far as I can find.
unsigned char *pDataResult[3];
int iRet = pSvcDecoder->DecodeFrameNoDelay(pBuf, iSize, pDataResult, &sDstBufInfo);
The tutorial also lists an initializer, but gives "..." as the assignment.
//output: [0~2] for Y,U,V buffer for Decoding only
unsigned char *pData[3] =...;
Is the YUV data null terminated?
There is the SBufferInfo last parameter with TagSysMemBuffer:
typedef struct TagSysMemBuffer {
int iWidth; ///< width of decoded pic for display
int iHeight; ///< height of decoded pic for display
int iFormat; ///< type is "EVideoFormatType"
int iStride[2]; ///< stride of 2 component
} SSysMEMBuffer;
And the length is probably in there, but not clear exactly. Maybe it is "iWidth*iHeight" for each buffer?
pData is freed in decoder destructor with WelsFreeDynamicMemory in decoder.cpp, just as you supposed.
Decoder itself assign nullptr's to channels, but it's fine to initialize pData with them as a good habit.
You have iSize parameter as input, that is the byte buffers length you want.
Related
I'm using FFmpe's swr_convert to convert AV_SAMPLE_FMT_FLTP audio. I've been successful converting to a different sample format (e.g. AV_SAMPLE_FMT_FLT and AV_SAMPLE_FMT_S16), but I'm running into trouble when I'm trying to keep the AV_SAMPLE_FMT_FLTP sample format but change the sample rate.
When converting AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_FLTP, swr_convert attempts to write to an empty buffer.
I'm using swr_convert to convert from 22050 Hz AV_SAMPLE_FMT_FLTP to 16000 Hz AV_SAMPLE_FMT_FLTP.
I initialized SwrContext like so:
if (swr_alloc_set_opts2(
&resample_context,
&pAVContext->ch_layout, AV_SAMPLE_FMT_FLTP, 16000,
&pAVContext->ch_layout, AV_SAMPLE_FMT_FLTP, 22050, 0, NULL) < 0)
return ERR_SWR_INIT_FAIL;
if(swr_init(resample_context) < 0)
return ERR_SWR_INIT_FAIL;
and when I call it like this, the program tries to write to a null buffer and crashes.
samples_decoded = swr_convert(ctx->pSwrContext,
&pDecodedAudio, numOutSamples,
(const uint8_t**)&pDecodedFrame->data, pDecodedFrame->nb_samples);
So far I've traced the problem to swr_convert_internal
if(s->int_sample_fmt == s->out_sample_fmt && s->out.planar
&& !(s->out_sample_fmt==AV_SAMPLE_FMT_S32P && (s->dither.output_sample_bits&31))){
//Sample format is planar and input format is same as output format
if(preout==in){
out_count= FFMIN(out_count, in_count);
av_assert0(s->in.planar);
copy(out, in, out_count);
return out_count;
}
else if(preout==postin) preout= midbuf= postin= out;
else if(preout==midbuf) preout= midbuf= out;
else preout= out;
}
That if bit of code assigns out to preout, but out's data is unitialized. Later on FFmpeg tries to write to the uninitialized block.
I've tested this in 5.1 and in the snapshot build, and it crashes both of them.
So, am I doing something wrong, or is this a bug?
I was doing something wrong. Packet audio is a contiguous block of memory and can be referenced by one pointer, but planar audio has a different pointer to each channel. To fix this, I got two pointers to my pDecodedAudio block.
uint8_t* convertedData [2] = {
pDecodedAudio ,
pDecodedAudio + (numOutSamples * ctx->output_sample_size)
};
samples_decoded = swr_convert(ctx->pSwrContext,
convertedData, numOutSamples,
pDecodedFrame->data, pDecodedFrame->nb_samples);
See the comments in AVFrame.
/*
* For planar audio, each channel has a separate data pointer, and
* linesize[0] contains the size of each channel buffer.
* For packed audio, there is just one data pointer, and linesize[0]
* contains the total size of the buffer for all channels.
*
* Note: Both data and extended_data should always be set in a valid frame,
* but for planar audio with more channels that can fit in data,
* extended_data must be used in order to access all channels.
*/
uint8_t **extended_data;
So i have a problem with converting a BYTE buffer to an image, (cv::Mat).
I am trying to read a real time video from a distant camera, and i got two elements, a pointer to the buffer and the buffer size, and i need to convert that to a cv::Mat image so that i can show it with cv::imshow. i tried to use:
cv::imdecode(cv::Mat(bufferSize,CV_8UC3,*buffer),cv::imread_color);
but it isn't working and i get this error:
error: (-215:Assertion failed) buf.checkVector(1, CV_8U) > 0 in function 'imdecode_'
when i try to convert directly without the imdecode function like this:
cv::Mat(bufferSize,CV_8UC3,*buffer)
i get an image but i can't show it so the programm just continue running without doing anything.
Can anyone help me please on how do we convert from BYTE buffer pointer to an cv::Mat image
EDIT:
Buffer is declared like this : BYTE *Buffer
the function where i get the buffer from is declared like this
void CALLBACK RealDataCallBackEx(LLONG lRealHandle, DWORD dwDataType, BYTE *pBuffer,DWORD dwBufSize, LONG param, LDWORD dwUser)
where:
lRealHandle : Real-time monitoring handle
dwDataType :
0 : Original data which is consistent with data saved by savedRealData
1 : Frame data.
2 : Yuv data.
3 : Pcm audio data.
pBuffer : Buffer for callback data. Data of different length will be called back according to different data type. The data are called back by frame for every type but type 0, and each time one frame is called back.
dwBufSize : Callback data length. The data buffers are diffreent for different types. The unit is BYTE
In my case i always get data type 0
This is how i try to decode then :
cv::Mat img = cv::imdecode(cv::Mat(dwBufSize,CV_8UC3,*pBuffer),cv::IMREAD_COLOR);
cv::Imshow("img",img);
my programm stops here, it continue running but it dosen't do anything after cuz i have put a std::cout here to check if it will pass this line of imshow or not but nothing happens
Thank you.
i am trying to do some functionality with espeak but missing some parameters
(i don`t know it) and working on code blocks on Linux
the next code runs well and reads Arabic Text
`#include<string.h>
#include<malloc.h>
#include</usr/local/include/espeak/speak_lib.h>
int main(int argc, char* argv[] )
{
char text[] = {"الله لطيف "};
espeak_Initialize(AUDIO_OUTPUT_PLAYBACK, 0, NULL, 0 );
espeak_SetVoiceByName("ar");
unsigned int size = 0;
while(text[size]!='\0') size++;
unsigned int flags=espeakCHARS_AUTO | espeakENDPAUSE;
espeak_Synth( text, size+1, 0,POS_CHARACTER,0, flags, NULL, NULL );
espeak_Synchronize( );
return 0;
}`
now could you help us finding these parameters from Espeak
1.Fuction which return the generated wave to store it in a variable
2.Frequency
3.number of channels
4.sample size
5.a buffer in which we store samples
6.number of samples
If you can't find a suitable example, you will have to read the documentation in the header file. Haven't used it, but it looks pretty comprehensible:
http://espeak.sourceforge.net/speak_lib.h
When you called espeak_Initialize you passed in AUDIO_OUTPUT_PLAYBACK. You will need to pass in AUDIO_OUTPUT_RETRIEVAL instead, and then it looks like you must call espeak_SetSynthCallback with a function of your own creation to accept the samples.
Your adapted code would look something like this (UNTESTED):
#include <string.h>
#include <vector>
#include </usr/local/include/espeak/speak_lib.h>
int samplerate; // determined by espeak, will be in Hertz (Hz)
const int buflength = 200; // passed to espeak, in milliseconds (ms)
std::vector<short> sounddata;
int SynthCallback(short *wav, int numsamples, espeak_EVENT *events) {
if (wav == NULL)
return 1; // NULL means done.
/* process your samples here, let's just gather them */
sounddata.insert(sounddata.end(), wav, wav + numsamples);
return 0; // 0 continues synthesis, 1 aborts
}
int main(int argc, char* argv[] ) {
char text[] = {"الله لطيف "};
samplerate = espeak_Initialize(AUDIO_OUTPUT_RETRIEVAL, buflength, NULL, 0);
espeak_SetSynthCallback(&SynthCallback);
espeak_SetVoiceByName("ar");
unsigned int flags=espeakCHARS_AUTO | espeakENDPAUSE;
size_t size = strlen(text);
espeak_Synth(text, size + 1, 0, POS_CHARACTER, 0, flags, NULL, NULL);
espeak_Synchronize();
/* in theory sounddata holds your samples now... */
return 0;
}
So for your questions:
Function which return the generated wave to store it in a variable - You write a callback function, and that function gets little buflength-long bits of the wav to process. If you are going to accumulate the data into a larger buffer, I've shown how you could do that yourself.
Frequency - Through this API it doesn't look like you pick it, espeak does. It's in Hz and returned as samplerate above.
Number of Channels - There's no mention of it, and voice synthesis is generally mono, one would think. (Vocals are mixed center by default in most stereo mixes...so you'd take the mono data you got back and play the same synthesized data on left and right channels.)
Sample Size - You get shorts. Those are signed integers, 2 bytes, range of -32,768 to 32,767. Probably it uses the entire range, doesn't seem to be configurable, but you could test and see what you get out.
A Buffer In Which We Store Samples - The synthesis buffer appears to belong to espeak, which handles the allocation and freeing of it. I've shown an example of using a std::vector to gather chunks from multiple calls.
Number of Samples - Each call to your SynthCallback will get a potentially different number of samples. You might get 0 for that number and it might not mean it's at the end.
I have a problem with reading 8bit grayscale bmp. I am able to get info from header and to read the palette, but I can't refer pixel values to the palette entries. Here I have found how to read the pixel data, but not actually how to use it in case of bmp with a palette. I am a beginner. My goal is to read only one row of pixels at a time.
Code:
#include <iostream>
#include <fstream>
using namespace std;
int main(int arc, char** argv)
{ const char* filename="Row_tst.bmp";
remove("test.txt");
ofstream out("test.txt",ios_base::app);//file for monitoring the results
FILE* f = fopen(filename, "rb");
unsigned char info[54];
fread(info, sizeof(unsigned char), 54, f); // read the header
int width = *(int*)&info[18];
int height = *(int*)&info[22];
unsigned char palette[1024]; //read the palette
fread(palette, sizeof(unsigned char), 1024, f);
for(int i=0;i<1024;i++)
{ out<<"\n";
out<<(int)palette[i];
}
int paletteSmall[256]; //1024-byte palette won't be needed in the future
for(int i=0;i<256;i++)
{ paletteSmall[i]=(int)palette[4*i];
out<<paletteSmall[i]<<"\n";
}
int size = width;
//for(int j=0;j<height;j++)
{ unsigned char* data = new unsigned char[size];
fread(data, sizeof(unsigned char), size, f);
for(int i=0;i<width;i++)
{ cout<<"\n"<<i<<"\t"<<paletteSmall[*(int*)&data[i]];
}
delete [] data;
}
fclose(f);
return 0;
}
What I get in the test.txt seems fine - first values from 0 0 0 0 to 255 255 255 0 (palette), next values from 0 do 255 (paletteSmall).
The problem is that I can't refer pixel values to the color table entries. My application callapses, with symptoms indicating, probably, that it tried to use some unexisting element of a table. If I understand properly, a pixel from a bmp with a color table should contain a number of a color table element, so I have no idea why it doesn't work. I ask for your help.
You are forcing your 8-bit values to be read as int:
cout<<"\n"<<i<<"\t"<<paletteSmall[*(int*)&data[i]];
The amount of casting indicates you were having problems here and probably resolved to adding one cast after another until "it compiled". As it turns out, compiling without errors is not the same as working without errors.
What happens here is that you force the data pointer to read 4 bytes (or as much as your local int size is, anyway) and so the value will almost always exceed the size of paletteSmall. (In addition, the last couple of values will be invalid under all circumstances, because you read bytes from beyond the valid range of data.)
Because the image data itself is 8-bit, all you need here is
cout<<"\n"<<i<<"\t"<<paletteSmall[data[i]];
No casts necessary; data is an unsigned char * so its values are limited from 0 to 255, and paletteSmall is exactly the correct size.
On Casting
The issue with casting is that your compiler will complain if you tell it flat out to treat a certain type of value as if it is another type altogether. By using a cast, you are telling it "Trust me. I know what I am doing."
This can lead to several problems if you actually do not know :)
For example: a line such as your own
int width = *(int*)&info[18];
appears to work because it returns the proper information, but that is in fact a happy accident.
The array info contains several disconnected unsigned char values, and you tell your compiler that there is an int stored starting at position #18 – it trusts you and reads an integer. It assumes that (1) the number of bytes that you want to combine into an integer is in fact the number of bytes that itself uses for an int (sizeof(int)), and (2) the individual bytes are in the same order as it uses internally (Endianness).
If either of these assumptions is false, you can get surprising results; and almost certainly not what you wanted.
The proper procedure is to scan the BMP file format for how the value for width is stored, and then using that information to get the data you want. In this case, width is "stored in little-endian format" and at offset 18 as 4 bytes. With that, you can use this instead:
int width = info[18]+(info[19]<<8)+(info[20]<<16)+(info[21]<<24);
No assumptions on how large an int is (except that it needs to be at least 4 bytes), no assumption on the order (shifting values 'internally' do not depend on endianness).
So why did it work anyway (at least, on your computer)? The most common size for an int in this decade is 4 bytes. The most popular CPU type happens to store multi-byte values in the same order as they are stored inside a BMP. Add that together, and your code works, on most computers, in this decade. A happy accident.
The above may not be true if you want to compile your code on another type of computer (such as an embedded ARM system that uses another endianness), or when the used compiler has a smaller (.. which by now would be a very old compiler) or a larger size for int (just wait another 10 years or so), or if you want to adjust your code to read other types of files (which will have parameters of their own, and the endianness used is one of them).
I am trying to get sound from simple tapping keyboard. Looks like a little drum machine.
If DirectSound is not a proper way to do this, please suggest something else.
In my code I don't know what's wrong. Here it is without error checking and with translations:
//Declaring the IDirectSound object
IDirectSound* device;
DirectSoundCreate(NULL, &device, NULL);
device->SetCooperativeLevel(hWnd, DSSCL_NORMAL );
/* Declaring secondary buffers */
IDirectSoundBuffer* kickbuf;
IDirectSoundBuffer* snarebuf;
/* Declaring .wav files pointers
And to structures for reading the information int the begining of the .wav file */
FILE* fkick;
FILE* fsnare;
sWaveHeader kickHdr;
sWaveHeader snareHdr;
The structure sWaveHeader is declared this way:
typedef struct sWaveHeader
{
char RiffSig[4]; // 'RIFF'
unsigned long WaveformChunkSize; // 8
char WaveSig[4]; // 'WAVE'
char FormatSig[4]; // 'fmt '
unsigned long FormatChunkSize; // 16
unsigned short FormatTag; // WAVE_FORMAT_PCM
unsigned short Channels; // Channels
unsigned long SampleRate;
unsigned long BytesPerSec;
unsigned short BlockAlign;
unsigned short BitsPerSample;
char DataSig[4]; // 'data'
unsigned long DataSize;
} sWaveHeader;
The .wav file opening
#define KICK "D:/muzic/kick.wav"
#define SNARE "D:/muzic/snare.wav"
fkick = fopen(KICK, "rb")
fsnare = fopen(SNARE, "rb")
Here I make a function that does the common work for snarebuf* and **kickbuf
int read_wav_to_WaveHeader (sWaveHeader* , FILE* , IDirectSoundBuffer* ); // The declaring
But I wil not write this function, just show the way it works with kickbuf, for instance.
fseek(fkick, 0, SEEK_SET); // Zero the position in file
fread(&kickHdr, 1, sizeof(sWaveHeader), fkick); // reading the sWaveHeader structure from file
Here goes a checking for fitting if sWaveHeader structure:
if(memcmp(pwvHdr.RiffSig, "RIFF", 4) ||
memcmp(pwvHdr.WaveSig, "WAVE", 4) ||
memcmp(pwvHdr.FormatSig, "fmt ", 4) ||
memcmp(pwvHdr.DataSig, "data", 4))
return 1;
Declaring the format and descriptor for a buffer and filling them:
DSBUFFERDESC bufDesc;
WAVEFORMATEX wvFormat;
ZeroMemory(&wvFormat, sizeof(WAVEFORMATEX));
wvFormat.wFormatTag = WAVE_FORMAT_PCM;
wvFormat.nChannels = kickHdr.Channels;
wvFormat.nSamplesPerSec = kickHdr.SampleRate;
wvFormat.wBitsPerSample = kickHdr.BitsPerSample;
wvFormat.nBlockAlign = wvFormat.wBitsPerSample / 8 * wvFormat.nChannels;
ZeroMemory(&bufDesc, sizeof(DSBUFFERDESC));
bufDesc.dwSize = sizeof(DSBUFFERDESC);
bufDesc.dwFlags = DSBCAPS_CTRLVOLUME |
DSBCAPS_CTRLPAN |
DSBCAPS_CTRLFREQUENCY;
bufDesc.dwBufferBytes = kickHdr.DataSize;
bufDesc.lpwfxFormat = &wvFormat;
Well, the creating of a buffer:
device->CreateSoundBuffer(&bufDesc, &kickbuf, NULL); // Any mistakes by this point?
Now locking the buffer and loading some data to it.
This data starts after sizeof(sWaveHeader) bytes in a WAVE file, am I wrong?
LPVOID Ptr1; // pointer on a pointer on a First block of data
LPVOID Ptr2; // pointer on a pointer on a Second block of data
DWORD Size1, Size2; // their sizes
Now calling the Lock() method:
kickbuf->Lock((DWORD)LockPos, (DWORD)Size,
&Ptr1, &Size1,
&Ptr2, &Size2, 0);
Loading data (is it ok?):
fseek(fkick, sizeof(sWaveHeader), SEEK_SET);
fread(Ptr1, 1, Size1, fkick);
if(Ptr2 != NULL)
fread(Ptr2, 1, Size2, fkick);
Unlocking the buffer:
kickbuf->Unlock(Ptr1, Size1, Ptr2, Size2);
Setting the volume:
kickbuf->SetVolume(-2500);
Then I make a wile(1) looping:
1. ask for a key pressing
2. if it is pressed:
kickbuf->SetCurrentPosition(0)
kickbuf->Play(0,0,0);
But there's no sound playing, please say, what is not proper in my code or maybe in the whole concept. Thank you.
When you initialize the WAVEFORMATEX, your are forgetting to set the nAvgBytesPerSec member. Add this line after the initialization of wvFormat.nBlockAlign:
wvFormat.nAvgBytesPerSec = wvFormat.nSamplesPerSec * wvFormat.nBlockAlign;
Also, I suspect this could be a problem:
kickbuf->SetVolume(-2500);
I suspect that will just attenuate your sample to absolute silence. Try taking that call out so that it plays at full volume.
But more likely, none of you sample code above shows validation of the return values from any of the DirectSound APIs, nor any of the file I/O values. Have you validated the HRESULTs returned by all the DSound APIs are returning S_OK? Have you tried printing or using OutputDebugString to print the values you computed for the members of WAVEFORMATEX?
Have you debugging the fread calls to validate that you are getting valid data into your buffers?
Hope this helps.