I am using FFmpeg library to decode and (potentially) modify some audio.
I managed to use the following functions to iterate through all frames of the audio file:
avformat_open_input // Obtains formatContext
avformat_find_stream_info
av_find_best_stream // The argument AVMEDIA_TYPE_AUDIO is fed in to find the audio stream
avcodec_open2 // Obtains codecContext
av_init_packet
// The following is used to loop through the frames
av_read_frame
avcodec_decode_audio4
In the end, I have these three values available on each iteration
int dataSize; // return value of avcodec_decode_audio4
AVFrame* frame;
AVCodecContext* codecContext; // Codec context of the best stream
I supposed that a loop like this can be used to iterate over all samples:
for (int i = 0; i < frame->nb_samples; ++i)
{
// Bytes/Sample is known to be 4
// Extracts audio from Channel 1. There are in total 2 channels.
int* sample = (int*)frame->data[0] + dataSize * i;
// Now *sample is accessible
}
However, when I plotted the data using gnuplot, I did not get a waveform as expected, and some of the values reached the the limit of 32 bits integers: (The audio stream is not silent in the first few seconds)
I suppose that some form of quantisation is going on to prevent the data from being interpreted mathematically. What should I do to de-quantise this?
for (int i = 0; i < frame->nb_samples; ++i)
{
// Bytes/Sample is known to be 4
// Extracts audio from Channel 1. There are in total 2 channels.
int* sample = (int*)frame->data[0] + dataSize * i;
// Now *sample is accessible
}
Well... No. So, first of all, we'll need to know the data type. Check frame->format. It's an enum AVSampleFormat, most likely flt, fltp, s16 or s16p.
So, how do you interpret frame->data[] given the format? Well, first, is it planar or not? If it's planar, it means each channel is in frame->data[n], where n is the channel number. frame->channels is the number of channels. If it's not planar, it means all data is interleaved (per sample) in frame->data[0].
Second, what is the storage type? If it's s16/s16p, it's int16_t *. If it's flt/fltp, it's float *. So the correct interpretation for fltp would be:
for (int c = 0; c < frame->channels; c++) {
float *samples = frame->data[c];
for (int i = 0; i < frame->nb_samples; i++) {
float sample = samples[i];
// now this sample is accessible, it's in the range [-1.0, 1.0]
}
}
Whereas for s16, it would be:
int16_t *samples = frame->data[0];
for (int c = 0; c < frame->channels; c++) {
for (int i = 0; i < frame->nb_samples; i++) {
int sample = samples[i * frame->channels + c];
// now this sample is accessible, it's in the range [-32768,32767]
}
}
Related
I am implementing an audio channel mixer and using Viktor T. Toth's algorithm. Trying to mix two audio channel streams.
In the code, quantization_ is the byte representation of the bit depth of a channel. My mix function, takes a pointer to destination and source uint8_t buffers, mixes two channels and writes into the destination buffer. Because I am taking data in a uint8_t buffer, doing that addition, division, and multiplication operations to get the actual 8, 16 or 24-bit samples and convert them again to 8-bit.
Generally, it gives the expected output sample values. However, some samples turn out to have near 0 value as they are not supposed to be when I look the output in Audacity. In the screenshot, bottom 2 signals are two mono channels and the top one is the mixed channel. It can be seen that there are some very low values, especially in the middle.
Below, is my mix function;
void audio_mixer::mix(uint8_t* dest, const uint8_t* source)
{
uint64_t mixed_sample = 0;
uint64_t dest_sample = 0;
uint64_t source_sample = 0;
uint64_t factor = 0;
for (int i = 0; i < channel_size_; ++i)
{
dest_sample = 0;
source_sample = 0;
factor = 1;
for (int j = 0; j < quantization_; ++j)
{
dest_sample += factor * static_cast<uint64_t>(*dest++);
source_sample += factor * static_cast<uint64_t>(*source++);
factor = factor * 256;
}
mixed_sample = (dest_sample + source_sample) - (dest_sample * source_sample / factor);
dest -= quantization_;
for (int k = 0; k < quantization_; ++k)
{
*dest++ = static_cast<uint8_t>(mixed_sample % 256);
mixed_sample = mixed_sample / 256;
}
}
}
It seems like you aren't treating the signed audio samples correctly. The horizontal line should be zero voltage from your audio signal.
If you look at the positive voltage audio samples they obey your equation correctly (except for the peak values in the center). The negative values are being compressed which makes me feel like they are being treated as small positive voltages instead of negative voltages.
In other words, maybe those unsigned ints should be signed ints so the top bit indicates the voltage polarity and you can have audio samples in the range +127 to -128.
Those peak values in the center seem like they are wrapping around modulo 255 which would be the peak value for an unsigned byte representation of your audio. I'm not sure how this would happen but it seems related to the unsigned vs signed signals.
Maybe you should try the other formula Viktor provided in his document:
Z = 2(A+B) - (AB/128) - 256
During the encoding process with x265 encoder (https://x265.readthedocs.org/en/default/api.html) I want to write image pixel values (specifically values of Y channel) into .txt file after a new image is encoded (not important why). For that, I'm using 'planes' variable of class x265_picture:
x265_picture* pic_out; # variable where encoded image is to be stored
... # encoding process
uint8_t *plane = (uint8_t*)pic_out->planes[0];
uint32_t pixelCount = x265_picturePlaneSize(pic_out->colorSpace, m_param->sourceWidth, m_param->sourceHeight, 0);
ofstream out_file("out_file.txt");
for (uint32_t j = 0; j < pixelCount; j++) # loop for all pixels
{
int pix_val = plane[j];
out << pix_val;
}
ofstream.close()
But when I reconstruct the output data into image, I get
instead of
or another example:
instead of
(color is not important, the "stripes" are the concern)
In the output file there seem to be intervals of data in (apparently) correct order (let's say 89,90,102,98,...) followed always by long sequence of equal numbers (eg. 235,235,235,235... or 65,65,65,65...), that "create" the stripes. Could someone please tell me what I'm missing?
thanks guys, just solved this...the key is using 'src += srcStride':
ofstream out_file("out_file.txt");
int srcStride = pic_out->stride[0] / sizeof(pixel);
uint8_t* src = (uint8_t*) pic_out->planes[0];
for (int y = 0; y < m_param->sourceHeight; y++, src += srcStride)
{
for (int x = 0; x < m_param->sourceWidth; x++)
out_file << (int)(src[x]) << ",";
}
out_file.close();
I'm trying to re-sample captured 2channel/48khz/32bit audio to 1channel/8khz/32bit using libsamplerate in a windows phone project using WASAPI.
I need to get 160 frames from 960 original frames by re-sampling.After capturing audio using GetBuffer method I send the captured BYTE array of 7680 byte to the below method:
void BackEndAudio::ChangeSampleRate(BYTE* buf)
{
int er2;
st=src_new(2,1,&er2);
//SRC_DATA sd defined before
sd=new SRC_DATA;
BYTE *onechbuf = new BYTE[3840];
int outputIndex = 0;
//convert Stereo to Mono
for (int n = 0; n < 7680; n+=8)
{
onechbuf[outputIndex++] = buf[n];
onechbuf[outputIndex++] = buf[n+1];
onechbuf[outputIndex++] = buf[n+2];
onechbuf[outputIndex++] = buf[n+3];
}
float *res1=new float[960];
res1=(float *)onechbuf;
float *res2=new float[160];
//change samplerate
sd->data_in=res1;
sd->data_out=res2;
sd->input_frames=960;
sd->output_frames=160;
sd->src_ratio=(double)1/6;
sd->end_of_input=1;
int er=src_process(st,sd);
transportController->WriteAudio((BYTE *)res2,640);
delete[] onechbuf;
src_delete(st);
delete sd;
}
src_process method returns no error and sd->input_frames_used set to 960 and sd->output_frames_gen set to 159 but the rendering output is only noise.
I use the code in a real-time VoIP app.
What could be the source of problem ?
I found the problem.I shouldn't make a new SRC_STATE object and delete it in each call of my function by calling st=src_new(2,1,&er2); and src_delete(st);but call them once is enough for the whole audio re-sampling.Also there is no need to using pointer for the SRC_DATA . I modified my code as below and it works fine now.
void BackEndAudio::ChangeSampleRate(BYTE* buf)
{
BYTE *onechbuf = new BYTE[3840];
int outputIndex = 0;
//convert Stereo to Mono
for (int n = 0; n < 7680; n+=8)
{
onechbuf[outputIndex++] = buf[n];
onechbuf[outputIndex++] = buf[n+1];
onechbuf[outputIndex++] = buf[n+2];
onechbuf[outputIndex++] = buf[n+3];
}
float *out=new float[160];
//change samplerate
sd.data_in=(float *)onechbuf;
sd.data_out=out;
sd.input_frames=960;
sd.output_frames=160;
sd.src_ratio=(double)1/6;
sd.end_of_input=0;
int er=src_process(st,&sd);
}
I'm currently trying to take in sound and feed it back to the speakers. I'm using the openframeworks library that makes this fairly simple.
I'm using this class
http://www.openframeworks.cc/documentation?detail=ofSoundStream
The setup function is
ofSoundStreamSetup(int nOutputs, int nInputs, ofSimpleApp * OFSA, int sampleRate, int bufferSize, int nBuffers)
and I am using
ofSoundStreamSetup(1, 1, this, 44100, 512, 4)
My header info is
float buffer1[1000000];
float buffer2[1000000];
float* readPointer;
float* writePointer;
int readp;
int writep;
I've got two functions
audioReceived (float * input, int bufferSize, int nChannels)
if (writep < 10)
{
for (int i = 0;i < bufferSize; i++)
{
writePointer[writep*i] = input[i];
}
writep++;
if (writep >= 10)
{
writep = 0;
}
}
audioRequested(float * output, int buffersize, int numChannels)
{
if (writep > 0)
{
for (int i = 0; i < bufferSize; i++)
{
output[i] = readPointer[readp * i];
}
readp++;
if (readp >=10)
{
readp = 0;
}
}
}
This is working but the quality seems poppy and crackly. I think I may have to implement a proper circle buffer, or double buffering, but I'm not sure.
Can anyone point me in the correct direction for how I can get the audio to sound good, using as simple a method as possible?
I would definitely suggest using double buffering. Otherwise a buffer becomes available at the same time you want a buffer. This potentially results in a case of you editing a buffer that is currently in use.
In general when audio is received you add it to buffer 1. When audio is requested you give it buffer 2. Now when audio is received put it in buffer 2 and when the request arrives give it buffer 1. And so on.
I am trying to take the imageData of image in this where w= width of image and h = height of image
for (int i = x; i < x+h; i++) //height of frame pixels
{
for (int j = y; j < y+w; j++)//width of frame pixels
{
int pos = i * w * Channels + j; //channels is 3 as rgb
// if any data exists
if (data->imageData[pos]>0) //Taking data (here is the problem how to take)
{
xPos += j;
yPos += i;
nPix++;
}
}
}
jeff7 gives you a link to a very old version of OpenCV. OpenCV 2.0 has a new C++ wrapper that is much better than the C++ wrapper mentioned in the link. I recommend that you read the C++ reference of OpenCV for information on how to access individual pixels.
Another thing to note is: you should have the outer loop being the loop in y-direction (vertical) and the inner loop be the loop in x-direction. OpenCV is in C/C++ and it stores the values in row major.
See good explanation here on multiple methods for accessing pixels in an IplImage in OpenCV.
From the code you've posted your problem lies in your position variable, you'd want something like int pos = i*w*Channels + j*Channels, then you can access the RGB pixels at
unsigned char r = data->imageData[pos];
unsigned char g = data->imageData[pos+1];
unsigned char b = data->imageData[pos+2];
(assuming RGB, but on some platforms I think it can be stored BGR).
uchar* colorImgPtr;
for(int i=0; i<colorImg->width; i++){
for(int j=0; j<colorImg->height; j++){
colorImgPtr = (uchar *)(colorImg->imageData) + (j*colorImg->widthStep + i-colorImg->nChannels)
for(int channel = 0; channel < colorImg->nChannels; channel++){
//colorImgPtr[channel] here you have each value for each pixel for each channel
}
}
}
There are quite a few methods to do this (the link provided by jeff7 is very useful).
My preferred method to access image data is the cvPtr2D method. You'll want something like:
for(int x = 0; x < width; ++x)
{
for(int y = 0; y < height; ++y)
{
uchar* ptr = cvPtr2D(img, y, x, NULL);
// blue channel can now be accessed with ptr[0]
// green channel can now be accessed with ptr[1]
// red channel can now be accessed with ptr[2]
}
}
(img is an IplImage* in the above code)
Not sure if this is the most efficient way of doing this etc. but I find it the easiest and simplest way of doing it.
You can find documentation for this method here.