Resample audio using libsamplerate in windows phone - c++

I'm trying to re-sample captured 2channel/48khz/32bit audio to 1channel/8khz/32bit using libsamplerate in a windows phone project using WASAPI.
I need to get 160 frames from 960 original frames by re-sampling.After capturing audio using GetBuffer method I send the captured BYTE array of 7680 byte to the below method:
void BackEndAudio::ChangeSampleRate(BYTE* buf)
{
int er2;
st=src_new(2,1,&er2);
//SRC_DATA sd defined before
sd=new SRC_DATA;
BYTE *onechbuf = new BYTE[3840];
int outputIndex = 0;
//convert Stereo to Mono
for (int n = 0; n < 7680; n+=8)
{
onechbuf[outputIndex++] = buf[n];
onechbuf[outputIndex++] = buf[n+1];
onechbuf[outputIndex++] = buf[n+2];
onechbuf[outputIndex++] = buf[n+3];
}
float *res1=new float[960];
res1=(float *)onechbuf;
float *res2=new float[160];
//change samplerate
sd->data_in=res1;
sd->data_out=res2;
sd->input_frames=960;
sd->output_frames=160;
sd->src_ratio=(double)1/6;
sd->end_of_input=1;
int er=src_process(st,sd);
transportController->WriteAudio((BYTE *)res2,640);
delete[] onechbuf;
src_delete(st);
delete sd;
}
src_process method returns no error and sd->input_frames_used set to 960 and sd->output_frames_gen set to 159 but the rendering output is only noise.
I use the code in a real-time VoIP app.
What could be the source of problem ?

I found the problem.I shouldn't make a new SRC_STATE object and delete it in each call of my function by calling st=src_new(2,1,&er2); and src_delete(st);but call them once is enough for the whole audio re-sampling.Also there is no need to using pointer for the SRC_DATA . I modified my code as below and it works fine now.
void BackEndAudio::ChangeSampleRate(BYTE* buf)
{
BYTE *onechbuf = new BYTE[3840];
int outputIndex = 0;
//convert Stereo to Mono
for (int n = 0; n < 7680; n+=8)
{
onechbuf[outputIndex++] = buf[n];
onechbuf[outputIndex++] = buf[n+1];
onechbuf[outputIndex++] = buf[n+2];
onechbuf[outputIndex++] = buf[n+3];
}
float *out=new float[160];
//change samplerate
sd.data_in=(float *)onechbuf;
sd.data_out=out;
sd.input_frames=960;
sd.output_frames=160;
sd.src_ratio=(double)1/6;
sd.end_of_input=0;
int er=src_process(st,&sd);
}

Related

VP8 C/C++ source, how to encode frames in ARGB format to frame instead of from file

I'm trying to get started with the VP8 library, I'm not building it in the standard way they tell you to, I just loaded all of the main files and the "encoder" folder into a new Visual Studio C++ DLL project, and just included the C files in an extern "C" dll export function, which so far builds fine etc., I just have no idea where to start with the C++ API to encode, say, 3 frames of ARGB data into a very basic video, just to get started
The only example I could find is in the examples folder called simple_encoder.c, although their premise is that they are loading in another file already and parsing its frames then converting it, so it seems a bit complicated, I just want to be able to pass in a byte array of a few ARGB frames and have it output a very simple VP8 video
I've seen How to encode series of images into VP8 using WebM VP8 Encoder API? (C/C++) but the accepted answer just links to the build instructions and references the general specification of the vp8 format, the closest I could find there is the example encoding parameters but I just want to do everything from C++ and I can't seem to find any other examples, besides for the default one simple_encoder.c?
Just to cite some of the relevant parts I think I understand, but still need more help on
//in int main...
...
vpx_image_t raw;
if (!vpx_img_alloc(&raw, VPX_IMG_FMT_I420, info.frame_width,
info.frame_height, 1)) {
//"Failed to allocate image." error
}
So that part I think I understand for the most part, VPX_IMG_FMT_I420 is the only part that's not made in this file itself, but its in vpx_image.h, first as
#define VPX_IMG_FMT_PLANAR
//then after...
typedef enum vpx_img_fmt {
VPX_IMG_FMT_NONE,
VPX_IMG_FMT_RGB24, /**< 24 bit per pixel packed RGB */
///some other formats....
VPX_IMG_FMT_ARGB, /**< 32 bit packed ARGB, alpha=255 */
VPX_IMG_FMT_YV12 = VPX_IMG_FMT_PLANAR | VPX_IMG_FMT_UV_FLIP | 1, /**< planar YVU */
VPX_IMG_FMT_I420 = VPX_IMG_FMT_PLANAR | 2,
} vpx_img_fmt_t; /**< alias for enum vpx_img_fmt */
So I guess part of my question is answered already just from writing this, that one of the formats is VPX_IMG_FMT_ARGB, although I don't where where it's defined, but I'm guessing in the above code I would replace it with
const VpxInterface *encoder = get_vpx_encoder_by_name("v8");
vpx_image_t raw;
VpxVideoInfo info = { 0, 0, 0, { 0, 0 } };
info.frame_width = 1920;
info.frame_height = 1080;
info.codec_fourcc = encoder->fourcc;
info.time_base.numerator = 1;
info.time_base.denominator = 24;
bool didIt = vpx_img_alloc(&raw, VPX_IMG_FMT_ARGB,
info.frame_width, info.frame_height/*example width and height*/, 1)
//check didIt..
vpx_codec_enc_cfg_t cfg;
vpx_codec_ctx_t codec;
vpx_codec_err_t res;
res = vpx_codec_enc_config_default(encoder->codec_interface(), &cfg, 0);
//check if !res for error
cfg.g_w = info.frame_width;
cfg.g_h = info.frame_height;
cfg.g_timebase.num = info.time_base.numerator;
cfg.g_timebase.den = info.time_base.denominator;
cfg.rc_target_bitrate = 200;
VpxVideoWriter *writer = NULL;
writer = vpx_video_writer_open(outfile_arg, kContainerIVF, &info);
//check if !writer for error
bool startIt = vpx_codec_enc_init(&codec, encoder->codec_interface(), &cfg, 0);
//not even sure where codec was set actually..
//check !startIt for error starting
//now the next part in the original is where it reads from the input file, but instead
//I need to pass in an array of some ARGB byte arrays..
//thing is, in the next step they use a while loop for
//vpx_img_read(&raw, fopen("path/to/YV12formatVideo", "rb"))
//to set the contents of the raw vpx image allocated earlier, then
//they call another program that writes it to the writer object,
//but I don't know how to read the actual ARGB data directly into the raw image
//without using fopen, so that's one question (review at end)
//so I'll just put a placeholder here for the **question**
//assuming I have an array of byte arrays stored individually
//for simplicity sake
int size = 1920 * 1080 * 4;
uint8_t imgOne[size] = {/*some big byte array*/};
uint8_t imgTwo[size] = {/*some big byte array*/};
uint8_t imgThree[size] = {/*some big byte array*/};
uint8_t *images[] = {imgOne, imgTwo, imgThree};
int framesDone = 0;
int maxFrames = 3;
//so now I can replace the while loop with a filler function
//until I find out how to set the raw image with ARGB data
while(framesDone < maxFrames) {
magicalFunctionToSetARGBOfRawImage(&raw, images[framesDone]);
encode_frame(&codec, &raw, framesDone, 0, writer);
framesDone++;
}
//now apparently it needs to be flushed after
while(encode_frame(&codec, 0, -1, 0, writer)){}
vpx_img_free(&raw);
bool isDestroyed = vpx_codec_destroy(&codec);
//check if !isDestroyed for error
//now we gotta define the encode_Frames function, but simpler
//(and make it above other function for reference purposes
//or in header
static int encode_frame(
vpx_codex_ctx_t *coydek,
vpx_image_t pic,
int currentFrame,
int flags,
VpxVideoWriter *koysayv/*writer*/
) {
//now to substitute their encodeFrame function for
//the actual raw calls to simplify things
const DidIt = vpx_codec_encode(
coydek,
pic,
currentFrame,
1,//duration I think
flags,//whatever that is
VPX_DL_REALTIME//different than simlpe_encoder
);
if(!DidIt) return;//error here
vpx_codec_iter_t iter = 0;
const vpx_codec_cx_pkt_t *pkt = 0;
int gotThings = 0;
while(
(pkt = vpx_codec_get_cx_data(
coydek,
&iter
)) != 0
) {
gotThings = 1;
if(
pkt->kind
== VPX_CODEC_CX_FRAME_PKT //don't exactly
//understand this part
) {
const
int
keyframe = (
pkt
->
data
.frame
.flags
&
VPX_FRAME_IS_KEY
) != 0; //don'texactly understand the
//& operator here or how it gets the keyframe
bool wroteFrame = vpx_video_writer_write_frame(
koysayv,
pkt->data.frame.buf
//I'm guessing this is the encoded
//frame data
,
pkt->data.frame.sz,
pkt->data.frame.pts
);
if(!wroteFrame) return; //error
}
}
return gotThings;
}
Thing is though, I don't know how to actually read the
ARGB data into the RAW image buffer itself, as mentioned
above, in the original example, they use
vpx_img_read(&raw, fopen("path/to/file", "rb"))
but if I'm starting off with the byte arrays themselves
then what function do I use for that instead of the file?
I have a feeling it can be solved by the source code for the vpx_img_read found in tools_common.c function:
int vpx_img_read(vpx_image_t *img, FILE *file) {
int plane;
for (plane = 0; plane < 3; ++plane) {
unsigned char *buf = img->planes[plane];
const int stride = img->stride[plane];
const int w = vpx_img_plane_width(img, plane) *
((img->fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? 2 : 1);
const int h = vpx_img_plane_height(img, plane);
int y;
for (y = 0; y < h; ++y) {
if (fread(buf, 1, w, file) != (size_t)w) return 0;
buf += stride;
}
}
return 1;
}
although I personally am not experienced enough to necessarily know how to get a single frames ARGB data in, I think the key part is fread(buf, 1, w, file) which seems to read parts of file into buf which represents img->planes[plane];, which I think then by reading into buf that automatically reads into img->planes[plane];, but I'm not sure if that is the case, and also not sure how to replace the fread from file to just take in a bye array that is alreasy loaded into memory...
VPX_IMG_FMT_ARGB is not defined because not supported by libvpx (as far as I have seen). To compress an image using this library, you must first convert it to one of the supported format, like I420 (VPX_IMG_FMT_I420). The code here (not mine) : https://gist.github.com/racerxdl/8164330 do it well for the RGB format. If you don't want to use libswscale to make the conversion from RGB to I420, you can do things like this (this code convert a RGBA array of bytes to a I420 vpx_image that can be use by libvpx):
unsigned int tx = <width of your image>
unsigned int ty = <height of your image>
unsigned char *image = <array of bytes : RGBARGBA... of size ty*tx*4>
vpx_image_t *imageVpx = <result that must have been properly initialized by libvpx>
imageVpx->stride[VPX_PLANE_U ] = tx/2;
imageVpx->stride[VPX_PLANE_V ] = tx/2;
imageVpx->stride[VPX_PLANE_Y ] = tx;
imageVpx->stride[VPX_PLANE_ALPHA] = tx;
imageVpx->planes[VPX_PLANE_U ] = new unsigned char[ty*tx/4];
imageVpx->planes[VPX_PLANE_V ] = new unsigned char[ty*tx/4];
imageVpx->planes[VPX_PLANE_Y ] = new unsigned char[ty*tx ];
imageVpx->planes[VPX_PLANE_ALPHA] = new unsigned char[ty*tx ];
unsigned char *planeY = imageVpx->planes[VPX_PLANE_Y ];
unsigned char *planeU = imageVpx->planes[VPX_PLANE_U ];
unsigned char *planeV = imageVpx->planes[VPX_PLANE_V ];
unsigned char *planeA = imageVpx->planes[VPX_PLANE_ALPHA];
for (unsigned int y=0; y<ty; y++)
{
if (!(y % 2))
{
for (unsigned int x=0; x<tx; x+=2)
{
int r = *image++;
int g = *image++;
int b = *image++;
int a = *image++;
*planeY++ = max(0, min(255, (( 66*r + 129*g + 25*b) >> 8) + 16));
*planeU++ = max(0, min(255, ((-38*r + -74*g + 112*b) >> 8) + 128));
*planeV++ = max(0, min(255, ((112*r + -94*g + -18*b) >> 8) + 128));
*planeA++ = a;
r = *image++;
g = *image++;
b = *image++;
a = *image++;
*planeA++ = a;
*planeY++ = max(0, min(255, ((66*r + 129*g + 25*b) >> 8) + 16));
}
}
else
{
for (unsigned int x=0; x<tx; x++)
{
int const r = *image++;
int const g = *image++;
int const b = *image++;
int const a = *image++;
*planeA++ = a;
*planeY++ = max(0, min(255, ((66*r + 129*g + 25*b) >> 8) + 16));
}
}
}

De-quantising audio with ffmpeg

I am using FFmpeg library to decode and (potentially) modify some audio.
I managed to use the following functions to iterate through all frames of the audio file:
avformat_open_input // Obtains formatContext
avformat_find_stream_info
av_find_best_stream // The argument AVMEDIA_TYPE_AUDIO is fed in to find the audio stream
avcodec_open2 // Obtains codecContext
av_init_packet
// The following is used to loop through the frames
av_read_frame
avcodec_decode_audio4
In the end, I have these three values available on each iteration
int dataSize; // return value of avcodec_decode_audio4
AVFrame* frame;
AVCodecContext* codecContext; // Codec context of the best stream
I supposed that a loop like this can be used to iterate over all samples:
for (int i = 0; i < frame->nb_samples; ++i)
{
// Bytes/Sample is known to be 4
// Extracts audio from Channel 1. There are in total 2 channels.
int* sample = (int*)frame->data[0] + dataSize * i;
// Now *sample is accessible
}
However, when I plotted the data using gnuplot, I did not get a waveform as expected, and some of the values reached the the limit of 32 bits integers: (The audio stream is not silent in the first few seconds)
I suppose that some form of quantisation is going on to prevent the data from being interpreted mathematically. What should I do to de-quantise this?
for (int i = 0; i < frame->nb_samples; ++i)
{
// Bytes/Sample is known to be 4
// Extracts audio from Channel 1. There are in total 2 channels.
int* sample = (int*)frame->data[0] + dataSize * i;
// Now *sample is accessible
}
Well... No. So, first of all, we'll need to know the data type. Check frame->format. It's an enum AVSampleFormat, most likely flt, fltp, s16 or s16p.
So, how do you interpret frame->data[] given the format? Well, first, is it planar or not? If it's planar, it means each channel is in frame->data[n], where n is the channel number. frame->channels is the number of channels. If it's not planar, it means all data is interleaved (per sample) in frame->data[0].
Second, what is the storage type? If it's s16/s16p, it's int16_t *. If it's flt/fltp, it's float *. So the correct interpretation for fltp would be:
for (int c = 0; c < frame->channels; c++) {
float *samples = frame->data[c];
for (int i = 0; i < frame->nb_samples; i++) {
float sample = samples[i];
// now this sample is accessible, it's in the range [-1.0, 1.0]
}
}
Whereas for s16, it would be:
int16_t *samples = frame->data[0];
for (int c = 0; c < frame->channels; c++) {
for (int i = 0; i < frame->nb_samples; i++) {
int sample = samples[i * frame->channels + c];
// now this sample is accessible, it's in the range [-32768,32767]
}
}

Convert cv::Mat to openni::VideoFrameRef

I have a kinect streaming data into a cv::Mat. I am trying to get some example code running that uses OpenNI.
Can I convert my Mat into an OpenNI format image somehow?
I just need the depth image, and after fighting with OpenNI for a long time, have given up on installing it.
I am using OpenCV 3, Visual Studio 2013, Kinect v2 for Windows.
The relevant code is:
void CDifodoCamera::loadFrame()
{
//Read the newest frame
openni::VideoFrameRef framed; //I assume I need to replace this with my Mat...
depth_ch.readFrame(&framed);
const int height = framed.getHeight();
const int width = framed.getWidth();
//Store the depth values
const openni::DepthPixel* pDepthRow = (const openni::DepthPixel*)framed.getData();
int rowSize = framed.getStrideInBytes() / sizeof(openni::DepthPixel);
for (int yc = height-1; yc >= 0; --yc)
{
const openni::DepthPixel* pDepth = pDepthRow;
for (int xc = width-1; xc >= 0; --xc, ++pDepth)
{
if (*pDepth < 4500.f)
depth_wf(yc,xc) = 0.001f*(*pDepth);
else
depth_wf(yc,xc) = 0.f;
}
pDepthRow += rowSize;
}
}
First you need to understand how your data is coming... If it is already in cv::Mat you should be receiving two images, one for the RGB information that usually is a 3 channel uchar cv::Mat and another image for the depth information that usually it is saved in a 16 bit representation in milimeters (you can not save float mat as images, but you can as yml/xml files using opencv).
Assuming you want to read and process the image that contains the depth information, you can change your code to:
void CDifodoCamera::loadFrame()
{
//Read the newest frame
//the depth image should be png since it is the one which supports 16 bits and it must have the ANYDEPTH flag
cv::Mat depth_im = cv::imread("img_name.png",CV_LOAD_IMAGE_ANYDEPTH);
const int height = depth_im.rows;
const int width = depth_im.cols;
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
if (depth_im<unsigned short>(y,x) < 4500)
depth_wf(y,x) = 0.001f * (float)depth_im<unsigned short>(y,x);
else
depth_wf(y,x) = 0.f;
}
}
}
I hope this helps you. If you have any question just ask :)

DirectShow ISampleGrabber: samples are upside-down and color channels reverse

I have to use MS DirectShow to capture video frames from a camera (I just want the raw pixel data).
I was able to build the Graph/Filter network (capture device filter and ISampleGrabber) and implement the callback (ISampleGrabberCB). I receive samples of appropriate size.
However, they are always upside down (flipped vertically that is, not rotated) and the color channels are BGR order (not RGB).
I tried setting the biHeight field in the BITMAPINFOHEADER to both positive and negative values, but it doesn't have any effect. According to MSDN documentation, ISampleGrapper::SetMediaType() ignores the format block for video data anyways.
Here is what I see (recorded with a different camera, not DS), and what DirectShow ISampleGrabber gives me: The "RGB" is actually in red, green and blue respectively:
Sample of the code I'm using, slightly simplified:
// Setting the media type...
AM_MEDIA_TYPE* media_type = 0 ;
this->ds.device_streamconfig->GetFormat(&media_type); // The IAMStreamConfig of the capture device
// Find the BMI header in the media type struct
BITMAPINFOHEADER* bmi_header;
if (media_type->formattype != FORMAT_VideoInfo) {
bmi_header = &((VIDEOINFOHEADER*)media_type->pbFormat)->bmiHeader;
} else if (media_type->formattype != FORMAT_VideoInfo2) {
bmi_header = &((VIDEOINFOHEADER2*)media_type->pbFormat)->bmiHeader;
} else {
return false;
}
// Apply changes
media_type->subtype = MEDIASUBTYPE_RGB24;
bmi_header->biWidth = width;
bmi_header->biHeight = height;
// Set format to video device
this->ds.device_streamconfig->SetFormat(media_type);
// Set format for sample grabber
// bmi_header->biHeight = -(height); // tried this for either and both interfaces, no effect
this->ds.sample_grabber->SetMediaType(media_type);
// Connect filter pins
IPin* out_pin= getFilterPin(this->ds.device_filter, OUT, 0); // IBaseFilter interface for the capture device
IPin* in_pin = getFilterPin(this->ds.sample_grabber_filter, IN, 0); // IBaseFilter interface for the sample grabber filter
out_pin->Connect(in_pin, media_type);
// Start capturing by callback
this->ds.sample_grabber->SetBufferSamples(false);
this->ds.sample_grabber->SetOneShot(false);
this->ds.sample_grabber->SetCallback(this, 1);
// start recording
this->ds.media_control->Run(); // IMediaControl interface
I'm checking return types for every function and don't get any errors.
I'm thankful for any hint or idea.
Things I already tried:
Setting the biHeight field to a negative value for either the capture device filter or the sample grabber or for both or for neither - doesn't have any effect.
Using IGraphBuilder to connect the pins - same problem.
Connecting the pins before changing the media type - same problem.
Checking if the media type was actually applied by the filter by querying it again - but it apparently is applied or at least stored.
Interpreting the image as total byte reversed (last byte first, first byte last) - then it would be flipped horizontally.
Checking if it's a problem with the video camera - when I test it with VLC (DirectShow capture) it looks normal.
My quick hack for this:
void Camera::OutputCallback(unsigned char* data, int len, void *instance_)
{
Camera *instance = reinterpret_cast<Camera*>(instance_);
int j = 0;
for (int i = len-4; i > 0; i-=4)
{
instance->buffer[j] = data[i];
instance->buffer[j + 1] = data[i + 1];
instance->buffer[j + 2] = data[i + 2];
instance->buffer[j + 3] = data[i + 3];
j += 4;
}
Transport::RTPPacket packet;
packet.payload = instance->buffer;
packet.payloadSize = len;
instance->receiver->Send(packet);
}
It's correct on RGB32 color space, for other color spaces this code need to be corrected
I noticed that when using the I420 color space turning disappears.
In addition, most current codecs (VP8) is used as a format raw I/O I420 color space.
I wrote a simple mirroring frame function in color space I420.
void Camera::OutputCallback(unsigned char* data, int len, uint32_t timestamp, void *instance_)
{
Camera *instance = reinterpret_cast<Camera*>(instance_);
Transport::RTPPacket packet;
packet.rtpHeader.ts = timestamp;
packet.payload = data;
packet.payloadSize = len;
if (instance->mirror)
{
Video::ResolutionValues rv = Video::GetValues(instance->resolution);
int k = 0;
// Chroma values
for (int i = 0; i != rv.height; ++i)
{
for (int j = rv.width; j != 0; --j)
{
int l = ((rv.width * i) + j);
instance->buffer[k++] = data[l];
}
}
// U values
for (int i = 0; i != rv.height/2; ++i)
{
for (int j = (rv.width/2); j != 0; --j)
{
int l = (((rv.width / 2) * i) + j) + rv.height*rv.width;
instance->buffer[k++] = data[l];
}
}
// V values
for (int i = 0; i != rv.height / 2; ++i)
{
for (int j = (rv.width / 2); j != 0; --j)
{
int l = (((rv.width / 2) * i) + j) + rv.height*rv.width + (rv.width/2)*(rv.height/2);
if (l == len)
{
instance->buffer[k++] = 0;
}
else
{
instance->buffer[k++] = data[l];
}
}
}
packet.payload = instance->buffer;
}
instance->receiver->Send(packet);
}

Feeding audio from input directly to output, sounding clean c++

I'm currently trying to take in sound and feed it back to the speakers. I'm using the openframeworks library that makes this fairly simple.
I'm using this class
http://www.openframeworks.cc/documentation?detail=ofSoundStream
The setup function is
ofSoundStreamSetup(int nOutputs, int nInputs, ofSimpleApp * OFSA, int sampleRate, int bufferSize, int nBuffers)
and I am using
ofSoundStreamSetup(1, 1, this, 44100, 512, 4)
My header info is
float buffer1[1000000];
float buffer2[1000000];
float* readPointer;
float* writePointer;
int readp;
int writep;
I've got two functions
audioReceived (float * input, int bufferSize, int nChannels)
if (writep < 10)
{
for (int i = 0;i < bufferSize; i++)
{
writePointer[writep*i] = input[i];
}
writep++;
if (writep >= 10)
{
writep = 0;
}
}
audioRequested(float * output, int buffersize, int numChannels)
{
if (writep > 0)
{
for (int i = 0; i < bufferSize; i++)
{
output[i] = readPointer[readp * i];
}
readp++;
if (readp >=10)
{
readp = 0;
}
}
}
This is working but the quality seems poppy and crackly. I think I may have to implement a proper circle buffer, or double buffering, but I'm not sure.
Can anyone point me in the correct direction for how I can get the audio to sound good, using as simple a method as possible?
I would definitely suggest using double buffering. Otherwise a buffer becomes available at the same time you want a buffer. This potentially results in a case of you editing a buffer that is currently in use.
In general when audio is received you add it to buffer 1. When audio is requested you give it buffer 2. Now when audio is received put it in buffer 2 and when the request arrives give it buffer 1. And so on.