I have two audio files I read in using libsndfile.
SNDFILE* file1 = sf_open("D:\\audio1.wav", SFM_READ, &info);
SNDFILE* file2 = sf_open("D:\\audio2.wav", SFM_READ, &info2);
After I've done the previous I sample x-number of samples:
//Buffers that will hold the samples
short* buffer1 = new short[2 * sizeof(short) * 800000];
short* buffer2 = new short[2 * sizeof(short) * 800000];
// Read the samples using libsndfile
sf_readf_short(file1, buffer1, 800000);
sf_readf_short(file2, buffer2, 800000);
Now, I want to mix those two. I read that you need to get the left and right channel separately and then sum them up. I tried doing it like this:
short* mixdown = new short[channels * sizeof(short) * 800000];
for (int t = 0; t < 800000; ++t)
{
mixdown[t] = buffer1[t] + buffer2[t] - ((buffer1[t]*buffer2[t]) / 65535);
t++;
mixdown[t] = buffer1[t] + buffer2[t] - ((buffer1[t]*buffer2[t]) / 65535);
}
After that I'm encoding the new audio using ffmpeg:
FILE* process2 = _popen("ffmpeg -y -f s16le -acodec pcm_s16le -ar 44100 -ac 2 -i - -f vob -ac 2 D:\\audioMixdown.wav", "wb");
fwrite(mixdown, 2 * sizeof(short) * 800000, 1, process2);
Now, the problem is that the audio from buffer1 sounds fine in the mixdown but the only thing "added" to the new audio is noise (like if it's an old audio recording) when I encode the mixdown to a file.
If I encode only one of the two to a file it works perfectly.
I have no idea why it's going wrong. I guess it has something to do with the mixing, obviously, but I don't know what I'm doing wrong. I got the mixing algorithm here but it doesn't give me the expected results.
I've also read other information on SO about people having similar questions but I couldn't figure it out with those.
Your mixing code is very odd - you seem to be adding a non-linear term which will result in distortion - it seems to be a hack specifically for 8 bit PCM where the dynamic range is very limited, but you probably don't need to worry about this for 16 bit PCM. For basic mixing you just want this:
for (int t = 0; t < 800000 * 2; ++t)
{
mixdown[t] = (buffer1[t] + buffer2[t]) / 2;
}
Note that the divide by 2 is necessary to prevent distortion when you have two full scale signals. Note also that I've removed 2x loop unrolling.
Your algorithm is correct, but you missed an important point : the range of your PCM is from -32768 to 32767. Thus you must divide by 32768, and not 65535.
Related
This is very similar to a question here but I don't seem to be able to apply the solution.
I have a code that samples a sine wave and writes it into a pcm file. When I listen to it with ffplay, there is some static noise that I don't know where it comes from. Based on the solution in the mentioned post, I use a binary file for writting out and I make sure I play the file with signed 8 bit format.
This is the code I use:
int createSineWavePCM(int freq, int sample_rate) {
char out_name[100];
sprintf(out_name, "../sine_freq%d_sr%d.pcm", freq, sample_rate);
ofstream outfile(out_name, ios::binary);
char data[1000000];
for (int j = 0 ; j < 1000000 ; ++j) {
double ll = 50.0L * sin((2.0L * M_PIl * j * freq / sample_rate));
data[j] = ll;
}
outfile.write(data, sizeof data);
outfile.close();
cout << "Stored sine wave pcm file in " << out_name << endl;
return 0;
}
I use freq = 440 and sample_rate = 44100, and then I play with:
ffplay {pcm_file} -f s8 -sample_rate 44100
Any ideas on what may cause the static noise?
The expression in the sin function looks dubious. Are all the components of type int or long? You wrote 2.0L, and I'm surprised that parses, but L usually converts a number to a long. Also it seems like M_PI has an l appended which would possibly also make that a long. If this is the case, the division being performed freq / sample_rate could well be integer division.
I am implementing an audio channel mixer and using Viktor T. Toth's algorithm. Trying to mix two audio channel streams.
In the code, quantization_ is the byte representation of the bit depth of a channel. My mix function, takes a pointer to destination and source uint8_t buffers, mixes two channels and writes into the destination buffer. Because I am taking data in a uint8_t buffer, doing that addition, division, and multiplication operations to get the actual 8, 16 or 24-bit samples and convert them again to 8-bit.
Generally, it gives the expected output sample values. However, some samples turn out to have near 0 value as they are not supposed to be when I look the output in Audacity. In the screenshot, bottom 2 signals are two mono channels and the top one is the mixed channel. It can be seen that there are some very low values, especially in the middle.
Below, is my mix function;
void audio_mixer::mix(uint8_t* dest, const uint8_t* source)
{
uint64_t mixed_sample = 0;
uint64_t dest_sample = 0;
uint64_t source_sample = 0;
uint64_t factor = 0;
for (int i = 0; i < channel_size_; ++i)
{
dest_sample = 0;
source_sample = 0;
factor = 1;
for (int j = 0; j < quantization_; ++j)
{
dest_sample += factor * static_cast<uint64_t>(*dest++);
source_sample += factor * static_cast<uint64_t>(*source++);
factor = factor * 256;
}
mixed_sample = (dest_sample + source_sample) - (dest_sample * source_sample / factor);
dest -= quantization_;
for (int k = 0; k < quantization_; ++k)
{
*dest++ = static_cast<uint8_t>(mixed_sample % 256);
mixed_sample = mixed_sample / 256;
}
}
}
It seems like you aren't treating the signed audio samples correctly. The horizontal line should be zero voltage from your audio signal.
If you look at the positive voltage audio samples they obey your equation correctly (except for the peak values in the center). The negative values are being compressed which makes me feel like they are being treated as small positive voltages instead of negative voltages.
In other words, maybe those unsigned ints should be signed ints so the top bit indicates the voltage polarity and you can have audio samples in the range +127 to -128.
Those peak values in the center seem like they are wrapping around modulo 255 which would be the peak value for an unsigned byte representation of your audio. I'm not sure how this would happen but it seems related to the unsigned vs signed signals.
Maybe you should try the other formula Viktor provided in his document:
Z = 2(A+B) - (AB/128) - 256
I'm using GDI+ in C++ to manipulate some Bitmap images, changing the colour and resizing the images. My code is very slow at one particular point and I was looking for some potential ways to speed up the line that's been highlighted in the VS2013 Profiler
for (UINT y = 0; y < 3000; ++y)
{
//one scanline at a time because bitmaps are stored wrong way up
byte* oRow = (byte*)bitmapData1.Scan0 + (y * bitmapData1.Stride);
for (UINT x = 0; x < 4000; ++x)
{
//get grey value from 0.114*Blue + 0.299*Red + 0.587*Green
byte grey = (oRow[x * 3] * .114) + (oRow[x * 3 + 1] * .587) + (oRow[x * 3 + 2] * .299); //THIS LINE IS THE HIGHLIGHTED ONE
//rest of manipulation code
}
}
Any handy hints on how to handle this arithmetic line better? It's causing massive slow downs in my code
Thanks in advance!
Optimization depends heavily on the used compiler and the target system. But there are some hints which may be usefull. Avoid multiplications:
Instead of:
byte grey = (oRow[x * 3] * .114) + (oRow[x * 3 + 1] * .587) + (oRow[x * 3 + 2] * .299); //THIS LINE IS THE HIGHLIGHTED ONE
use...
//get grey value from 0.114*Blue + 0.299*Red + 0.587*Green
byte grey = (*oRow) * .114;
oRow++;
grey += (*oRow) * .587;
oRow++;
grey += (*oRow) * .299;
oRow++;
You can put the incrimination of the pointer in the same line. I put it in a separate line for better understanding.
Also, instead of using the multiplication of a float you can use a table, which can be faster than arithmetic. This depends on CPU und table size, but you can give it a shot:
// somwhere global or class attributes
byte tred[256];
byte tgreen[256];
byte tblue[256];
...at startup...
// Only init once at startup
// I am ignoring the warnings, you should not :-)
for(int i=0;i<255;i++)
{
tred[i]=i*.114;
tgreen[i]=i*.587;
tblue[i]=i*.229;
}
...in the loop...
byte grey = tred[*oRow];
oRow++;
grey += tgreen[*oRow];
oRow++;
grey += tblue[*oRow];
oRow++;
Also. 255*255*255 is not such a great size. You can build one big table. As this Table will be larger than the usual CPU cache, I give it not such more speed efficiency.
As suggested, you could do math in integer, but you could also try floats instead of doubles (.114f instead of .114), which are usually quicker and you don't need the precision.
Do the loop like this, instead, to save on pointer math. Creating a temporary pointer like this won't cost because the compiler will understand what you're up to.
for(UINT x = 0; x < 12000; x+=3)
{
byte* pVal = &oRow[x];
....
}
This code is also easily threadable - the compiler can do it for you automatically in various ways; here's one, using parallel for:
https://msdn.microsoft.com/en-us/library/dd728073.aspx
If you have 4 cores, that's a 4x speedup, just about.
Also be sure to check release vs debug build - you don't know the perf until you run it in release/optimized mode.
You could premultiply values like: oRow[x * 3] * .114 and put them into an array. oRow[x*3] has 256 values, so you can easily create array aMul1 of 256 values from 0->255, and multiply it by .144. Then use aMul1[oRow[x * 3]] to find multiplied value. And the same for other components.
Actually you could even create such array for RGB values, ie. your pixel is 888, so you will need an array of size 256*256*256, which is 16777216 = ~16MB.Whether this would speed up your process, you would have to check yourself with profiler.
In general I've found that more direct pointer management, intermediate instructions, less instructions (on most CPUs, they're all equal cost these days), and less memory fetches - e.g. tables are not the answer more often than they are - is the usual optimum, without going to direct assembly. Vectorization, especially explicit is also helpful as is dumping assembly of the function and confirming the inner bits conform to your expectations. Try this:
for (UINT y = 0; y < 3000; ++y)
{
//one scanline at a time because bitmaps are stored wrong way up
byte* oRow = (byte*)bitmapData1.Scan0 + (y * bitmapData1.Stride);
byte *p = oRow;
byte *pend = p + 4000 * 3;
for(; p != pend; p+=3){
const float grey = p[0] * .114f + p[1] * .587f + p[2] * .299f;
}
//alternatively with an autovectorizing compiler
for(; p != pend; p+=3){
#pragma unroll //or use a compiler option to unroll loops
//make sure vectorization and relevant instruction sets are enabled - this is effectively a dot product so the following intrinsic fits the bill:
//https://msdn.microsoft.com/en-us/library/bb514054.aspx
//vector types or compiler intrinsics are more reliable often too... but get compiler specific or architecture dependent respectively.
float grey = 0;
const float w[3] = {.114f, .587f, .299f};
for(int c = 0; c < 3; ++c){
grey += w[c] * p[c];
}
}
}
Consider fooling around with OpenCL and targeting your CPU to see how fast you could solve with CPU specific optimizations and easily multiple cores - OpenCL covers this up for you pretty well and provides built in vector ops and dot product.
I have a TV capture card that has a feed coming in as a YUV format. I've seen other posts here similar to this question and attempted to try every possible method stated, but neither of them provided a clear image. At the moment the best results were with the OpenCV cvCvtColor(scr, dst, CV_YUV2BGR) function call.
I am currently unaware of the YUV format and to be honest confuses me a little bit as it looks like it stores 4 channels, but is only 3? I have included an image from the capture card to hope that someone can understand what is possibly going on that I could use to fill in the blanks.
The feed is coming in through a DeckLink Intensity Pro card and being accessed in a C++ application in using OpenCV in a Windows 7 environment.
Update
I have looked at a wikipedia article regarding this information and attempted to use the formula in my application. Below is the code block with the output received from it. Any advice is greatly appreciated.
BYTE* pData;
videoFrame->GetBytes((void**)&pData);
m_nFrames++;
printf("Num Frames executed: %d\n", m_nFrames);
for(int i = 0; i < 1280 * 720 * 3; i=i+3)
{
m_RGB->imageData[i] = pData[i] + pData[i+2]*((1 - 0.299)/0.615);
m_RGB->imageData[i+1] = pData[i] - pData[i+1]*((0.114*(1-0.114))/(0.436*0.587)) - pData[i+2]*((0.299*(1 - 0.299))/(0.615*0.587));
m_RGB->imageData[i+2] = pData[i] + pData[i+1]*((1 - 0.114)/0.436);
}
In newer version of OPENCV there is a built in function can be used to do YUV to RGB conversion
cvtColor(src,dst,CV_YUV2BGR_YUY2);
specify the YUV format after the underscore, like this CV_YUYV2BGR_xxxx
It looks to me like you're decoding a YUV422 stream as YUV444. Try this modification to the code you provided:
for(int i = 0, j=0; i < 1280 * 720 * 3; i+=6, j+=4)
{
m_RGB->imageData[i] = pData[j] + pData[j+3]*((1 - 0.299)/0.615);
m_RGB->imageData[i+1] = pData[j] - pData[j+1]*((0.114*(1-0.114))/(0.436*0.587)) - pData[j+3]*((0.299*(1 - 0.299))/(0.615*0.587));
m_RGB->imageData[i+2] = pData[j] + pData[j+1]*((1 - 0.114)/0.436);
m_RGB->imageData[i+3] = pData[j+2] + pData[j+3]*((1 - 0.299)/0.615);
m_RGB->imageData[i+4] = pData[j+2] - pData[j+1]*((0.114*(1-0.114))/(0.436*0.587)) - pData[j+3]*((0.299*(1 - 0.299))/(0.615*0.587));
m_RGB->imageData[i+5] = pData[j+2] + pData[j+1]*((1 - 0.114)/0.436);
}
I'm not sure you've got your constants correct, but at worst your colors will be off - the image should be recognizable.
I use the following C++ code using OpenCV to convert yuv data (YUV_NV21) to rgb image (BGR in OpenCV)
int main()
{
const int width = 1280;
const int height = 800;
std::ifstream file_in;
file_in.open("../image_yuv_nv21_1280_800_01.raw", std::ios::binary);
std::filebuf *p_filebuf = file_in.rdbuf();
size_t size = p_filebuf->pubseekoff(0, std::ios::end, std::ios::in);
p_filebuf->pubseekpos(0, std::ios::in);
char *buf_src = new char[size];
p_filebuf->sgetn(buf_src, size);
cv::Mat mat_src = cv::Mat(height*1.5, width, CV_8UC1, buf_src);
cv::Mat mat_dst = cv::Mat(height, width, CV_8UC3);
cv::cvtColor(mat_src, mat_dst, cv::COLOR_YUV2BGR_NV21);
cv::imwrite("yuv.png", mat_dst);
file_in.close();
delete []buf_src;
return 0;
}
and the converted result is like the image yuv.png.
you can find the testing raw image from here and the whole project from my Github Project
It may be the wrong path, but many people (I mean, engineers) do mix YUV with YCbCr.
Try to
cvCvtColor(src, dsc, CV_YCbCr2RGB)
or CV_YCrCb2RGB or maybe a more exotic type.
The BlackMagic Intensity software return YUVY' format in bmdFormat8BitYUV, so 2 sources pixels are compressed into 4bytes - I don't think openCV's cvtColor can handle this.
You can either do it yourself, or just call the Intensity software ConvertFrame() function
edit: Y U V is normally stored as
There is a Y (brightness) for each pixel but only a U and V (colour) for every alternate pixel in the row.
So if data is an unsigned char pointing to the start of the memory as shown above.
pixel 1, Y = data[0] U = data[+1] V = data[+3]
pixel 2, Y = data[+2] U = data[+1] V = data[+3]
Then use the YUV->RGB coefficients you used in your sample code.
Maybe someone is confused by color models YCbCr and YUV.
Opencv does not handle YCbCr. Instead it has YCrCb, and it implemented the same way as YUV in opencv.
From the opencv sources https://github.com/Itseez/opencv/blob/2.4/modules/imgproc/src/color.cpp#L3830:
case CV_BGR2YCrCb: case CV_RGB2YCrCb:
case CV_BGR2YUV: case CV_RGB2YUV:
// ...
// 1 if it is BGR, 0 if it is RGB
bidx = code == CV_BGR2YCrCb || code == CV_BGR2YUV ? 0 : 2;
//... converting to YUV with the only difference that brings
// order of Blue and Red channels (variable bidx)
But there is one more thing to say.
There is currently a bug in conversion CV_BGR2YUV and CV_RGB2YUV in OpenCV branch 2.4.* .
At present, this formula is used in implementation:
Y = 0.299B + 0.587G + 0.114R
U = 0.492(R-Y)
V = 0.877(B-Y)
What it should be (according to wikipedia):
Y = 0.299R + 0.587G + 0.114B
U = 0.492(B-Y)
V = 0.877(R-Y)
The channels Red and Blue are misplaced in the implemented formula.
Possible workaround to convert BGR->YUV while the bug is not fixed :
cv::Mat source = cv::imread(filename, CV_LOAD_IMAGE_COLOR);
cv::Mat yuvSource;
cvtColor(source, yuvSource, cv::COLOR_BGR2RGB); // rearranges B and R in the appropriate order
cvtColor(yuvSource, yuvSource, cv::COLOR_BGR2YUV);
// yuvSource will contain here correct image in YUV color space
I haven't been programming in C++ for a while, and now I have to write a simple thing, but it's driving me nuts.
I need to create a bitmap from a table of colors:
char image[200][200][3];
First coordinate is width, second height, third colors: RGB. How to do it?
Thanks for any help.
Adam
I'm sure you've already checked http://en.wikipedia.org/wiki/BMP_file_format.
With that information in hand we can write a quick BMP with:
// setup header structs bmpfile_header and bmp_dib_v3_header before this (see wiki)
// * note for a windows bitmap you want a negative height if you're starting from the top *
// * otherwise the image data is expected to go from bottom to top *
FILE * fp = fopen ("file.bmp", "wb");
fwrite(bmpfile_header, sizeof(bmpfile_header), 1, fp);
fwrite(bmp_dib_v3_header, sizeof(bmp_dib_v3_header_t), 1, fp);
for (int i = 0; i < 200; i++) {
for (int j = 0; j < 200; j++) {
fwrite(&image[j][i][2], 1, 1, fp);
fwrite(&image[j][i][1], 1, 1, fp);
fwrite(&image[j][i][0], 1, 1, fp);
}
}
fclose(fp);
If setting up the headers is a problem let us know.
Edit: I forgot, BMP files expect BGR instead of RGB, I've updated the code (surprised nobody caught it).
I'd suggest ImageMagick, comprehensive library etc.
I would first try to find out, how the BMP file format (that's what you mean by a bitmap, right?) is defined. Then I would convert the array to that format and print it to the file.
If that's an option, I would also consider trying to find an existing library for BMP files creation, and just use it.
Sorry if what I said is already obvious for you, but I don't know on which stage of the process you are stuck.
For simple image operations I highly recommend Cimg. This library works like a charm, and is extremely easy to use. You just have to include a header file in your code. It literally took me less than 10 minutes to compile and test.
If you want to do more complicated image operations however, I would go with Magick++ as suggested by dagoof.
It would be advisable to initialise the function as a simple 1 dimensional array.
ie (Where bytes is the number of bytes per pixel)
char image[width * height * bytes];
You can then access the relevant position in the array as follows
char byte1 = image[(x * 3) + (y * (width * bytes)) + 0];
char byte2 = image[(x * 3) + (y * (width * bytes)) + 1];
char byte3 = image[(x * 3) + (y * (width * bytes)) + 2];