Data from wav file are between -1 and 1, c++, sndfile - c++

I am trying to read data from .wav and put it to fft.
To read wav file I am using sndfile library.
SNDFILE* infile;
SF_INFO sfinfo ;
memset (&sfinfo, 0, sizeof (sfinfo)) ;
infile = sf_open ("sound.wav", SFM_READ, &sfinfo);
double data [BUF_SIZE];
while (readcount = (int)sf_readf_double (infile, data, BUF_SIZE))
{
for (int i = 0; i < readcount; i++)
{
cout << data[i] << " ";
}
}
But every values in this (and other files) are between (-1 ; 1).
Is this correct? Why every values are so small? I was expected to read amplitude in time domain (volume of sound).

This is the canonical format of floating point samples. With float values, you get full 32-bit precision. Clipping is also easy to represent. If a sample value is higher than 1 or lower than -1, it means the sample clipped. With integer values, there's no way to know that.
Floating point is also an easy sample format to apply operations to. Mixing for example is trivial (you just add the sample values together.)
So even if it looks weird at first, it is the best format for audio sample representation. Once you applied the operations you need to the float values, you then convert them to the format you want for output (like 16-bit integers.) This operation is trivial. Here's a function that converts and clips float samples to any known integer sample format in use today:
#include <limits>
/* Convert and clip a float sample to an integer sample. This works for
* all usual integer sample types (8-bit, 16-bit, 32-bit, signed or
* unsigned.)
*/
template <typename T>
T floatSampleToInt(float src) noexcept
{
if (src >= 1.f)
return std::numeric_limits<T>::max();
if (src < -1.f)
return std::numeric_limits<T>::min();
return src * (float)(1UL << (sizeof(T) * 8 - 1))
+ ((float)(1UL << (sizeof(T) * 8 - 1))
+ (float)std::numeric_limits<T>::min());
}
If you want to convert a float sample to a signed 16-bit integer sample for example, you do:
int16_t intSample = floatSampleToInt<int16_t>(floatSample);
Note that 24-bit integer samples are covered by 32-bit. A 32-bit sample is also a valid 24-bit sample; its lower 8 bits are just truncated.

Related

Static noise in generated sine wave pcm sound

This is very similar to a question here but I don't seem to be able to apply the solution.
I have a code that samples a sine wave and writes it into a pcm file. When I listen to it with ffplay, there is some static noise that I don't know where it comes from. Based on the solution in the mentioned post, I use a binary file for writting out and I make sure I play the file with signed 8 bit format.
This is the code I use:
int createSineWavePCM(int freq, int sample_rate) {
char out_name[100];
sprintf(out_name, "../sine_freq%d_sr%d.pcm", freq, sample_rate);
ofstream outfile(out_name, ios::binary);
char data[1000000];
for (int j = 0 ; j < 1000000 ; ++j) {
double ll = 50.0L * sin((2.0L * M_PIl * j * freq / sample_rate));
data[j] = ll;
}
outfile.write(data, sizeof data);
outfile.close();
cout << "Stored sine wave pcm file in " << out_name << endl;
return 0;
}
I use freq = 440 and sample_rate = 44100, and then I play with:
ffplay {pcm_file} -f s8 -sample_rate 44100
Any ideas on what may cause the static noise?
The expression in the sin function looks dubious. Are all the components of type int or long? You wrote 2.0L, and I'm surprised that parses, but L usually converts a number to a long. Also it seems like M_PI has an l appended which would possibly also make that a long. If this is the case, the division being performed freq / sample_rate could well be integer division.

Arduino c+ >> operation

I am adapting the example for the Arduino AutoAnalogAudio library entitled
SDAudioWavPlayer
which can be found in Examples->AutoAnalogAudio->SDAudio->SDAudioWavPlayer
This example uses interrupts to repeatedly call the function
void loadBuffer(). The code for that is below
/* Function called from DAC interrupt after dacHandler(). Loads data into the dacBuffer */
void loadBuffer() {
if (myFile) {
if (myFile.available()) {
if (aaAudio.dacBitsPerSample == 8) {
//Load 32 samples into the 8-bit dacBuffer
myFile.read((byte*)aaAudio.dacBuffer, MAX_BUFFER_SIZE);
}else{
//Load 32 samples (64 bytes) into the 16-bit dacBuffer
myFile.read((byte*)aaAudio.dacBuffer16, MAX_BUFFER_SIZE * 2);
//Convert the 16-bit samples to 12-bit
for (int i = 0; i < MAX_BUFFER_SIZE; i++) {
aaAudio.dacBuffer16[i] = (aaAudio.dacBuffer16[i] + 0x8000) >> 4;
}
}
}else{
#if defined (AUDIO_DEBUG)
Serial.println("File close");
#endif
myFile.close();
aaAudio.disableDAC();
}
}
}
The specific part I am concerned with is the second part of the if statement
{
//Load 32 samples (64 bytes) into the 16-bit dacBuffer
myFile.read((byte*)aaAudio.dacBuffer16, MAX_BUFFER_SIZE * 2);
//Convert the 16-bit samples to 12-bit
for (int i = 0; i < MAX_BUFFER_SIZE; i++) {
aaAudio.dacBuffer16[i] = (aaAudio.dacBuffer16[i] + 0x8000) >> 4;
}
}
Despite the comment MAX_BUFFER_SIZE is 256 so 512 bytes are read into
aaAudio.dacBuffer16. That data was originally 16 bit signed integers (+/- 32k) and dacBuffer16 is an array of 16bit unsigned integers (0-64K). The negative sign is removed by going through the array and adding 2^15 (0x8000) to each element. This makes the negative numbers overflow leaving the positive part of the negative number. Positive numbers are just increased by 2^15. thus the values are rescalled to lie in 0 -64K. The result is then shifted 4 places right so that only the highest 12 bits remain which is what the Arduino DAC can handle. This all happens in the line
aaAudio.dacBuffer16[i] = (aaAudio.dacBuffer16[i] + 0x8000) >> 4;
So far so good.
Now I want to be able to programmatically reduce the volume. As far as I can find the library does not provide a function to do that so I thought that the simplest
thing to do was to change the '4' to 'N' and increase the amount of shifting to 5,6,7.. etc
eg
aaAudio.dacBuffer16[i] = (aaAudio.dacBuffer16[i] + 0x8000) >> N;
where N is an integer. I tried this but I got a terribly distorted result which I did not understand.
While fiddling around trying different things I tried the following which works
uint16_t sample;
int N = 5;
for (int i = 0; i < MAX_BUFFER_SIZE; i++)
{
sample = (aaAudio.dacBuffer16[i] + 0x8000);
sample = sample >> N;
// sample = sample / 40;
aaAudio.dacBuffer16[i] = sample;
}
You can also see that I have commented out simply dividing by a number which works if I want finer control.
My problem is I do not see what the difference is between the two bits of code.
Can anybody enlighten me ?

Mixing audio channels

I am implementing an audio channel mixer and using Viktor T. Toth's algorithm. Trying to mix two audio channel streams.
In the code, quantization_ is the byte representation of the bit depth of a channel. My mix function, takes a pointer to destination and source uint8_t buffers, mixes two channels and writes into the destination buffer. Because I am taking data in a uint8_t buffer, doing that addition, division, and multiplication operations to get the actual 8, 16 or 24-bit samples and convert them again to 8-bit.
Generally, it gives the expected output sample values. However, some samples turn out to have near 0 value as they are not supposed to be when I look the output in Audacity. In the screenshot, bottom 2 signals are two mono channels and the top one is the mixed channel. It can be seen that there are some very low values, especially in the middle.
Below, is my mix function;
void audio_mixer::mix(uint8_t* dest, const uint8_t* source)
{
uint64_t mixed_sample = 0;
uint64_t dest_sample = 0;
uint64_t source_sample = 0;
uint64_t factor = 0;
for (int i = 0; i < channel_size_; ++i)
{
dest_sample = 0;
source_sample = 0;
factor = 1;
for (int j = 0; j < quantization_; ++j)
{
dest_sample += factor * static_cast<uint64_t>(*dest++);
source_sample += factor * static_cast<uint64_t>(*source++);
factor = factor * 256;
}
mixed_sample = (dest_sample + source_sample) - (dest_sample * source_sample / factor);
dest -= quantization_;
for (int k = 0; k < quantization_; ++k)
{
*dest++ = static_cast<uint8_t>(mixed_sample % 256);
mixed_sample = mixed_sample / 256;
}
}
}
It seems like you aren't treating the signed audio samples correctly. The horizontal line should be zero voltage from your audio signal.
If you look at the positive voltage audio samples they obey your equation correctly (except for the peak values in the center). The negative values are being compressed which makes me feel like they are being treated as small positive voltages instead of negative voltages.
In other words, maybe those unsigned ints should be signed ints so the top bit indicates the voltage polarity and you can have audio samples in the range +127 to -128.
Those peak values in the center seem like they are wrapping around modulo 255 which would be the peak value for an unsigned byte representation of your audio. I'm not sure how this would happen but it seems related to the unsigned vs signed signals.
Maybe you should try the other formula Viktor provided in his document:
Z = 2(A+B) - (AB/128) - 256

C++ Convert string to 16-bit unsigned/signed int/float

I am in the process of making a simple program, which loads vertices and triangles from a file (uints and floats).
They will be used in OpenGL and i want them to be 16-bit (to conserve memory), however i only know how to convert to 32-bit. I don't want to use assembly, because i want it to run on ARM as well.
So, is it possible to convert a string to a 16-bit int/float?
One possible answer would be to something like this :
#include <string>
#include <iostream>
std::string str1 = "345";
std::string str2 = "3.45";
int myInt(std::stoi(str1));
uint16_t myInt16(0);
if (myInt <= static_cast<int>(UINT16_MAX) && myInt >=0) {
myInt16 = static_cast<uint16_t>(myInt);
}
else {
std::cout << "Error : Manage your error the way you want to\n";
}
float myFloat(std::stof(str2));
For the vertex coordinates, you have a floating point number X and you need to convert it to one of the 16 bit alternatives in OpenGL: GL_SHORT or GL_UNSIGNED_SHORT or GL_HALF_FLOAT. First, you need to decide whether you want to use integers or floating point.
If you're going with integers, I recommend unsigned integers, so that zero maps to the minimal value and 65536 maps to the maximal value. With integers, you need to decide on the range of valid values for X.
Suppose you know that X is between Xmin and Xmax. Then, you can calculate a GL_UNSIGNED_SHORT-compatible representation by:
unsigned short convert_to_GL_UNSIGNED_SHORT(float x, float xmin, float xmax) {
if (x<=xmin)
return 0;
else if (x>=xmax)
return 65535;
else
return (unsigned short)((X-Xxmin)/(X-Xmax)*65535 + 0.5)
}
If you go with half floats, I suggest you look at 16-bit floats and GL_HALF_FLOAT
For the face indices, you have unsigned 32 bit integers, right? If they are all below 65536, you can easily convert them to 16 bit unsigned shorts by
unsigned short i16 = (unsigned short)i32;

How can you convert a std::bitset<64> to a double?

Is there a way to convert a std::bitset<64> to a double without using any external library (Boost, etc.)? I am using a bitset to represent a genome in a genetic algorithm and I need a way to convert a set of bits to a double.
The C++11 road:
union Converter { uint64_t i; double d; };
double convert(std::bitset<64> const& bs) {
Converter c;
c.i = bs.to_ullong();
return c.d;
}
EDIT: As noted in the comments, we can use char* aliasing as it is unspecified instead of being undefined.
double convert(std::bitset<64> const& bs) {
static_assert(sizeof(uint64_t) == sizeof(double), "Cannot use this!");
uint64_t const u = bs.to_ullong();
double d;
// Aliases to `char*` are explicitly allowed in the Standard (and only them)
char const* cu = reinterpret_cast<char const*>(&u);
char* cd = reinterpret_cast<char*>(&d);
// Copy the bitwise representation from u to d
memcpy(cd, cu, sizeof(u));
return d;
}
C++11 is still required for to_ullong.
Most people are trying to provide answers that let you treat the bit-vector as though it directly contained an encoded int or double.
I would advise you completely avoid that approach. While it does "work" for some definition of working, it introduces hamming cliffs all over the place. You usually want your encoding to arrange things so that if two decoded values are near to one another, then their encoded values are near to one another as well. It also forces you to use 64-bits of precision.
I would manage the conversion manually. Say you have three variables to encode, x, y, and z. Your domain expertise can be used to say, for example, that -5 <= x < 5, 0 <= y < 100, and 0 <= z < 1, where you need 8 bits of precision for x, 12 bits for y, and 10 bits for z. This gives you a total search space of only 30 bits. You can have a 30 bit string, treat the first 8 as encoding x, the next 12 as y, and the last 10 as z. You are also free to gray code each one to remove the hamming cliffs.
I've personally done the following in the past:
inline void binary_encoding::encode(const vector<double>& params)
{
unsigned int start=0;
for(unsigned int param=0; param<params.size(); ++param) {
// m_bpp[i] = number of bits in encoding of parameter i
unsigned int num_bits = m_bpp[param];
// map the double onto the appropriate integer range
// m_range[i] is a pair of (min, max) values for ith parameter
pair<double,double> prange=m_range[param];
double range=prange.second-prange.first;
double max_bit_val=pow(2.0,static_cast<double>(num_bits))-1;
int int_val=static_cast<int>((params[param]-prange.first)*max_bit_val/range+0.5);
// convert the integer to binary
vector<int> result(m_bpp[param]);
for(unsigned int b=0; b<num_bits; ++b) {
result[b]=int_val%2;
int_val/=2;
}
if(m_gray) {
for(unsigned int b=0; b<num_bits-1; ++b) {
result[b]=!(result[b]==result[b+1]);
}
}
// insert the bits into the correct spot in the encoding
copy(result.begin(),result.end(),m_genotype.begin()+start);
start+=num_bits;
}
}
inline void binary_encoding::decode()
{
unsigned int start = 0;
// for each parameter
for(unsigned int param=0; param<m_bpp.size(); param++) {
unsigned int num_bits = m_bpp[param];
unsigned int intval = 0;
if(m_gray) {
// convert from gray to binary
vector<int> binary(num_bits);
binary[num_bits-1] = m_genotype[start+num_bits-1];
intval = binary[num_bits-1];
for(int i=num_bits-2; i>=0; i--) {
binary[i] = !(binary[i+1] == m_genotype[start+i]);
intval += intval + binary[i];
}
}
else {
// convert from binary encoding to integer
for(int i=num_bits-1; i>=0; i--) {
intval += intval + m_genotype[start+i];
}
}
// convert from integer to double in the appropriate range
pair<double,double> prange = m_range[param];
double range = prange.second - prange.first;
double m = range / (pow(2.0,double(num_bits)) - 1.0);
// m_phenotype is a vector<double> containing all the decoded parameters
m_phenotype[param] = m * double(intval) + prange.first;
start += num_bits;
}
}
Note that for reasons that probably don't matter to you, I wasn't using bit vectors -- just ordinary vector<int> to encoding things. And of course, there's a bunch of stuff tied into this code that isn't shown here, but you can probably get the basic idea.
One other note, if you're doing GPU calculations or if you have a particular problem such that 64 bits are the appropriate size anyway, it may be worth the extra overhead to stuff everything into native words. Otherwise, I would guess that the overhead you add to the search process will probably overwhelm whatever benefits you get by faster encoding and decoding.
Edit:: I've decided that I was being a bit silly with this. While you do end up with a double it assumes that the bitset holds an integer... which is a big assumption to make. You will end up with a predictable and repeatable value per bitset but still I don't think that this is what the author intended.
Well if you iterate over the bit values and do
output_double += pow( 2, 64-(bit_position+1) ) * bit_value;
That would work. As long as it is big-endian