I have a 16 bit, 48kHz, 1-channel (mono) PCM audio file (with no header but it would be the same with a WAV header anyway) and I can read that file correctly using a software such as Audacity, however when I try to read it programatically (in C++), some samples just seem to be out of place while most are correct when comparing Audacity values.
My process of reading the PCM file is the following:
Convert the byte array of PCM to a short array to get readable values by bitshifting (the order of bytes is little-endian here).
for(int i = 0; i < bytesSize - 1; i += 2)
shortValue[i] = bytes[i] | bytes[i + 1] << 8;
note: bytes is a char array of the binary contents of the PCM file. And shortValue is a short array.
Convert the short values to Amplitude levels in a float array by dividing by the max value of short (32767)
for(int i = 0; i < shortsSize ; i++)
amplitude[i] = static_cast<float>(shortValue[i]) / 32767;
This is obviously not optimal code and I could do it in one loop but for the sole purpose of explaining I separated the two steps.
So what happens exactly is that when I try to find very big changes of amplitude levels in my last array, it shows me samples that are not correct? Like here in Audacity notice how the wave is perfectly smooth and how the sample 276,467 pointed in green goes just a bit lower to the next sample pointed in red, which should be around -0.17.
However, when reading from my code, I get a totally wrong value of the red sample (-0.002), while still getting a good value of the green sample (around -0.17), the sample after the red one is also correct (around -0.17 as well).
I don't really understand what's happening and how Audacity is able to read those bytes correctly, I tried with multiple PCM/WAV files and I get the same results. Any help would really be appreciated!
Related
I am writing bit of code in C++ where I want to play a .wav file and perform an FFT (with fftw) on it as it comes (and eventually display that FFT on screen with ncurses). This is mainly just as a "for giggles/to see if I can" project, so I have no restrictions on what I can or can't use aside from wanting to try to keep the result fairly lightweight and cross-platform (I'm doing this on Linux for the moment). I'm also trying to do this "right" and not just hack it together.
I'm using SDL2_audio to achieve the playback, which is working fine. The callback is called at some interval requesting N bytes (seems to be desiredSamples*nChannels). My idea is that at the same time I'm copying the memory from my input buffer to SDL I might as well also copy it in to fftw3's input array to run an FFT on it. Then I can just set ncurses to refresh at whatever rate I'd like separate from the audio callback frequency and it'll just pull the most recent data from the output array.
The catch is that the input file is formatted where the channels are packed together. I.E "(LR) (LR) (LR) ...". So while SDL expects this, I need a way to just get one channel to send to FFTW.
The audio callback format from SDL looks like so:
void myAudioCallback(void* userdata, Uint8* stream, int len) {
SDL_memset(stream, 0, sizeof(stream));
SDL_memcpy(stream, audio_pos, len);
audio_pos += len;
}
where userdata is (currently) unused, stream is the array that SDL wants filled, and len is the length of stream (I.E the number of bytes SDL is looking for).
As far as I know there's no way to get memcpy to just copy every other sample (read: Copy N bytes, skip M, copy N, etc). My current best idea is a brute-force for loop a la...
// pseudocode
for (int i=0; i<len/2; i++) {
fftw_in[i] = audio_pos + 2*i*sizeof(sample)
}
or even more brute force by just reading the file a second time and only taking every other byte or something.
Is there another way to go about accomplishing this, or is one of these my best option? It feels kind of kludgey to go from a nice one line memcpy to send to the data to SDL to some sort of weird loop to send it to fftw.
Very hard OP's solution can be simplified (for copying bytes):
// pseudocode
const char* s = audio_pos;
for (int d = 0; s < audio_pos + len; d++, s += 2*sizeof(sample)) {
fftw_in[d] = *s;
}
If I new what fftw_in is, I would memcpy blocks sizeof(*fftw_in).
Please check assembly generated by #S.M.'s solution.
If the code is not vectorized, I would use intrinsics (depending on your hardware support) like _mm_mask_blend_epi8
I have tried to make a simple wav writer. I wanted to do this so that I could read in a wav file (using a pre-existing wav reader), resample the audio data then write the resampled data to another wav file. Input files could be 16 bitsPerSample or 32 bitsPerSample and I wanted to save the resampled audio with the same number of bitsPerSample.
The writer is working but there a couple of things I don't understand to do with endianness and I was hoping someone may be able to help me?
I previously had no experience of reading or writing binary files. I began by looking up the wav file format online and tried to write the data following the correct format. At first the writing wasn't working but I then found out that wav files are little-endian and it was trying to make my file writer consistent with this that brought up the majority of my problems.
I have got the wav writer to work now (by way of a test whereby I read in a wav file and checked I could write the unsampled audio and reproduce the exact same file) however there are a couple of points I am still unsure on to do with endianness and I was hoping someone may be able to help me?
Assuming the relevant variables have already been set here is my code for the wav writer:
// Write RIFF header
out_stream.write(chunkID.c_str(),4);
out_stream.write((char*)&chunkSize,4);
out_stream.write(format.c_str());
// Write format chunk
out_stream.write(subchunk1ID.c_str(),4);
out_stream.write((char*)&subchunk1Size,4);
out_stream.write((char*)&audioFormat,2);
out_stream.write((char*)&numOfChannels,2);
out_stream.write((char*)&sampleRate,4);
out_stream.write((char*)&byteRate,4);
out_stream.write((char*)&blockAlign,2);
out_stream.write((char*)&bitsPerSample,2);
// Write data chunk
out_stream.write(subchunk2ID.c_str(),4);
out_stream.write((char*)&subchunk2Size,4);
// Variables for writing 16 bitsPerSample data
std::vector<short> soundDataShort;
soundDataShort.resize(numSamples);
char theSoundDataBytes [2];
// soundData samples are written as shorts if bitsPerSample=16 and floats if bitsPerSample=32
switch( bitsPerSample )
{
case (16):
// cast each of the soundData samples from floats to shorts
// then save the samples in little-endian form (requires reversal of byte-order of the short variable)
for (int sample=0; sample < numSamples; sample++)
{
soundDataShort[sample] = static_cast<short>(soundData[sample]);
theSoundDataBytes[0] = (soundDataShort[sample]) & 0xFF;
theSoundDataBytes[1] = (soundDataShort[sample] >> 8) & 0xFF;
out_stream.write(theSoundDataBytes,2);
}
break;
case (32):
// save the soundData samples in binary form (does not require change to byte order for floats)
out_stream.write((char*)&soundData[0],numSamples);
}
The questions that I have are:
In the soundData vector why does the endianness of a vector of shorts matter but the vector of floats doesn't? In my code I have reversed the byte order of the shorts but not the floats.
Originally I tried to write the shorts without reversing the byte order. When I wrote the file it ended up being half the size it should have been (i.e. half the audio data was missing, but the half that was there sounded correct), why would this be?
I have not reversed the byte order of the shorts and longs in the other single variables which are essentially all the other fields that make up the wav file e.g. sampleRate, numOfChannels etc but this does not seem to affect the playing of the wav file. Is this just because media players do not use these fields (and hence I can't tell that I have got them wrong) or is it because the byte order of these variables does not matter?
In the soundData vector why does the endianness of a vector of shorts matter but the vector of floats doesn't? In my code I have reversed the byte order of the shorts but not the floats.
Actually, if you take a closer look at your code, you will see that you are not reversing the endianness of your shorts at all. Nor do you need to, on Intel CPUs (or on any other low-endian CPU).
Originally I tried to write the shorts without reversing the byte order. When I wrote the file it ended up being half the size it should have been (i.e. half the audio data was missing, but the half that was there sounded correct), why would this be?
I have no idea without seeing the code but I suspect that some other factor was in play.
I have not reversed the byte order of the shorts and longs in the other single variables which are essentially all the other fields that make up the wav file e.g. sampleRate, numOfChannels etc but this does not seem to affect the playing of the wav file. Is this just because media players do not use these fields (and hence I can't tell that I have got them wrong) or is it because the byte order of these variables does not matter?
These fields are in fact very important and must also be little-endian, but, as we have seen, you don't need to swap those either.
Currently I'm working on this Arduino/Nanode project where we want to play a collection of WAV-files stored on a SD-card, with PWM on clock OCR0.
- I'm able to play the PWM perfectly, starting from the sketch from Michael Smith on the Arduino website: http://www.arduino.cc/playground/Code/PCMAudio
- I'm able to read the SD-card correctly and convert the data to 8bit integers that look correct when I print them to the serial window.
The problem I have is when I feed in these integers to the PWM value of the clock.
As I said, when I'm using the original PWM Audio file with my own WAV-file converted to a .h-file (through wav2c) it works and it sounds good. When I'm reading the SD-card it shows me the correct values. It shows correct when I'm reading the WAV-files direct and also (what I'm trying in my latest version posted here) as I convert them to text-files and read these. When I'm feeding in the integers from the text file, I hear a horn-like sound, like if the PWM uses the wrong values to output.
I'm guessing the problem is somewhere in the casting of the data into the byte data the Atmega uses. But I don't have any clue where to look or how to solve it. I noticed that the original file uses unsigned char's where I'm using uint_t8. I tried to cast them but it's not working.
Does anyone has some experience in this? Or any clue how I could possible solve this?
Many thanks for you help and time!
Jeroen
PS: Below is the piece of my code where I read through the text files and convert them to integers. They always consist of 3 characters; value 21 for example is printed as 021 in the file, and seperated with a comma which the script skips with the 4th myFile.read()
myFile = SD.open(FileName);
char sampleTMP[4];
sampleTMP[0] = (myFile.read());
sampleTMP[1] = (myFile.read());
sampleTMP[2] = (myFile.read());
sampleTMP[3] = 0;
myFile.read();
unsigned char ss;
ss = atoi(sampleTMP);
Serial.println(ss, DEC);
OCR0A = ss;
OCR0B = ss;
I have a question regarding file reading and I am getting frustrated over it as I am doing some handwriting recognition development and the tool I am using doesn't seem to read my training data file.
So I have one file which works perfectly fine. I paste some contents of that file here:
è Aڈ2*A ê“AêA mwA)àXA$NلAئ~A›إA:ozA)"ŒA%IœA&»ّAم3ACA
|®AH÷AD¢A ô-A گ&AJXAsAA mGA قQAٍALs#÷8´A
The file is in a format I know about that first 12 bytes are 2 longs and 2 shorts with most probably data as 4 , 1000 , 1024 , 9 but T cannot read the file to get these values.
Actually I want to write my first 12 bytes in format similar to the mentioned above and I dont seem to get how to do it.
Forgot to mention that the remaining data are float points. When I write data into file I get human readable text not these symbols and when I am reading these symbols I do not get the actual values. How to get the actual floats and integers across these symbols?
My code is
struct rec
{
long a;
long b ;
short c;
short d;
}; // this is the struct
FILE *pFile;
struct rec my_record;
// then I read using fread
fread(&my_record,1,sizeof(my_record),pFile);`
and the values i get in a, b, c and d are 85991456, -402448352, 8193, and 2336 instead of the actual values.
First of all, you should open that file in a hex editor, to see exactly what bytes it contains. From the text excerpt you have posted I think it does not contain 4, 1000, 1024 and 9 as you expect, but text form may be very misleading, because different character encodings show different characters for the same sequences of bytes.
If you have confirmed that the file contains the expected data, there may be still other issues. One of these is endianness, some machines and file formats encode a 4-byte long with least significant byte first, while others read and write the most significant byte first.
Other issue concerns the long data type you use. If your computer has a 64-bit architecture and you are using Linux, long is a 64-bit value, and your structure becomes 20 bytes long instead of 12.
Edit:
To read big-endian longs on a litte-endian machine like yours, you should read de data byte-by-byte and build the longs from them manually:
// Read 4 bytes
unsigned char buf[4];
fread(buf, 4, 1, pFile);
// Convert to long
my_record.a = (((long)buf[0]) << 24) | (((long)buf[1]) << 16) | (((long)buf[2]) << 8) | ((long)buf[3]);
Compiler adds padding to your structure members to make it (typically ) 4byte aligned. In this case variables c and d are padded.
You should read per-defined data types at a time From your fread instead of your structure.
I'm recently working on ID3v2.4.0.
Reading 2.4.0 document, i found a particular part that i can't understand - sync-safe integer.
Why does the ID3v2 use this method?
Of course, i know why the ID3v2 uses Unsynchronization scheme, which is used to keep MPEG decoder from considering ID3 tag as a MPEG sync data.
But what i couldn't understand is that why sync-safe integer instead of Unsynchronization Scheme (= inserting $00).
Is there any reason why they adopt sync-safe integer when expressing tag size instead of inserting $00?
These two method result in completely same effect.
ID3v2 document says that the size of unsynchronized data is not known in advance.
But that statement does not make sense.
If tag data is stored in buffer, one can know the size of unsynchronized data after simply replacing the problematic character with $FF 00.
Is there anyone who can help me?
I would presume for simplicity, and the unsynch/synch scheme only makes sense when used on an mpeg file.
It is trivial to read in the four bytes and convert them to a regular integer:
// pseudo code
uint32_t size;
file.read( &size, sizeof(uint32_t) );
size = (size & 0x0000007F) |
( (size & 0x00007F00) >> 1 ) |
( (size & 0x007F0000) >> 2 ) |
( (size & 0x7F000000) >> 3 );
If they used the same unsynch scheme as frame data you would need to read each byte separately, look for the FF00 pattern, and reconstruct the integer byte by byte. Also, if the ‘size’ field in the header could be a variable number of bytes, due to unsynch bytes being inserted, the entire header would be a variable number of bytes. Simpler for them to say 'the header is always 10 bytes in size and it looks like this...'.
ID3v2 document says that the size of unsynchronized data is not known in advance. But that statement does not make sense. If tag data is stored in buffer, one can know the size of unsynchronized data after simply replacing the problematic character with $FF 00.
You are correct, it doesn't make sense. The size written in the id3v2 header and frame headers is the size after unsynchronisation, if any, was applied. However, it is permissible to write frame data without unsynching as id3v2 may be used for tagging files other than mp3, where the concept of unsynch/synch makes no sense. I think what section 6.2 was trying to say is 'regardless of whether this is an mp3 file, or a frame is written unsynched/synched, the frame size is always written in a mpeg synch-safe manner'.
ID3v2.4 frames can have the ‘Data Length Indicator’ flag set in the frame header, in which case you can find out how big a buffer is after synchronisation. Refer to section 4.1.2 of the spec.
Is there anyone who can help me?
Some helpful advice from someone who has written a conforming id3v2 tag reader: Don't try make sense of the spec. It surely was written by madmen and sadists. Just looking at it again is giving me nightmares.