Endianness in wav files - c++

I have tried to make a simple wav writer. I wanted to do this so that I could read in a wav file (using a pre-existing wav reader), resample the audio data then write the resampled data to another wav file. Input files could be 16 bitsPerSample or 32 bitsPerSample and I wanted to save the resampled audio with the same number of bitsPerSample.
The writer is working but there a couple of things I don't understand to do with endianness and I was hoping someone may be able to help me?
I previously had no experience of reading or writing binary files. I began by looking up the wav file format online and tried to write the data following the correct format. At first the writing wasn't working but I then found out that wav files are little-endian and it was trying to make my file writer consistent with this that brought up the majority of my problems.
I have got the wav writer to work now (by way of a test whereby I read in a wav file and checked I could write the unsampled audio and reproduce the exact same file) however there are a couple of points I am still unsure on to do with endianness and I was hoping someone may be able to help me?
Assuming the relevant variables have already been set here is my code for the wav writer:
// Write RIFF header
out_stream.write(chunkID.c_str(),4);
out_stream.write((char*)&chunkSize,4);
out_stream.write(format.c_str());
// Write format chunk
out_stream.write(subchunk1ID.c_str(),4);
out_stream.write((char*)&subchunk1Size,4);
out_stream.write((char*)&audioFormat,2);
out_stream.write((char*)&numOfChannels,2);
out_stream.write((char*)&sampleRate,4);
out_stream.write((char*)&byteRate,4);
out_stream.write((char*)&blockAlign,2);
out_stream.write((char*)&bitsPerSample,2);
// Write data chunk
out_stream.write(subchunk2ID.c_str(),4);
out_stream.write((char*)&subchunk2Size,4);
// Variables for writing 16 bitsPerSample data
std::vector<short> soundDataShort;
soundDataShort.resize(numSamples);
char theSoundDataBytes [2];
// soundData samples are written as shorts if bitsPerSample=16 and floats if bitsPerSample=32
switch( bitsPerSample )
{
case (16):
// cast each of the soundData samples from floats to shorts
// then save the samples in little-endian form (requires reversal of byte-order of the short variable)
for (int sample=0; sample < numSamples; sample++)
{
soundDataShort[sample] = static_cast<short>(soundData[sample]);
theSoundDataBytes[0] = (soundDataShort[sample]) & 0xFF;
theSoundDataBytes[1] = (soundDataShort[sample] >> 8) & 0xFF;
out_stream.write(theSoundDataBytes,2);
}
break;
case (32):
// save the soundData samples in binary form (does not require change to byte order for floats)
out_stream.write((char*)&soundData[0],numSamples);
}
The questions that I have are:
In the soundData vector why does the endianness of a vector of shorts matter but the vector of floats doesn't? In my code I have reversed the byte order of the shorts but not the floats.
Originally I tried to write the shorts without reversing the byte order. When I wrote the file it ended up being half the size it should have been (i.e. half the audio data was missing, but the half that was there sounded correct), why would this be?
I have not reversed the byte order of the shorts and longs in the other single variables which are essentially all the other fields that make up the wav file e.g. sampleRate, numOfChannels etc but this does not seem to affect the playing of the wav file. Is this just because media players do not use these fields (and hence I can't tell that I have got them wrong) or is it because the byte order of these variables does not matter?

In the soundData vector why does the endianness of a vector of shorts matter but the vector of floats doesn't? In my code I have reversed the byte order of the shorts but not the floats.
Actually, if you take a closer look at your code, you will see that you are not reversing the endianness of your shorts at all. Nor do you need to, on Intel CPUs (or on any other low-endian CPU).
Originally I tried to write the shorts without reversing the byte order. When I wrote the file it ended up being half the size it should have been (i.e. half the audio data was missing, but the half that was there sounded correct), why would this be?
I have no idea without seeing the code but I suspect that some other factor was in play.
I have not reversed the byte order of the shorts and longs in the other single variables which are essentially all the other fields that make up the wav file e.g. sampleRate, numOfChannels etc but this does not seem to affect the playing of the wav file. Is this just because media players do not use these fields (and hence I can't tell that I have got them wrong) or is it because the byte order of these variables does not matter?
These fields are in fact very important and must also be little-endian, but, as we have seen, you don't need to swap those either.

Related

Reading PCM audio file is giving sometimes wrong samples

I have a 16 bit, 48kHz, 1-channel (mono) PCM audio file (with no header but it would be the same with a WAV header anyway) and I can read that file correctly using a software such as Audacity, however when I try to read it programatically (in C++), some samples just seem to be out of place while most are correct when comparing Audacity values.
My process of reading the PCM file is the following:
Convert the byte array of PCM to a short array to get readable values by bitshifting (the order of bytes is little-endian here).
for(int i = 0; i < bytesSize - 1; i += 2)
shortValue[i] = bytes[i] | bytes[i + 1] << 8;
note: bytes is a char array of the binary contents of the PCM file. And shortValue is a short array.
Convert the short values to Amplitude levels in a float array by dividing by the max value of short (32767)
for(int i = 0; i < shortsSize ; i++)
amplitude[i] = static_cast<float>(shortValue[i]) / 32767;
This is obviously not optimal code and I could do it in one loop but for the sole purpose of explaining I separated the two steps.
So what happens exactly is that when I try to find very big changes of amplitude levels in my last array, it shows me samples that are not correct? Like here in Audacity notice how the wave is perfectly smooth and how the sample 276,467 pointed in green goes just a bit lower to the next sample pointed in red, which should be around -0.17.
However, when reading from my code, I get a totally wrong value of the red sample (-0.002), while still getting a good value of the green sample (around -0.17), the sample after the red one is also correct (around -0.17 as well).
I don't really understand what's happening and how Audacity is able to read those bytes correctly, I tried with multiple PCM/WAV files and I get the same results. Any help would really be appreciated!

Sending Binary File Data via Google Protobuf

I have my protobuf-message set up fine it seems, all other fields I have transmit correctly across the network and do not truncate. I only have one problem, when I read the binary data of a picture or file then send it through google protobuf as bytes array type, on the other side it only contains the first 4 elements of the array. If the picture is say 200kb, on the other end it comes out as 1kb(Basically only contains a header or identifier). This problem is kinda complex so I will try to give a run down. Sorry if I make this impossible to understand. I may be going about this completely the wrong way.
Example below contains conceptual work, and was written in class. It very well could contain small errors. The code compiles at home, and if it is a typo let me know and I can fix it.
FILE* file;
FILE* ofile;
file = fopen("red.png", "rb");
fseek(file, 0, SEEK_END);
long fSize = ftell(file);
rewind(file);
BYTE* ret = new BYTE[fSize];
fread(ret, 1, fSize, file);
fclose(file);
char dataStream[1024] //yes it is large enough
myPacket.set_file(ret);
//set other fields here
myPacket.SerializeToArray(dataStream,sizeof(dataStream));
//send through sockets below, works for all but file field.
I can include more when I get back home to my main work computer, sorry, was just hoping I could let this stew while at class. If this is not enough info feel free to give me the smack down, it's alright just looking for advice. I also know that certain image formats can be read certain ways, but I was able to copy a png and rewrite it through binary locally, just not over protobuf
Thanks for reading my pseudo book guys, I am finally trying to leap into improving my knowledge.
Edited quickly typed pointer error(&ret) to (ret). Also then should size of be sizeof(myPacket) rather.
You have written this:
char dataStream[1024] //yes it is large enough
But how could 1024 bytes buffer be large enough if you want to store 200 000 bytes into it?
Better allocate a bigger buffer on the heap, e.g.:
std::vector<char> dataStream(500000);
myPacket.SerializeToArray(&dataStream[0], dataStream.size());

Reading subchunk2 data of a wav file in C++

I am trying to read the data part of a .wav file into a buffer. I have already read the header part according to C++ Reading the Data part of a WAV file
Therefore, my file pointer wavFile now points to the beginning of the data section. Then I use the following code to read audio data into a buffer.
long bytes = wavHeader.bitsPerSample/8;
long buffsize= wavHeader.Subchunk2Size/bytes;
int16_T *audiobuf = new int16_T[buffsize];
fread(audiobuf,bytes,buffsize,wavFile);
// do some processing
delete audiobuf;
In my test audio file, bitsPerSample is 16 and Subchunk2Size is 79844. Therefore, buffsize is 39922.
After running this code, I noticed that only first 256 positions of audiobuf get filled. But theoretically there should be 39922 entries of audio data. How can I sort out this issue?

understanding format of file

I have a question regarding file reading and I am getting frustrated over it as I am doing some handwriting recognition development and the tool I am using doesn't seem to read my training data file.
So I have one file which works perfectly fine. I paste some contents of that file here:
è Aڈ2*A ê“AêA mwA)àXA$NلAئ~A›إA:ozA)"ŒA%IœA&»ّAم3ACA
|®AH÷AD¢A ô-A گ&AJXAsAA mGA قQAٍALs#÷8´A
The file is in a format I know about that first 12 bytes are 2 longs and 2 shorts with most probably data as 4 , 1000 , 1024 , 9 but T cannot read the file to get these values.
Actually I want to write my first 12 bytes in format similar to the mentioned above and I dont seem to get how to do it.
Forgot to mention that the remaining data are float points. When I write data into file I get human readable text not these symbols and when I am reading these symbols I do not get the actual values. How to get the actual floats and integers across these symbols?
My code is
struct rec
{
long a;
long b ;
short c;
short d;
}; // this is the struct
FILE *pFile;
struct rec my_record;
// then I read using fread
fread(&my_record,1,sizeof(my_record),pFile);`
and the values i get in a, b, c and d are 85991456, -402448352, 8193, and 2336 instead of the actual values.
First of all, you should open that file in a hex editor, to see exactly what bytes it contains. From the text excerpt you have posted I think it does not contain 4, 1000, 1024 and 9 as you expect, but text form may be very misleading, because different character encodings show different characters for the same sequences of bytes.
If you have confirmed that the file contains the expected data, there may be still other issues. One of these is endianness, some machines and file formats encode a 4-byte long with least significant byte first, while others read and write the most significant byte first.
Other issue concerns the long data type you use. If your computer has a 64-bit architecture and you are using Linux, long is a 64-bit value, and your structure becomes 20 bytes long instead of 12.
Edit:
To read big-endian longs on a litte-endian machine like yours, you should read de data byte-by-byte and build the longs from them manually:
// Read 4 bytes
unsigned char buf[4];
fread(buf, 4, 1, pFile);
// Convert to long
my_record.a = (((long)buf[0]) << 24) | (((long)buf[1]) << 16) | (((long)buf[2]) << 8) | ((long)buf[3]);
Compiler adds padding to your structure members to make it (typically ) 4byte aligned. In this case variables c and d are padded.
You should read per-defined data types at a time From your fread instead of your structure.

Why sync-safe integer?

I'm recently working on ID3v2.4.0.
Reading 2.4.0 document, i found a particular part that i can't understand - sync-safe integer.
Why does the ID3v2 use this method?
Of course, i know why the ID3v2 uses Unsynchronization scheme, which is used to keep MPEG decoder from considering ID3 tag as a MPEG sync data.
But what i couldn't understand is that why sync-safe integer instead of Unsynchronization Scheme (= inserting $00).
Is there any reason why they adopt sync-safe integer when expressing tag size instead of inserting $00?
These two method result in completely same effect. 
ID3v2 document says that the size of unsynchronized data is not known in advance.
But that statement does not make sense.
If tag data is stored in buffer, one can know the size of unsynchronized data after simply replacing the problematic character with $FF 00.
Is there anyone who can help me?
I would presume for simplicity, and the unsynch/synch scheme only makes sense when used on an mpeg file.
It is trivial to read in the four bytes and convert them to a regular integer:
// pseudo code
uint32_t size;
file.read( &size, sizeof(uint32_t) );
size = (size & 0x0000007F) |
( (size & 0x00007F00) >> 1 ) |
( (size & 0x007F0000) >> 2 ) |
( (size & 0x7F000000) >> 3 );
If they used the same unsynch scheme as frame data you would need to read each byte separately, look for the FF00 pattern, and reconstruct the integer byte by byte. Also, if the ‘size’ field in the header could be a variable number of bytes, due to unsynch bytes being inserted, the entire header would be a variable number of bytes. Simpler for them to say 'the header is always 10 bytes in size and it looks like this...'.
ID3v2 document says that the size of unsynchronized data is not known in advance. But that statement does not make sense. If tag data is stored in buffer, one can know the size of unsynchronized data after simply replacing the problematic character with $FF 00.
You are correct, it doesn't make sense. The size written in the id3v2 header and frame headers is the size after unsynchronisation, if any, was applied. However, it is permissible to write frame data without unsynching as id3v2 may be used for tagging files other than mp3, where the concept of unsynch/synch makes no sense. I think what section 6.2 was trying to say is 'regardless of whether this is an mp3 file, or a frame is written unsynched/synched, the frame size is always written in a mpeg synch-safe manner'.
ID3v2.4 frames can have the ‘Data Length Indicator’ flag set in the frame header, in which case you can find out how big a buffer is after synchronisation. Refer to section 4.1.2 of the spec.
Is there anyone who can help me?
Some helpful advice from someone who has written a conforming id3v2 tag reader: Don't try make sense of the spec. It surely was written by madmen and sadists. Just looking at it again is giving me nightmares.