Deinterleaving PCM (*.wav) stereo audio data - c++

I understand that PCM data is stored as [left][right][left][right].... Am trying to convert a stereo PCM to mono Vorbis (*.ogg) which I understand is achievable by halving the left and the right channels ((left+right)*0.5). I have actually achieved this by amending the encoder example in the libvorbis sdk like this,
#define READ 1024
signed char readbuffer[READ*4];
and the PCM data is read thus
fread(readbuffer, 1, READ*4, stdin)
I then halved the two channels,
buffer[0][i] = ((((readbuffer[i*4+1]<<8) | (0x00ff&(int)readbuffer[i*4]))/32768.f) + (((readbuffer[i*4+3]<<8) | (0x00ff&(int)readbuffer[i*4+2]))/32768.f)) * 0.5f;
It worked perfectly, but, I don't understand how they deinterleave the left and right channel from the PCM data (i.e. all the bit shifting and "ANDing" and "ORing").

A .wav file typically stores its PCM data in little endian format, with 16 bits per sample per channel. For the usual signed 16-bit PCM file, this means that the data is physically stored as
[LEFT LSB] [LEFT MSB] [RIGHT LSB] [RIGHT MSB] ...
so that every group of 4 bytes makes up a single stereo PCM sample. Hence, you can find sample i by looking at bytes 4*i through 4*i+3, inclusive.
To decode a single 16-bit value from two bytes, you do this:
(MSB << 8) | LSB
Because your read buffer values are stored as signed chars, you have to be a bit careful because both MSB and LSB will be sign-extended. This is undesirable for the LSB; therefore, the code uses
0xff & (int)LSB
to obtain the unsigned version of the low byte (technically, this works by upcasting to an int, and selecting the low 8 bits; an alternate formulation would be to just write (uint8_t)LSB).
Note that the MSBs are at indices 1 and 3, and the LSBs are at indices 0 and 2. So,
((readbuffer[i*4+1]<<8) | (0x00ff&(int)readbuffer[i*4]))
and
((readbuffer[i*4+3]<<8) | (0x00ff&(int)readbuffer[i*4+2]))
are just obtaining the values of the left and right channels as 16-bit signed values by using some bit manipulation to assemble the bytes into numbers.
Then, each of these values is divided by 32768.0. Note that a signed 16-bit value has a range of [-32768, 32767]. Thus, dividing by 32768 gives a range of approximately [-1, 1]. The two divided values are added to give a number in the range [-2, 2], and then the whole thing is multiplied by 0.5 to obtain the average (a floating-point value in the range [-1, 1]).

Related

Microsoft WASAPI - Do different audio formats need different data in the buffer, for the same wave

Lets say I want to play a sine wave using WASAPI.
Will the data I enter into the AudioClient buffer always just be samples between -1 and 1, or will it be different between PCM and IEEE_Float Formats, and other formats for that matter.
Thanks.
Right now i'm just using 1 to -1, but i want to know whether or not i need to write my buffer input code different for each format.
MEDIASUBTYPE_IEEE_FLOAT / WAVE_FORMAT_IEEE_FLOAT audio types operate with floating point values in [-1, +1] range.
MEDIASUBTYPE_PCM / WAVE_FORMAT_PCM has integer values,
8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767.
You will also find good references here: How to handle asymmetry of WAV data?.

Use png8 instead of png24 as RGB height map

I have a 1 band DEM geotiff, and a formula to convert altitude -> RGB and RGB -> altitude (something like this: https://docs.mapbox.com/help/troubleshooting/access-elevation-data).
Using the formula (and GDAL/Python), I converted my geotiff to a 3 bands (R, G & B) geotiff, each band having values in the 0-255 range.
Using mapnik / mod_tile, I'm then serving my geotiff as PNG tiles to a web client. Everything is fine if I setup mod_tile to serve the tiles as 24 or 32 bits PNGs. But if I serve them as 8 bits PNGs (to reduce their size), then the decoded values are a bit off (I can't see the difference when looking at the image, but the RGB values are not exactly the same and it thus messes my decoded altitudes).
Am I right expecting to be able to do what I want (retrieving the exact RGB values) with 8 bits PNGs instead of 24/32, or there something I don't understand about 8 bits PNGs (if so, I'll have to dive into mod_tile code, I guess that when we ask 8 bits, it generates 24 or 32 and then compress)?
No, you are not right in expecting that you can compress any ensemble of 24-bit values losslessly to 8-bit values. If there are more than 256 different 24-bit values in the original, then some of those different values will necessarily map to the same 8-bit value.

DXT3 (BC2) compression format alpha data

I'm trying to read image info from a dds file. I managed to get the DXT1 and DXT5 formats working fine, however I have a question concerning the alpha data of the DXT3 format (Also know as BC2).
When looking at the layout of a compressed BC2 block, it shows the alpha data for the 16-pixel block is stored in the first 8 bytes of the data, with each value taking up 4 bits.
Does this mean that, since the stored alpha value can only be 0-15, the actual alpha data is calculated as follows:
unsigned char bitvalue = GetAlphaBitValue(); // assume this works and gets the 4-bit value i am looking for
unsigned char alpha = (bitvalue / 15.0f) * 255;
Is this correct, or am I looking at it wrong?
That's what this specification seems to say:
The alpha component for a texel at location (x,y) in the block is
given by alpha(x,y) / 15.
Because the result there is supposed to be in [0 .. 1], not [0 .. 255].
Since 255 is divisible by 15, it's probably easier to think of the transformation to [0 .. 255] as
uint8_t alpha = bitvalue * 17;
It is now more obvious that what's going on is the usual "replicate" mapping (just like eg CSS short color codes) that gives a nice spreading of output values (allows both the minimum and the maximum values to be encoded, and has equal steps between all values).

RLE Encoding bit sequence, not bytes

I need to implement a compression algorithm for binary data, that need to work on embedded constrained devices (256kB ROM, 48 KB RAM).
I'm thinking to the RLE compression algorithm. Unless implementing it from scratch, I've found a lot of C implementations, (for example: http://sourceforge.net/projects/bcl/?source=typ_redirect ), but they apply the RLE algorithm over the byte sequence (the word of the dictionary are 1 to 255, that is 8-bit encoding.
I'm finding for an implementation that, starting from a sequence of bytes, applies the RLE encoding over the bit-sequence corresponding to the input (0 and 1). Note that also another algorithm can work (I need a compression ratio <0.9, so I think any algorithm can do it), but the implementation need to work on a bit-basis, not bytes.
Can anyone help me? Thank you!
I think that you can encode bytes such as 0, 1, 2, 3, 255… etc. (Where lots of 0 and 1)
Let's encode this bit sequence:
000000011111110
1. Shift bits and increment the counter if bit compare to last bit
2. If NOT— shift 111 to output buffer and write control bit 1 before bit sequence
3. If bits can not be packed — write 0 bit and rest of data
Output:
111101110100
To decompress simply shift first control bit:
If 0 — write next bit to output buffer
If 1 — read 3 bits (can be other length) and convert them to the decimal number. Shift next bit which will mean what bit to repeat and start loop to represent the original sequence
But this compression method will work only on files, which have lots of 0 and 255 bytes (00000000 and 11111111 in binary), such as BMP files with black or white background.
Hope I helped you!

Are bitmaps supposed to be 2-byte or 4-byte aligned?

MSDN documentation seems to contradict itself:
Here it says:
For uncompressed RGB formats, the minimum stride is always the image width in bytes, rounded up to the nearest DWORD.
While here it says:
The number of bytes in each scan line. This value must be divisible by 2, because the system assumes that the bit values of a bitmap form an array that is word aligned.
So sometimes MSDN wants a 4-byte aligned stride and sometimes it wants a 2-byte aligned stride. Which is right?
To be more specific, when saving a bitmap file should I use a 4-byte stride or a 2-byte stride?
The first quote is accurate. The second dates back to the 16-bit version of Windows and did not get edited as it should have. Not entirely unusual, GDI32 docs have had a fair amount of mistakes.
Do note that the up voted answer is not accurate. Monochrome bitmaps still have a stride that's a multiple of 4, there is no special rule that makes it 2. A bit of .NET code to demonstrate this:
var bmp = new Bitmap(1, 1, System.Drawing.Imaging.PixelFormat.Format1bppIndexed);
var bdata = bmp.LockBits(new Rectangle(0, 0, 1, 1), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat);
Console.WriteLine(bdata.Stride);
Output: 4
For uncompressed RGB formats, the minimum stride is always the image width in bytes, rounded up to the nearest DWORD.
Bitmaps are not necessarily always uncompressed RGB, they might be monochrome. In the BITMAP structure, the member bmBitsPixel specifies the number of bits per pixel, so it is valid for it to be 1. So, you should save RGB bitmaps with a byte stride that is a multiple of 4, and save monochrome bitmap with a stride that is a multiple of 2.
CreateBitmap/CreateBitmapIndirect/BITMAP struct - are all pre-Windows 3.0 APIs that was supposed to be used on 16-bit processors. Thats why they are using this 16-bit aligned stride.
All newer APIs are using 32bit stride aligment (sizeof(DWORD)).
You can use "newer" APIs (post-Windows 3.0) like CreateDIBitmap or CreateCompatibleBitmap/SetDiBits if your buffer have 32-bit aligned strides.
As for files - they are using BITMAPINFO/BITMAPINFOHEADER structure and implies 32bit stride aligment.