correct coding of ID3 v2.3 frame size field for GEOB tag - mp3

I have some confusion regarding how the frame size bytes should be coded/decoded for ID3 v2.3.0. According to the (informal) ID3 v2.3.0 specification, the size of each frame should be coded into 4 bytes, where the most significant bit of each byte is unused. To calculate the size, it would take the formula below:
byte MASK = (byte)0x7F;
int size = 0;
for (int = 0; i < 4; i++) {
size = size * 128 + (b[i] & MASK);
}
But when I used my parser to parse some MP3 files, quite a few files had GEOB (general encapsulated object tag) frames whose size bytes were coded as if it were a Big Endian 32-bit Integer.
After I fixed these bytes by re-coding them using the proper algorithm, commercial software such as Windows 7 and Winamp were not able proper display the subsequent tags (in several instances, TIT2 was right after GEOB, so the song's title was not displayed although it was in the file).
I also found similar problems for MCDI (music cd identifier), and TALB ('Album/Movie/Show title') tags.
I read through the v2.3 spec, and also Googled, but wasn't able to find any information regarding the use of a 32-bit integer as size metadata for these frames. Yet the common behavior in different commercial software seems to suggest for such fields, a 32-bit integer should be used as size instead of 4 bytes masked by 0x7F.
So I am just wondering if anyone here has worked on ID3 v2.3 and could clarify this for me.

Yes. However, I consider the docs to be explicit enough, given the conventions of % (binary) and $ (hexadecimal) which are explained right away:
Header size:
4 * %0xxxxxxx as per v2.2.0 (§3.1.) header
4 * %0xxxxxxx as per v2.3.0 header
4 * %0xxxxxxx as per v2.4.0 (§3.1.) header
Frame size:
$xx xx xx as per v2.2.0 (i.e. §4.1.) frame
$xx xx xx xx as per v2.3.0 frame
4 * %0xxxxxxx as per v2.4.0 (§4.) frame
Summary:
For all 3 versions in ID3v2 the header size is stored in the same way: using 4 bytes, but for each only 7 bits are valid.
Only for ID3v2.2 frames the size consists of 3 (full) bytes.
Only for ID3v2.3 frames the size consists of 4 (full) bytes.
Only for ID3v2.4 frames the size finally is stored just like the header's size: 4 bytes, but only 28 bits are valid.
ID3v2.4.0 changes §3 also lines out the frame size change from v2.3.0. The whole issue comes from MPEG Audio (and AAC) stream which synchronizes with 9 (or 12) bits set - any decoder might then misinterpret the ID3 metadata as audio data.

I believe I have found the answer. ID3 v2.3, despite its being the more commonly supported (as opposed to v2.4) has not to well-written (and informal) spec. Its header size uses the 4 0x7F bytes, but the frame sizes are in fact 32-bit integers, only they are never clearly spelled out.
the reason I usually encountered the problem when dealing with GEOB is because the problem won't crop up until the frame size is larger than 0x7F, and GEOB usually is.

Related

What is the proper length of an ID3v2 frame?

In describing and ID3v2 header and the frames within, https://id3.org/id3v2.3.0#ID3v2_frame_overview states:
The frame ID is followed by a size descriptor, making a total header size of ten bytes in every frame.
Yet when I use a hex editor to look through the frames of an ID3 tag, the frame seems to be 12 bytes. I have looked at numerous songs and they seem to have the tag, followed by a 4 byte size descriptor and then 4 additional bytes (the description says this should be two flag bytes).
I admit to being a little of of my depth here but I'm trying to write ID3v2 tags using PHP and I'm a bit stumped.
You haven't read 3.3.1. Frame header flags:
Some flags indicates that the frame header is extended with additional information. This information will be added to the frame header in the same order as the flags indicating the additions. I.e. the four bytes of decompressed size will precede the encryption method byte.
Which means the following in addition to 3.3. ID3v2 frame overview:
The layout of the frame header:
Frame ID $xx xx xx xx (four characters)
Size $xx xx xx xx
Flags $xx xx
If looking bit wise at those 2 bytes "Flags" of the 10 bytes frame header then you have to expect additional bytes as per set flag:
bit 7 ("i") = 4 more bytes "decompressed size"
bit 6 ("j") = 1 more byte "encryption method"
bit 5 ("k") = 1 more byte "group identifier"
So your observation might be correct. If you link to the first 4096 bytes of such a file then I can tell you if they're still correct as per the standard although they "look" like having 12 bytes per header.

MPEG 2 and 2.5 - problems calculating frame sizes in bytes

I have a console program which I have used for years, for (among other things) displaying info about certain audio-file formats, including mp3. I used data from the mpeghdr site to calculate the frame sizes, in order to further calculate playing time for the tracks. The equation that I got from mpeghdr was:
// Read the BitRate, SampleRate and Padding of the frame header.
// For Layer I files use this formula:
//
// FrameLengthInBytes = (12 * BitRate / SampleRate + Padding) * 4
//
// For Layer II & III files use this formula:
//
// FrameLengthInBytes = 144 * BitRate / SampleRate + Padding
This works well for most mp3 files, but there have always been a small subset for whom this equation failed. Recently, I've been looking at a set of very small mp3 files, and have found that for these files this formula fails much more often, so I'm trying to finally nail down what is going on. All of these mp3 files were generated using Lame V3.100, with default settings, on Windows 7 64-bit.
In all cases, I can successfully find the first frame header, but when I used the above formula to calculate the offset to the next frame header, it is sometimes not correct.
As an example, I have a file 'wolf howl.mp3'; analytical files such as MPEGAudioInfo show frame size as 288 bytes. When I run my program, though, it shows length of first frame as 576 bytes (2 * 288). When I look at the mp3 file in a hex editor, with first frame at 0x154, I can see that the next frame is at 0x154 + 208 bytes, but this calculation does in fact result in 576 bytes...
File info:
mpegV2.5, layer III
frame: bitrate=32, sample_rate=8000, pad=0, bytes=576
mtemp->frame_length_in_bytes =
(144 * (mtemp->bitrate * 1000) / mtemp->sample_rate) + mtemp->padding_bit;
which equals 576
I've looked at numerous other references, and they all show this equation...
At first I thought is was an issue with MPEG 2.5, which is an unofficial standard, but I have also seen this with MPEG2 files as well. Only happens with small files, though.
Does anyone have any insights on what I am missing here??
//**************************************
Later notes:
I thought maybe audio format would be relevant to this issue, so I dumped channel_mode and mode_extension for each of my test files (3 calculate properly, 2 don't). Sadly, all of them are cmode=3, mode_ext=0
(i.e., last byte of the header is 0xC4)... so that doesn't help...
Okay, I found the answer to this queston... it was in the MPEGAudioInfo program on CodeProject site. Here is the vital key:
//*************************************************************************************
// This reference data is from MPEGAudioInfo app
// Samples per Frame / 8
static const u32 m_dwCoefficients[2][3] =
{
{ // MPEG 1
12, // Layer1 (must be multiplied with 4, because of slot size)
144, // Layer2
144 // Layer3
},
{ // MPEG 2, 2.5
12, // Layer1 (must be multiplied with 4, because of slot size)
144, // Layer2
72 // Layer3
}
};
It is unfortunately that none of the reference pages mention this detail !!
My program now successfully calculates frame sizes for all of my mp3 files, including the small ones.
I had the same problem. Some documents, I've read, don't define dividing by 2 in Frame-Size formula for MPEG2.5L3. But some src-code, I encountered - does.
It's hard to find out any proof.
I have nothing better than this link:
https://link.springer.com/chapter/10.1007/978-1-4615-0327-9_12
(it's better to share that link in "add a comment"-form, but I have insufficient rank)

Parsing Opus TOC Byte

For working with the Opus audio codec, I need to parse a TOC byte from a char buffer and extract the config, s and c segments, each representing a number.
I would need to store these numbers into separate variables for further use and am wondering what the best way to achieve this (in C++) is.
A well-formed Opus packet MUST contain at least one byte [R1].
This byte forms a table-of-contents (TOC) header:
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
| config |s| c |
+-+-+-+-+-+-+-+-+
The top five bits of the TOC byte, labeled "config", encode one of 32
possible configurations of operating mode, audio bandwidth, and frame
size.
One additional bit, labeled "s", signals mono vs. stereo, with 0
indicating mono and 1 indicating stereo.
The remaining two bits of the TOC byte, labeled "c", code the number
of frames per packet (codes 0 to 3).
This format is according to the RFC found here: https://www.rfc-editor.org/rfc/rfc6716#section-3.1
Thanks!
Just use a right shift (>>) to shift the bits to the low bits if needed, and a bitwise and (&) to mask out the other bits.
int config = (toc >> 3) & 31;
int s = (toc >> 2) & 1;
int c = toc & 3;

Default WAV description when all specs are "0"

I'm learning how to read WAV files in C++, and extract data according to the header. I have a few WAV files lying around. By looking at the header of all of them, I see that they all follow the rules of wave files. However, files recordings produced by TeamSpeak are weird, but they're still playable in media players.
So looking at the standard format of WAV files, it looks like this:
So in all files that look normal, I get legitimate values for all the values from "AudioFormat" up to "BitsPerSample" (from the picture). However, in TeamSpeak files, ALL these values are exactly zero.
This, but the first 3 values are not zero. So there's "RIFF" and "WAVE" in the first and third strings, and the ChunkSize seems legit.
So my question is: How does the player know anything about such a file and recognize that this file is mono or stereo? The sample rate? Anything about it? Is it like there's something standard to assume when all these values are zero?
Update
I examined the file with MediaInfo and got this:
General
Complete name : ts3_recording_16_10_02_17_53_54.wav
Format : Wave
File size : 2.45 MiB
Duration : 13 s 380 ms
Overall bit rate mode : Constant
Overall bit rate : 1 536 kb/s
Audio
Format : PCM
Format settings, Endianness : Little
Format settings, Sign : Signed
Codec ID : 1
Duration : 13 s 380 ms
Bit rate mode : Constant
Bit rate : 1 536 kb/s
Channel(s) : 2 channels
Sampling rate : 48.0 kHz
Bit depth : 16 bits
Stream size : 2.45 MiB (100%)
Still though don't understand how it arrived at these conclusions.
After examining your file with a hex editor with WAV binary templates, it is obvious that there is an additional "JUNK" chunk before the "fmt" one (screenshot attached). The JUNK chunk is possibly there for some padding reasons, but all it's values are 0s. You need to seek (fseek maybe) the wav file in your code for the first occurrence of "fmt" bytes and parse the WAVEFORMATEX info from there.

Building a fast PNG encoder issues

I am trying to build a fast 8-bit greyscale PNG encoder. Unfortunately I must be misunderstanding part of the spec. Smaller image sizes seem to work, but the larger ones will only open in some image viewers. This image (with multiple DEFLATE Blocks) gives a
"Decompression error in IDAT" error in my image viewer but opens fine in my browser:
This image has just one DEFLATE block but also gives an error:
Below I will outline what I put in my IDAT chunk in case you can easily spot any mistakes (note, images and steps have been modified based on answers, but there is still a problem):
IDAT length
"IDAT" in ascii (literally the bytes 0x49 0x44 0x41 0x54)
Zlib header 0x78 0x01
Steps 4-7 are for every deflate block, as the data may need to be broken up:
The byte 0x00 or 0x01, depending on if it is a middle or the last block.
Number of bytes in block (up to 2^16-1) stored as a little endian 16-bit integer
The 1's complement of this integer representation.
Image data (each scan-line is starts with a zero-byte for the no filter option in PNG, and is followed by width bytes of greyscale pixel data)
An adler-32 checksum of all the image data
A CRC of all the IDAT data
I've tried pngcheck on linux, an it does not spot any errors. If nobody can see what is wrong, can you point me in the right direction for a debugging tool?
My last resort is to use the libpng library to make my own decoder, and debug from there.
Some people have suggested it may be my adler-32 function calculation:
static uint32_t adler32(uint32_t height, uint32_t width, char** pixel_array)
{
uint32_t a=1,b=0,w,h;
for(h=0;h<height;h++)
{
b+=a;
for(w=0;w<width;w++)
{
a+=pixel_array[h][w];
b+=a;
}
}
return (uint32_t)(((b%65521)*65536)|(a%65521));
}
Note that because the pixel_array passed to the function does not contain the zero-byte at the beginning of each scanline (needed for PNG) there is an extra b+=a (and implicit a+=0) at the beginning of each iteration of the outer loop.
I do get an error with pngcheck: "zlib: inflate error = -3 (data error)". As your PNG scaffolding structure looks okay, it's time to take a low-level look into the IDAT block with a hex viewer. (I'm going to type this up while working through it.)
The header looks alright; IDAT length is okay. Your zlib flags are 78 01 ("No/low compression", see also What does a zlib header look like?), where one of my own tools use 78 9C ("Default compression"), but then again, these flags are only informative.
Next: zlib's internal blocks (per RFC1950).
Directly after the compression flags (CMF in RFC1950) it expects FLATE compressed data, which is the only compression scheme zlib supports. And that is in another castle RFC: RFC1951.
Each separately compression block is prepended by a byte:
3.2.3. Details of block format
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
...
BFINAL is set if and only if this is the last block of the data
set.
BTYPE specifies how the data are compressed, as follows:
00 - no compression
01 - compressed with fixed Huffman codes
10 - compressed with dynamic Huffman codes
11 - reserved (error)
So this value can be set to 00 for 'not last block, uncompressed' and to 01 for 'last block, uncompressed', immediately followed by the length (2 bytes) and its bitwise inverse, per 3.2.4. Non-compressed blocks (BTYPE=00):
3.2.4. Non-compressed blocks (BTYPE=00)
Any bits of input up to the next byte boundary are ignored.
The rest of the block consists of the following information:
0 1 2 3 4...
+---+---+---+---+================================+
| LEN | NLEN |... LEN bytes of literal data...|
+---+---+---+---+================================+
LEN is the number of data bytes in the block. NLEN is the
one's complement of LEN.
They are the final 4 bytes in your IDAT segment. Why do small images work, and larger not? Because you only have 2 bytes for the length.1 You need to break up your image into blocks no larger than 65,535 bytes (in my own PNG creator I seem to have used 32,768, probably "for safety"). If the last block, write out 01, else 00. Then add the two times two LEN bytes, properly encoded, followed by exactly LEN data bytes. Repeat until done.
The Adler-32 checksum is not part of this Flate-compressed data, and should not be counted in the blocks of LEN data. (It is still part of the IDAT block, though.)
After re-reading your question to verify I addressed all of your issues (and confirming I spelled "Adler-32" correctly), I realized you describe all of the steps right -- except that the 'last block' indicator is 01, not 80 (later edit: uh, perhaps you are right about that!) -- but that it does not show in this sample PNG. See if you can get it to work following all of the steps by the letter.
Kudos for doing this 'by hand'. It's a nice exercise in 'following the specs', and if you get this to work, it may be worthwhile to try and add proper compression. I shun pre-made libraries as much as possible; the only allowance I made for my own PNG encoder/decoder was to use Rich Geldreich's miniz.c, because implementing proper Flate encoding/decoding is beyond my ken.
1 That's not the whole story. Browsers are particularly forgiving in HTML errors; it seems they are as forgiving for PNG errors as well. Safari displays your image just fine, and so does Preview. But they may just all be sharing OS X's PNG decoder, because Photoshop rejects the file.
The byte 0x00 or 0x80, depending on if it is a middle or the last block.
Change the 0x80 to 0x01 and all will be well.
The 0x80 is appearing as a stored block that is not the last block. All that's being looked at is the low bit, which is zero, indicating a middle block. All of the data is in that "middle" block, so a decoder will recover the full image. Some liberal PNG decoders may then ignore the errors it gets when it tries to decode the next block, which isn't there, and then ignore the missing check values (Adler-32 and CRC-32), etc. That's why it shows up ok in browsers, even though it is an invalid PNG file.
There are two things wrong with your Adler-32 code. First, you are accessing the data from a char array. char is signed, so your 0xff bytes are being added not as 255, but rather as -127. You need to make the array unsigned char or cast it to that before extracting byte values from it.
Second, you are doing the modulo operation too late. You must do the % 65521 before the uint32_t overflows. Otherwise you don't get the modulo of the sum as required by the algorithm. A simple fix would be to do the % 65521 to a and b right after the width loop, inside the height loop. This will work so long as you can guarantee that the width will be less than 5551 bytes. (Why 5551 is left as an exercise for the reader.) If you cannot guarantee that, then you will need to embed a another loop to consume bytes from the line until you get to 5551 of them, do the modulo, and then continue with the line. Or, a smidge slower, just run a counter and do the modulo when it gets to the limit.
Here is an example of a version that works for any width:
static uint32_t adler32(uint32_t height, uint32_t width, unsigned char ** pixel_array)
{
uint32_t a = 1, b = 0, w, h, k;
for (h = 0; h < height; h++)
{
b += a;
w = k = 0;
while (k < width) {
k += 5551;
if (k > width)
k = width;
while (w < k) {
a += pixel_array[h][w++];
b += a;
}
a %= 65521;
b %= 65521;
}
}
return (b << 16) | a;
}