Need help from a zLIB expert for VB.NET function - mp3

Need to know if I'm wasting my time on this. Using UltraID3lib which does not decompress frames but stores them in an array using an exception function. The flags used says they are compressed but not Encrypted.
If the bytes are indeed zLIB compressed and in the the correct format:
How can I decompress them, given that fact I know absolutely nothing about zLIB and I'm just a part time coder who was drop on he's head as a child. (Please explain slowly).
The MP3 user-defined frame (TXXX) holds a small xml string.
A fast (bad example) to get the byte array stored by UltraID3Lib:
UltraID3.Read(MP3FileName) 'actual file in folder
Dim byte1 As ID3v23EncryptedCompressedFrame
For Each byte1 In UltraID3.ID3v2Tag.Frames
Dim str1 = byte1.FrameBytes
Dim result1 = BytesToString2(str1)
Stop 'lets see what we got
Next
This site says if it has 789C near the beginning its zLib compressed:
http://www.xtremevbtalk.com/showthread.php?t=318843
I used these function2 to convert to hex:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/fa53ce74-fd53-4d2a-bc05-619fb9d32481/convert-byte-array-to-hex-string?forum=vbgeneral
example function 1 at start of article:
000B0789C6330377433D63534D575F3F737B570343767B02929CA2C4B2D4BCD2B29B6B31D376367989B9A976C519F9E5ACE1989452536FA6019B924C206968017A10CA461F2C6AA3FD58A61427E5E72AA42228A114666E6F88CD04772110D5923799
example function 2 at end of article:
000000B0789C6330377433D63534D575F3F737B570343767B02929CA2C4B2D4BCD2B29B6B301D376367989B9A976C519F9E50ACE1989452536FA60019B924C20696800017A10CA461F2C6AA30FD58A61427E5E72AA42228A114666E6F88CD047721100D5923799

Your "example function 2" is a hex representation of a valid zlib stream, starting with the 789c, which decompresses to:
71F3-15-FOO58A77<trivevents><event><name>show Chart</name><time>10000000.000000</time></event><event><name>show once a</name><time>26700000.000000</time></event></trivevents>
However "example function 1" is a corrupted version of "example function 2", with, for some reason, several missing zero digits.
You can use the .NET DeflateStream class to decompress.

Related

Cannot read .jpg binary data, buffer only has 4 bytes of data

My question almost exactly the same as this one which is unanswered. I am trying to read the binary data of a .jpg to send as an HTTP response on a simple web server using C++. The code for reading the data is below.
FILE *f = fopen(file.c_str(),"rb");
if(f){
fseek(f,0,SEEK_END);
int length = ftell(f);
fseek(f,0,SEEK_SET);
char* buffer = (char*)malloc(length+1);
if(buffer){
int b = fread(buffer,1,length,f);
std::cout << "bytes read: " << b << std::endl;
}
fclose(f);
buffer[length] = '\0';
return buffer;
}
return NULL;
When the request for the image is made and this code runs, fread() returns 25253 bytes being read, which seems correct. However, when I perform strlen(buffer) I get only 4. Of course, this gives an error on a browser when the image tries to display. I have also tried manually setting the HTTP content length to 25253 but I then a receive a curl error 18, indicating the transfer ended early (as only 4 bytes exist).
As the other poster mentioned in their question, the 5th byte of the image (and I assume most .jpg images) is 0x00, but I am unsure if this has an effect on saving to the buffer.
I have verified the .jpg images I am loading are in the directory, valid, and display properly when opened normally. I have also tried 2 different methods of loading the binary data, and both also give only 4 bytes, so I am really at a loss. Any help is much appreciated.
When the request for the image is made and this code runs, fread()
returns 25253 bytes being read, which seems correct. However, when I
perform strlen(buffer) I get only 4.
Well there is your problem: You read binary data, not text, meaning that special characters like newline or the null character is not a something that indicates the structure of a text, its simple numbers.
strlen is a function to give you the count of characters other than '\0' or simply 0. However in a binary file like jpeg there a dozen of zeros usually in there, and because of a binary header structure, there seems to be always a zero at position 5 so, so strlen will stop at the first it found and return 4.
Also you seem confused by the fact that you try to send this "text interpreted" jpeg to a HTTP server. Of course it will complain, because you can not simply send binary data as text in HTTP, you either have to encode it, base64 is very popular, or set the content length header. Of course you also have to tell the HTTP client/server the type by setting the proper MIME header.

ifstream read: where are extra bytes coming from?

I am reading WHOIS record files. The first line of a sample file reads, in the editor: "id:0--0.ga"
In code, I check to verify that the first line starts with "id:" as follows:
// given ifstream * fs,
char id[3];
streampos pos = fs-> tellg();
fs -> read(&id[0],3);
fs -> seekg(pos);
if (// id[3] is "id:" ...
However, when I do this (and I am running a debugger; further it is compiled with clang rather than gcc), I get the following result in id:
The characters it read, in addition to an 'i', 'd', and ':' were:
\xb87#_?
Where the question mark has a stop sign around it. I am not sure how I could have read anything "extra," seeing as I am only reading three bytes into an array of the proper length...
Further, the if statement evaluates to true.
Could this just be a coding mistake, an error in the debugger, or is something else going on?
The debugger is assuming that id contains a string, which it does not. You should probably just ignore the debugger when looking at things that aren't stored in formats you expect the debugger to understand.
The alternative is to mentally convert the debugger's display into the raw memory contents and then mentally parse the raw memory contents in the correct format. We have some area of memory which, if understood to contain a string would mean "id:\xb87#_? ..." so that same area of memory, if understood to be an array of only three characters, would be "id:".

Why is a different zlib window bits value required for extraction, compared with compression?

I am trying to debug a problem with some code that uses zlib 1.2.8. The problem is that this larger project can make archives, but runs into Z_DATA_ERROR header problems when trying to extract that archive.
To do this, I wrote a small test program in C++ that compresses ("deflates") a specified regular file, writes the compressed data to a second regular file, and extracts ("inflates") to a third regular file, one line at a time. I then diff the first and third files to make sure I get the same bytes.
For reference, this test project is located at: https://github.com/alexpreynolds/zlib-test and compiles under Clang (and should also compile under GNU GCC).
My larger question is how to deal with header data correctly in my larger project.
In my first test scenario, I can set up compression machinery with the following code:
z_error = deflateInit(this->z_stream_ptr, ZLIB_TEST_COMPRESSION_LEVEL);
Here, ZLIB_TEST_COMPRESSION_LEVEL is 1, to provide best speed. I then run deflate() on the z_stream pointer until there is nothing left that comes out of compression.
To extract these bytes, I can use inflateInit():
int ret = inflateInit(this->z_stream_ptr);
So what is the header format, in this case?
In my second test scenario, I set up the deflate machinery like so:
z_error = deflateInit2(this->z_stream_ptr,
ZLIB_TEST_COMPRESSION_LEVEL,
ZLIB_TEST_COMPRESSION_METHOD,
ZLIB_TEST_COMPRESSION_WINDOW_BITS,
ZLIB_TEST_COMPRESSION_MEM_LEVEL,
ZLIB_TEST_COMPRESSION_STRATEGY);
These deflate constants are, respectively, 1 for level, Z_DEFLATED for method, 15+16 or 31 for window bits, 8 for memory level, and Z_DEFAULT_STRATEGY for strategy.
The former inflateInit() call does not work; instead, I must use inflateInit2() and specify a modified window bits value:
int ret = inflateInit2(this->z_stream_ptr, ZLIB_TEST_COMPRESSION_WINDOW_BITS + 16);
In this case, the window bits value is not 31 as in the deflateInit2() call, but 15+32 or 47.
If I use 31 (or any other value than 47), then I get a Z_DATA_ERROR on subsequent inflate() calls. That is, if I use the same window bits for the inflateInit2() call:
int ret = inflateInit2(this->z_stream_ptr, ZLIB_TEST_COMPRESSION_WINDOW_BITS);
Then I get the following error on attempting to inflate():
Error: inflate to stream failed [-3]
Here, -3 is the same as Z_DATA_ERROR.
According to the documentation, using 31 with deflateInit2() should write a gzip header and trailer. Thus, 31 on the following inflateInit2() call should be expected to be able to extract the header information.
Why is the modified value 47 working, but not 31?
My test project is mostly similar to the example code on the zlib site, with the exception of the extraction/inflation code, which inflates one z_stream chunk at a time and parses the output for newline characters.
Is there something special about running inflate() only when a new buffer of extracted data is asked for — like header information going missing between inflate() calls — as opposed to running the whole extraction in one pass, as in the zlib example code?
My larger debugging problem is looking for a robust way to extract a chunk of zlib-compressed data only on request, so that I can extract data one line at a time, as opposed to getting the whole extracted file. Something about the way I am handling the zlib format parameter seems to be messing me up, but I can't figure out why or how to fix this.
deflateInit() and inflateInit(), as well as deflateInit2() and inflateInit2() with windowBits in 0..15 all process zlib-wrapped deflate data. (See RFC 1950 and RFC 1951.)
deflateInit2() and inflateInit2() with negative windowBits in -1..-15 process raw deflate data with no header or trailer. deflateInit2() and inflateInit2() with windowBits in 16..31, i.e. 16 added to 0..15, process gzip-wrapped deflate data (RFC 1952). inflateInit2() with windowBits in 32..47 (32 added to 0..15) will automatically detect either a gzip or zlib header (but not raw deflate data), and decompress accordingly.
Why is the modified value 47 working, but not 31?
31 does work. I did not try to look at your code to debug it.
Is there something special about running inflate() only when a new
buffer of extracted data is asked for — like header information going
missing between inflate() calls — as opposed to running the whole
extraction in one pass, as in the zlib example code?
I can't figure out what you're asking here. Perhaps a more explicit example would help. The whole point of inflate() is to decompress a chunk at a time.

I get "Invalid utf 8 error" when checking string, but it seems correct when i use std::cout

I am writing some code that must read utf 8 encoded text files, and send them to OpenGL.
Also using a library which i downloaded from this site: http://utfcpp.sourceforge.net/
When i write down this i can show the right images on OpenGL window:
std::string somestring = "abcçdefgğh";
// Convert string to utf32 encoding..
// I also set local on program startup.
But when i read the utf8 encoded string from file:
The library warns me about that the string has not a valid utf encoding
I can't send the 'read from file' string to OpenGL. It crashes.
But i can still use std::cout for the string that i read from file (it looks right).
I use this code to read from file:
void something(){
std::ifstream ifs("words.xml");
std::string readd;
if(ifs.good()){
while(!ifs.eof()){
std::getline(ifs, readd);
// do something..
}
}
}
Now the question is:
If the string which is read from file is not correct, how does it look as expected when i check it with std::cout?
How can i get this issue solved?
Thanks in advance:)
The shell to which you write output is probably rather robust against characters it doesn't understand. It seems, not all of the used software is. It should, however, be relatively straight forward to verify if you byte sequence is a valid UTF-8 sequence: the UTF-8 encoding is relatively straight forward:
each code point starts with a byte representing the number of bytes to be read and the first couple of bytes:
if the high bit is 0, the code point consists of one byte represented by the 7 lower bits
otherwise the number of leading 1 bits represent the total number of bytes followed by a zero bit (obiously) and the remaining bits become the high bits of the code point
since 1 byte is already represented, bytes with the high bit set and the next bit not set are continuation bytes: the lower 6 bits are part of the representation of the code point
Based on these rules, there are two things which can go wrong and make the UTF-8 invalid:
a continuation byte is encountered at a point where a start byte is expected
there was a start byte indicating more continuation bytes then followed
I don't have code around which could indicate where things are going wrong but it should be fairly straight forward to write such code.

Why sync-safe integer?

I'm recently working on ID3v2.4.0.
Reading 2.4.0 document, i found a particular part that i can't understand - sync-safe integer.
Why does the ID3v2 use this method?
Of course, i know why the ID3v2 uses Unsynchronization scheme, which is used to keep MPEG decoder from considering ID3 tag as a MPEG sync data.
But what i couldn't understand is that why sync-safe integer instead of Unsynchronization Scheme (= inserting $00).
Is there any reason why they adopt sync-safe integer when expressing tag size instead of inserting $00?
These two method result in completely same effect. 
ID3v2 document says that the size of unsynchronized data is not known in advance.
But that statement does not make sense.
If tag data is stored in buffer, one can know the size of unsynchronized data after simply replacing the problematic character with $FF 00.
Is there anyone who can help me?
I would presume for simplicity, and the unsynch/synch scheme only makes sense when used on an mpeg file.
It is trivial to read in the four bytes and convert them to a regular integer:
// pseudo code
uint32_t size;
file.read( &size, sizeof(uint32_t) );
size = (size & 0x0000007F) |
( (size & 0x00007F00) >> 1 ) |
( (size & 0x007F0000) >> 2 ) |
( (size & 0x7F000000) >> 3 );
If they used the same unsynch scheme as frame data you would need to read each byte separately, look for the FF00 pattern, and reconstruct the integer byte by byte. Also, if the ‘size’ field in the header could be a variable number of bytes, due to unsynch bytes being inserted, the entire header would be a variable number of bytes. Simpler for them to say 'the header is always 10 bytes in size and it looks like this...'.
ID3v2 document says that the size of unsynchronized data is not known in advance. But that statement does not make sense. If tag data is stored in buffer, one can know the size of unsynchronized data after simply replacing the problematic character with $FF 00.
You are correct, it doesn't make sense. The size written in the id3v2 header and frame headers is the size after unsynchronisation, if any, was applied. However, it is permissible to write frame data without unsynching as id3v2 may be used for tagging files other than mp3, where the concept of unsynch/synch makes no sense. I think what section 6.2 was trying to say is 'regardless of whether this is an mp3 file, or a frame is written unsynched/synched, the frame size is always written in a mpeg synch-safe manner'.
ID3v2.4 frames can have the ‘Data Length Indicator’ flag set in the frame header, in which case you can find out how big a buffer is after synchronisation. Refer to section 4.1.2 of the spec.
Is there anyone who can help me?
Some helpful advice from someone who has written a conforming id3v2 tag reader: Don't try make sense of the spec. It surely was written by madmen and sadists. Just looking at it again is giving me nightmares.