What is the default size unit of unrar? - compression

I have rar files but want to see the uncompressed size without unraring them.
I am using unrar through the command line.
I am using unrar v which lists:
Attributes Size Packed Ratio Date Time Checksum Name
----------- --------- -------- ----- ---------- ----- -------- ----
32837413376 5881611972 17% 1711
I am wondering what that size is? Bytes? MB? GB? i tried conversions. I unpacked this one personally and the size is 92 Gigs. I check in the command line by using dn -ch ./* as well as the windows file explorer properties.
Im hoping the ration does not have a play in this factor

Related

Removing null bytes from a file results in larger output after XZ compression

I am developing a custom file format consisting of a large number of integers. I'm using XZ to compress the data as it has the best compression ratio of all compression algorithms I've tested.
All integers are stored as u32s in RAM, but they are all a maximum of 24 bits large, so I updated the serializer to skip the high byte (as it will always be 0 anyways), to try to compress the data further. However, this had the opposite effect: the compressed file is now larger than the original.
$ xzcat 24bit.xz | hexdump -e '3/1 "%02x" " "'
020000 030000 552d07 79910c [...] b92c23 c82c23
$ xzcat 32bit.xz | hexdump -e '4/1 "%02x" " "'
02000000 03000000 552d0700 79910c00 [...] b92c2300 c82c2300
$ xz -l 24bit.xz 32bit.xz
Strms Blocks Compressed Uncompressed Ratio Check Filename
1 1 82.4 MiB 174.7 MiB 0.472 CRC64 24bit.xz
1 1 77.2 MiB 233.0 MiB 0.331 CRC64 32bit.xz
-------------------------------------------------------------------------------
2 2 159.5 MiB 407.7 MiB 0.391 CRC64 2 files
Now, I wouldn't have an issue if the size of the file had remained the same, as a perfect compressor would detect that all of those bytes are redundant anyways, and compress them all down to practically nothing. However, I do not understand how removing data from the source file can possibly result in a larger file after compression?
I've tried changing the LZMA compressor settings, and xz --lzma2=preset=9e,lc=4,pb=0 yielded a slightly smaller file at 82.2M, but this is still significantly larger than the original file.
For reference, both files can be downloaded from https://files.spaceface.dev/sW1Xif/32bit.xz and https://files.spaceface.dev/wKXEQm/24bit.xz.
The order of the integers is somewhat important, so naively sorting the entire file won't work. The file is made up of different chunks, and the numbers making up each chunk are currently sorted for slightly better compression; however, the order does not matter, just the order of the chunks themselves.
Chunk 1: 000002 000003 072d55 0c9179 148884 1e414b
Chunk 2: 00489f 0050c5 0080a6 0082f0 0086f6 0086f7 01be81 03bdb1 03be85 03bf4e 04dfe6 04dfea 0583b1 061125 062006 067499 07d7e6 08074d 0858b8 09d35d 09de04 0cfd78 0d06be 0d3869 0d5534 0ec366 0f529c 0f6d0d 0fecce 107a7e 107ab3 13bc0b 13e160 15a4f9 15ab39 1771e3 17fe9c 18137d 197a30 1a087a 1a2007 1ab3b9 1b7d3c 1ba52c 1bc031 1bcb6b 1de7d2 1f0866 1f17b6 1f300e 1f39e1 1ff426 206c51 20abbe 20cbbc 211a58 211a59 215f73 224ea8 227e3f 227eab 22f3b7 231aef 004b15 004c86 0484e7 06216e 08074d 0858b8 0962ed 0eb020 0ec366 1a62c2 1fefae 224ea8 0a2701 1e414b
Chunk 3: 000006 003b17 004b15 004b38 [...]

Concatenating text and binary data into one file

I am developing an application, and I have several pieces of data that I want to be able to save to and open from the same file. The first is several lines that are essentially human readable that store simple data about certain attributes. The data is stored in AttributeList objects that support operator<< and operator>>. The rest are .png images which I have loaded into memory.
How can I save all this data to one file in such a way that I can then easily read it back into memory? Is there a way to store the image data in memory that will make this easier?
How can I save all this data to one file in such a way that I can then
easily read it back into memory? Is there a way to store the image
data in memory that will make this easier?
Yes.
In an embedded system I once worked on, the requirement was to capture sysstem configuration into a ram file system. (1 Meg byte)
We used zlib to compress and 'merge' multiple files into a single storage file.
Perhaps any compression system can work for you. On Linux, I would use popen() to run gzip or gunzip, etc.
update 2017-08-07
In my popen demo (for this question), I build the command string with standard shell commands:
std::string cmd;
cmd += "tar -cvf dumy514DEMO.tar dumy*V?.cc ; gzip dumy514DEMO.tar ; ls -lsa *.tar.gz";
// tar without compression ; next do compress
Then construct my popen-wrapped-in-a-class instance and invoke the popen read action. There is normally very little feedback to the user (as is the style of UNIX Philosophy, i.e. no success messages), so I included (for this demo) the -v (for verbose option). The resulting feedback lists the 4 files tar'd together, and I list the resulting .gz file.
dumy514V0.cc
dumy514V1.cc
dumy514V2.cc
dumy514V3.cc
8 -rw-rw-r-- 1 dmoen dmoen 7983 Aug 7 17:23 dumy514DEMO.tar.gz
And a snippet from the dir listing shows my executable, my source code, and the newly created tar.gz.
-rwxrwxr-x 1 dmoen dmoen 86416 Aug 7 17:18 dumy514DEMO
-rw-rw-r-- 1 dmoen dmoen 13576 Aug 7 17:18 dumy514DEMO.cc
-rw-rw-r-- 1 dmoen dmoen 7983 Aug 7 17:23 dumy514DEMO.tar.gz
As you can see, the tar.gz is about 8000 bytes. The 4 files add to about 70,000 bytes.

Inconsistent results in compressing with zlib between Win32 and Linux-64 bit

Using zlib in a program and noticed a one bit difference in how "foo" is compressed on Windows 1F8B080000000000000A4BCBCF07002165738C03000000 and Linux 1F8B08000000000000034BCBCF07002165738C03000000. Both decompress back to "foo"
I decided to check outside our code to see if the implementation was correct and used the test programs in the zlib repository to double check. I got the same results:
Linux: echo -n foo| ./minigzip64 > text.txt'
Windows: echo|set /p="foo" | minigzip > text.txt
What would account for this difference? Is it a problem?
1F8B 0800 0000 0000 000 *3/A* 4BCB CF07 0021 6573 8C03 0000 00
First off, if it decompresses to what was compressed, then it's not a problem. Different compressors, or the same compressor at different settings, or even the same compressor with the same settings, but different versions, can produce different compressed output from the same input.
Second, the compressed data in this case is identical. Only the last byte
of the gzip header that precedes the compressed data is different. That byte identifies the originating operating system. Hence it rightly varies between Linux and Windows.
Even on the same operating system, the header can vary since it carries a modification date and time. However in both your cases the modification date and time was left out (set to zeros).
Just to add to the accepted answer here. I got curious and tried out for myself, saving the raw data and opening with 7zip:
Windows:
Linux:
You can immediately notice that the only field that's different is the Host OS.
What the data means
Header Data Footer
1F8B080000000000000A | 4BCBCF0700 | 2165738C03000000
Let's break that down.
Header
First, from this answer I realize it's actually a gzip instead of a zlib header:
Level ZLIB GZIP
1 | 78 01 | 1F 8B
9 | 78 DA | 1F 8B
Further searching led me to an article about Gzip on forensics wiki.
The values in this case are:
Offset Size Value Description
0 | 2 | 1f8b | Signature (or identification byte 1 and 2)
2 | 1 | 08 | Compression Method (deflate)
3 | 1 | | Flags
4 | 4 | | Last modification time
8 | 1 | | Compression flags (or extra flags)
9 | 1 | 0A | Operating system (TOPS-20)
Footer
Offset Size Value Description
0 | 4 | 2165738C | Checksum (CRC-32) (Little endian)
4 | 4 | 03 | Uncompressed data size Value in bytes.
Interesting thing to note here is that even if the Last modification time
and Operating system in header is different, it will compress to the same data
with the same checksum in the footer.
The IETF RFC has a more detailed summary of the format

Default WAV description when all specs are "0"

I'm learning how to read WAV files in C++, and extract data according to the header. I have a few WAV files lying around. By looking at the header of all of them, I see that they all follow the rules of wave files. However, files recordings produced by TeamSpeak are weird, but they're still playable in media players.
So looking at the standard format of WAV files, it looks like this:
So in all files that look normal, I get legitimate values for all the values from "AudioFormat" up to "BitsPerSample" (from the picture). However, in TeamSpeak files, ALL these values are exactly zero.
This, but the first 3 values are not zero. So there's "RIFF" and "WAVE" in the first and third strings, and the ChunkSize seems legit.
So my question is: How does the player know anything about such a file and recognize that this file is mono or stereo? The sample rate? Anything about it? Is it like there's something standard to assume when all these values are zero?
Update
I examined the file with MediaInfo and got this:
General
Complete name : ts3_recording_16_10_02_17_53_54.wav
Format : Wave
File size : 2.45 MiB
Duration : 13 s 380 ms
Overall bit rate mode : Constant
Overall bit rate : 1 536 kb/s
Audio
Format : PCM
Format settings, Endianness : Little
Format settings, Sign : Signed
Codec ID : 1
Duration : 13 s 380 ms
Bit rate mode : Constant
Bit rate : 1 536 kb/s
Channel(s) : 2 channels
Sampling rate : 48.0 kHz
Bit depth : 16 bits
Stream size : 2.45 MiB (100%)
Still though don't understand how it arrived at these conclusions.
After examining your file with a hex editor with WAV binary templates, it is obvious that there is an additional "JUNK" chunk before the "fmt" one (screenshot attached). The JUNK chunk is possibly there for some padding reasons, but all it's values are 0s. You need to seek (fseek maybe) the wav file in your code for the first occurrence of "fmt" bytes and parse the WAVEFORMATEX info from there.

Windows Get list of ALL files on volume with size

Question: how to list all files on volume with size they occupy on disk?
Applicable solutions:
cmd script
free tool with sqlite/txt/xls/xml/json output
C++ / winapi code
The problem:
There are many tools and apis to list files, but their results dont match chkdsk and actual free space info:
Size Count (x1000)
chkdsk c: 67 GB 297
dir /S 42 GB 267
FS Inspect 47 GB 251
Total Commander (Ctrl+L) 47 GB 251
explorer (selection size) 44 GB 268
explorer (volume info) 67 GB -
WinDirStat 45 GB 245
TreeSize couldn't download it - site unavailable
C++ FindFirstFile/FindNextFile 50 GB 288
C++ GetFileInformationByHandleEx 50 GB 288
Total volume size is 70 GB, about 3 GB is actually free.
I'm aware of:
File can occupy on disk, more than its actual size, i need the size it occupies (i.e. greater one)
Symlinks, Junctions etc - that would be good to see them (though i don't think this alone can really give 20 GB difference in my case)
Filesystem uses some space for indexes and system info (chkdisk shows negligible, don't give 20 GB)
I run all tools with admin privileges, hidden files are shown.
FindFirstFile/FindNextFile C++ solution - this dont give correct results, i don't know because of what, but this gives the same as Total commander NOT the same as chkdsk
Practical problem:
I have 70 GB SSD disk, all the tools report about 50 GB is occupied, but in fact it's almost full.
Format all and reinstall - is not an option since this will happens again quite soon.
I need a report about filesizes. Report total must match actual used and free space. I'm looking for an existing solution - a tool, a script or a C++ library or C++ code.
(Actual output below)
chkdsk c:
Windows has scanned the file system and found no problems.
No further action is required.
73715708 KB total disk space.
70274580 KB in 297259 files.
167232 KB in 40207 indexes.
0 KB in bad sectors.
463348 KB in use by the system.
65536 KB occupied by the log file.
2810548 KB available on disk.
4096 bytes in each allocation unit.
18428927 total allocation units on disk.
702637 allocation units available on disk.
dir /S
Total Files Listed:
269966 File(s) 45 071 190 706 bytes
143202 Dir(s) 3 202 871 296 bytes free
FS Inspect http://sourceforge.net/projects/fs-inspect/
47.4 GB 250916 Files
Total Commander
49709355k, 48544M 250915 Files
On a Posix system, the answer would be to use the stat function. Unfortunately, it does not give the number of allocated blocs in Windows so it does not meet your requirements.
The correct function from Windows API is GetFileInformationByHandleEx. You can use FindFirstFile, FindNextFile to browse the full disk, and ask for FileStandardInfo to get a FILE_STANDARD_INFO that contains for a file (among other fields): LARGE_INTEGER AllocationSize for the allocated size and LARGE_INTEGER EndOfFile for the used size.
Alternatively, you can use directly GetFileInformationByHandleEx on directories, asking for FileIdBothDirectoryInfo to get a FILE_ID_BOTH_DIR_INFO structure. This allows you to get information on many files in a single call. My advice would be to use that one, even if it is of less common usage.
To get list of all files (including hidden and system files), sorted within directories with descending size, you can go to your cmd.exe and type:
dir /s/a:-d/o:-s C:\* > "list_of_files.txt"
Where:
/s lists files within the specified directory and all subdirectories,
/a:-d lists only files (no directories),
/o:-s put files within directory in descending size order,
C:\* means all directories on disk C,
> "list_of_files.txt" means save output to list_of_files.txt file
Listing files grouped by directory may be a little inconvenient, but it's the easiest way to list all files. For more information, take a look at technet.microsoft.com
Checked on Win7 Pro.