TAR file format issue

TAR file format issue - compression

It is unclear to me, what is a correct .tar file format, as I am experiencing proper functionality with three scenarios (see below).
Based on .tar specification I have been working with, the magic field (ustar) is null-terminated character string and version field is octal number with no trailing nulls.
However I've review several .tar files I found on my server and I found different implementation of magic and version field and all three of them seems to work properly, probably because system ignore those fields.
See different (3) bytes between words ustar and root in the following examples >>
Scenario 1 (20 20 00):
000000F0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ............
000000FC 00 00 00 00 | 00 75 73 74 | 61 72 20 20 .....ustar
00000108 00 72 6F 6F | 74 00 00 00 | 00 00 00 00 .root.......
00000114 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ............
Scenario 2 (00 20 20):
000000F0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ............
000000FC 00 00 00 00 | 00 75 73 74 | 61 72 00 20 .....ustar.
00000108 20 72 6F 6F | 74 00 00 00 | 00 00 00 00 root.......
00000114 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ............
Scenario 3 (00 00 00):
000000F0 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ............
000000FC 00 00 00 00 | 00 75 73 74 | 61 72 00 00 .....ustar..
00000108 00 72 6F 6F | 74 00 00 00 | 00 00 00 00 .root.......
00000114 00 00 00 00 | 00 00 00 00 | 00 00 00 00 ............
Which one is the correct format?

In my opinion none of your examples is the correct one, at least not for the POSIX format.
As you can read here:
/* tar Header Block, from POSIX 1003.1-1990. */
/* POSIX header */
struct posix_header { /* byte offset */
char name[100]; /* 0 */
char mode[8]; /* 100 */
char uid[8]; /* 108 */
char gid[8]; /* 116 */
char size[12]; /* 124 */
char mtime[12]; /* 136 */
char chksum[8]; /* 148 */
char typeflag; /* 156 */
char linkname[100]; /* 157 */
char magic[6]; /* 257 */
char version[2]; /* 263 */
char uname[32]; /* 265 */
char gname[32]; /* 297 */
char devmajor[8]; /* 329 */
char devminor[8]; /* 337 */
char prefix[155]; /* 345 */
};
#define TMAGIC "ustar" /* ustar and a null */
#define TMAGLEN 6
#define TVERSION "00" /* 00 and no null */
#define TVERSLEN 2
The format of your first example (Scenario 1) seems to be matching with the old GNU header format:
/* OLDGNU_MAGIC uses both magic and version fields, which are contiguous.
Found in an archive, it indicates an old GNU header format, which will be
hopefully become obsolescent. With OLDGNU_MAGIC, uname and gname are
valid, though the header is not truly POSIX conforming */
#define OLDGNU_MAGIC "ustar " /* 7 chars and a null */
In both your second and third examples (Scenario 2 and Scenario 3), the version field is set to an unexpected value (according to the above documentation, the correct value should be 00 ASCII or 0x30 0x30 hex), so this field is most likely ignored.

With Fedora 18 if I execute this command:
tar --format=posix -cvf testPOSIX.tar test.txt
I have a POSIX tar file format with: ustar\0 (0x757374617200)
else if I execute this:
tar --format=gnu -cvf testGNU.tar test.txt
I have a GNU tar file format with: ustar 0x20 0x20 0x00 (0x7573746172202000) (old gnu format)
From /usr/share/magic file:
# POSIX tar archives
257 string ustar\0 POSIX tar archive
!:mime application/x-tar # encoding: posix
257 string ustar\040\040\0 GNU tar archive
!:mime application/x-tar # encoding: gnu
0x20 is 40 in octal.
I've also tried to edit the hex code with:
00 20 20
and however the tar worked correctly. I've exctract test.txt without problem.
but when I've tried to edit the hex code with:
00 00 00
The tar was not recognized.
So, my conclusion is that the correct format is:
20 20 00

Related

Memory Leak with Openssl when allocating memory for X509_STORE

I am using openssl in my project. When I exit my application I get "Detected memory leaks!" in Visual Studio 2013.
Detected memory leaks!
Dumping objects ->
{70202} normal block at 0x056CB738, 12 bytes long.
Data: <8 j > 38 E8 6A 05 00 00 00 00 04 00 00 00
{70201} normal block at 0x056CB6E8, 16 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{70200} normal block at 0x056CB698, 20 bytes long.
Data: < l > 00 00 00 00 E8 B6 6C 05 00 00 00 00 04 00 00 00
{70199} normal block at 0x056AE838, 12 bytes long.
Data: < l > 04 00 00 00 98 B6 6C 05 00 00 00 00
{70198} normal block at 0x056CB618, 64 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{70197} normal block at 0x056CB578, 96 bytes long.
Data: < l 3 3 > 18 B6 6C 05 00 FE C0 33 C0 FD C0 33 08 00 00 00
Object dump complete.
When I add
_CrtSetDbgFlag(_CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF);
_CrtSetBreakAlloc(70202);
to main main function I always get a breakpoint at the allocation of the x509 store, no matter for which of the 6 numbers (70202,...) I set the break point.
I initialize and uninitialize the x509 store in a class' constructor and destructor (see below).
Is there anything else I need to look out for when using the x509_STORE?
Foo::CSCACerts::CSCACerts(void)
{
m_store = X509_STORE_new();
}
Foo::CSCACerts::~CSCACerts(void)
{
X509_STORE_free( m_store );
}

(VS15 C++) Got a Visual Leak Detector report, but what now?

because of some (strange) problems in my C++-project I used Visual Leak Detector (for the first time), to check the project on memory leaks.
So I got i.a. the follwoing reports:
WARNING: Visual Leak Detector detected memory leaks!
---------- Block 4 at 0x004D07B0: 200 bytes ----------
Leak Hash: 0xD2D1B4A0, Count: 1, Total 200 bytes
Call Stack (TID 8796):
ucrtbase.dll!malloc()
f:\dd\vctools\crt\vcstartup\src\heap\new_scalar.cpp (19): LASS.exe!operator new() + 0x8 bytes
clr.dll!0x72D616E5()
Data:
28 75 14 03 00 00 00 00 01 00 00 00 00 00 00 00 (u...... ........
9A 99 99 99 99 99 B9 3F 50 00 00 00 0A 00 00 00 .......? P.......
00 00 00 00 F4 01 00 00 00 00 00 00 01 00 00 00 ........ ........
7B 14 AE 47 E1 7A 74 3F 14 00 00 00 BA FF FF FF {..G.zt? ........
00 00 00 00 F4 01 00 00 00 00 00 00 01 00 00 00 ........ ........
7B 14 AE 47 E1 7A 84 3F 00 00 00 00 64 00 00 00 {..G.z.? ....d...
00 00 00 00 01 00 00 00 14 00 00 00 46 00 00 00 ........ ....F...
00 00 00 00 64 00 00 00 00 00 00 00 F4 01 00 00 ....d... ........
01 00 00 00 B8 E2 13 03 F0 AD 18 03 00 00 00 00 ........ ........
C8 E2 13 03 C8 AB 18 03 00 00 00 00 78 E3 13 03 ........ ....x...
B8 AC 18 03 00 00 00 00 68 E2 13 03 E8 AC 18 03 ........ h.......
00 00 00 00 14 00 00 00 01 00 00 00 64 00 00 00 ........ ....d...
01 00 00 00 00 00 00 00 ........ ........
---------- Block 20 at 0x004D0880: 200 bytes ----------
Leak Hash: 0xD2D1B4A0, Count: 1, Total 200 bytes
Call Stack (TID 8796):
ucrtbase.dll!malloc()
f:\dd\vctools\crt\vcstartup\src\heap\new_scalar.cpp (19): LASS.exe!operator new() + 0x8 bytes
clr.dll!0x72D616E5()
Data:
78 74 14 03 00 00 00 00 01 00 00 00 00 00 00 00 xt...... ........
9A 99 99 99 99 99 B9 3F 50 00 00 00 0A 00 00 00 .......? P.......
00 00 00 00 F4 01 00 00 00 00 00 00 01 00 00 00 ........ ........
7B 14 AE 47 E1 7A 74 3F 14 00 00 00 BA FF FF FF {..G.zt? ........
00 00 00 00 F4 01 00 00 00 00 00 00 01 00 00 00 ........ ........
7B 14 AE 47 E1 7A 84 3F 00 00 00 00 64 00 00 00 {..G.z.? ....d...
00 00 00 00 01 00 00 00 14 00 00 00 46 00 00 00 ........ ....F...
00 00 00 00 64 00 00 00 00 00 00 00 F4 01 00 00 ....d... ........
01 00 00 00 38 E2 13 03 00 F0 15 03 00 00 00 00 ....8... ........
B8 E1 13 03 88 00 7F 05 00 00 00 00 08 E2 13 03 ........ ........
20 FF 7E 05 00 00 00 00 E8 E1 13 03 80 FF 7E 05 ..~..... ......~.
00 00 00 00 14 00 00 00 01 00 00 00 64 00 00 00 ........ ....d...
01 00 00 00 00 00 00 00 ........ ........
---------- Block 31 at 0x0053E1B8: 72 bytes ----------
Leak Hash: 0x3F88029B, Count: 1, Total 72 bytes
Call Stack (TID 8796):
ucrtbase.dll!malloc()
f:\dd\vctools\crt\vcstartup\src\heap\new_scalar.cpp (19): LASS.exe!operator new() + 0x8 bytes
clr.dll!0x72D616E5()
Data:
60 BC 55 00 40 3E 80 05 A0 3F 80 05 A0 3F 80 05 `.U.#>.. .?...?..
60 BB 55 00 20 34 18 03 00 00 00 00 00 00 00 00 `.U..4.. ........
00 00 00 00 20 00 00 00 2F 00 00 00 80 BC 55 00 ........ /.....U.
00 2E 18 03 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
20 00 00 00 2F 00 00 00 ..../... ........
---------- Block 33 at 0x0055BB60: 8 bytes ----------
Leak Hash: 0xA49C5AA6, Count: 1, Total 8 bytes
Call Stack (TID 8796):
ucrtbase.dll!malloc()
f:\dd\vctools\crt\vcstartup\src\heap\new_scalar.cpp (19): LASS.exe!operator new() + 0x8 bytes
clr.dll!0x72D616E5()
Data:
C8 E1 53 00 00 00 00 00
..S..... ........
//And many more...
Unfortunatly I do not understand, what VLD wants to say is the problem.
With a double-click on the "f:\dd..."-lines it should set my courser to the line with the problem, shouldn´t it? But it dosen´t.
My question is now: How do I get to the area of the problem or in other words "how do I read these reports"?
In addition:
I use Visual Studio 2015
The project is a C++ Windows Forms Project
I included the vld.h in the additional includes and the lib-directory to the additional libraries of the project
In the main() I use #include <vld.h> and _CrtDumpMemoryLeaks();
EDIT:
My Main (a reduced version, but gives similar reports):
//some class-includes
#include <vld.h>
using namespace System;
using namespace System::Windows::Forms;
using namespace std;
#define _CRTDBG_MAP_ALLOC
#include <stdlib.h>
#include <crtdbg.h>
[STAThread]
void Main()
{
Application::EnableVisualStyles();
Application::SetCompatibleTextRenderingDefault(false);
Experiment* experiment = new Experiment();
Experiment_List* running_experiments = new Experiment_List();
while(!experiment->end) {
experiment= new Experiment();
LASS::MainWindow form(experiment, running_experiments);
form.ShowDialog();
if(!experiment->end){
running_experiments->register_experiment(experiment);
}
}
running_experimente->end_all();
_CrtDumpMemoryLeaks();
exit(0);
}
Unfortunatley there are about 40 classes, that I do not want to post...

I don't know where the problem exact is.
For me, it helps to run the program in RELEASE Mode, instead of DEBUG mode.
I suppose, my problem is the handling of managed and unmanaged code together.
I have unmanaged code inside managed code.
It seams as if CLR use a different new operator in Debug mode. Not as conform as the c++ standard.
According to: Using push_back() for STL List in C++ causes Access Violation, Crash
If you malloc() a C++ class, no constructors will be called for any of
that class's fields
And the VS will step into a constructor in class new_scalar.cpp.
Folks say that is depending of the Visual Leak Detector (VLD). You use them in your includes.
In the End, try to distinguish your code with
#pragma managed
and
#pragma unmanaged
And run in RELEASE mode.

Create a 44-byte header with ffmpeg

I made a program using ffmpeg libraries that converts an audio file to a wav file. Except the only problem is that it doesn't create a 44-byte header. When input the file into Kaldi Speech Recognition, it produces the error:
ERROR (online2-wav-nnet2-latgen-faster:Read4ByteTag():wave-reader.cc:74) WaveData: expected 4-byte chunk-name, got read errror
I ran the file thru shntool and it reports a 78-byte header. Is there anyway I can get the standard 44-byte header using ffmpeg libraries?

FFmpeg inserts some metadata about the encoder into the header file. Here is the hexdump of the header before the fix:
00000000 52 49 46 46 06 90 00 00 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt |
00000010 10 00 00 00 01 00 01 00 40 1f 00 00 80 3e 00 00 |........#....>..|
00000020 02 00 10 00 4c 49 53 54 1a 00 00 00 49 4e 46 4f |....LIST....INFO|
00000030 49 53 46 54 0e 00 00 00 4c 61 76 66 35 36 2e 33 |ISFT....Lavf56.3|
00000040 36 2e 31 30 30 00 64 61 74 61 c0 8f 00 00 00 00 |6.100.data......|
as you can see Lavf56.36.100 is the encoder in the header. Here is the portion of code that I used to get rid of it.
std::cout<<"------------------BEFORE-----------------------"<<std::endl;
std::cout<< av_dict_count ( (*ofmt_ctx)->metadata) <<std::endl;
std::cout<<"-------------------------------------------"<<std::endl;
if(av_dict_set(&(*ofmt_ctx)->metadata,"ISFT",NULL, AV_DICT_IGNORE_SUFFIX)){
std::cerr<<"Nope it, didn't work :("<<std::endl;
}
ret = avformat_write_header(*ofmt_ctx,&(*ofmt_ctx)->metadata );
if (ret < 0) {
std::cout<<"-------------------------------------------"<<std::endl;
av_log(NULL, AV_LOG_ERROR, "Error occurred when writing header to file\n");
return ret;
}
std::cout<<"------------------AFTER-----------------------"<<std::endl;
std::cout<< av_dict_count ( (*ofmt_ctx)->metadata) <<std::endl;
std::cout<<"-------------------------------------------"<<std::endl;
Here is the hexdump afterwards:
00000000 52 49 46 46 e4 8f 00 00 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt |
00000010 10 00 00 00 01 00 01 00 40 1f 00 00 80 3e 00 00 |........#....>..|
00000020 02 00 10 00 64 61 74 61 c0 8f 00 00 00 00 00 00 |....data........|
00000030 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 |................|
shntool now report 44-bytes
(NOTE:ofmt_ctx was a ** in this function that I made, hence why referencing the metadata dictionary as &(*ofmt_ctx)->metadata)

Why will cout.imbue(locale("")) cause memory leaks?

My compiler is Visual VC++ 2013. The following simplest program will cause a few memory leaks.
Why? How to fix it?
#define _CRTDBG_MAP_ALLOC
#include <stdlib.h>
#include <crtdbg.h>
#include <cstdlib>
#include <iostream>
#include <locale>
using namespace std;
int main()
{
_CrtSetDbgFlag(_CRTDBG_ALLOC_MEM_DF|_CRTDBG_LEAK_CHECK_DF);
cout.imbue(locale("")); // If this statement is commented, then OK.
}
The debug window outputs as follows:
Detected memory leaks!
Dumping objects ->
{387} normal block at 0x004FF8C8, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{379} normal block at 0x004FF678, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{352} normal block at 0x004FE6E8, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{344} normal block at 0x004FE498, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{318} normal block at 0x004FD5C8, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{308} normal block at 0x004F8860, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
Object dump complete.
Detected memory leaks!
Dumping objects ->
{387} normal block at 0x004FF8C8, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{379} normal block at 0x004FF678, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{352} normal block at 0x004FE6E8, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{344} normal block at 0x004FE498, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{318} normal block at 0x004FD5C8, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
{308} normal block at 0x004F8860, 12 bytes long.
Data: <z h - C N > 7A 00 68 00 2D 00 43 00 4E 00 00 00
Object dump complete.
The program '[0x5B44] cpptest.exe' has exited with code 0 (0x0).

I was using std::codecvt and get a similar problem. I am not sure whether it is a same cause. Just try to provide s possible way to discover the root cause.
You can reference the example in http://www.cplusplus.com/reference/locale/codecvt/in/
It actually "use" the member of mylocale, and it seems without an r-value reference version overload. So when directly write const facet_type& myfacet = std::use_facet<facet_type>(std::locale()); may cause the same problem. .
So try
auto myloc = locale("");
cout.imbue(myloc);

Accessing specific binary information based on binary format documentation

I have a binary file and documentation of the format the information is stored in. I'm trying to write a simple program using c++ that pulls a specific piece of information from the file but I'm missing something since the output isn't what I expect.
The documentation is as follows:
Half-word Field Name Type Units Range Precision
10 Block Divider INT*2 N/A -1 N/A
11-12 Latitude INT*4 Degrees -90 to +90 0.001
There are other items in the file obviously but for this case I'm just trying to get the Latitude value.
My code is:
#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv[])
{
char* dataFileLocation = "testfile.bin";
ifstream dataFile(dataFileLocation, ios::in | ios::binary);
if(dataFile.is_open())
{
char* buffer = new char[32768];
dataFile.seekg(10, ios::beg);
dataFile.read(buffer, 4);
dataFile.close();
cout << "value is << (int)(buffer[0] & 255);
}
}
The result of which is "value is 226" which is not in the allowed range.
I'm quite new to this and here's what my intentions where when writing the above code:
Open file in binary mode
Seek to the 11th byte from the start of the file
Read in 4 bytes from that point
Close the file
Output those 4 bytes as an integer.
If someone could point out where I'm going wrong I'd sure appreciate it. I don't really understand the (buffer[0] & 255) part (took that from some example code) so layman's terms for that would be greatly appreciated.
Hex Dump of the first 100 bytes:
testfile.bin 98,402 bytes 11/16/2011 9:01:52
-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
00000000- 00 5F 3B BF 00 00 C4 17 00 00 00 E2 2E E0 00 00 [._;.............]
00000001- 00 03 FF FF 00 00 94 70 FF FE 81 30 00 00 00 5F [.......p...0..._]
00000002- 00 02 00 00 00 00 00 00 3B BF 00 00 C4 17 3B BF [........;.....;.]
00000003- 00 00 C4 17 00 00 00 00 00 00 00 00 80 02 00 00 [................]
00000004- 00 05 00 0A 00 0F 00 14 00 19 00 1E 00 23 00 28 [.............#.(]
00000005- 00 2D 00 32 00 37 00 3C 00 41 00 46 00 00 00 00 [.-.2.7.<.A.F....]
00000006- 00 00 00 00 [.... ]

Since the documentation lists the field as an integer but shows the precision to be 0.001, I would assume that the actual value is the stored value multiplied by 0.001. The integer range would be -90000 to 90000.
The 4 bytes must be combined into a single integer. There are two ways to do this, big endian and little endian, and which you need depends on the machine that wrote the file. x86 PCs for example are little endian.
int little_endian = buffer[0] | buffer[1]<<8 | buffer[2]<<16 | buffer[3]<<24;
int big_endian = buffer[0]<<24 | buffer[1]<<16 | buffer[2]<<8 | buffer[3];
The &255 is used to remove the sign extension that occurs when you convert a signed char to a signed integer. Use unsigned char instead and you probably won't need it.
Edit: I think "half-word" refers to 2 bytes, so you'll need to skip 20 bytes instead of 10.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

TAR file format issue - compression

Related

Memory Leak with Openssl when allocating memory for X509_STORE

(VS15 C++) Got a Visual Leak Detector report, but what now?

Create a 44-byte header with ffmpeg

Why will cout.imbue(locale("")) cause memory leaks?

Accessing specific binary information based on binary format documentation

Categories

Resources