The following code leads to a binary file, which as an additional byte in front of the representation of the value 10.
int main()
{
unsigned int data0 = 8;
unsigned int data1 = 9;
unsigned int data2 = 10;
unsigned int data3 = 11;
std::ofstream file("test.bin", std::ios_base::out, std::ios_base::binary);
file.write(reinterpret_cast<char*>(&data0), sizeof(data0));
file.write(reinterpret_cast<char*>(&data1), sizeof(data1));
file.write(reinterpret_cast<char*>(&data2), sizeof(data2));
file.write(reinterpret_cast<char*>(&data3), sizeof(data3));
file.close();
return 0;
}
Here's what the hexdump of the file looks like:
08 00 00 00 09 00 00 00 0D 0A 00 00 00 0B 00 00 00
| 8 -> OK | 9 -> OK |??| 10 -> OK | 11 -> OK
What's going on here with the byte in front of the 10?
std::ofstream file("test.bin", std::ios_base::out, std::ios_base::binary);
should be
std::ofstream file("test.bin", std::ios_base::out | std::ios_base::binary);
Credit goes to user4581301. See the comments of the question.
Related
I have a binary file. i am reading 16 bytes at a time it using fstream.
I want to convert it to an integer. I tried atoi. but it didnt work.
In python we can do that by converting to byte stream using stringobtained.encode('utf-8') and then converting it to int using int(bytestring.hex(),16). Should we follow such an elloborate steps as done in python or is there a way to convert it directly?
ifstream file(binfile, ios::in | ios::binary | ios::ate);
if (file.is_open())
{
size = file.tellg();
memblock = new char[size];
file.seekg(0, ios::beg);
while (!file.eof())
{
file.read(memblock, 16);
int a = atoi(memblock); // doesnt work 0 always
cout << a << "\n";
memset(memblock, 0, sizeof(memblock));
}
file.close();
Edit:
This is the sample contents of the file.
53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00
04 00 01 01 00 40 20 20 00 00 05 A3 00 00 00 47
00 00 00 2E 00 00 00 3B 00 00 00 04 00 00 00 01
I need to read it as 16 byte i.e. 32 hex digits at a time.(i.e. one row in the sample file content) and convert it to integer.
so when reading 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00, i should get, 110748049513798795666017677735771517696
But i couldnt do it. I always get 0 even after trying strtoull. Am i reading the file wrong, or what am i missing.
You have a number of problems here. First is that C++ doesn't have a standard 128-bit integer type. You may be able to find a compiler extension, see for example Is there a 128 bit integer in gcc? or Is there a 128 bit integer in C++?.
Second is that you're trying to decode raw bytes instead of a character string. atoi will stop at the first non-digit character it runs into, which 246 times out of 256 will be the very first byte, thus it returns zero. If you're very unlucky you will read 16 valid digits and atoi will start reading uninitialized memory, leading to undefined behavior.
You don't need atoi anyway, your problem is much simpler than that. You just need to assemble 16 bytes into an integer, which can be done with shifting and or operators. The only complication is that read wants a char type which will probably be signed, and you need unsigned bytes.
ifstream file(binfile, ios::in | ios::binary);
char memblock[16];
while (file.read(memblock, 16))
{
uint128_t a = 0;
for (int i = 0; i < 16; ++i)
{
a = (a << 8) | (static_cast<unsigned int>(memblock[i]) & 0xff);
}
cout << a << "\n";
}
file.close();
It the number is binary what you want is:
short value ;
file.read(&value, sizeof (value));
Depending upon how the file was written and your processor, you may have to reverse the bytes in value using bit operations.
I have some sample code reading some binary data from file and then writing the content into stringstream.
#include <sstream>
#include <cstdio>
#include <fstream>
#include <cstdlib>
std::stringstream * raw_data_buffer;
int main()
{
std::ifstream is;
is.open ("1.raw", std::ios::binary );
char * buf = (char *)malloc(40);
is.read(buf, 40);
for (int i = 0; i < 40; i++)
printf("%02X ", buf[i]);
printf("\n");
raw_data_buffer = new std::stringstream("", std::ios_base::app | std::ios_base::out | std::ios_base::in | std::ios_base::binary);
raw_data_buffer -> write(buf, 40);
const char * tmp = raw_data_buffer -> str().c_str();
for (int i = 0; i < 40; i++)
printf("%02X ", tmp[i]);
printf("\n");
delete raw_data_buffer;
return 0;
}
With a specific input file I have, the program doesn't function correctly. You could download the test file here.
So the problem is, I write the file content into raw_data_buffer and immediately read it back, and the content differs. The program's output is:
FFFFFFC0 65 59 01 00 00 00 00 00 00 00 00 00 00 00 00 FFFFFFE0 0A 40 00 00 00 00 00 FFFFFF80 08 40 00 00 00 00 00 70 FFFFFFA6 57 6E FFFFFFFF 7F 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FFFFFFE0 0A 40 00 00 00 00 00 FFFFFF80 08 40 00 00 00 00 00 70 FFFFFFA6 57 6E FFFFFFFF 7F 00 00
The content FFFFFFC0 65 59 01 is overwritten with 0. Why so?
I suspect this a symptom of undefined behavior from using deallocated memory. You're getting a copy of the string from the stringstream but you're only grabbing a raw pointer to the internals that is then immediately deleted. (the link actually warns against this exact case)
const char* tmp = raw_data_buffer->str().c_str();
// ^^^^^ returns a temporary that is destroyed
// at the end of this statement
// ^^^ now a dangling pointer
Any use of tmp would exhibit undefined behavior and could easily cause the problem you're seeing. Keep the result of str() in scope.
I have a vector (which is just a wrapper over a char array) that is the input. The pkzip was created using c# sharpZipLib.
I stored the data into a file which I ran through a hex editor zip template that checked out. The input is good, it's not malformed. This all but the compressed data:
50 4B 03 04 14 00 00 00 08 00 51 B2 8B 4A B3 B6
6C B0 F6 18 00 00 40 07 01 00 07 00 00 00 2D 33
31 2F 31 32 38
<compressed data (6390 bytes)>
50 4B 01 02 14 00 14 00 00 00 08 00 51 B2 8B 4A
B3 B6 6C B0 F6 18 00 00 40 07 01 00 07 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 2D 33
31 2F 31 32 38 50 4B 05 06 00 00 00 00 01 00 01
00 35 00 00 00 1B 19 00 00 00 00
I have another vector which is to be the output. The inflated data will have about 67-68k, so I know it fits into the buffer.
For the life of me, I cannot get the minizip to inflate the former and store it into latter.
This is what I have so far:
#include "minizip\zlib.h"
#define ZLIB_WINAPI
std::vector<unsigned char> data;
/*...*/
std::vector<unsigned char> outBuffer(1024 * 1024);
z_stream stream;
stream.zalloc = Z_NULL;
stream.zfree = Z_NULL;
stream.opaque = Z_NULL;
stream.data_type = Z_BINARY;
stream.avail_in = data.size();
stream.avail_out = outBuffer.size();
stream.next_in = &data[0];
stream.next_out = &outBuffer[0];
int ret = inflateInit(&stream);
ret = inflate(&stream, 1);
ret = inflateEnd(&stream);
I used the debugger to step through the method and monitor the ret. inflate returned value -3 with message "incorrect header check".
This is a pkzip, which is a wrapper around zlib, but minizip should be a wrapper library around zlib that should support pkzip, shouldn't it be?
How do I have to modify this to work?
Since it starts with 50 4B 03 04 it is a PKZIP file, according to https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html
If these are zip files, then inflate is the wrong function. The zlib, gzip and zip formats are all different. You can read zip with zlib, if you use the right functions to do so. If you don't have the contrib, maybe download and rebuild zlib.
Here's some old code I have which works for zip files, using the zlib library. I might have moved some headers around, because the official zlib has them under zlib/contrib/minizip.
The arguments are filenames, so you'll have to modify it, or write your array to a file.
// #include <zlib/unzip.h>
#include <zlib/contrib/minizip/unzip.h>
/// return list of filenames in zip archive
std::list<std::string> GetZipFilenames(const char *szZipArchive){
std::list<std::string> results;
unzFile zip = unzOpen(szZipArchive);
if (zip){
unz_global_info info;
int rv = unzGetGlobalInfo(zip, &info);
if (UNZ_OK == unzGoToFirstFile(zip)){
do {
char szFilename[BUFSIZ];
if (UNZ_OK == unzGetCurrentFileInfo(zip, NULL, szFilename, sizeof(szFilename), NULL, 0, NULL, 0))
results.push_back(std::string(szFilename));
} while (UNZ_OK == unzGoToNextFile(zip));
}
}
return results;
}
/// extract the contents of szFilename inside szZipArchive
bool ExtractZipFileContents(const char *szZipArchive, const char *szFilename, std::string &contents){
bool result = false;
unzFile zip = unzOpen(szZipArchive);
if (zip){
if (UNZ_OK == unzLocateFile(zip, szFilename, 0)){
if (UNZ_OK == unzOpenCurrentFile(zip)){
char buffer[BUFSIZ];
size_t bytes;
while (0 < (bytes = unzReadCurrentFile(zip, buffer, sizeof(buffer)))){
contents += std::string(buffer, bytes);
}
unzCloseCurrentFile(zip);
result = (bytes == 0);
}
}
unzClose(zip);
}
return result;
}
If I understand the question correctly, you are trying to decompress the 6390 bytes of compressed data. That compressed data is a raw deflate stream, which has no zlib header or trailer. For that you would need to use inflateInit2(&stream, -MAX_WBITS) instead of inflateInit(&stream). The -MAX_WBITS requests decompression of raw deflate.
Zip-Utils
std::vector<unsigned char> inputBuffer;
std::vector<unsigned char> outBuffer(1024 * 1024);
HZIP hz = OpenZip(&inputBuffer[0], inputBuffer.capacity(), 0);
ZIPENTRY ze;
GetZipItem(hz, 0, &ze);
UnzipItem(hz, 0, &outBuffer[0], 1024 * 1024);
outBuffer.resize(ze.unc_size);
If inputBuffer contains a pkzip file, this snippet will unzip it and store contents in outBuffer.
That is all.
I would like to send a struct via a QUdpSocket.
I know I should use QDataStream and QByteArray, but I can't because the receiver will not use Qt.
I tried many things, but I never find something that seems to do the work properly.
My struct will be :
typedef struct myStruct
{
int nb_trame;
std::vector<bool> vBool;
std::vector<int> vInt;
std::vector<float> vFloat;
} myStruct;
How do I procede to do that properly ?
The solution to this is called serialization
serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and reconstructed later in the same or another computer environment.
The following is a fully working example how to serialize the mentioned struct using QDataStream:
// main.cpp
#include <limits>
#include <QDataStream>
#include <QVector>
#include <vector>
typedef struct myStruct
{
int nb_trame;
std::vector<bool> vBool;
std::vector<int> vInt;
std::vector<float> vFloat;
void serialize(QDataStream &out) {
out << nb_trame;
out << QVector<bool>::fromStdVector(vBool);
out << QVector<qint32>::fromStdVector(vInt);
out << QVector<float>::fromStdVector(vFloat);
}
} myStruct;
void fillData(myStruct &s) {
s.nb_trame = 0x42;
s.vBool.push_back(true);
s.vBool.push_back(false);
s.vBool.push_back(false);
s.vBool.push_back(true);
s.vInt.push_back(0xB0);
s.vInt.push_back(0xB1);
s.vInt.push_back(0xB2);
s.vInt.push_back(0xB3);
s.vFloat.push_back(std::numeric_limits<float>::min());
s.vFloat.push_back(0.0);
s.vFloat.push_back(std::numeric_limits<float>::max());
}
int main()
{
myStruct s;
fillData(s);
QByteArray buf;
QDataStream out(&buf, QIODevice::WriteOnly);
s.serialize(out);
}
Then you can send buf with QUdpSocket::writeDatagram()
How QDataStream serializes
If we replace
QByteArray buf;
QDataStream out(&buf, QIODevice::WriteOnly);
with
QFile file("file.dat");
file.open(QIODevice::WriteOnly);
QDataStream out(&file);
The serialized data gets written to the file "file.dat". This is the data the code above generates:
> hexdump -C file.dat
00000000 00 00 00 42 00 00 00 04 01 00 00 01 00 00 00 04 |...B............|
00000010 00 00 00 b0 00 00 00 b1 00 00 00 b2 00 00 00 b3 |................|
00000020 00 00 00 03 38 10 00 00 00 00 00 00 00 00 00 00 |....8...........|
00000030 00 00 00 00 47 ef ff ff e0 00 00 00 |....G.......|
The data starts with four bytes that represent the member nb_trame (00 00 00 42)
The next eight bytes are the serialized form of the vector vBool (00 00 00 04 01 00 00 01)
00 00 00 04 --> Number of entries in the vector
01 00 00 01 --> True, False, False, True
The next 20 bytes are for vInt (00 00 00 04 00 00 00 b0 00 00 00 b1 00 00 00 b2 00 00 00 b3)
00 00 00 04 --> Number of entries in the vector
00 00 00 b0 00 00 00 b1 00 00 00 b2 00 00 00 b3 --> 0xB0, 0xB1, 0xB2, 0xB3 (4 bytes per entry)
The next 28 bytes are for vFloat (00 00 00 03 38 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 47 ef ff ff e0 00 00 00)
00 00 00 03 --> Number of entries in the vector
38 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 47 ef ff ff e0 00 00 00 --> 1.1754943508222875e-38, 0.0, 3.4028234663852886e+38 (8 bytes per entry)
Additional information (Original posting)
Serialization is not a trivial topic but there are many libraries out there, that can help you with that. In the end you have two options to choose from:
Define your own serialization format
Use an existing serialization format
Binary
Text based (e.g. JSON, XML)
Which one you choose depends highly on your needs and use cases. In general I would prefer established formats over self-brewed. Text-based formats are by nature less compact and need more space and therefore also bandwidth. This is something you should take into account when you decide for a format. On the other hand text-based/human-readable formats have the advantage of being easier to debug, as you can open them in a text editor. And there are many more factors you should consider.
Serialization works because you do not rely on machine dependent things. The only thing you have to take care of is that the serialized data is consistent and follows the defined format. So for the serialized data you know exactly how the byte order is defined, where specific data is stored and so on.
The idea is that the sender serializes the data, sends it over whatever channel is required, and the receiver deserializes the data again. In which format the data is stored on each of the both sides doesn't matter.
+--------------------------------+ +--------------------------------+
| Host A | | Host B |
| | | |
| | | |
| | | |
| +-------------------------+ | | +-------------------------+ |
| | Raw data | | | | Raw data | |
| |(Specific to plattfrom A)| | | |(Specific to plattfrom B)| |
| +-------------------------+ | | +-------------------------+ |
| | | | ^ |
| | serialize | | | deserialize |
| v | | | |
| +-----------------+ | transmit | +-----------------+ |
| | Serialized Data +----------------------------> Serialized Data | |
| +-----------------+ | | +-----------------+ |
| | | |
+--------------------------------+ +--------------------------------+
I have a file which first 64 bytes are:
0x00: 01 00 00 10 00 00 00 20 00 00 FF 03 00 00 00 10
0x10: 00 00 00 10 00 00 FF 03 00 00 00 10 00 00 FF 03
0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
When i'm reading the file (mode read and write) at position 26 for 4 bytes I get 0 and the next time (at position 30) i get correctly 4096.
The code is:
// read LastDirectoryBlockStartByte...
seekg(26);
char * pCUIBuffer = new char[4];
read(pCUIBuffer, 4);
const unsigned int x1 = gcount ();
const unsigned int LastDirectoryBlockStartByte = *(unsigned int *)pCUIBuffer;
// read LastDirectoryBlockNumberItems...
seekg(30);
read(pCUIBuffer, 4);
const unsigned int x2 = gcount ();
const unsigned int LastDirectoryBlockNumberItems = *(unsigned int *)pCUIBuffer;
With gcount() I checked the bytes are read - and this were correctly both times 4.
I have no idea to debug it.
---------- EDIT ----------
When I use the following code (with some dummy before) it reads correctly:
char * pCUIBuffer = new char[4];
seekg(26);
read(pCUIBuffer, 4);
const unsigned int x1 = gcount ();
seekg(26);
read(pCUIBuffer, 4);
const unsigned int x2 = gcount ();
const unsigned int LastDirectoryBlockStartByte = *(unsigned int *)pCUIBuffer;
// read LastDirectoryBlockNumberItems...
seekg(30);
read(pCUIBuffer, 4);
const unsigned int x3 = gcount ();
const unsigned int LastDirectoryBlockNumberItems = *(unsigned int *)pCUIBuffer;
The difficulty is that the code stands at the begining in a methode. And the "false readed value" has obviously nothing to do with the listed code. Maybe theres a trick with flush or sync (but both I tryed...) or somewhat else...
You are saying that pCUIBuffer contains a pointer:
*(unsigned int *)pCUIBuffer;
And then you go get whatever it's pointing at...in RAM. That could be anything.
Now I'm writing an answer, because my attempt to contact TonyK failes (I asked for writing an answer).
The perfect answer to my question was to enable exceptions by calling exceptions (eofbit | failbit | badbit).
Rumo