I have a binary file. i am reading 16 bytes at a time it using fstream.
I want to convert it to an integer. I tried atoi. but it didnt work.
In python we can do that by converting to byte stream using stringobtained.encode('utf-8') and then converting it to int using int(bytestring.hex(),16). Should we follow such an elloborate steps as done in python or is there a way to convert it directly?
ifstream file(binfile, ios::in | ios::binary | ios::ate);
if (file.is_open())
{
size = file.tellg();
memblock = new char[size];
file.seekg(0, ios::beg);
while (!file.eof())
{
file.read(memblock, 16);
int a = atoi(memblock); // doesnt work 0 always
cout << a << "\n";
memset(memblock, 0, sizeof(memblock));
}
file.close();
Edit:
This is the sample contents of the file.
53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00
04 00 01 01 00 40 20 20 00 00 05 A3 00 00 00 47
00 00 00 2E 00 00 00 3B 00 00 00 04 00 00 00 01
I need to read it as 16 byte i.e. 32 hex digits at a time.(i.e. one row in the sample file content) and convert it to integer.
so when reading 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00, i should get, 110748049513798795666017677735771517696
But i couldnt do it. I always get 0 even after trying strtoull. Am i reading the file wrong, or what am i missing.
You have a number of problems here. First is that C++ doesn't have a standard 128-bit integer type. You may be able to find a compiler extension, see for example Is there a 128 bit integer in gcc? or Is there a 128 bit integer in C++?.
Second is that you're trying to decode raw bytes instead of a character string. atoi will stop at the first non-digit character it runs into, which 246 times out of 256 will be the very first byte, thus it returns zero. If you're very unlucky you will read 16 valid digits and atoi will start reading uninitialized memory, leading to undefined behavior.
You don't need atoi anyway, your problem is much simpler than that. You just need to assemble 16 bytes into an integer, which can be done with shifting and or operators. The only complication is that read wants a char type which will probably be signed, and you need unsigned bytes.
ifstream file(binfile, ios::in | ios::binary);
char memblock[16];
while (file.read(memblock, 16))
{
uint128_t a = 0;
for (int i = 0; i < 16; ++i)
{
a = (a << 8) | (static_cast<unsigned int>(memblock[i]) & 0xff);
}
cout << a << "\n";
}
file.close();
It the number is binary what you want is:
short value ;
file.read(&value, sizeof (value));
Depending upon how the file was written and your processor, you may have to reverse the bytes in value using bit operations.
Related
I'm trying to read the first 6 bytes of a file, but it's giving me weird results, and I can't seem to figure out what I'm doing wrong.
My code:
struct Block {
char fileSize[3];
char initialDataBlockId[3];
};
int main(int c, char **a) {
ifstream file("C\\main_file_cache.idx0", ios::binary);
Block block;
file.get((char*)&block, sizeof(block));
printf("File Size: %i\n", block.fileSize);
printf("Initial Data Block ID: %i\n", block.initialDataBlockId);
file.close();
system("pause");
return 0;
}
Before I ran the code, I opened the file in a binary editor,
and it showed me this hex code:
00 00 00 00-00 00 05 af-4b 00 00 01-26 df cd 00
00 6f 03 3f-ed 00 03 61-05 08 35 00-04 8b 01 61
59 00 08 39-03 23 0a 00-05 6c 00 35-d0 00 06 fe
03 69 d8 00-07 19
There are a total of 54 bytes. The first 6 bytes are just zero.
So, I expected my program to produce the following output:
File Size: 0
Initial Data Block ID: 0
Instead, the outputs is as follows:
File Size: 10419128
Initial Data Block ID: 10419131
This result makes no sense. Maybe there is something wrong with my code?
You should use type unsigned char in your Block structure.
You should use file.read() to read binary data instead of file.get().
You are printing the addresses of the arrays in the Block structure, not their contents, furthermore the specifier %i expects an int, not a char *, so the behavior in undefined and you get some weird integer value but anything culd have happened, including program termination. Increasing the warning level is advisable so the compiler warns about such silly mistakes.
If the file format is little endian, you could convert these 3 byte arrays to numbers this way:
int block_fileSize = (unsigned char)block.fileSize[0] +
((unsigned char)block.fileSize[1] << 8) +
((unsigned char)block.fileSize[2] << 16);
int block_initialDataBlockId = (unsigned char)block.initialDataBlockId[0] +
((unsigned char)block.initialDataBlockId[1] << 8) +
((unsigned char)block.initialDataBlockId[2] << 16);
printf("File Size: %i\n", block_fileSize);
printf("Initial Data Block ID: %i\n", block_initialDataBlockId);
If you want to read a binary data you can use a read method from ifstream and also write method from ofstream.
istream & ifstream::read (char * s, streamsize n);
ostream & ofstream::write (const char * s, streamsize n);
You have to know that binary mode is useless for UNIX systems and text mode is only useful.
I'm generating a big char for future passing to a thread with strcpyand strcat. It was all going ok until I needed to substitute all the occurrences of the space for a comma in one of the strings. I searched for the solution to this here
Problem is, now I have a memory leak and the program exits with this message:
_Dumping objects ->
{473} normal block at 0x0091E0C0, 32 bytes long.
Data: <AMLUH UL619 BKD > 41 4D 4C 55 48 20 55 4C 36 31 39 20 42 4B 44 20
{472} normal block at 0x049CCD20, 8 bytes long.
Data: < > BC ED 18 00 F0 EC 18 00
{416} normal block at 0x082B5158, 1000 bytes long.
Data: <Number of Aircra> 4E 75 6D 62 65 72 20 6F 66 20 41 69 72 63 72 61
{415} normal block at 0x04A0E200, 20 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{185} normal block at 0x049DA998, 64 bytes long.
Data: < O X8 8 > DC 4F BB 58 38 C5 9A 06 38 D3 88 00 00 00 00 00
PythonPlugin.cpp(76) : {172} normal block at 0x0088D338, 72 bytes long.
Data: < a X F <) > DC 61 BB 58 18 BB 46 06 3C 29 8A 06 CD CD CD CD
Object dump complete._
Here's the code so you can tell me what I'm doing wrong:
Code of the problem:
char* loop_planes(ac){
char *char1=new char[1000];
for(...){
strcpy(char1,"Number of Aircrafts\nHour of simulation\n\n");
string tmp2=fp.GetRoute();
tmp2.replace(tmp2.begin(),tmp2.end()," ",","); #PROBLEM IS IN THIS LINE
const char *tmp3=tmp2.c_str();
strcat(char1,tmp3);
}
return char1;
}
The fp.GetRoute()is a string like this: AMLUH UL619 BKD UM748 RUTOL
Also, now that I'm talking about memory allocation, I don't want any future problems with memory leaks, so when should I delete char1, knowing that the thread is going to call this function?
When you call std::string::replace, the best match is a fumction template whose third and fourth parameters are input iterators. So the string literals you are passing are interpreted as the start and end of a range, when they are not. This leads to undefined behaviour.
You can fix this easily by using the algorithm std::replace instead:
std::replace(tmp2.begin(),tmp2.end(),' ',',');
Note that here the third and fourth parameters are single chars.
The answer from #juanchopanza correctly identifies and fixes the original question, but since you've asked about memory leaks in general, I'd like to additionally suggest that you replace your function with something that doesn't use new or delete or strcpy or strcat.
std::string loop_planes() {
std::string res("Number of Aircrafts\nHour of simulation\n\n");
for (...) {
std::string route = fp.GetRoute();
std::replace(route.begin(), route.end(), ' ',',');
res += route;
}
return res;
}
This doesn't require any explicit memory allocation or deletion and does not leak memory. I also took the liberty of changing the return type from char * to std::string to eliminate messy conversions.
I am trying to read the file 'train-images-idx3-ubyte', which can be found here along with the corresponding file format description (at the bottom of the webpage). When I look at the bytes with od -t x1 train-images-idx3-ubyte | less (hexadecimal, bytewise), I get the following output:
adress bytes
0000000 00 00 08 03 00 00 ea 60 00 00 00 1c 00 00 00 1c
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
This is what I expected according to 1. But when I try to read the data with C++ I've got a problem. What I do is this:
std::fstream trainingData("minst/train-images-idx3-ubyte",
std::ios::in | std::ios::binary);
int8_t zero = 0, encoding = 0, dimension = 0;
int32_t samples = -1;
trainingData >> zero >> zero >> encoding >> dimension;
trainingData >> samples;
debugLogger << "training set image file, encoding = "
<< (int) encoding << ", dimension = "
<< (int) dimension << ", items = " << (int) samples << "\n";
But the output of these few lines of code is:
training set image file, encoding = 8, dimension = 3, items = 0
Everything but the number of instances (items, samples) is correct. I tried reading the next 4 bytes as int8_t and that gave me at least the same result as od. I cannot imagine how samples can be 0. What I actually wanted to read here was 10,000. Maybe you've got a clue?
As mentioned in other answers, you need to use unformatted input, i.e. istream::read(...) instead of operator>>. Translating your code above to use read yields:
trainingData.read(reinterpret_cast<char*>(&zero), sizeof(zero));
trainingData.read(reinterpret_cast<char*>(&zero), sizeof(zero));
trainingData.read(reinterpret_cast<char*>(&encoding), sizeof(encoding));
trainingData.read(reinterpret_cast<char*>(&dimension), sizeof(dimension));
trainingData.read(reinterpret_cast<char*>(&samples), sizeof(samples));
Which gets you most of the way there - but 00 00 ea 60 looks like it's in Big-endian format, so you'll have to pass it through ntohl to make sense of it if you're running on an intel-based machine:
samples = ntohl(samples);
which gives encoding = 8, dimension = 3, items = 60000.
The input is formatted, which will result in you reading wrong results from the file. Reading from an unformatted input will provide the correct results.
I am reading in a binary file (in c++). And the header is something like this (printed in hexadecimal)
43 27 41 1A 00 00 00 00 23 00 00 00 00 00 00 00 04 63 68 72 31 FFFFFFB4 01 00 00 04 63 68 72 32 FFFFFFEE FFFFFFB7
when printed out using:
std::cout << hex << (int)mem[c];
Is there an efficient way to store 23 which is the 9th byte(?) into an integer without using stringstream? Or is stringstream the best way?
Something like
int n= mem[8]
I want to store 23 in n not 35.
You did store 23 in n. You only see 35 because you are outputting it with a routine that converts it to decimal for display. If you could look at the binary data inside the computer, you would see that it is in fact a hex 23.
You will get the same result as if you did:
int n=0x23;
(What you might think you want is impossible. What number should be stored in n for 1E? The only corresponding number is 31, which is what you are getting.)
Do you mean you want to treat the value as binary-coded decimal? In that case, you could convert it using something like:
unsigned char bcd = mem[8];
unsigned char ones = bcd % 16;
unsigned char tens = bcd / 16;
if (ones > 9 || tens > 9) {
// handle error
}
int n = 10*tens + ones;
I'm trying to convert 8 bit char into hex view which looks like this:
00 03 80 45 E5 93 00 18 02 72 3B 90 88 64 11 00
45 FF 00 36 00 FF 45 00 00 34 7B FE 40 00 40 02
But some characters contain negative values which makes a larger hex value of more than 2 digits. how would i get each one as represented above?
I don't know what you are using for formatting, but make sure that you make your byte holding variable an unsigned char (assuming that char is 8-bits on your platform, which it is on all sane platforms), before formatting. If your platform has a sane BYTE typedef, use that. You can also use the boost::uint8_t type to store the byte and avoid these sorts of issues. For example:
char c=-25; // Oh no, this is one of those pesky "negative" characters
unsigned char byteVal=static_cast<unsigned char>(c); // FTFY
// Do the formatting with byteVal
"negative byte values" is an oxymoron, a byte is a number of bits without any sign typically an unsigned char which, when being 8 bits. can contain values 0-255 or in hex 00 to FF.