Writing/Reading strings in binary file-C++ - c++

I searched for a similar post but couldn't find something that could help me.
I' m trying to first write the integer containing the string length of a String and then write the string in the binary file.
However when i read data from the binary file i read integers with value=0 and my strings contain junk.
for example when i type 'asdfgh' for username and 'qwerty100' for password
i get 0,0 for both string lengths and then i read junk from the file.
This is how i write data to the file.
std::fstream file;
file.open("filename",std::ios::out | std::ios::binary | std::ios::trunc );
Account x;
x.createAccount();
int usernameLength= x.getusername().size()+1; //+1 for null terminator
int passwordLength=x.getpassword().size()+1;
file.write(reinterpret_cast<const char *>(&usernameLength),sizeof(int));
file.write(x.getusername().c_str(),usernameLength);
file.write(reinterpret_cast<const char *>(&passwordLength),sizeof(int));
file.write(x.getpassword().c_str(),passwordLength);
file.close();
Right below in the same function i read the data
file.open("filename",std::ios::binary | std::ios::in );
char username[51];
char password[51];
char intBuffer[4];
file.read(intBuffer,sizeof(int));
file.read(username,atoi(intBuffer));
std::cout << atoi(intBuffer) << std::endl;
file.read(intBuffer,sizeof(int));
std::cout << atoi(intBuffer) << std::endl;
file.read(password,atoi(intBuffer));
std::cout << username << std::endl;
std::cout << password << std::endl;
file.close();

When reading the data back in you should do something like the following:
int result;
file.read(reinterpret_cast<char*>(&result), sizeof(int));
This will read the bytes straight into the memory of result with no implicit conversion to int. This will restore the exact binary pattern written to the file in the first place and thus your original int value.

file.write(reinterpret_cast<const char *>(&usernameLength),sizeof(int));
This writes sizeof(int) bytes from the &usernameLength; which is binary representation of integer and depends on the computer architecture (little endian vs big endian).
atoi(intBuffer))
This converts ascii to integer and expect the input to contain character representation. e.g. intBuffer = { '1', '2' } - would return 12.
You can try to read it in the same way you have written -
*(reinterpret_cast<int *>(&intBuffer))
But it can potentially lead to unaligned memory access issues. Better use serialization formats like JSON, which would be helpful to read it in cross-platform ways.

Related

read double type data from binary file

I want to read double values from a binary file and store them in a vector. My values have the following form: 73.6634, 73.3295, 72.6764 and so on. I have this code that read and store data in memory. It works perfectly with char types since the read function has as input a char type (istream& read (char* s, streamsize n)). When I try to convert char type to double I get obviously integer values as 74, 73, 73 and so on. Is there any function which allows me to read directly double values or any other way of doing that?
If I change char * memblock to double * memblock and memblock = new char[] to memblock = new double[] , I get errors when compiling because again read function can only have char type input variable...
Thanks, I will appreciate your help :)
// reading an entire binary file
#include <iostream>
#include <fstream>
using namespace std;
int main () {
streampos size;
char * memblock;
int i=0;
ifstream file ("example.bin", ios::in|ios::binary|ios::ate);
if (file.is_open())
{
size = file.tellg();
cout << "size=" << size << "\n";
memblock = new char [size];
file.seekg (0, ios::beg);
file.read (memblock, size);
file.close();
cout << "the entire file content is in memory \n";
for(i=0; i<=10; i++)
{
double value = memblock [i];
cout << "value ("<<i<<")=" << value << "\n";
}
};
delete[] memblock;
}
else cout << "Unable to open file";
return 0;
}
(sorry about the "Like I'm 5" tone, I have no idea how much you know or don't)
Intro Binary Data
As you probably know, your computer doesn't think about numbers the way you do.
To start, the computer thinks about all numbers in a "base 2" system. But it doesn't stop there. Your computer also associates a fixed size to all the numbers. It creates a fixed "width" of the numbers. This size is (almost always) in bytes, or groups of 4 digits. This is (pretty close to) the equivalent of, when you do math on the numbers [1,15,30002] you look at all the numbers as
[
00000001
00000015
00030002
]
(doubles are a little weirder, but I'll get to that in a second).
Lets pretend for demonstrative purposes that each 2 characters above represent a single byte of data. This means that, in the computer, it thinks about the numbers like this:
[
00,00,00,01
00,00,00,15
00,03,00,02
]
File IO is all done along a "byte"(char) size: it typically has no idea what it is reading. It is up to YOU to figure that out. When writing binary data to a file (from an array atleast) we just dump it all. So in the example above, if we write it all to the file like this:
[00,00,00,01,00,00,00,15,00,03,00,02]
But you'll have to reinterpret it, back into the type of 4 bytes.
Luckily, this is stupidly easy to do in c++:
size = file.tellg();
cout << "size=" << size << "\n";
memblock = new char [size];
file.seekg (0, ios::beg);
file.read (memblock, size);
file.close();
cout << "the entire file content is in memory \n";
double* double_values = (double*)memblock;//reinterpret as doubles
for(i=0; i<=10; i++)
{
double value = double_values[i];
cout << "value ("<<i<<")=" << value << "\n";
}
What this basically does is say, interpret those bytes (char) as double.
edit: Endianness
Endiannessis (again, LI5) the order of which the computer writes the number. You are used to fifteen being written left to right (25, twenty-five) but it would be just as valid to write the number from right to left (52, five-twenty). We have big-endian (Most Significan Byte at lowest address) and little-endian (MSB at highest address).
This was never standardized between architectures or virtual machines...but if they disagree you can get weird results.
A special case: doubles
Not really in line with your question, but I have to point out that doubles are a special case: while reading and writing looks the same, the underlying data isn't just a simple number. I like to think of doubles as the "scientific notation" of computers. The double standard uses a base and power to get your number. in the same amount of space as a long it stores (sign)(a^x). This gives a much larger dynamic range of representation of the values, BUT you loose a certain sense of "human readability" of the bytes, and you get the SAME number of values so you can loose precision (though its relative precision, just like scientific notation, so you may not be able to distinguish from a billion and 1 from a billion and 2, but that 1 and 2 are TINY compared to the number).
writing data in C++
We might as well point out one quirk of C++: you gotta make sure when you write the data, it doesn't try to reformat the file to ascii. http://www.cplusplus.com/forum/general/21018/
The issue is this -- there is no guarantee that binary data written by another program (you said Matlab) can be read back by another program by merely casting, unless you know that the data written by this secondary program is the same as data written by your program.
It may not be sufficient to just cast -- you need to know the exact form of the data that is written. You need to know the binary format (for example IEEE), the number of bytes each value occupies, endianess, etc. so that you can interpret the data correctly.
What you should do is this -- write a small program that writes out the number you claim this file has to another file. Then look at the file you just wrote in a hex editor. Then take the file you're attempting to read that was created by MatLab and compare the contents side-by-side with the one you just wrote. Do you see a pattern? If not, then either you have to find one, or forget about it and get the two files to be the same.

trouble reading binary data

The reader and writer
#include<string>
#include<fstream>
#include<memory>
class BinarySearchFile{
BinarySearchFile::BinarySearchFile(std::string file_name){
// concatenate extension to fileName
file_name += ".dat";
// form complete table data filename
data_file_name = file_name;
// create or reopen table data file for reading and writing
binary_search_file.open(data_file_name, std::ios::binary); // create file
if(!binary_search_file.is_open()){
binary_search_file.clear();
binary_search_file.open(data_file_name, std::ios::out | std::ios::binary);
binary_search_file.close();
binary_search_file.open(data_file_name), std::ios::out | std::ios::in | std::ios::binary | std::ios::ate;
}
std::fstream binary_search_file;
void BinarySearchFile::writeT(std::string attribute){
if(binary_search_file){
binary_search_file.write(reinterpret_cast<char *>(&attribute), attribute.length() * 2);
}
}
std::string BinarySearchFile::readT(long filePointerLocation, long sizeOfData)
{
if(binary_search_file){
std::string data;
data.resize(sizeOfData);
binary_search_file.seekp(filePointerLocation);
binary_search_file.seekg(filePointerLocation);
binary_search_file.read(&data[0], sizeOfData);
return data;
}
};
The reader call
while (true){
std::unique_ptr<BinarySearchFile> data_file(new BinarySearchFile("classroom.dat"));
std::string attribute_value = data_file->read_data(0, 20);
}
The writer call
data_file->write_data("packard ");
The writer writes a total of 50 bytes
"packard 101 500 "
The reader is to read the first 20 bytes and the result is "X packard X" where X represents some malformed bytes of data. Why is the data read back in x-number of bytes corrupt?
You can't simply write data out by casting it's address to a char* and hoping to get anything useful. You have to define the binary format you want to use, and implement it. In the case of std::string, this may mean outputing the length in some format, then the actual data. Or in the case where fixed length fields are needed, forcing the string (or a copy of the string) to that length using std::string::resize, then outputting that, using std::string::data() to get your char const*.
Reading will, of course, be similar. You'll read the data into a std::vector<char> (or for fixed length fields, a char[]), and parse it.
binary_search_file.write(reinterpret_cast<char *>(&attribute), attribute.length() * 2);
It is incorrect to cast std::string to char* if you need char* you must use attribute.c_str().
std::string apart from string pointer contains other data members, for example, allocator, your code will write all of that data to file. Also I don't see any reason to multiply string length by 2. +1 makes sense if you want to output terminating zero.

Reading binary text into array?

I have a program that I need to read binary text into. I read the binary text via a redirection:
readData will be an executable made by my Makefile.
Example: readData < binaryText.txt
What I want to do is read the binary text, and store each character in the binary text file as a character inside a char array. The binary text is made up of 32 This is my attempt at doing so...
unsigned char * buffer;
char d;
cin.seekg(0, ios::end);
int length = cin.tellg();
cin.seekg(0, ios::beg);
buffer = new unsigned char [length];
while(cin.get(d))
{
cin.read((char*)&buffer, length);
cout << buffer[(int)d] << endl;
}
However, I keep getting a segmentation fault on this. Might anyone have any ideas on how to read binary text into a char array? Thanks!
I'm more a C programmer rather than a C++, but I think that you should have started your while loop
while(cin.get(&d)){
The easiest would be like this:
std::istringstream iss;
iss << std::cin.rdbuf();
// now use iss.str()
Or, all in one line:
std::string data(static_cast<std::istringstream&>(std::istringstream() << std::cin.rdbuf()).str());
Something like this should do the trick.
You retrieve the filename from the arguments and then read the whole file in one shot.
const char *filename = argv[0];
vector<char> buffer;
// open the stream
std::ifstream is(filename);
// determine the file length
is.seekg(0, ios_base::end);
std::size_t size = is.tellg();
is.seekg(0, std::ios_base::beg);
// make sure we have enough memory space
buffer.reserve(size);
buffer.resize(size, 0);
// load the data
is.read((char *) &buffer[0], size);
// close the file
is.close();
You then just need to iterate over the vector to read characters.
The reason why you are getting segmentation fault is because you are trying to access an array variable using a character value.
Problem:
buffer[(int)d] //d is a ASCII character value, and if the value exceeds the array's range, there comes the segfault.
If what you want is an character array, you already have that from cin.read()
Solution:
cin.read(reinterpret_cast<char*>(buffer), length);
If you want to print out, just use printf
printf("%s", buffer);
I used reinterpret_cast because it thought it is safe to convert to signed character pointer since most characters that are used would range from 0 ~ 127. You should know that character values from 128 to 255 would be converted wrongly.

Writing chars as a byte in C++

I'm writing a Huffman encoding program in C++, and am using this website as a reference:
http://algs4.cs.princeton.edu/55compression/Huffman.java.html
I'm now at the writeTrie method, and here is my version:
// write bitstring-encoded tree to standard output
void writeTree(struct node *tempnode){
if(isLeaf(*tempnode)){
tempfile << "1";
fprintf(stderr, "writing 1 to file\n");
tempfile << tempnode->ch;
//tempfile.write(&tempnode->ch,1);
return;
}
else{
tempfile << "0";
fprintf(stderr, "writing 0 to file\n");
writeTree(tempnode->left);
writeTree(tempnode->right);
}
}
Look at the line commented - let's say I'm writing to a text file, but I want to write the bytes that make up the char at tempnode->ch (which is an unsigned char, btw). Any suggestions for how to go about doing this? The line commented gives an invalid conversion error from unsigned char* to const char*.
Thanks in advance!
EDIT: To clarify: For instance, I'd like my final text file to be in binary -- 1's and 0's only. If you look at the header of the link I provided, they give an example of "ABRACADABRA!" and the resulting compression. I'd like to take the char (such as in the example above 'A'), use it's unsigned int number (A='65'), and write 65 in binary, as a byte.
A char is identical to a byte. The preceding line tempfile << tempnode->ch; already does exactly what you seem to want.
There is no overload of write for unsigned char, but if you want, you can do
tempfile.write(reinterpret_cast< char * >( &tempnode->ch ),1);
This is rather ugly, but it does exactly the same thing as tempfile << tempnode->ch.
EDIT: Oh, you want to write a sequence of 1 and 0 characters for the bits in the byte. C++ has an obscure trick for that:
#include <bitset>
tempfile << std::bitset< 8 >( tempnode->ch );

C++ Read Binary file containing numbers of type double

I have a binary file that contains numbers of a type double.
The example input file is available here: www.bobdanani.net/download/A.0.0
I would like to read the file and print the numbers in it.
This is what I have done:
char* buffer;
int length;
string filename = "A.0.0";
ifs.open (filename.c_str(), ios::in | ios::binary);
// get length of file:
ifs.seekg (0, ios::end);
length = ifs.tellg();
ifs.seekg (0, ios::beg);
// allocate memory:
buffer = new char [length];
// read data as a block:
ifs.read (buffer,length);
ifs.close();
cout.write (buffer,length);
cout << buffer << endl;
delete[] buffer;
I have also tried to use a type casting to double when printing the number, but I got strange characters. What is the best way to do this? I need the data of this binary file as an input to a function for a parallel program. But this is out of the scope of this question.
While I could be wrong, since you said the number is separated by a tab/space, I'm willing to be this is actually ASCII data, and not raw binary data. Therefore the best way to work with the floating point value would be to use the operator>> on the ifstream object and then push that into a double. That will do an automatic conversion of the input value into a double, where-as what you've done will merely copy the character bytes that compose a floating point value, but are not a floating point value themselves. Additionaly, if you were trying to output your buffer like a string, you haven't explicitly null-terminated it, so it's going to keep reading up the stack until it encounters a null-terminator or you get a segmentation fault due to accessing memory the OS isn't allowing you to access off the top of the stack. But either way, in the end, your buffer won't be a representation of a double data-type.
So you would have something like:
double my_double_val;
ifs.open (filename.c_str());
if (ifs)
{
ifs >> my_double_val;
}
else
{
cerr << "Error opening file" << endl;
}
ifs.close();
cout << "Double floating point value: " << my_double_val << endl;
cout.write (buffer,length);
Don't do this! The above is going to dump the binary data to standard output.
cout << buffer << endl;
Don't do this either! The above will dump the binary data up to the first byte that happens to be zero to standard output. If there is no such byte, this will just keep on going past the end of the buffer (so undefined behavior).
If the buffer truly does contain doubles, and only doubles, you can do something nasty like
double * dbuf = reinterpret_cast<double*>(buffer);
int dlength = length / sizeof(double);
Use the system function call in C++ (assuming you are using unix OS) and pass 'od -e filename' as the argument of the system function call. And then you can easily pipe the values that it returned and read them. This is one approach. Of course there are many other approaches to do this.