Binary input of text file - c++

Programming Principles and Practice says in the Chapter 11:
"In memory, we can represent the number 123 as an integer value (each int on 4 bytes) or as a string value (each character on 1 byte)".
I'm trying to understand what is stored in the memory, when reading binary a text file.
So I'm writing the content of the vector v.
If the input file contains this text: "test these words"
The output file shows these numbers: 1953719668 1701344288 1998611827 1935962735 168626701 168626701 168626701 168626701 168626701 168626701
I tried to convert each char of "test" to binary
and I have 01110100 01100101 01100101 01110100
and if I consider this as an integer of 4 bytes and convert it to decimal I get 1952802164, which is still different from the output.
How is this done correctly, so I can understand what's going on? Thanks!
#include<iostream>
#include<string>
#include<vector>
#include<algorithm>
#include<cmath>
#include<sstream>
#include <fstream>
#include <iomanip>
using namespace std;
template <class T>
char *as_bytes(T &i) // treat a T as a sequence of bytes
{
void *addr = &i; // get the address of the first byte of memory used to store the object
return static_cast<char *>(addr); // treat that memory as bytes
}
int main()
{
string iname{"11.9_in.txt"};
ifstream ifs {iname,ios_base::binary}; // note: stream mode
string oname{"11.9_out.txt"};
ofstream ofs {oname,ios_base::binary}; // note: stream mode
vector<int> v;
// read from binary file:
for(int x; ifs.read(as_bytes(x),sizeof(int)); ) // note: reading bytes
v.push_back(x);
for(int x : v)
ofs << x << ' ';
}

Let me assume you are using little-endian machine (for example, x86) and ASCII-compatible character code (such as Shift_JIS and UTF-8).
test is represented as 74 65 73 74 as binary data.
Using little-endian, higher bytes of muitl-byte integer is placed to higher address.
Therefore, reading thes as 4-byte integer, it will be interpreted as 0x74736574 and it is 1953719668 in decimal.

Related

Reading and writing int to a binary file in C++

I am unclear about how reading long integers work. If I say
long int a[1]={666666}
ofstream o("ex",ios::binary);
o.write((char*)a,sizeof(a));
to store values to a file and want to read them back as it is
long int stor[1];
ifstream i("ex",ios::binary);
i.read((char*)stor,sizeof(stor));
how will I be able to display the same number as stored using the information stored in multiple bytes of character array?
o.write does not write character, it writes bytes (if flagged with ios::binary). The char-pointer is used because a char has length 1 Byte.
o.write((char*)a,sizeof(a));
(char*) a is the adress of what o.write should write. Then it writes sizeof(a) bytes to a file. There are no characters stored, just bytes.
If you open the file in a Hex-Editor you would see something like this if a is int i = 10:
0A 00 00 00 (4 Byte, on x64).
Reading is analogue.
Here is a working example:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main (int argc, char* argv[]){
const char* FILENAM = "a.txt";
int toStore = 10;
ofstream o(FILENAM,ios::binary);
o.write((char*)&toStore,sizeof(toStore));
o.close();
int toRestore=0;
ifstream i(FILENAM,ios::binary);
i.read((char*)&toRestore,sizeof(toRestore));
cout << toRestore << endl;
return 0;
}
Sorry I took so long to see your question.
I think the difference between binary is the binary will read and write the file as is. But the non-binary (i.e. text) mode will fix up the end-of-line '\n' with carriage-return '\r'. The fix-up will change back and forth between '\n' and '\r', or "\n\r" or "\r\n" or leave it as '\n'. What it does depends on whether the target operating system is Mac, Windows, Unix, etc.
I think if you are reading and writing an integer, it will read and write your integer fine and it will look correct. But if some byte(s) of the integer look like '\r' and '\n', then the integer will not read back correctly from the file.
Binary assures that reading back an int will always be correct. But you want text mode to format a file to be read in a text editor such as Windows's Notepad.

reading bytes from a file to short / long integer

Hiho everyone! I'm trying to read first 4 bytes of a file and store them in integer variable.
here's what I'm doing:
#include <iostream>
#include <fstream>
#include <iomanip>
#include <cstring>
using namespace std;
int main(){
ifstream is;
is.open ("binary_file.dat", ios::binary );
char file_version[4];
is.read(file_version, 4);
int fv_int;
memcpy(&fv_int, file_version, sizeof(fv_int));
cout << fv_int;
}
But the result is not what I meant it to be. Program copies first byte of the file in correct position, but considers the rest of bytes to be 0's. Example:
First 4 bytes of my file:
10101010 00101100 00101100 00101100
What is the content of fv_int after program execution:
10101010 00000000 00000000 00000000
Is there any way to access specific bytes of integer? Or maybe better method of reading bytes from a file?
istream::read does not read exactly 4 bytes, it returns number of bytes read. Check return value, your file may be too short
Additional hint:
You could do is.read(reinterpret_cast<char*>(&fv_int), size_of(fv_int)); to reduce amount of code and add verbosity
If I feed your program with files which have the first 4 bytes, it reads & displays them perfect. For further diagnosis, change the last cout to: cout <<sizeof(int)<<" "<<hex<<fv_int<<endl;

Storing C++ binary output in xml

As per xsd, the supported binary types are hexbinary and base64 encoded binary data. http://www.w3schools.com/schema/schema_dtypes_misc.asp
My intention is to read raw byte contents from the memory and serialize it to the xml file. Hence, what data type above would describe the raw byte contents OR do i have to make sure that the raw byte contents are converted to hexdecimal to adhere to one of the 2 data types described above ?
You do have to convert the raw binary to hexadecimal (or base64) representation. Eg, if the value of the byte is 255 (in decimal), it's hex representation (as a string) would be "ff".
The (conventional) type to use for storing the raw input is unsigned char, so you can get the ranges 0-255 easily byte by byte, but for each byte of that array, you need two bytes in a signed char (or std::string) type to store the representation, and that is what you use in the XML.
Your framework probably has a method for converting raw bytes to Base64 or hex. If not, here's one method for hex:
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
int main (void) {
ostringstream os;
os.flags(ios::hex);
unsigned char data[] = { 0, 123, 11, 255, 66, 99 };
for (int i = 0; i < 6; i++) {
if (data[i] < 16) os << '0';
os << (int)data[i] << '|';
}
string formatted(os.str());
cout << formatted << endl;
return 0;
}
Outputs: 00|7b|0b|ff|42|63|
You need to encode the raw data to one of the two data types. This is to keep some random data from messing up the XML format, for example if you had a < embedded in the data somewhere.
You can choose whichever of the two is most convenient for you. The hexadecimal type is easier to write code for but produces a larger file - the ratio of bytes out to bytes in is 2:1, where it is 4:3 for the Base64 encoding. You shouldn't need to write your own code though, Base64 conversion functions are readily available. Here's a question that has some code in the answers: How do I base64 encode (decode) in C?
As an example of how the codings differ, here's the phrase "The quick brown fox jumps over the lazy dog." encoded both ways.
Hex:
54686520717569636b2062726f776e20666f78206a756d7073206f76657220746865206c617a7920646f672e
Base64:
VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=

Need Convert Binary file to Txt file

I have a dat(binary) file but i wish to convert this file into Ascii (txt) file using c++ but i am very new in c++ programming.so I juct opend my 2 files:myBinaryfile and myTxtFile but I don't know how to read data from that dat file and then how to write those data into new txt file.so i want to write a c+ codes that takes in an input containing binary dat file, and converts it to Ascii txt in an output file. if this possible please help to write this codes. thanks
Sorry for asking same question again but still I didn’t solve my problem, I will explain it more clearly as follows: I have a txt file called “A.txt”, so I want to convert this into binary file (B.dat) and vice verse process. Two questions:
1. how to convert “A.txt” into “B.dat” in c++
2. how to convert “B.dat” into “C.txt” in c++ (need convert result of the 1st output again into new ascii file)
my text file is like (no header):
1st line: 1234.123 543.213 67543.210 1234.67 12.000
2nd line: 4234.423 843.200 60543.232 5634.60 72.012
it have more than 1000 lines in similar style (5 columns per one line).
Since I don’t have experiences in c++, I am struggle here, so need your helps. Many Thanks
All files are just a stream of bytes. You can open files in binary mode, or text mode. The later simply means that it may have extra newline handling.
If you want your text file to contain only safe human readable characters you could do something like base64 encode your binary data before saving it in the text file.
Very easy:
Create target or destination file
(a.k.a. open).
Open source file in binary mode,
which prevents OS from translating
the content.
Read an octet (byte) from source
file; unsigned char is a good
variable type for this.
Write the octet to the destination
using your favorite conversion, hex,
decimal, etc.
Repeat at 3 until the read fails.
Close all files.
Research these keywords: ifstream, ofstream, hex modifier, dec modifier, istream::read, ostream::write.
There are utilities and applications that already perform this operation. On the *nix and Cygwin side try od, *octal dump` and pipe the contents to a file.
There is the debug utility on MS-DOS system.
A popular format is:
AAAAAA bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb cccccccccccccccc
where:
AAAAAA -- Offset from beginning of file in hexadecimal or decimal.
bb -- Hex value of byte using ASCII text.
c -- Character representation of byte, '.' if the value is not printable.
Please edit your post to provide more details, including an example layout for the target file.
Edit:
A complex example (not tested):
#include <iostream>
#include <fstream>
#include <cstdio>
#include <cstdlib>
using namespace std;
const unsigned int READ_BUFFER_SIZE = 1024 * 1024;
const unsigned int WRITE_BUFFER_SIZE = 2 * READ_BUFFER_SIZE;
unsigned char read_buffer[READ_BUFFER_SIZE];
unsigned char write_buffer[WRITE_BUFFER_SIZE];
int main(void)
{
int program_status = EXIT_FAILURE;
static const char hex_chars[] = "0123456789ABCDEF";
do
{
ifstream srce_file("binary.dat", ios::binary);
if (!srce_file)
{
cerr << "Error opening input file." << endl;
break;
}
ofstream dest_file("binary.txt");
if (!dest_file)
{
cerr << "Error creating output file." << endl;
}
// While no read errors from reading a block of source data:
while (srce_file.read(&read_buffer[0], READ_BUFFER_SIZE))
{
// Get the number of bytes actually read.
const unsigned int bytes_read = srce_file.gcount();
// Define the index and byte variables outside
// of the loop to maybe save some execution time.
unsigned int i = 0;
unsigned char byte = 0;
// For each byte that was read:
for (i = 0; i < bytes_read; ++i)
{
// Get source, binary value.
byte = read_buffer[i];
// Convert the Most Significant nibble to an
// ASCII character using a lookup table.
// Write the character into the output buffer.
write_buffer[i * 2 + 0] = hex_chars[(byte >> 8)];
// Convert the Least Significant nibble to an
// ASCII character and put into output buffer.
write_buffer[i * 2 + 1] = hex_chars[byte & 0x0f];
}
// Write the output buffer to the output, text, file.
dest_file.write(&write_buffer[0], 2 * bytes_read);
// Flush the contents of the stream buffer as a precaution.
dest_file.flush();
}
dest_file.flush();
dest_file.close();
srce_file.close();
program_status = EXIT_SUCCESS;
} while (false);
return program_status;
}
The above program reads 1MB chunks from the binary file, converts to ASCII hex into an output buffer, then writes the chunk to the text file.
I think you are misunderstanding that the difference between a binary file and a test file is in the interpretation of the contents.

C++ Text File Reading

So I need a little help, I've currently got a text file with following data in it:
myfile.txt
-----------
b801000000
What I want to do is read that b801 etc.. data as bits so I could get values for
0xb8 0x01 0x00 0x00 0x00.
Current I'm reading that line into a unsigned string using the following typedef.
typedef std::basic_string <unsigned char> ustring;
ustring blah = reinterpret_cast<const unsigned char*>(buffer[1].c_str());
Where I keep falling down is trying to now get each char {'b', '8' etc...} to really be { '0xb8', '0x01' etc...}
Any help is appreciated.
Thanks.
I see two ways:
Open the file as std::ios::binary and use std::ifstream::operator>> to extract hexadecimal double bytes after using the flag std::ios_base::hex and extracting to a type that is two bytes large (like stdint.h's (C++0x/C99) uint16_t or equivalent). See #neuro's comment to your question for an example using std::stringstreams. std::ifstream would work nearly identically.
Access the stream iterators directly and perform the conversion manually. Harder and more error-prone, not necessarily faster either, but still quite possible.
strtol does string (needs a nullterminated C string) to int with a specified base
Kind of a dirty way to do it:
#include <stdio.h>
int main ()
{
int num;
char* value = "b801000000";
while (*value) {
sscanf (value, "%2x", &num);
printf ("New number: %d\n", num);
value += 2;
}
return 0;
}
Running this, I get:
New number: 184
New number: 1
New number: 0
New number: 0
New number: 0