Reading a binary file and interpret as integers - c++

I'm trying to interpret a binary file as a series of integers and read the values to a vector.
However, the line ifs >> n; always returns 0 and eof is always false, the file position is not updated.
If I change the type to char it works but that is not what want to achieve.
How can I make the code work as I want?
int readAsNumber(const char* fileName, vector <int> &content)
{
ifstream ifs;
int n;
ifs.open(fileName, ifstream::in | ifstream::binary);
while (ifs.eof() == false) // Never terminates
{
ifs >> n; // Always sets n = 0
content.push_back(n); // Saves 0
}
ifs.close();
return 0;
}

The input operator >> reads and interprets the input as text.
If the file contains raw binary data you need to read as raw data as well:
int value;
while (ifs.read(reinterpret_cast<char*>(&value), sizeof value))
content.push_back(value);
Remember that storing raw binary data like this is not portable, and is really not recommended.

Related

Huffman Decoding Compressed File

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also encodes and compresses an output file.
I am now trying to decompress and decode my current output file which is opened as an input file and a new output file is to have the decoded message identical to the original text input file.
My thought process for this part of my assignment is to work backwards from the encoding function I have made and read 8 bits at a time and somehow decode the message by updating a variable (string n) which is an empty string at first, through recursion of the Huffman tree until I get a code to output to output file.
I have currently started the function but I am stuck and I am looking for some guidance in writing my current decodeOutput function. All help is appreciated.
My completed encodedOutput function and decodeOutput function is down below:
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile;//to read file
ifile.open(fileName, ios::binary);
if (!ifile) //to check if file is open or not
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get();//read one char from file and store it in int
char buffer = 0, bit_count = 0;
while (read != -1) {
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
//Work in progress
void decodeOutput(const string & fileName2, const string & fileName3, string code[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
string n = "";
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) {
if ((c >> p & 1) == '0') { // if bit is a 0
}
else if ((c >> p & 1) == '1') { // if bit is a 1
}
else { // Output string n (decoded character) to output file
ofile << n;
}
}
}
}
The decoding would be easier if you had the original Hoffman tree used to construct the codebook. But suppose you only have the codebook (i.e., the string code[256]) but not the original Hoffman tree. What you can do is the following:
Partition the codebook into groups of codewords with different lengths. Say the codebook consists of codewords with n different lengths: L0 < L1 < ... < Ln-1 .
Read (but do not consume yet) k bits from input file, with k increasing From L0 up to Ln-1, until you find a match between the input k bits and a codeword of length k = Li for some i.
Output the 8-bit character corresponding to the matching codeword, and consume the k bits from input file.
Repeat until all bits from input file are consumed.
If the codebook were constructed correctly, and you always look up the codewords in increasing length, you should never find a sequence of input bits which you cannot find a matching codeword.
Effectively, in terms of the Hoffman tree equivalence, every time you compare k input bits with a group of codewords of length k, you are checking whether a leaf at tree level-k contains an input-matching codeword; every time you increase k to the next longer group of codewords, you are walking down the tree to a higher level (say level-0 is the root).

ifstream wont read all integer

When i read TestData.txt file it gives me wrong output. What am i doing wrong. I am using int array so i can do MergeSort after saving data into array.
TestData.txt
-------------------
31791 564974 477059 269094 972335
739154 206345 634644 227684 398536
910177 507975 589785 67117 395140
598829 372499 364165 450187 996527
700285 263407 918021 661467 457544
656297 846316 221731 240676 68287
913 141702 845802 477617 109824
{
int myArray[1000];
int i;
//reading givin data
const char* filename= "TestData.txt";
ifstream file(filename);
if(file.is_open())
{
for(i = 0; i <=999; ++i)
{
file >> myArray[i];//storing data to array
}
}
Need to check if you ifstream is end of file, in that case you get garbage value from out of the file bound.
With One modification, the code would be OK.
Change:
for(i = 0; i <=999; ++i)
to:
for(i = 0; i <=999 && !file.eof(); ++i)
You are reading 1000 enties from your file which contains clearly less than 1000 integers.
The first values of your array must be correct, but after you reach the end of your file the operator>> will not ready anything.
For example here is one way to write it:
const char* filename= "TestData.txt";
std::vector<int> myArray;
std::ifstream file(filename);
if(file.is_open())
{
int v;
while(file >> v) {
myArray.push_back(v);
}
}
int if I'm not wrong can keep data from -32768 to 32767.
So if u have bigger values than that (which you have, from your source file), you won't have the results you are expecting.
btw, it would be nice to know also what output you are getting.

Writing and reading in and from a binary file in c++

I am a beginner in working with files. What I want to do in my code is to get a name from the user, and hide it in a .bmp picture. And also be able to get the name again from the file. But I want to change the characters into ASCII codes first ( that's what my assignment says)
What I tried to do is to change the name's characters to ASCII codes, and then add them to the end of the bmp picture which I'll open in binary mode. And after adding them, i want to read them from the file and be able to get the name again.
This is what I've done so far. But I am not getting a proper result. All i get is some meaningless characters. Is this code even right?
int main()
{
cout<<"Enter your name"<< endl;
char * Text= new char [20];
cin>> Text; // getting the name
int size=0;
int i=0;
while( Text[i] !='\0')
{
size++;
i++;
}
int * BText= new int [size];
for(int i=0; i<size; i++)
{
BText[i]= (int) Text[i]; // having the ASCII codes of the characters.
}
fstream MyFile;
MyFile.open("Picture.bmp, ios::in | ios::binary |ios::app");
MyFile.seekg (0, ios::end);
ifstream::pos_type End = MyFile.tellg(); //End shows the end of the file before adding anything
// adding each of the ASCII codes to the end of the file.
int j=0;
while(j<size)
{
MyFile.write(reinterpret_cast <const char *>(&BText[j]), sizeof BText[j]);
j++;
}
MyFile.close();
char * Text2= new char[size*8];
MyFile.open("Picture.bmp, ios:: in , ios:: binary");
// putting the pointer to the place where the main file ended and start reading from there.
MyFile.seekg(End);
MyFile.read(Text2,size*8);
cout<<Text2<<endl;
MyFile.close();
system("pause");
return 0;
}
Many flaws are in your code, one important is:
MyFile.open("Picture.bmp, ios::in | ios::binary |ios::app");
Must be
MyFile.open("Picture.bmp", ios::in | ios::binary |ios::app);
^ ^
| |
+-----------+
Second, use std::string instead of C-style strings:
char * Text= new char [20];
should be
std::string Text;
Also, use std::vector to make a array:
int * BText= new int [size];
should be
std::vector<int> BText(size);
And so on...
You write int (which is 32 bits) but read char (which is 8 bits).
Why not write the string as-is? There's no need to convert it to an integer array.
And also, you don't terminate the array you read into.
your write operation is incorrect, you should pass the complete text directly
MyFile.write(reinterpret_cast <const char *>(BText), sizeof (*BText));
Also, casting your string to ints and back to chars will insert spaces between your characters which you don't take into account in your reading operation

Finding specific primitives within a binary file

Is there any ways to find a specific primitive within a binary file(such as fread in MATLAB or BinaryReadLists in Mathematica)? Specifically, I want to scan my file until it reaches, say a int8_t precision number, then store it in a variable, then scan for another primitive(unsigned char, double, etc..)?
I am rewriting code from MATLAB that does this, so the format of the file is known.
I want to read n bytes of only the specified type (32-bit int, char, ..) in a file. Ex: Read only the first 12 bytes of my file if they return to be 8-bit integers
Maybe the solution to your problem is in understanding the difference between these two doc pages:
http://www.mathworks.com/help/matlab/ref/fread.html
http://www.cplusplus.com/reference/cstdio/fread/
Both versions of fread let you pull in an array of items from a binary file. I'm assuming from your question that you know the size and shape of the array you need.
#include <stdio.h>
int main() {
const size_t NumElements = 128; // hopefully you know
int8_t myElements[NumElements];
FILE *fp = fopen("mydata.bin", "rb");
assert(fp != NULL);
size_t countRead = fread(myElements, sizeof(int8_t), NumElements, fp);
assert(countRead = NumElements);
// do something with myElements
}
Your question makes no sense to me, but here's a bunch of random information on how to read a binary file:
struct myobject { //so you have your data
char weight;
double value;
};
//for primitives in a binary format you simply read it in
std::istream& operator>>(std::istream& in, myobject& data) {
return in >> data.weight >> data.value;
//we don't really care about failures here
}
//if you don't know the length, that's harder
std::istream& operator>>(std::istream& in, std::vector<myobject>& data) {
int size;
in >> size; //read the length
data.clear();
for(int i=0; i<size; ++i) { //then read that many myobject instances
myobject obj;
if (in >> obj)
data.push_back(obj);
else //if the stream fails, stop
break;
}
return in;
}
int main() {
std::ifstream myfile("input.txt", std::ios_base::binary); //open a file
std::vector<myobject> array;
if (myfile >> array) //read the data!
//well that was easy
else
std::cerr << "error reading from file";
return 0;
};
Also, you can use the .seek(position) member of ifstream to skip directly to a specific point in the file, if you happen to know where to find the data you're looking for.
Oh, you just want to read the first 12 bytes of the file as 8 bit integers, and then the next 12 bytes as int32_t?
int main() {
std::ifstream myfile("input.txt", std::ios_base::binary); //open a file
std::vector<int8_t> data1(12); //array of 12 int8_t
for(int i=0; i<12; ++i) //for each int
myfile >> data1[i]; //read it in
if (!myfile) return 1; //make sure the read succeeded
std::vector<int32_t> data2(3); //array of 3 int32_t
for(int i=0; i<3; ++i) //for each int
myfile >> data2[i]; //read it in
if (!myfile) return 1; //make sure the read succeeded
//processing
}

Finding end of file while reading from it

void graph::fillTable()
{
ifstream fin;
char X;
int slot=0;
fin.open("data.txt");
while(fin.good()){
fin>>Gtable[slot].Name;
fin>>Gtable[slot].Out;
cout<<Gtable[slot].Name<<endl;
for(int i=0; i<=Gtable[slot].Out-1;i++)
{
**//cant get here**
fin>>X;
cout<<X<<endl;
Gtable[slot].AdjacentOnes.addFront(X);
}
slot++;
}
fin.close();
}
That's my code, basically it does exactly what I want it to but it keeps reading when the file is not good anymore. It'll input and output all the things I'm looking for, and then when the file is at an end, fin.good() apparently isn't returning false. Here is the text file.
A 2 B F
B 2 C G
C 1 H
H 2 G I
I 3 A G E
F 2 I E
and here is the output
A
B
F
B
C
G
C
H
H
G
I
I
A
G
E
F
I
E
Segmentation fault
-
Here's is Gtable's type.
struct Gvertex:public slist
{
char Name;
int VisitNum;
int Out;
slist AdjacentOnes;
//linked list from slist
};
I'm expecting it to stop after outputting 'E' which is the last char in the file. The program never gets into the for loop again after reading the last char. I can't figure out why the while isn't breaking.
Your condition in the while loop is wrong. ios::eof() isn't
predictive; it will only be set once the stream has attempted
(internally) to read beyond end of file. You have to check after each
input.
The classical way of handling your case would be to define a >>
function for GTable, along the lines of:
std::istream&
operator>>( std::istream& source, GTable& dest )
{
std::string line;
while ( std::getline( source, line ) && line.empty() ) {
}
if ( source ) {
std::istringstream tmp( line );
std::string name;
int count;
if ( !(tmp >> name >> count) ) {
source.setstate( std::ios::failbit );
} else {
std::vector< char > adjactentOnes;
char ch;
while ( tmp >> ch ) {
adjactentOnes.push_back( ch );
}
if ( !tmp.eof() || adjactentOnes.size() != count ) {
source.setstate( std::ios::failbit );
} else {
dest.Name = name;
dest.Out = count;
for ( int i = 0; i < count; ++ i ) {
dest.AdjacentOnes.addFront( adjactentOnes[ i ] );
}
}
}
}
return source;
}
(This was written rather hastily. In real code, I'd almost certainly
factor the inner loop out into a separate function.)
Note that:
We read line by line, in order to verify the format (and to allow
resynchronization in case of error).
We set failbit in the source stream in case of an input error.
We skip empty lines (since your input apparently contains them).
We do not modify the target element until we are sure that the input
is correct.
One we have this, it is easy to loop over all of the elements:
int slot = 0;
while ( slot < GTable.size() && fin >> GTable[ slot ] ) {
++ slot;
}
if ( slot != GTable.size )
// ... error ...
EDIT:
I'll point this out explicitly, because the other people responding seem
to have missed it: it is absolutely imperative to ensure that you have
the place to read into before attempting the read.
EDIT 2:
Given the number of wrong answers this question is receiving, I would
like to stress:
Any use of fin.eof() before the input is known to fail is wrong.
Any use of fin.good(), period, is wrong.
Any use of one of the values read before having tested that the input
has succeeded is wrong. (This doesn't prevent things like fin >> a >>
b, as long as neither a or b are used before the success is
tested.)
Any attempt to read into Gtable[slot] without ensuring that slot
is in bounds is wrong.
With regards to eof() and good():
The base class of istream and ostream defines three
“error” bits: failbit, badbit and eofbit. It's
important to understand when these are set: badbit is set in case of a
non-recoverable hardward error (practically never, in fact, since most
implementations can't or don't detect such errors); and failbit is set in
any other case the input fails—either no data available (end of
file), or a format error ("abc" when inputting an int, etc.).
eofbit is set anytime the streambuf returns EOF, whether this
causes the input to fail or not! Thus, if you read an int, and the
stream contains "123", without trailing white space or newline,
eofbit will be set (since the stream must read ahead to know where the
int ends); if the stream contains "123\n", eofbit will not be set.
In both cases, however, the input succeeds, and failbit will not be
set.
To read these bits, there are the following functions (as code, since I
don't know how to get a table otherwise):
eof(): returns eofbit
bad(): returns badbit
fail(): returns failbit || badbit
good(): returns !failbit && !badbit && !eofbit
operator!(): returns fail()
operator void*(): returns fail() ? NULL : this
(typically---all that's guaranteed is that !fail() returns non-null.)
Given this: the first check must always be fail() or one of the
operator (which are based on fail). Once fail() returns true, we
can use the other functions to determine why:
if ( fin.bad() ) {
// Serious problem, disk read error or such.
} else if ( fin.eof() ) {
// End of file: there was no data there to read.
} else {
// Formatting error: something like "abc" for an int
}
Practically speaking, any other use is an error (and any use of good()
is an error—don't ask me why the function is there).
Slightly slower but cleaner approach:
void graph::fillTable()
{
ifstream fin("data.txt");
char X;
int slot=0;
std::string line;
while(std::getline(fin, line))
{
if (line.empty()) // skip empty lines
continue;
std::istringstream sin(line);
if (sin >> Gtable[slot].Name >> Gtable[slot].Out && Gtable[slot].Out > 0)
{
std::cout << Gtable[slot].Name << std::endl;
for(int i = 0; i < Gtable[slot].Out; ++i)
{
if (sin >> X)
{
std::cout << X << std::endl;
Gtable[slot].AdjacentOnes.addFront(X);
}
}
slot++;
}
}
}
If you still have issues, it's not with file reading...
The file won't fail until you actually read from past the end of file. This won't occur until the fin>>Gtable[slot].Name; line. Since your check is before this, good can still return true.
One solution would be to add additional checks for failure and break out of the loop if so.
fin>>Gtable[slot].Name;
fin>>Gtable[slot].Out;
if(!fin) break;
This still does not handle formatting errors in the input file very nicely; for that you should be reading line by line as mentioned in some of the other answers.
Try moving first two reads in the while condition:
// assuming Gtable has at least size of 1
while( fin>>Gtable[slot].Name && fin>>Gtable[slot].Out ) {
cout<<Gtable[slot].Name<<endl;
for(int i=0; i<=Gtable[slot].Out-1;i++) {
fin>>X;
cout<<X<<endl;
Gtable[slot].AdjacentOnes.addFront(X);
}
slot++;
//EDIT:
if (slot == table_size) break;
}
Edit: As per James Kanze's comment, you're taking an adress past the end of Gtable array, which is what causes segfault. You could pass the size of Gtable as argument to your fillTable() function (f.ex. void fillTable(int table_size)) and check slot is in bounds before each read.
*Edited in response to James' comment - the code now uses a good() check instead of a
!eof() check, which will allow it to catch most errors. I also threw in an is_open()
check to ensure the stream is associated with the file.*
Generally, you should try to structure your file reading in a loop as follows:
ifstream fin("file.txt");
char a = '\0';
int b = 0;
char c = '\0';
if (!fin.is_open())
return 1; // Failed to open file.
// Do an initial read. You have to attempt at least one read before you can
// reliably check for EOF.
fin >> a;
// Read until EOF
while (fin.good())
{
// Read the integer
fin >> b;
// Read the remaining characters (I'm just storing them in c in this example)
for (int i = 0; i < b; i++)
fin >> c;
// Begin to read the next line. Note that this will be the point at which
// fin will reach EOF. Since it is the last statement in the loop, the
// file stream check is done straight after and the loop is exited.
// Also note that if the file is empty, the loop will never be entered.
fin >> a;
}
fin.close();
This solution is desirable (in my opinion) because it does not rely on adding random
breaks inside the loop, and the loop condition is a simple good() check. This makes the
code easier to understand.