C++ reading a file in binary mode. Problems with END OF FILE - c++

I am learning C++and I have to read a file in binary mode. Here's how I do it (following the C++ reference):
unsigned values[255];
unsigned total;
ifstream in ("test.txt", ifstream::binary);
while(in.good()){
unsigned val = in.get();
if(in.good()){
values[val]++;
total++;
cout << val <<endl;
}
}
in.close();
So, I am reading the file byte per byte till in.good() is true. I put some cout at the end of the while in order to understand what's happening, and here is the output:
marco#iceland:~/workspace/huffman$ ./main
97
97
97
97
10
98
98
10
99
99
99
99
10
100
100
10
101
101
10
221497852
marco#iceland:~/workspace/huffman$
Now, the input file "test.txt" is just:
aaaa
bb
cccc
dd
ee
So everything works perfectly till the end, where there's that 221497852. I guess it's something about the end of file, but I can't figure the problem out.
I am using gedit & g++ on a debian machine(64bit).
Any help help will be appreciated.
Many thanks,
Marco

fstream::get returns an int-value. This is one of the problems.
Secondly, you are reading in binary, so you shouldn't use formatted streams. You should use fstream::read:
// read a file into memory
#include <iostream> // std::cout
#include <fstream> // std::ifstream
int main () {
std::ifstream is ("test.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
char * buffer = new char [length];
std::cout << "Reading " << length << " characters... ";
// read data as a block:
is.read (buffer,length);
if (is)
std::cout << "all characters read successfully.";
else
std::cout << "error: only " << is.gcount() << " could be read";
is.close();
// ...buffer contains the entire file...
delete[] buffer;
}
return 0;
}

This isn't the way istream::get() was designed to be used.
The classical idiom for using this function would be:
for ( int val = in.get(); val != EOF; val = in.get() ) {
// ...
}
or even more idiomatic:
char ch;
while ( in.get( ch ) ) {
// ...
}
The first loop is really inherited from C, where in.get() is
the equivalent of fgetc().
Still, as far as I can tell, the code you give should work.
It's not idiomatic, and it's not
The C++ standard is unclear what it should return if the
character value read is negative. fgetc() requires a value in
the range [0...UCHAR_MAX], and I think it safe to assume that
this is the intent here. It is, at least, what every
implementation I've used does. But this doesn't affect your
input. Depending on how the implementation interprets the
standard, the return value of in.get() must be in the range
[0...UCHAR_MAX] or [CHAR_MIN...CHAR_MAX], or it must be EOF
(typically -1). (The reason I'm fairly sure that the intent is
to require [0...UCHAR_MAX] is because otherwise, you may not
be able to distinguish end of file from a valid character.)
And if the return value is EOF (almost always
-1), failbit should be set, so in.good() would return
false. There is no case where in.get() would be allowed
to return 221497852. The only explication I can possibly think
of for your results is that your file has some character with
bit 7 set at the end of the file, that the implementation is
returning a negative number for this (but not end of file,
because it is a character), which results in an out of bounds
index in values[val], and that this out of bounds index
somehow ends up modifying val. Or that your implementation is
broken, and is not setting failbit when it returns end of
file.
To be certain, I'd be interested in knowing what you get from
the following:
std::ifstream in( "text.txt", std::ios_base::binary );
int ch = in.get();
while ( ch != std::istream::traits_type::eof() ) {
std::cout << ch << std::endl;
ch = in.get();
}
This avoids any issues of a possibly invalid index, and any type
conversions (although the conversion int to unsigned is well
defined). Also, out of curiosity (since I can only access VC++
here), you might try replacing in as follows:
std::istringstream in( "\n\xE5" );
I would expect to get:
10
233
(Assuming 8 bit bytes and an ASCII based code set. Both of
which are almost, but not quite universal today.)

I've eventually figured this out.
Apparently it seems the problem wasn't due to any code. The problem was gedit. It always appends a newline character at the end of file. This also happen with other editors, such as vim. For some editor this can be configured to not append anything, but in gedit this is apparently not possible. https://askubuntu.com/questions/13317/how-to-stop-gedit-gvim-vim-nano-from-adding-end-of-file-newline-char
Cheers to everyone who asked me,
Marco

Related

cannot read character value '26' from file(substitute character) in c++

hi,
I've just done something like below in c++: - ON - WINDOWS 10
//1. Serialize a number into string then write it to file
std::string filename = "D:\\Hello.txt";
size_t OriNumber = 26;
std::string str;
str.resize(sizeof(size_t));
memcpy(&str[0], reinterpret_cast<const char*>(&OriNumber), str.size());
std::ofstream ofs(filename);
ofs << str << std::endl;
ofs.close();
//2. Now read back the string from file and deserialize it
std::ifstream ifs(filename);
std::string str1{std::istreambuf_iterator<char>(ifs), std::istreambuf_iterator<char>()};
// 3. Expect that the string str1 will not be empty here.
size_t DeserializedNumber = *(reinterpret_cast<const size_t*>(str1.c_str()));
std::cout << DeserializedNumber << std::endl;
At step 3, I could not read the string from file, even if I opened the file with notepad++, it showed several characters. At last line we still have the value of DeserializedNumber got printed, but it is due to str1.c_str() is now a valid pointer with some garbage value.
After debugged the program, I found that std:ifstream will get the value -1(EOF) at beginning of the file, and as explanation of wikipedia, 26 is value of Substitue Character and sometime is considered as EOF.
My question is:
if I can't read value 26 from file as above, then how can serialization library serialize this value to bytes?
and Do we have some way to read/write/transfer this value properly if still serialize the value 26 as my way above?
thanks,

Using seekg() in text mode

While trying to read in a simple ANSI-encoded text file in text mode (Windows), I came across some strange behaviour with seekg() and tellg(); Any time I tried to use tellg(), saved its value (as pos_type), and then seek to it later, I would always wind up further ahead in the stream than where I left off.
Eventually I did a sanity check; even if I just do this...
int main()
{
std::ifstream dataFile("myfile.txt",
std::ifstream::in);
if (dataFile.is_open() && !dataFile.fail())
{
while (dataFile.good())
{
std::string line;
dataFile.seekg(dataFile.tellg());
std::getline(dataFile, line);
}
}
}
...then eventually, further into the file, lines are half cut-off. Why exactly is this happening?
This issue is caused by libstdc++ using the difference between the current remaining buffer with lseek64 to determine the current offset.
The buffer is set using the return value of read, which for a text mode file on windows returns the number of bytes that have been put into the buffer after endline conversion (i.e. the 2 byte \r\n endline is converted to \n, windows also seems to append a spurious newline to the end of the file).
lseek64 however (which with mingw results in a call to _lseeki64) returns the current absolute file position, and once the two values are subtracted you end up with an offset that is off by 1 for each remaining newline in the text file (+1 for the extra newline).
The following code should display the issue, you can even use a file with a single character and no newlines due to the extra newline inserted by windows.
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
For a file with a single a character I get the following output
2 3
Clearly off by 1 for the first call to tellg. After the second call the file position is correct as the end has been reached after taking the extra newline into account.
Aside from opening the file in binary mode, you can circumvent the issue by disabling buffering
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f;
f.rdbuf()->pubsetbuf(nullptr, 0);
f.open("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
but this is far from ideal.
Hopefully mingw / mingw-w64 or gcc can fix this, but first we'll need to determine who would be responsible for fixing it. I suppose the base issue is with MSs implementation of lseek which should return appropriate values according to how the file has been opened.
Thanks for this , though it's a very old post. I was stuck on this problem for more then a week. Here's some code examples on my site (the menu versions 1 and 2). Version 1 uses the solution presented here, in case anyone wants to see it .
:)
void customerOrder::deleteOrder(char* argv[]){
std::fstream newinFile,newoutFile;
newinFile.rdbuf()->pubsetbuf(nullptr, 0);
newinFile.open(argv[1],std::ios_base::in);
if(!(newinFile.is_open())){
throw "Could not open file to read customer order. ";
}
newoutFile.open("outfile.txt",std::ios_base::out);
if(!(newoutFile.is_open())){
throw "Could not open file to write customer order. ";
}
newoutFile.seekp(0,std::ios::beg);
std::string line;
int skiplinesCount = 2;
if(beginOffset != 0){
//write file from zero to beginoffset and from endoffset to eof If to delete is non-zero
//or write file from zero to beginoffset if to delete is non-zero and last record
newinFile.seekg (0,std::ios::beg);
// if primarykey < largestkey , it's a middle record
customerOrder order;
long tempOffset(0);
int largestKey = order.largestKey(argv);
if(primaryKey < largestKey) {
//stops right before "current..." next record.
while(tempOffset < beginOffset){
std::getline(newinFile,line);
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
newinFile.seekg(endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while( std::getline(newinFile,line) ) {
newoutFile << line << std::endl;
}
} else if (primaryKey == largestKey){
//its the last record.
//write from zero to beginoffset.
while((tempOffset < beginOffset) && (std::getline(newinFile,line)) ) {
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
} else {
throw "Error in delete key"
}
} else {
//its the first record.
//write file from endoffset to eof
//works with endOffset - 4 (but why??)
newinFile.seekg (endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while(std::getline(newinFile,line)) {
newoutFile << line << std::endl;
}
}
newoutFile.close();
newinFile.close();
}
beginOffset is a specific point in the file (beginning of each record) , and endOffset is the end of the record, calculated in another function with tellg (findFoodOrder) I did not add this as it may become very lengthy, but you can find it on my site (under: menu version 1 link) :
http://www.buildincode.com

How to move through each line in getline()

Whenever I run my code, I get the first line pulled out of the file, but only the first. Is there something I am missing? I ran into the issue when I implemented stringstream to try and more easily read in the lines of hex from the file and more quickly convert between a string to a hex value. It read in each line accordingly before, but now it is not. Am I missing something in the understanding of how getline() works?
ss is stringstream, fileIn is the file, hexInput is a string, memory[] is an array of short int, instruction is a short int, opCounter is an int...
string hexInput;
stringstream ss;
short int instruction;
ifstream fileIn ("proj1.txt");
if (fileIn.is_open())
{
while ( getline(fileIn, hexInput) )
{
ss << hex << hexInput;
ss >> instruction;
memory[opCounter] = instruction;
cout << hex << memory[opCounter] << '\t';
cout << opCounter << '\n';
ss.str("");
opCounter++;
}
fileIn.close();
}
else cout << "Unable to open file";
Above is the entire function (which was working before using stringstream) and below are the contents of the file.
4000
0033
0132
2033
4321
2137
D036
A00F
B003
C00C
3217
6217
E044
FFFF
6016
1013
FFFF
0
0
0
0
1
2
3
4
5
There are 26 lines and the last opCounter output says "19" in hex which makes me assume the file is being read line-by-line, but the stringstream never updated. This is my first C++ program and am new to a few of these features I am trying to implement.
Thanks for any help...
Your stringstream is created correctly from the first line. After outputting the number into instruction it will be eof though (you can check this with ss.eof()) because there is no data after the first number inside the stringstream.
replace ss.str(""); (which you don't need) by ss.clear(); which will reset the eof flag. Inputting the new line and reading from the stream will then work as expected.
Of course there is absolutely no need for a stringstream in the first place.
while ( fileIn.good() ) {
fileIn >> hex >> instruction;
[...]
}
works fine. It will read short ints in hexadecimal representation until one line cannot be interpreted as such. (Which incidentally is line 7 because D036 is too large to fit inside a short int - admittedly that is different from your current behaviour, but did you really want a silent failure? very useful at this point are again fileIn.eof() to check whether the read failed due to the stream being at the end of the file and fileIn.clear() to reset other fail-bits)
As requested:
The loop is commonly abbreviated as
while ( fileIn >> hex >> instruction ) {
[...]
}
but note, that if you want to check why the read failed, and continue if it was not an eof, the aforementioned loop is more suited to the task.

why does this C++ code print a after every line?

#include <iostream>
using namespace std;
int main() {
for (;;) {
char ch ;
if (cin.fail()) break;
cin.read((char*)&ch, sizeof(unsigned char));
cout<<hex<<(unsigned int)(unsigned char)ch<<endl;
cin.clear();
}
}
Why does this code always print a after every line? I just used any char as standard input. Added: I am trying to read unformatted input with read.
This code is reading a character at a time and writing out the value of the character in hexadecimal.
What you might not be expecting is that the pressing Enter also sends a character, which is read by your call to cin.read.
The a is the hexadecimal value of that character. So if you type hello and press Enter, the following will result from the cout statements:
68
65
6c
6c
6f
a
If you stop displaying the value in hexadecimal, you'll notice that it prints 10 after each entry.
The a that you see is simply the hex value of the '\n' character at the end of the line of input.
If you don't want to see that character, simply wrap the output line in an if statement that checks for that character and doesn't bother to do any output when it's seen:
if (ch != '\n') {
cout<<hex<<(unsigned int)(unsigned char)ch<<endl;
}
I don't even know what you're trying to accomplish with this. Don't use cin.read to read a single character. This loop should look more like this:
char ch;
while (std::cin.get(ch)) {
std::cout << std::hex << static_cast<unsigned>(ch) << std::endl;
}
As to why it prints something, are you sure it's not the character you're actually inputting?
GargantuChet has correctly explained why you get 'a's.
More generally, there are many other issues
1 for (;;)
2 {
3 char ch;
4 if (cin.fail()) break;
5 cin.read((char*)&ch, sizeof(unsigned char));
6 cout << hex << (unsigned int)(unsigned char)ch < <endl;
7 cin.clear();
8 }
On line 4, you see if cin.fail() is set, but with streams they will never start in a failed state - you have to attempt to do something for that to fail. In other words, you should do the read() then have a look at cin.fail(). In general, you should also use gcount() to check how many bytes could actually be read (e.g. despite asking for say 4 you might only get 2, which wouldn't be considered a failure), but here you're only requesting 1 character so it can be simpler.
Cleaning it up a bit but keeping the same basic approach:
1 for (char ch; cin.read(&ch, sizeof ch); )
2 cout << hex << (unsigned)(unsigned char)ch < <endl;
This works because read() returns a reference to cin, and evaluating the "truth" of cin is a shorthand for asking if the input it's performed so far has been error-free (more strictly, at least since the last clear() if you're using that).
Still, std::istream - of which std::cin is an instance - also has a function designed for getting characters, allowing the loop to be simplified to:
for (char ch; std::cin.get(ch); )
...
Aside
Remember that a for( ; ; ) control statement has three parts:
the initialisation code on the left which can also create new variables
the test: this happens before the 1st and every subsequent execution of the loop's statement(s)
code to be executed only after the each execution of the statement(s) and before repeating the test.
Because of this, tests like std::cin.get(ch) are called and evaluated for success as a condition for each iteration. The last solution listed above is equivalent to:
{
char ch;
while (std::cin.get(ch))
...
}

getline seems to not working correctly

Please tell me what am I doing wrong here. What I want to do is this:
1.Having txt file with four numbers and each of this numbers has 15 digits:
std::ifstream file("numbers.txt",std::ios::binary);
I'm trying to read those numbers into my array:
char num[4][15];
And what I'm thinking I'm doing is: for as long as you don't reach end of files write every line (max 15 chars, ending at '\n') into num[lines]. But this somewhat doesn't work. Firstly it reads correctly only first number, rest is just "" (empty string) and secondly file.eof() doesn't seems to work correctly either. In txt file which I'm presenting below this code I reached lines equal 156. What's going on?
for (unsigned lines = 0; !file.eof(); ++lines)
{
file.getline(num[lines],15,'\n');
}
So the whole "routine" looks like this:
int main()
{
std::ifstream file("numbers.txt",std::ios::binary);
char numbers[4][15];
for (unsigned lines = 0; !file.eof(); ++lines)
{
file.getline(numbers[lines],15,'\n');// sizeof(numbers[0])
}
}
This is contents of my txt file:
111111111111111
222222222222222
333333333333333
444444444444444
P.S.
I'm using VS2010 sp1
Do not use the eof() function! The canonical way to read lines is:
while( getline( cin, line ) ) {
// do something with line
}
file.getline() extracts 14 characters, filling in num[0][0] .. num[0][13]. Then it stores a '\0' in num[0][14] and sets the failbit on file because that's what it does when the buffer is full but terminating character not reached.
Further attempts to call file.getline() do nothing because failbit is set.
Tests for !file.eof() return true because the eofbit is not set.
Edit: to give a working example, best is to use strings, of course, but to fill in your char array, you could do this:
#include <iostream>
#include <fstream>
int main()
{
std::ifstream file("numbers.txt"); // not binary!
char numbers[4][16]={}; // 16 to fit 15 chars and the '\0'
for (unsigned lines = 0;
lines < 4 && file.getline(numbers[lines], 16);
++lines)
{
std::cout << "numbers[" << lines << "] = " << numbers[lines] << '\n';
}
}
tested on Visual Studio 2010 SP1
According to ifstream doc, reading stops either after n-1 characters are read or delim sign is found : first read would take then only 14 bytes.
It reads bytes : '1' (the character) is 0x41 : your buffer would be filled with 0x41 instead of 1 as you seem to expect, last character will be 0 (end of c-string)
Side note, your code doesn't check that lines doesn't go beyond your array.
Using getline supposes you're expecting text and you open the file in binary mode : seems wrong to me.
It looks like the '\n' in the end of the first like is not being considered, and remaining in the buffer. So in the next getline() it gets read.
Try adding a file.get() after each getline().
If one file.get() does not work, try two, because under the Windows default file encoding the line ends with '\n\r\' (or '\r\n', I never know :)
Change it to the following:
#include <cstring>
int main()
{
//no need to use std::ios_base::binary since it's ASCII data
std::ifstream file("numbers.txt");
//allocate one more position in array for the NULL terminator
char numbers[4][16];
//you only have 4 lines, so don't use EOF since that will cause an extra read
//which will then cause and extra loop, causing undefined behavior
for (unsigned lines = 0; lines < 4; ++lines)
{
//copy into your buffer that also includes space for a terminating null
//placing in if-statement checks for the failbit of ifstream
if (!file.getline(numbers[lines], 16,'\n'))
{
//make sure to place a terminating NULL in empty string
//since the read failed
numbers[lines][0] = '\0';
}
}
}