Is istream::seekg with ios_base::end reliable?

Is istream::seekg with ios_base::end reliable? - c++

Consider I created a file using this way:
std::ofstream osf("MyTextFile.txt");
string buffer="spam\neggs\n";
osf.write(buffer,buffer.length());
osf.close();
When I was trying to read that file using the following way, I realized that more characters than present was read.
std::ifstream is("MyTextFile.txt");
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
char * buffer = new char [length];
is.read (buffer,length);
//work with buffer
delete[] buffer;
For example, if the file contains spam\neggs\n, then this procedure reads 12 characters instead of 10. The first 10 chars are spam\neggs\n as expected but there are 2 more, which have the integer value 65533.
Moreover, this problem happens only when \n is present in the file. For example, there is no problem if file contains spam\teggs\t instead.
The question is;
Am I doing anything wrong? Or doesn't this procedure work as it should do?
Bonus Q: Can you suggest an alternative for reading the whole file at once?
Note: I found this way here.

The problem is that you wrote the string
"spam\neggs\n"
initially to an ofstream, without setting the std::ios::binary flag at the open (or on the initializator). This causes the runtime to translate to the "native text format", i. e., to convert each \n to \r\n on the output (as you are on Windows OS). So, after being written, the contents of your file was actually:
"spam\r\neggs\r\n"
(i. e., 12 chars). That was returned by
int length = is.tellg();
But, when you tried to read 12 chars you got
"spam\neggs\n"
back, because the runtime converted each \r\n back to \n.
As a final advice, please, please, don't use new char[length]... use std::string and reserve so you won't leak memory etc. And if your file can be very big, maybe it's not a good idea to slurp the whole file to memory at once, also.

Just an idea, since the number 2 corresponds to the count of \ns: Are you doing this on Windows? It might have something to do with the file actually containing \r\n. What happens if you open the file in binary mode (std::ios::binary)?

Can you suggest an alternative for reading the whole file at once?
Yes:
std::ifstream is("MyTextFile.txt");
std::string str( std::istreambuf_iterator<char>{is}, {} ); // requires <iterator>
str now contains the file. Does this solve your problem?

Related

c++: Istream counts every newline in a .txt file as two

I've got a slight problem. It appears that for some reason my function, when counting the size of a .txt file, counts a newline as it was two chars instead of one. Here's the function:
#define IN_FILE "in_mat.txt"
#define IN_BUF
#ifdef IN_BUF
void inBuf(char *(&b)){
streampos size;
ifstream f(IN_FILE, ios::in);
f.seekg(0,ios::end);
size=f.tellg();
b=new char[size];
f.seekg(0, ios::beg);
f.read(b, size);
f.close();
}
#endif
And here's the read file:
2 2
1 0
0 1
2 2
i 0
0 -i
2 2
0 1
-1 0
2 2
0 i
i 0
Earlier, i've put some couts, and it appears, that size=60, while the actual size is 49 (checked it), and the count of newlines in the file is 11, so exactly 60-49. Could somebody help me with that?

To add to the other answers, if you want to read special characters such as newline characters, you should open your file in binary mode, not text mode.
ifstream f(IN_FILE, ios::in | ios::binary);
If you don't open the file in binary mode, the actual characters that make up the '\n' are translated by the runtime to a single character (namely '\n'). So in text mode, you don't get the "real" version of the file in terms of all of the actual characters that the file consists of.
In addition, functions such as seekg() and tellg() will not work as expected with a file opened in text mode, or at the very least, will give you "wrong results" (actually not wrong to the functions themselves, but wrong if you're writing a program that tries to "hone in" on a position within the file). Again, the newline (and EOF) translation that is done under the hood by the runtime gets in the way of these functions working as you would expect them to.
On the other hand, a file opened in binary mode allows these functions to work as expected -- no translation of newline, or EOF -- whatever the individual bytes that makes up the file contents are, that is what you get.
The next thing you need to determine is whether it is a Unix text file or a Windows text file. Depending on which one it is, the line endings will be different.

Windows uses "\r\n" to return to the beginning of the line ('\r') and begin a new one ('\n').
To remove them from your count you have to read the whole file and count the number of '\r's.

Windows stores newlines as two characters: '\r\n', known as carriage return and line feed. That's why it's counted twice: there are actually two characters to be counted.

I am assuming that you are running on Windows. If not, disregard my answer below.
Windows stores new line characters in text files as two characters (CR LF or '\r' '\n'). So, seeking to the end of the file and calling tellg() will return the binary size of the file (60), not the text size (49).
In order to get the correct text size (49), one solution would be to count each new line character (11) and subtract that number from the total byte size.

Ifstream read strange behavior

I'm using C++ to read some chars from a file and store them in a buffer, however, I'm witnessing strange behavior with ifstream's read function.
To start with, I'm using this code snippet to get the file's length:
input.seekg (0, input.end);
int length = input.tellg();
input.seekg (0, input.beg);
After that, I call read() to get length bytes from the file.
It works fine, except for one thing:
If I use input.gcount() to see how many bytes were read, this number is much less that the length of the file we got above (but shows the actual nuber of bytes in the file).
Do you guys know anything about the difference between the file's length, found by using tellg(), and the number of bytes read afterwards, as reported by gcount()?
Sorry for any formatting issues (I'm using my phone).
Thanks a lot.
Edit :
That's the code (more or less) I'm using:
ifstream input("test.txt");
input.seekg (0, input.end);
int length = input.tellg();
input.seekg (0, input.beg);
input.read(buffer,length);
int extracted = input.gcount();

Fstream's tellg / seekg returning higher value than expected
Just found this link... It explains it nicely!
Turns out I need to search a little bit more before posting...
Thank you all for your answers

Output data not the same as input data

I'm doing some file io and created the test below, but I thought testoutput2.txt would be the same as testinputdata.txt after running it?
testinputdata.txt:
some plain
text
data with
a number
42.0
testoutput2.txt (In some editors its on seperate lines, but in others its all on one line)
some plain
਍ऀ琀攀砀琀ഀഀ
data with
਍ 愀  渀甀洀戀攀爀ഀഀ
42.0
int main()
{
//Read plain text data
std::ifstream filein("testinputdata.txt");
filein.seekg(0,std::ios::end);
std::streampos length = filein.tellg();
filein.seekg(0,std::ios::beg);
std::vector<char> datain(length);
filein.read(&datain[0], length);
filein.close();
//Write data
std::ofstream fileoutBinary("testoutput.dat");
fileoutBinary.write(&datain[0], datain.size());
fileoutBinary.close();
//Read file
std::ifstream filein2("testoutput.dat");
std::vector<char> datain2;
filein2.seekg(0,std::ios::end);
length = filein2.tellg();
filein2.seekg(0,std::ios::beg);
datain2.resize(length);
filein2.read(&datain2[0], datain2.size());
filein2.close();
//Write data
std::ofstream fileout("testoutput2.txt");
fileout.write(&datain2[0], datain2.size());
fileout.close();
}

Its working fine on my side, i have run your program on VC++ 6.0 and checked the output on notepad and MS Word. can you specify name of editor where you are facing problem.

You can't read Unicode text into a std::vector<char>. The char data type only works with narrow strings, and my guess is that the text file you're reading in (testinputdata.txt) is saved with either UTF-8 or UTF-16 encoding.
Try using the wchar_t type for your characters, instead. It is specifically designed to work with "wide" (or Unicode) characters.

Thou shalt verify thy input was successful! Although this would sort you out, you should also note that number of bytes in the file has no direct relationship to the number of characters being read: there can be less characters than bytes (think Unicode character using multiple bytes using UTF8 to be encoded) or vice versa (although the latter doesn't happen with any of the Unicode encodings). All you experience is that read() couldn't read as many characters as you'd asked it to read but write() happily wrote the junk you gave it.

Accessing to information in a ".txt" file and go to a determinated row

When accessing a text file, I want to read from a specific line. Let's suppose that my file has 1000 rows and I want to read row 330. Each row has a different number of characters and could possibly be quite long (let's say around 100,000,000 characters per row). I'm thinking fseek() can't be used effectively here.
I was thinking about a loop to track linebreaks, but I don't know how exactly how to implement it, and I don't know if that would be the best solution.
Can you offer any help?

Unless you have some kind of index saying "line M begins at position N" in the file, you have to read characters from the file and count newlines until you find the desired line.
You can easily read lines using std::getline if you want to save the contents of each line, or std::istream::ignore if you want to discard the contents of the lines you read until you find the desired line.

There is no way to know where row 330 starts in an arbitrary text file without scanning the whole file, finding the line breaks, and then counting.
If you only need to do this once, then scan. If you need to do it many times, then you can scan once, and build up a data structure listing where all of the lines start. Now you can figure out where to seek to to read just that line. If you're still just thinking about how to organize data, I would suggest using some other type of data structure for random access. I can't recommend which one without knowing the actual problem that you are trying to solve.

Create an index on the file. You can do this "lazily" but as you read a buffer full you may as well scan it for each character.
If it is a text file on Windows that uses a 2-byte '\n' then the number of characters you read to the point where the newline occurs will not be the offset. So what you should do is a "seek" after each call to getline().
something like:
std::vector< off_t > lineNumbers;
std::string line;
lineNumbers.push_back(0); // first line begins at 0
while( std::getline( ifs, line ) )
{
lineNumbers.push_back(ifs.tellg());
}
last value will tell you where EOF is.

I think you need to scan the file and count the \n occurrences since you find the desired line. If this is a frequent operation, and you are the only one you write the file, you can possibly mantain an index file containing such information side by side with the one containing the data, a sort of "poor-man-index", but can save a lot of time.

Try running fgets in a loop
/* fgets example */
#include <stdio.h>
int main()
{
FILE * pFile;
char mystring [100];
pFile = fopen ("myfile.txt" , "r");
if (pFile == NULL) perror ("Error opening file");
else {
fgets (mystring , 100 , pFile);
puts (mystring);
fclose (pFile);
}
return 0;
}

how to create files named with current time?

I want to create a series of files under "log" directory which every file named based on execution time. And in each of these files, I want to store some log info for my program like the function prototype that acts,etc.
Usually I use the hard way of fopen("log/***","a") which is not for this purpose.And I just write a timestamp function:
char* timeStamp(char* txt){
char* rc;
char timestamp[16];
time_t rawtime = time(0);
tm *now = localtime(&rawtime);
if(rawtime != -1) {
strftime(timestamp,16,"%y%m%d_%H%M%S",now);
rc = strcat(txt,timestamp);
}
return(rc);
}
But I don't know what to do next. Please help me with this!

Declare a char array big enough to hold 16 + "log/" (so 20 characters total) and initialize it to "log/", then use strcat() or something related to add the time string returned by your function to the end of your array. And there you go!
Note how the string addition works: Your char array is 16 characters, which means you can put in 15 characters plus a nul byte. It's important not to forget that. If you need a 16 character string, you need to declare it as char timestamp[17] instead. Note that "log/" is a 4 character string, so it takes up 5 characters (one for the nul byte at the end), but strcat() will overwrite starting at the nul byte at the end, so you'll end up with the right number. Don't count the nul terminator twice, but more importantly, don't forget about it. Debugging that is a much bigger problem.
EDIT: While we're at it, I misread your code. I thought it just returned a string with the time, but it appears that it adds the time to a string passed in. This is probably better than what I thought you were doing. However, if you wanted, you could just make the function do all the work - it puts "log/" in the string before it puts the timestamp. It's not that hard.

What about this:
#include <stdio.h>
#include <time.h>
#define LOGNAME_FORMAT "log/%Y%m%d_%H%M%S"
#define LOGNAME_SIZE 20
FILE *logfile(void)
{
static char name[LOGNAME_SIZE];
time_t now = time(0);
strftime(name, sizeof(name), LOGNAME_FORMAT, localtime(&now));
return fopen(name, "ab");
}
You'd use it like this:
FILE *file = logfile();
// do logging
fclose(file);
Keep in mind that localtime() is not thread-safe!

Steps to create (or write to) a sequential access file in C++:
1.Declare a stream variable name:
ofstream fout; //each file has its own stream buffer
ofstream is short for output file stream
fout is the stream variable name
(and may be any legal C++ variable name.)
Naming the stream variable "fout" is helpful in remembering
that the information is going "out" to the file.
2.Open the file:
fout.open(filename, ios::out);
fout is the stream variable name previously declared
"scores.dat" is the name of the file
ios::out is the steam operation mode
(your compiler may not require that you specify
the stream operation mode.)
3.Write data to the file:
fout<<grade<<endl;
fout<<"Mr";
The data must be separated with space characters or end-of-line characters (carriage return), or the data will run together in the file and be unreadable. Try to save the data to the file in the same manner that you would display it on the screen.
If the iomanip.h header file is used, you will be able to use familiar formatting commands with file output.
fout<<setprecision(2);
fout<<setw(10)<<3.14159;
4.Close the file:
fout.close( );
Closing the file writes any data remaining in the buffer to the file, releases the file from the program, and updates the file directory to reflect the file's new size. As soon as your program is finished accessing the file, the file should be closed. Most systems close any data files when a program terminates. Should data remain in the buffer when the program terminates, you may loose that data. Don't take the chance --- close the file!

Sounds like you have mostly solved it already - to create a file like you describe:
char filename[256] = "log/";
timeStamp( filename );
f = fopen( filename, "a" );
Or do you wish do do something more?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is istream::seekg with ios_base::end reliable? - c++

Just an idea, since the number 2 corresponds to the count of \ns: Are you doing this on Windows? It might have something to do with the file actually containing \r\n. What happens if you open the file in binary mode (std::ios::binary)?

Can you suggest an alternative for reading the whole file at once? Yes: std::ifstream is("MyTextFile.txt"); std::string str( std::istreambuf_iterator<char>{is}, {} ); // requires <iterator> str now contains the file. Does this solve your problem?

Related

c++: Istream counts every newline in a .txt file as two

Ifstream read strange behavior

Output data not the same as input data

Accessing to information in a ".txt" file and go to a determinated row

how to create files named with current time?

Categories

Resources