Reading a fixed number of chars with << on an istream - c++

I was trying out a few file reading strategies in C++ and I came across this.
ifstream ifsw1("c:\\trys\\str3.txt");
char ifsw1w[3];
do {
ifsw1 >> ifsw1w;
if (ifsw1.eof())
break;
cout << ifsw1w << flush << endl;
} while (1);
ifsw1.close();
The content of the file were
firstfirst firstsecond
secondfirst secondsecond
When I see the output it is printed as
firstfirst
firstsecond
secondfirst
I expected the output to be something like:
fir
stf
irs
tfi
.....
Moreover I see that "secondsecond" has not been printed. I guess that the last read has met the eof and the cout might not have been executed. But the first behavior is not understandable.

The extraction operator has no concept of the size of the ifsw1w variable, and (by default) is going to extract characters until it hits whitespace, null, or eof. These are likely being stored in the memory locations after your ifsw1w variable, which would cause bad bugs if you had additional variables defined.
To get the desired behavior, you should be able to use
ifsw1.width(3);
to limit the number of characters to extract.

It's virtually impossible to use std::istream& operator>>(std::istream&, char *) safely -- it's like gets in this regard -- there's no way for you to specify the buffer size. The stream just writes to your buffer, going off the end. (Your example above invokes undefined behavior). Either use the overloads accepting a std::string, or use std::getline(std::istream&, std::string).
Checking eof() is incorrect. You want fail() instead. You really don't care if the stream is at the end of the file, you care only if you have failed to extract information.
For something like this you're probably better off just reading the whole file into a string and using string operations from that point. You can do that using a stringstream:
#include <string> //For string
#include <sstream> //For stringstream
#include <iostream> //As before
std::ifstream myFile(...);
std::stringstream ss;
ss << myFile.rdbuf(); //Read the file into the stringstream.
std::string fileContents = ss.str(); //Now you have a string, no loops!

You're trashing the memory... its reading past the 3 chars you defined (its reading until a space or a new line is met...).
Read char by char to achieve the output you had mentioned.
Edit : Irritate is right, this works too (with some fixes and not getting the exact result, but that's the spirit):
char ifsw1w[4];
do{
ifsw1.width(4);
ifsw1 >> ifsw1w;
if(ifsw1.eof()) break;
cout << ifsw1w << flush << endl;
}while(1);
ifsw1.close();

The code has undefined behavior. When you do something like this:
char ifsw1w[3];
ifsw1 >> ifsw1w;
The operator>> receives a pointer to the buffer, but has no idea of the buffer's actual size. As such, it has no way to know that it should stop reading after two characters (and note that it should be 2, not 3 -- it needs space for a '\0' to terminate the string).
Bottom line: in your exploration of ways to read data, this code is probably best ignored. About all you can learn from code like this is a few things you should avoid. It's generally easier, however, to just follow a few rules of thumb than try to study all the problems that can arise.
Use std::string to read strings.
Only use fixed-size buffers for fixed-size data.
When you do use fixed buffers, pass their size to limit how much is read.
When you want to read all the data in a file, std::copy can avoid a lot of errors:
std::vector<std::string> strings;
std::copy(std::istream_iterator<std::string>(myFile),
std::istream_iterator<std::string>(),
std::back_inserter(strings));

To read the whitespace, you could used "noskipws", it will not skip whitespace.
ifsw1 >> noskipws >> ifsw1w;
But if you want to get only 3 characters, I suggest you to use the get method:
ifsw1.get(ifsw1w,3);

Related

Why is the last character of character array is getting excluded?

#include<iostream>
using namespace std;
int main()
{
int n;
cin>>n;
cin.ignore();
char arr[n+1];
cin.getline(arr,n);
cin.ignore();
cout<<arr;
return 0;
}
Input:
11
of the year
Output:
of the yea
I'm already providing n+1 for the null character. Then why is the last character getting excluded?
You allocated n+1 characters for your array, but then you told getline that there were only n characters available. It should be like this:
int n;
cin>>n;
cin.ignore();
char arr[n+1];
cin.getline(arr,n+1); // change here
cin.ignore();
cout<<arr;
Per cppreference.com:
https://en.cppreference.com/w/cpp/io/basic_istream/getline
Behaves as UnformattedInputFunction. After constructing and checking the sentry object, extracts characters from *this and stores them in successive locations of the array whose first element is pointed to by s, until any of the following occurs (tested in the order shown):
end of file condition occurs in the input sequence (in which case setstate(eofbit) is executed)
the next available character c is the delimiter, as determined by Traits::eq(c, delim). The delimiter is extracted (unlike basic_istream::get()) and counted towards gcount(), but is not stored.
count-1 characters have been extracted (in which case setstate(failbit) is executed).
If the function extracts no characters (e.g. if count < 1), setstate(failbit)is executed.
In any case, if count > 0, it then stores a null character CharT() into the next successive location of the array and updates gcount().
In your case, n=11. You are allocating n+1 (12) chars, but telling getline() that only n (11) chars are available, so it reads only n-1 (10) chars into the array and then terminates the array with '\0' in the 11th char. That is why you are missing the last character.
of the year
^
10th char, stops here
You need to +1 when calling getline(), to match your actual array size:
cin.getline(arr,n+1);
john's answer should fix your issue. Variable-length arrays (your char arr[n+1]) are not part of the C++ standard, for justified reasons. Yet I've taken a few hours of my time to go way out of question's scope and create the...
Student's guide to C++ I/O
...and I/O in general, with an emphasis on the I part. Fear not, do it the C++ way! The following snippets should be compiled with a standard-conforming C++ compiler.
C++ I/O & standard library
Textual input
This is the recommended way of reading UTF-8 encoded strings in C++, the most widespread text encoding. We will use std::string for storage, which is the de-facto way for holding UTF-8 encoded strings, and std::getline for the reading itself.
#include <iostream> // std::cin, std::cout, std::ws
#include <string> // std::string, std::getline
int main() {
int size;
// std::ws ignores all whitespace in the stream,
// until the first non-whitespace character.
// it's prettier and handles cases a simple .ignore() does not.
std::cin >> size >> std::ws;
std::string input;
std::getline(std::cin, input);
// This condition will most certainly be true (output will be 1).
std::cout << (size == input.size()) << '\n';
}
std::string is dynamically allocated, or, as you may hear, on the heap. This is a broad subject, so feel free to venture on your own, from this given starting point! How does this help us? We can store strings of sizes unknown ahead of time on the heap, because we can always reallocate a bigger buffer! std::getline allocates and reallocates as it reads the input until newline is reached, so you can read without knowing a size beforehand. Your size variable will most probably be equal with the size of the string under the assumption that this is a school exercise where the input length is provided as you're probably not taught of dynamic memory. For good reason, though - it's complex and would needlessly distract from the actual subject (algorithms, data structures etc.). Good to keep in mind: std::strings, unlike C-style strings, are not null terminated, but you can get a null-terminated C-style string from an std::string by calling the .c_str() method.
Binary data
What's binary data? Everything that's not text: images, videos, music, 2003 MS Word documents (the .doc ones, wait 'til you see what .docx is) and many others. It's customary to store binary as raw bytes, which is a fancy way to say numbers. unsigned char is the C/C++ type used to represent these raw bytes (C++17 introduces std::byte for this purpose. To work with data from binary input we need to store it somewhere in memory - either on the stack, or on the heap. We could store the whole input at once, but binary files are considered too large for this (and, really, are - think about the size of a movie!), so we usually read it in chunks - that means, we read only a finite part of it at a time (say 256 characters, that's our buffer), and we keep reading until we reach the end of the input (usually called end-of-file or, short, EOF). As a rule of thumb, when a buffer is small and static (doesn't need to be resized, as our string above), we can store it on the stack. If any of those conditions is not met, it goes on the heap. We should note that the notions of small and large are quite context dependent - compiler, OS, hardware, runtime environment (see this thread on stack size limits and embedded systems). The buffer size you'll choose is also task-specific, so there's no rule here, too. Let's see some code now!
#include <array> // std::array
#include <fstream> // std::ifstream, std::ofstream
int main() {
// We open this file in binary mode.
// The default mode may modify the input.
std::ifstream input{"some_image.jpg", std::ios::binary};
// 256 is our buffer size, unsigned char is the array type.
// This is the C++ way of `unsigned char buffer[256]`.
std::array<unsigned char, 256> buffer;
while (input.read(buffer.data(), buffer.size())) {
// Buffer is filled, do something with it
}
// At this point, either EOF is reached or an error occurred.
if (input.eof()) {
// Less characters than the buffer's size have been read.
// .gcount() returns the number of characters read by
// the last operation.
const std::streamsize chunk_size = input.gcount();
// Do something with these characters, as in the loop.
// Valid range to access in the buffer is [0, chunk_size).
// chunk_size can be 0, too. In that case, there is no more data
// to handle.
} else {
// Some other failure, handle error.
}
}
This snippet is reading through a file using a small, stack-allocated buffer of 256 bytes. std::array makes usage convenient and safe with its methods - read the linked docs! If we want to use a large buffer (say, 16MB), we replace the std::array with an std::vector:
std::vector buffer(1 << 24); // 1 << 24 gives 16MB in bytes
Rest is the same. You could also use std::string here, too, as std::string does not imply/force UTF-8 encoding of input. It's useful to have a convention that easily differentiates between binary and text data, in code.
Something to note is that reading in smaller chunks uses less space, but takes more time - taking bytes from a file involves making OS system calls and moving disks or electrons, when reading from a hard drive or an SSD, respectively. C++'s fstream objects already do buffering for you to speed up reads, which is usually a much-needed optimization. You'll know if this affects you.
Another thing to note is the EOF and error handling, using the .eof() method. We have omitted error handling in the textual input retrieval, but here we are forced into doing it, if we don't want to lose data. When EOF is reached, usually less bytes than the buffer size have been read, so we need a way to know how much of the buffer was filled with data. This is what .gcount() tells us. Depending on the program you're making, you may deem the EOF error as "unexpected" if the buffer is partially filled (.gcount() returns a non-0 value) - for example, the data read is incomplete, according to the rules it was assumed to be created after, or in other words, the end of file was reached before the data was supposed to end. Other than that, EOF is a condition that all files are in after being fully read.
C-style I/O
This may look closer to what's taught in school. As we've explained the general concepts above, this section will be richer in coding and explanations of code. We still use C++ as a language, so the C++ version of the C headers and the std namespace will be used - to have the code that follows work in a C compiler, replace the <csomething> headers with <something.h> and remove the std:: namespace prefix from types and functions. Let's dive into it!
Textual input
The equivalent of a C++ stream (std::cin, std::fstream etc.) in C is the std::FILE. FILEs are buffered by default, as are C++ streams. We'll use std::fscanf for reading the size of the input, which is just scanf but it takes as parameter the stream you read from, and std::fgets for reading the text line.
#include <cstdio> // std::FILE, std::fscanf, std::fgets, stdin
#include <cstring> // std::strcspn
// discard_whitespace does what std::ws did above.
// It consumes all whitespace before a non-whitespace
// character from stream f.
void discard_whitespace(std::FILE* f) {
char discard;
// The leading space in the format string
// tells fscanf to consume all whitespace.
std::fscanf(f, " %c", &discard);
}
int main() {
int size;
// stdin is a macro, doesn't have a namespace,
// hence no std:: prefix.
std::fscanf(stdin, "%d", &size);
// fscanf, like std::cin, doesn't consume whitespace
discard_whitespace(stdin);
// Your school exercise will probably have a size limit for the input.
// We consider it to be 256.
const int SIZE_UPPER_BOUND = 256;
// We add some extra bytes so the maximum length input can be accomodated.
// 1 is added for the null terminator of C-style strings.
// The other 2 is because `fgets` will also read the newline,
// which can be \n or \r\n, depending on OS. See explanation after code.
char input[SIZE_UPPER_BOUND + 3];
// The actual read - sizeof gets the size of our input buffer,
// we don't have to write it twice.
std::fgets(input, sizeof input, stdin);
// fgets also reads the newline, unlike `std::getline` or
// `std::cin.getline` - we have to remove it ourselves.
input[std::strcspn(input, "\r\n")] = '\0';
// This condition will be true, as in the C++ example.
std::fprintf(stdin, "%d\n", std::strlen(input) == size);
}
Let's unpack that newline removal. std::strcspn finds the first position of any of the given characters in the input. We provide both \r and \n, to support UNIX (\n) and Windows (\r\n) newline terminators - yeah, they're different, see Wikipedia, on "Newline". By adding the null terminator, '\0', we move the ending of the string where the newline was, basically "removing" the newline. If this is a school assignment, we can assume input is correct, so we could have used size + 1 instead of std::strcspn to remove the newline:
input[size + 1] = '\0';
This doesn't work when we don't know the input size or the input may be invalid.
As an optimization trick, observe that std::strcspn returns the line length, in this case. When you don't know the size, but you need it for later, you can save the result of std::strcspn in a variable before, and then use it instead of std::strlen:
// std::size_t is an unsigned integral type, used to represent
// array sizes and indexes in C/C++
const std::size_t input_size = std::strcspn(input, "\r\n");
input[input_size] = '\0';
You'll see some people use 0 or NULL for the terminator. I recommend against it - unlike the \0 literal, that is of char type, the other two variants are implicitly casted to char. If you read the linked documentation, you'll realize NULL is even incorrect, according to the spec, as it's meant to be used in contexts that require pointers only.
An alternative method to fgets is fscanf, again. Thread carefully, though - while a simple %s may do it, it makes your code vulnerable to buffer overflow exploits. See this StackOverflow thread on disadvantages of scanf, too. Let's see the (safe) code:
std::fscanf(stdin, "%256[^\r\n]s", input);
That number limits the input size to our SIZE_UPPER_BOUND, and the [^\r\n] tells fscanf to read all characters up to \r or \n. With this method you can remove the discard_whitespace call, as fscanf with the %s verb consumes leading whitespace. A downside to fscanf is that you have to keep the size limit in the input string and the buffer size in sync - you have no way to specify the input size dynamically other than building the format string dynamically, which is overkill for a school assignment). This is a problem in more sizable codebases, but for a one-file, one-time school assignment it's not a big deal, so you may prefer fscanf over fgets, as it's less work. fscanf doesn't read the newline in the buffer, too.
Binary data
The equivalent of C++'s std::cin.read in C world is std::fread. Code will resemble its C++ counterpart:
#include <cstdio>
int main() {
// The second parameter is the file access mode.
// In this case, it is read (r) binary (b).
std::FILE* f = std::fopen("some_image.jpg", "rb");
unsigned char buffer[256];
std::size_t chunk_size;
while (chunk_size = std::fread(buffer, sizeof buffer[0], sizeof buffer, f)) {
// chunk_size == sizeof buffer, do something with the buffer
}
if (std::feof(f)) {
// chunk_size != sizeof buffer, do something with buffer
// or handle as error
} else {
// an error occurred, handle it
}
// We need to close the file, unlike in C++, where it is closed automatically.
std::fclose(f);
}
The arguments to std::fread are hairy: read the documentation. Everything else looks very similar to the C++ way, from the loop to the error handling. Why? Because it's literally the same thing - we're just using different (standard) libraries. Another similarity is that C I/O is also buffered by default, just like C++'s. What's different is the line at the end - the call to std::fclose. We're not doing anything similar in the C++ code, right? No. Remember that C++ classes have constructors and destructors, functions that are automatically called at the beginning, respectively at the end of a variable's lifetime. These two allow us to implement the RAII technique, which will do the resource management automatically (opening the file in the constructor, closing it in the destructor). RAII is used inside std::string and std::vector (and other containers, smart pointers & others). In other words, the destructor of std::ifstream closes the file at the end of main(), just as we are doing here, manually.
Hybrid approach (??)
Would you ever want to combine the two? So it seems. Let's talk drawbacks:
The C++ I/O library, due to the way it's built, takes more care to use in a performant manner compared to C's (virtual function calls and extra function calls in general, especially when using << and >> operators & stream manipulators, as each of these is a function call, compared to a single plain function call/operation with the C library). See this StackOverflow thread on i/ostream speed, too. The C++ library is also more verbose, especially in the case of outputting (ever heard of the "chevrone hell"?)
The C I/O library is easy to use improperly/unsafely, the terse, shorthand namings make code difficult to follow, and output cannot be extended to support custom types (this is a problem when using C-style I/O in C++). It also takes great care to handle dynamic buffers correctly, given that the only way of managing heap memory in C is malloc and free.
Some schools may crucify you if any trace of std::string is left in sight (or so I've heard)
Using C-style types (char[N] instead of std::array<char, N>, for example), is easier - no headers to include, as the types are builtin primitives and less to type. May be preferred in short, throwaway programs like algorithmic exercises at school.
With these in mind, we can take a look at how to conveniently combine the two when reading text and binary!
Textual input
We will take advantage of the terseness of C-style types and the ease of use of C++'s I/O library:
#include <iostream>
int main() {
int size;
std::cin >> size >> std::ws;
const int SIZE_UPPER_BOUND = 256;
char input[SIZE_UPPER_BOUND + 1];
std::cin.getline(input, sizeof input);
// Input done, solve the problem.
}
Teachers don't have to scratch their head at the presence of std::string and std::getline and all the standard library shenanigans you start using after diving in this rabbit hole. You, the programmer, don't have to clean up newline endings and memorize arcane format specifiers just to read a string and an int. Focus on code and solve problems without having to debug the input reading logic, ever - it just works!
Binary data
The convoluted hierarchical tree of C++'s I/O library types scares you, the clean assembly output enjoyer, just like Linus Torvalds. You're still somehow afraid to manually manage memory, so you choose this solution:
#include <cstdio>
#include <vector>
int main() {
// The second parameter is the file access mode.
// In this case, it is read (r) binary (b).
std::FILE* f = std::fopen("some_image.jpg", "rb");
std::vector<unsigned char> buffer(1 << 24);
std::size_t chunk_size;
while (chunk_size = std::fread(buffer.data(), sizeof buffer[0], buffer.size(), f)) {
// use the buffer
}
if (std::feof(f)) {
// handle EOF
} else {
// handle error
}
std::fclose(f);
}
Weird choice, given that you still manage the file's lifetime manually. While this may not be the best example, using C++ RAII containers together with C libraries is not uncommon - memory safety is crucial.
Trivia
as usual, weigh your decision of using namespace std;
Cool things you won't need:
speed up C++ I/O using a single line at the beginning of the program (but be careful)
disable C I/O buffering
disable C++ I/O buffering
Conclusion
I/O is the crowded junction of fundamental CS concepts, hardware & software inner workings and C++'s features and quirks. Take in what you can at a time & focus on what matters, and make sure you're building on sturdy fundamentals.

Puzzling behavior of istream::getline()

I tested following codes to clarify my understanding to istream::getline():
#include <iostream>
#include <sstream>
using namespace std;
int main()
{
string s("abcd efgh\nijklmnopqrst");
string s1;
stringstream ss(s);
ss >> s1;
cout << s1 << endl;
ss.getline(&s1[0], 250, '\n');
cout << s1 << endl;
ss >> s1;
cout << s1 << endl;
getchar();
return 1;
}
then the console printed:
abcd
efg
ijklmnopqrst
but in my opinon it should be
abcd
efgh
ijklmnopqrst
Besides, I found the size of s1 after calling ss.getline() was the same as that after calling ss>>, but the size will be changed after calling ss>> once more. Can anyone help me parsing?
ss.getline(&s1[0], 250, '\n');
The first parameter to this getline() call is a char *. ss knows absolutely nothing about the fact that this char buffer actually comes from a std::string, and its actually its internal buffer.
Complicating this entire affair is the fact that this std::string is under the impression that it contains four characters. Because that's all it has, at this point.
And there is absolutely nothing, whatsoever, that could possibly lead this std::string to change its mind. Just because a pointer to its internal character buffer was passed to getline(), which proceeded to rather rudely scribble all over it (resulting in undefined behavior, as I'll extrapolate in a moment), the std::string still believes that it contains four characters only.
Meanwhile, the initial formatted input operator, >> extracted the initial character, but did not extract the following space, so when this stream, subsequently, had this getline() call, it started off its job of extracting characters starting with this space character, and up until the next newline character -- five characters (if I count on my fingers), but dumping it into a buffer that's guaranteed, by the std::string, to only be long enough to hold four characters (because, keep in mind, the initial formatted extract operator, >>, only dumped four characters inside it).
I'm ignoring some details, such as the fact that std::string takes care of automatically tacking on a trailing '\0', but the bottom line is that this is undefined behavior. The getline call extracts more characters that the buffer it's given is guaranteed to hold. Undefined behavior. A whole big heap of undefined behavior. It's not just the four characters in your second line of output is not the four characters you were expected to see, it's just that the getline() actually ended up extracting more characters, but the std::string that's being printed here has every right under the constitution to believe that it still has only four characters, and it's just that it's internal buffer got stomped all over.
Two things.
First, >> does not consume whitespace, so getline will retrieve it.
Secondly, this line is not correct:
ss.getline(&s1[0], 250, '\n');
Since getline expects a std::basic_string, just pass in the string:
ss.getline(s1, 250, '\n');
In your code, &s1[0] gets access to the underlying buffer, which is written to, but the string's length is stored separately, and is still what is was from the previous read (which is why the h gets dropped). Though, at this point you've already invoked undefined behaviour due to a buffer overflow.

Is calling `stringstream::str()` for getting what's printed actually legal?

Pre-history: I'm trying to ensure that some function foo(std::stringstream&) consumes all data from the stream.
Answers to a previous question suggest that using stringstream::str() is the right way of getting content of a stringstream. I've also seen it being used to convert arbitrary type to string like this:
std::stringstream sstr;
sstr << 10;
assert(sstr.str() == std::string("10")); // Conversion to std::string for clarity.
However, the notion of "content" is somewhat vague. For example, consider the following snippet:
#include <assert.h>
#include <sstream>
#include <iostream>
int main() {
std::stringstream s;
s << "10 20";
int x;
s >> x;
std::cout << s.str() << "\n";
return 0;
}
On Ideone (as well as on my system) this snippet prints 10 20, meaning that reading from stringstream does not modify what str() returns. So, my assumption is that that str() returns some internal buffer and it's up to stringstream (or, probably, its internal rdbuf, which is stringbuf by default) to handle "current position in that buffer". It's a known thing.
Looking at stringbuf::overflow() function (which re-allocates the buffer if there is not enough space), I can see that:
this may modify the pointers to both the input and output controlled sequences (up to all six of eback, gptr, egptr, pbase, pptr, epptr).
So, basically, there is no theoretical guarantee that writing to stringstream won't allocate a bigger buffer. Therefore, even using stringstream::str() for converting int to string is flawed: assert(sstr.str() == std::string("10")) from my first snippet can fail, because internal buffer is not guaranteed to be precisely of the necessary size.
Question is: what is the correct way of getting the "content" of stringstream, where "content" is defined as "all characters which could be consumed from the steream"?
Of course, one can read char-by-char, but I hope for a less verbose solution. I'm interested in the case where nothing is read from stringstream (my first snippet) as I never saw it fail.
You can use the tellg() function (inherited from std::basic_istream) to find the current input position. If it returns -1, there are no further characters to be consumed. Otherwise you can use s.str().substr(s.tellg()) to return the unconsumed characters in stringstream s.

How do the different ways to read strings from console actually differ? Operator <<, getline or cin.getline?

Let's suppose I'd like to read an integer from the console, and I would not like the program to break if it is fed non-integer characters. This is how I would do this:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main() {
string input; int n;
cin >> input;
if(!(stringstream(input)>>n)) cout << "Bad input!\n";
else cout << n;
return 0;
}
However, I see that http://www.cplusplus.com/doc/tutorial/basic_io/ uses getline(cin,input) rather than cin >> input. Are there any relevant differences between the two methods?
Also I wonder, since string is supposed not to have any length limits... What would happen if someone passed a 10GB long string to this program? Wouldn't it be safer to store the input in a limited-length char table and use, for example, cin.getline(input,256)?
std::getline gets a line (including spaces) and also reads (but discards) the ending newline. The input operator >> reads a whitespace-delimited "word".
For example, if your input is
123 456 789
Using std::getline will give you the string "123 456 789", but using the input operator >> you will get only "123".
And theoretically there's no limit to std::string, but in reality it's of course limited to the amount of memory it can allocate.
the first gets a line,
the second gets a world.if your input "hello world"
getline(cin,str) : str=="hello world"
cin>>str: str="hello"
and dont worry about out of range, like vector ,string can grow
operator>> reads a word (i.e. up to next whitespace-character)
std::getline() reads a "line" (by default up to next newline, but you can configure the delimiter) and stores the result in a std::string
istream::getline() reads a "line" in a fashion similar to std::getline(), but takes a char-array as its target. (This is the one you'll find as cin.getline())
If you get a 10 GB line passed to your program, then you'll either run out of memory on a 32-bit system, or take a while to parse it, potentially swapping a fair bit of memory to disk on a 64-bit system.
Whether arbitrary line-length size limitations make sense, really boils down to what kind of data your program expects to handle, as well as what kind of error you want to produce when you get too much data. (Presumably it is not acceptable behaviour to simply ignore the rest of the data, so you'll want to either read and perform work on it in parts, or error out)

How to read/write from a data file in C++

I'm having a problem reading from a binary file (*.dat) using the .read(reinterpret_cast (&x),sizeof(x)) command but there is always an error about the existence of the file even when the file exist or has been created successfully. Here is the code:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
struct x{
char name[10],pass[10];
};
int main()
{
x x1,x2;
fstream inout;
inout.open("test.dat" ,ios::binary);
if(!inout)
{
cout<<"Error";
exit(1);
}
cout<<"Enter your name:";
cin>>x1.name;
inout.write(reinterpret_cast <const char*> (&x1.name), sizeof(x1));
cout<<"Enter your name:";
cin>>x1.pass;
inout.write(reinterpret_cast <const char*> (&x1.pass), sizeof(x1));
while(inout.read(reinterpret_cast <char*> (&x2.name), sizeof(x1)))
{
cout<<x2.name;//here is my problem cannot read!!
}
inout.close();
}
Use std:flush after your write operations.
// ... Write x1.name and x1.pass
inout << std::flush;
// ... Read x2.name in while loop.
inout.close();
There is a problem with your output to the file.
First you are writing the struct x1 to the file where only the name field is filled
inout.write(reinterpret_cast <const char*> (&x1.name), sizeof(x1));
and afterwards:
inout.write(reinterpret_cast <const char*> (&x1.pass), sizeof(x1));
You start writing from the address of x1.pass but you are writing sizeof(x1) bytes.
sizeof(x1) is 20 here but its only 10 bytes from the start of x1.pass to the end of the struct, so you are writing 10 bytes of unknown data from the stack into your file.
So this is the first thing that your file may not contain what you expect it to contain.
The next thing is that after writing your data the stream is sitting at the end of the file and you try to read from there. You have to move the position back to the beginning of the stream to read the stuff you just wrote. For example use:
inout.seekg(std::ios::beg);
If you mess with read and write to the same stream, you'd rather use flush or file positioning functions.
MSDN says:
When a basic_fstream object is used to perform file I/O, although the underlying buffer contains separately designated positions for reading and writing, the current input and current output positions are tied together, and therefore, reading some data moves the output position.
GNU Stdlib:
As you can see, ‘+’ requests a stream that can do both input and output. When using such a stream, you must call fflush (see Stream Buffering) or a file positioning function such as fseek (see File Positioning) when switching from reading to writing or vice versa. Otherwise, internal buffers might not be emptied properly.
Reading into raw C-style arrays from an input stream is not as idiomatic as a simple call to operator>>(). You also have to prevent buffer overruns by keeping track of the both the bytes allocated for the buffer, and the bytes being read into the buffer.
Reading into the buffer can be done by using the input stream method getline(). The following example shows the extraction into x1.name; the same would be done for x1.path:
if (std::cin.getline(x1.name, sizeof(x1.name))) {
}
The second argument is the maximum number of bytes to be read. It is useful in that the stream won't write pass the allocated bounds of the array. The next thing to do is just write it to the file as you have done:
if (std::cin.getline(x1.name, sizeof(x1.name))) {
inout.write(reinterpret_cast<char*>(&x1.name), std::cin.gcount());
}
std::cin.gcount() is the number of characters that were read from the input stream. It is a much more reliable alternative to sizeof(x1.name) in that it returns the number of characters written, not the characters allotted.
Now, bidirectional file streams are a bit tricky. They have be coordinated in the right way. As explained in the other answers, bidirectional file streams (or std::fstreams) share a joint buffer for both input and output. The position indicators that mark positions in the input and output sequence are both affected by any input and output operations that may occur. As such, the file stream position has to be "moved" back before performing input. This can be done by either a call to seekg() or seekp(). Either will suffice since, as I said, the position indicators are bound to each other:
if (std::cin.getline(x1.pass, sizeof(x1.pass))) {
inout.write(reinterpret_cast<char*>(&x1.pass), std::cin.gcount());
inout.seekg(0, std::ios_base::beg);
}
Notice how this was done after the extraction into x1.pass. We can't do it after x1.name because we would be overwriting the stream on the second call to write().
As you can see, extracting into raw C-style arrays isn't pretty, you have to manage more things than you should. Fortunately, C++ comes to the rescue with their standard string class std::string. Use this for more efficient I/O:
Make both name and pass standard C++ strings (std::string) instead of raw C-arrays. This allows you pass in the size as the second argument to your read() and write() calls:
#include <string>
struct x {
std::string name;
std::string pass;
};
// ...
if (std::cin >> x1.name) {
inout.write(x1.name.data(), x1.name.size());
}
if (std::cin >> x1.pass) {
inout.write(x1.name.data(), x1.name.size());
inout.seekg(0, std::ios_base::beg);
}
std::string allows us to leverage its dynamic nature and its capacity for maintaining the size of the buffer. We no longer have to use getline() but now a simple call to operator>>() and an if() check.
This was not possible before, but now that we're using std::string we can also combine both extractions to achieve the following:
if (std::cout << "Enter your name: " && std::cin >> x1.name &&
std::cout << "Enter your pass: " && std::cin >> x1.pass) {
inout.write(x1.name.data(), x1.name.size());
inout.write(x1.pass.data(), x1.pass.size());
inout.seekg(0, std::ios_base::beg);
}
And finally, the last extraction would simply be this:
while (inout >> x2.name)
{
std::cout << x2.name;
}