Rogue Wave Standard C++ Library Iostreams and Locale User’s Guide mentions the following:
As with file streams, there are three class templates that implement
string streams: basic_istringstream <charT,traits,Allocator>,
basic_ostringstream <charT,traits,Allocator>, and basic_stringstream
<charT,traits,Allocator> … For convenience, there are the regular
typedefs istringstream, ostringstream, and stringstream…
and
Output string streams are dynamic. The internal buffer is allocated
once an output string stream is constructed. The buffer is
automatically extended during insertion each time the internal buffer
is full.
Input string streams are always static. You can extract as many items
as are available in the string you provided the string stream.
Now if output string streams are dynamic and input string streams are always static, how do we reconcile between the two and realize the (input/ouput) stringstream?
Related
A comment by James Kanze on How to copy a .txt file to a char array in c++ makes it sound like in order to be sure that a standard string would get the exact binary contents of a file when iterated through by a standard string constructor, one would have to both:
open the file in binary mode,
ensure that the file is imbued with the "C" locale.
In code, I'm guessing that means:
std::ifstream in(filename, ios_base::binary);
in.imbue(std::locale("C"));
Is that really necessary? More specifically, why would the locale have any impact when the file is opened in binary mode?
Note that what I am trying to do is more or less what the above mentioned question was about:
std::string contents(std::istreambuf_iterator<char>(in), std::istreambuf_iterator<char>());
Based on binary and text modes:
A binary stream is an ordered sequence of characters that can transparently record internal data. Data read in from a binary stream always equals to the data that were earlier written out to that stream. Implementations are only allowed to append a number of null characters to the end of the stream.
I think
std::ifstream in(filename, ios_base::binary);
together with:
in.imbue(std::locale("C"));
does not make sense.
Either the stream is in binary mode, and the locale does not apply, or the programmer chooses to set the locale, but then he/she implicitly means that the stream is open in text mode (ios_base::binary should not be passed to the stream constructor). In that case, the read data may or may not equal to the data in the file, depending on the OS and the contents of the file.
I'd like to have some explanation regarding this paragraph on page 518 of Nicolai Josuttis book "The C++ Standard Library" (first edition):
These flags are maintained by the class basic_ios and are thus present in all objects of type basic_istream or basic_ostream. However, the stream buffers don't have state flags. One stream buffer can be shared by multiple stream objects, so the flags only represent the state of the stream as found in the last operation. Even this is only the case if goodbit was set prior to this operation. Otherwise the flags may have been set by some earlier operation.
I don't understand what does he mean by "the stream buffer don't have state flags" and right below this paragraph there's a table with the title "Member functions for stream states".
Streams consist of two objects:
The actual stream object (std::istream or std::ostream, derived from std::ios).
The stream buffer, i.e., a class derived from std::streambuf.
The state flags are present in std::ios but not in std::streambuf.
There are "stream buffer objects", and a "stream objects". One stream buffer can be shared between multiple stream objects. Each stream object has its own set of flags - so one stream may be "reached end of file", where another is not - or the flags for output in decimal or hex may be completely different for two output streams using the same buffer.
[Of course, if you are using the same buffer for multiple streams, you will have to take care that you don't mess things up - and it's not a common thing to share the buffer over multiple streams, but it can be done!]
The iostate flags store things about output formatting: whether you want numbers printed in decimal or hex, capitals or lowercase, etc. Stream objects control formatting, so the flags are inside the stream object.
In iostreams, buffering is separate from formatting. Linked to the iostream object is a stream buffer object, which controls sending and/or receiving characters from an underlying source. The buffer object has no such flags; its only state variables deal with preparing (encoding) the characters and optionally storing (buffering) them to reduce the number of times the operating system is asked to perform I/O. (Or in the case of stringstream, the buffer provides the ultimate storage behind the stream.)
So a stream has state flags, but the stream buffer it uses doesn't.
A stream buffer goes inside a stream.
The buffer holds some amount of bytes that the stream is reading/writing before sending/recieving it to whatever the stream is talking to (file/stdin/tcpsocket/etc.).
Stream reference: http://www.cplusplus.com/reference/istream/iostream/
Stream Buffer Reference: http://www.cplusplus.com/reference/streambuf/streambuf/
By default a stream will usually create it's own stream buffer, but you can tell it to use one of your choosing in the constructor: http://www.cplusplus.com/reference/istream/iostream/iostream/
Or you can get/set the buffer with the rdbuf method.
The "stream buffer" is an object of class basic_streambuf. That class doesn't have state flags. Every stream (basic_istream or basic_ostream) has a pointer to a basic_streambuf, but the flags are a property of the stream, not of the stream buffer.
I have been hearing about streams, more specifically file streams.
So what are they?
Is it something that has a location in the memory?
Is it something that contains data?
Is it just a connection between a file and an object?
The term stream is an abstraction of a construct that allows you to send or receive an unknown number of bytes. The metaphor is a stream of water. You take the data as it comes, or send it as needed. Contrast this to an array, for example, which has a fixed, known length.
Examples where streams are used include reading and writing to files, receiving or sending data across an external connection. However the term stream is generic and says nothing about the specific implementation.
IOStreams are a front-end interface (std::istream, std::ostream) used to define input and output functions. The streams also store formatting options, e.g., the base to use for integer output and hold a std::locale object for all kind of customization. Their most important component is a pointer to a std::streambuf which defines how to access a sequence of characters, e.g., a file, a string, an area on the screen, etc. Specifically for files and strings special stream buffers are provided and classes derived from the stream base classes are provided for easier creation. Describing the entire facilities of the IOStreams library can pretty much fill an entire book: In C++ 2003 about half the library section was devoted to stream related functionality.
Stream is linear queue that connects a file to the program and maintain the flow of data in both direction. Here the source is any file, I/O device, Hard disk, CD/DVD etc.
Basically stream is if two type 1.Text Stream 2.Binary stream
Text Stream : It is a sequence of character arranges in line and each line terminated by new line (unix).
Binary Stream: It is data as it is coded internally in computer's main memory, without any modification.
File system is designed to work with a wide variety of devices, including terminals, disk drives, tape drives etc. Even though each device is different, file system transforms each into a logical device called stream. Streams are device independent so same function can be used to write a disk file and a tape file. In more technical term stream provides a abstraction between programmer and actual device being used.
I had been reading a few articles on some sites about Formatted and Unformatted I/O, however i have my mind more messed up now.
I know this is a very basic question, but i would request anyone can give a link [ to some site or previously answered question on Stackoverflow ] which explains, the idea of streams in C and C++.
Also, i would like to know about Formatted and Unformatted I/O.
The standard doesn't define what these terms mean, it just says which of the functions defined in the standard are formatted IO and which are not. It places some requirements on the implementation of these functions.
Formatted IO is simply the IO done using the << and >> operators. They are meant to be used with text representation of the data, they involve some parsing, analyzing and conversion of the data being read or written. Formatted input skips whitespace:
Each formatted input function begins execution by constructing an object of class sentry with the noskipws (second) argument false.
Unformatted IO reads and writes the data just as a sequence of 'characters' (with possibly applying the codecvt of the imbued locale). It's meant to read and write binary data, or function as a lower-level used by the formatted IO implementation. Unformatted input doesn't skip whitespace:
Each unformatted input function begins execution by constructing an object of class sentry with the default argument noskipws (second) argument true.
And allows you to retrieve the number of characters read by the last input operation using gcount():
Returns: The number of characters extracted by the last unformatted input member function called for the object.
Formatted IO means that your output is determined by a "format string", that means you provide a string with certain placeholders, and you additionally give arguments that should be used to fill these placeholders:
const char *daughter_name = "Lisa";
int daughter_age = 5;
printf("My daughter %s is %d years old\n", daughter_name, daughter_age);
The placeholders in the example are %s, indicating that this shall be substituted using a string, and %d, indicating that this is to be replaced by a signed integer number. There are a lot more options that give you control over how the final string will present itself. It's a convenience for you as the programmer, because it relieves you from the burden of converting the different data types into a string and it additionally relieves you from string appending operations via strcat or anything similar.
Unformatted IO on the other hand means you simply write character or byte sequences to a stream, not using any format string while you are doing so.
Which brings us to your question about streams. The general concept behind "streaming" is that you don't have to load a file or whatever input as a whole all the time. For small data this does work though, but imagine you need to process terabytes of data - no way this will fit into a single byte array without your machine running out of memory. That's why streaming allows you to process data in smaller-sized chunks, one at a time, one after the other, so that at any given time you just have to deal with a fix-sized amount of data. You read the data into a helper variable over and over again and process it, until your underlying stream tells you that you are done and there is no more data left.
The same works on the output side, you write your output step for step, chunk for chunk, rather than writing the whole thing at once.
This concept brings other nice features, too. Because you can nest streams within streams within streams, you can build a whole chain of transformations, where each stream may modify the data until you finally receive the end result, not knowing about the single transformations, because you treat your stream as if there were just one.
This can be very useful, for example C or C++ streams buffer the data that they read natively from e.g. a file to avoid unnecessary calls and to read the data in optimized chunks, so that the overall performance will be much better than if you would read directly from the file system.
Unformatted Input/Output is the most basic form of input/output. Unformatted input/output transfers the internal binary representation of the data directly between memory and the file. Formatted output converts the internal binary representation of the data to ASCII characters which are written to the output file. Formatted input reads characters from the input file and converts them to internal form. Formatted
I have a strange problem,
I use
wifstream a("a.txt");
wstring line;
while (a.good()) //!a.eof() not helping
{
getline (a,line);
//...
wcout<<line<<endl;
}
and it works nicely for txt file like this
http://www.speedyshare.com/files/29833132/a.txt
(sorry for the link, but it is just 80 bytes so it shouldn't be a problem to get it , if i c/p on SO newlines get lost)
BUT when I add for example 水 (from http://en.wikipedia.org/wiki/UTF-16/UCS-2#Examples )to any line that is the line where loading stops. I was under the wrong impression that getline that takes wstring as one input and wifstream as other can chew any txt input...
Is there any way to read every single line in the file even if it contains funky characters?
The not-very-satisfying answer is that you need to imbue the input stream with a locale which understands the particular character encoding in question. If you don't know which locale to choose, you can use the empty locale.
For example (untested):
std::wifstream a("a.txt");
std::locale loc("");
a.imbue(loc);
Unfortunately, there is no standard way to determine what locales are available for a given platform, let alone select one based on the character encoding.
The above code puts the locale selection in the hands of the user, and if they set it to something plausible (e.g. en_AU.UTF-8) it might all Just Work.
Failing this, you probably need to resort to third-party libraries such as iconv or ICU.
Also relevant this blog entry (apologies for the self-promotion).
The problem is with your call to the global function getline (a,line). This takes a std::string. Use the std::wistream::getline method instead of the getline function.
C++ fstreams delegeate I/O to their filebufs. filebufs always read "raw bytes" from disk and then use the stream locale's codecvt facet to convert between these raw bytes into their "internal encoding".
A wfstream is a basic_fstream<wchar_t> and thus has a basic_filebuf<wchar_t> which uses the locale's codecvt<wchar_t, char> to convert the bytes read from disk into wchar_ts. If you read a UCS-2 encoded file, the conversion must thus be performed with a codecvt who "knows" that the external encoding is UCS-2. You thus need a locale with such a codecvt (see, for example, this SO question)
By default, the stream's locale is the global locale at the stream construction. To use a specific locale, it should be imbue()-d on the stream.