C++ count the number of words from standard input - c++

I saw a piece of C++ code to count the number of words inputted from standard input:
#include <iostream>
#include <iterator>
#include <string>
using namespace std;
int main() {
auto ss {
istream_iterator<string>{cin}
};
cout << distance(ss, {}) << endl;
return 0;
}
I have several questions:
What's the type of auto ss?
What does distance(ss, {}) do? Why does it calculate the number of words?
My guess is:
istream_iterator<string>{cin} converts the standard input into the istream_iterator type, automatically separated by space (why?). Thus ss looks like a container with all words as its elements;
distance(ss, {}) calculates the distance between the 1st element and the empty (thus outside of the last, but why?) element.
Can someone help me to go through my guess on this fantastic short piece of code?

auto ss deduces ss to be std::istream_iterator<std::string>, because that is what the full statement is constructing and assigning to ss.
istream_iterator uses the specified istream's operator>> to read formatted input of the specified type, where operator>> reads input delimited by whitespace, including space characters, line breaks, etc. So, in this case, istream_iterator<string> is reading std::string values from std::cin one whitespace-delimited word at a time.
istream_iterator is an input iterator, and a default-constructed istream_iterator denotes an end-of-stream iterator. When istream_iterator stops reading (EOF is reached, bad input is entered, etc), its value becomes equal to the end-of-stream iterator.
std::distance() returns the number of elements in a range denoted by a pair of iterators. For a non-random input iterator, like istream_iterator, std::distance() iterates the range one element at a time via the iterator's operator++, counting the number of times it incremented the iterator, until the target iterator is reached. So, in this case, istream_iterator::operator++ internally reads a value from its istream, thus std::distance() effectively returns how many words are successfully read from std::cin until end-of-stream is reached.
So, the code you have shown is roughly just an algorithmic way of writing the following equivalent loop:
int main() {
string s;
size_t count = 0;
while (cin >> s) {
++count;
}
cout << count << endl;
return 0;
}

ss has type std::istream_iterator<std::string>.
std::distance(ss, {}) computes the number of items between the first whitespace-delimited token in std::cin to the end of cin, effectively returning the number of whitespace-delimited tokens in std::cin. This is due to the way std::istream::operator>>(std::istream&, std::string&) functions (the second parameter is not actually an std::string, but I'm trying to keep this short). The default constructor for a std::istream_iterator<std::string> returns the end of any std::istream_iterator<std::string>.
The cutting of the contents of std::cin is actually done lazily when computing the distance.
That is indeed an interesting piece of code.

Related

Extracting certain columns from a CSV file in C++

I would like to know how I can extract / skip certain columns such as age and weight from a CSV file in C++.
Does it make more sense to extract the desired information after I loaded the entire csv file (if memory is not a problem)?
EDIT: If possible, I would like to have a reading, printing and modification part.
If possible, I want to use only the STL. The content of my test csv file looks as follows:
*test.csv*
name;age;weight;height;test
Bla;32;1.2;4.3;True
Foo;43;2.2;5.3;False
Bar;None;3.8;2.4;True
Ufo;32;1.5;5.4;True
I load the test.csv file with the following C++ program that prints the file's content on the screen:
#include <iostream>
#include <vector>
#include <string>
#include <iomanip>
#include <fstream>
#include <sstream>
void readCSV(std::vector<std::vector<std::string> > &data, std::string filename);
void printCSV(const std::vector<std::vector<std::string>> &data);
int main(int argc, char** argv) {
std::string file_path = "./test.csv";
std::vector<std::vector<std::string> > data;
readCSV(data, file_path);
printCSV(data);
return 0;
}
void readCSV(std::vector<std::vector<std::string> > &data, std::string filename) {
char delimiter = ';';
std::string line;
std::string item;
std::ifstream file(filename);
while (std::getline(file, line)) {
std::vector<std::string> row;
std::stringstream string_stream(line);
while (std::getline(string_stream, item, delimiter)) {
row.push_back(item);
}
data.push_back(row);
}
file.close();
}
void printCSV(const std::vector<std::vector<std::string> > &data) {
for (std::vector<std::string> row: data) {
for (std::string item: row) {
std::cout << item << ' ';
}
std::cout << std::endl;
}
}
Any assistance you can provide would be greatly appreciated.
Basically I answered this question already in a similar thread. But anyway, I will show a ready to use solution with a different approach and some explanation here.
One hint: You should make yourself more familiar with object oriented programming. And think over your design. In your read and write function you create a unneccessary dependency to a file or to std::cout- So, you should not handover a file name and then open the file in the function, but, use streams. Because, in the function that I created, using the C++ IO facilities, it doesn't matter, if we read from a file or a std::istringstream or write to std::cout or a file stream.
All will be handled via the (overloaded) extractor and inserter operators.
So, and because I wanted the code a little bit more flexible, I made my struct a template, to be able to put in the selected Columns and reuse the same struct for other column combinations.
If you want to have fixed selected columns then you can delete the line with template and can replace std::vector<size_t> selectedFields{ {Colums...} }; with std::vector<size_t> selectedFields{ {1,2} };
Later we use a using for the template to allow easier handling and understanding:
// Define Dataype for selected columns age and weight
using AgeAndWeight = SelectedColumns<1, 2>;
OK, let's first see the source code and then try to understand.
#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <fstream>
#include <initializer_list>
#include <iterator>
#include <algorithm>
std::regex re{ ";" };
// Proxy for reading an splitting a line and extracting certain fields and some simple output
template<size_t ... Colums>
struct SelectedColumns {
std::vector<std::string> data{};
std::vector<size_t> selectedFields{ {Colums...} };
// Overwrite extractor operator
friend std::istream& operator >> (std::istream& is, SelectedColumns& sl) {
// Read a complete line and check, if it could be read
if (std::string line{}; std::getline(is, line)) {
// Now split the line into tokens
std::vector tokens(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});
// Clear old data
sl.data.clear();
// So, and now copy the selected columns into our data vector
for (const size_t& column : sl.selectedFields)
if (column < tokens.size()) sl.data.push_back(tokens[column]);
}
return is;
}
// Simple extractor
friend std::ostream& operator << (std::ostream & os, const SelectedColumns & sl) {
std::copy(sl.data.begin(), sl.data.end(), std::ostream_iterator<std::string>(os, "\t"));
return os;
}
};
// Define Dataype for selected columns age and weight
using AgeAndWeight = SelectedColumns<1U, 2U>;
const std::string fileName{ "./test.csv" };
int main() {
// Open the csv file and check, if it is open
if (std::ifstream csvFileStream{ fileName }; csvFileStream) {
// Read complete csv file and extract age and weight columns
std::vector sc(std::istream_iterator<AgeAndWeight>(csvFileStream), {});
// Now all data is available in this vector sc Do something
sc[3].data[0] = "77";
// Show some debug out put
std::copy(sc.begin(), sc.end(), std::ostream_iterator<AgeAndWeight>(std::cout, "\n"));
// By the way, you could also write the 2 lines above in one line.
//std::copy(std::istream_iterator<AgeAndWeight>(csvFileStream), {}, std::ostream_iterator<AgeAndWeight>(std::cout, "\n"));
}
else std::cerr << "\n*** Error: Could not open source file\n\n";
return 0;
}
One major task here is to split a line with CSV Data into its tokens. Let us have a look at this.
Splitting a string into tokens:
What do people expect from the function, when they read
getline ?
Most people would say, Hm, I guess it will read a complete line from somewhere. And guess what, that was the basic intention for this function. Read a line from a stream and put it into a string.
But, as you can see here std::getline has some additional functionality.
And this lead to a major misuse of this function for splitting up std::strings into tokens.
Splitting strings into tokens is a very old task. In very early C there was the function strtok, which still exists, even in C++. Here std::strtok. Please see the std::strtok-example
std::vector<std::string> data{};
for (char* token = std::strtok(const_cast<char *>(line.data()), ","); token != nullptr; token = std::strtok(nullptr, ","))
data.push_back(token);
Simple, right?
But because of the additional functionality of std::getline is has been heavily misused for tokenizing strings. If you look on the top question/answer regarding how to parse a CSV file (please see here), then you will see what I mean.
People are using std::getline to read a text line, a string, from the original stream, then stuffing it into an std::istringstream and use std::getline with delimiter again to parse the string into tokens. Weird.
But, since many many years, we have a dedicated, special function for tokenizing strings, especially and explicitly designed for that purpose. It is the
std::sregex_token_iterator
And since we have such a dedicated function, we should simply use it.
This thing is an iterator. For iterating over a string, hence the function name is starting with an s. The begin part defines, on what range of input we shall operate, the end part is default constructed, and then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
0 --> give me the stuff that I defined in the regex and (optional)
-1 --> give me that what is NOT matched based on the regex.
We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators as parameter, and copies the data between the first iterator and 2nd iterator to the std::vector. The statement
std::vector tokens(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {});
defines a variable “tokens” as a std::vector and uses the so called range-constructor of the std::vector. Please note: I am using C++17 and can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction").
Additionally, you can see that I do not use the "end()"-iterator explicitly.
This iterator will be constructed from the empty brace-enclosed default initializer with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.
You can read any number of tokens in a line and put it into the std::vector
But you can do even more. You can validate your input. If you use 0 as last parameter, you define a std::regex that even validates your input. And you get only valid tokens.
Overall, the usage of a dedicated functionality is superior over the misused std::getline and people should simple use it.
Some people complain about the function overhead, and, they are right, but how many of them are using big data. And even then, the approach would be probably then to use string.findand string.substring or std::stringviews or whatever.
So, now to further topics.
In the extractor, we first read a complete line from the source stream and check, if that worked. Or, if we have and end of file or any other error.
Then we tokenize that just read string as described above.
And then, we will copy only selected columns from the tokens into our resulting data. This is done in a simple for loop. Here we also check the boundaries, because somebody could specify invalid selected columns, or, a line could have less tokens than expected.
So the body of the extractor is vey simple. Just 5 line of code . . .
Then, again,
You should start using object-oriented features in C++. In C++ you can put data and methods that operate on these data into one object. The reason is that the outside world should not care about objects internals. For example, your readCSV and printCSV function should be part of a struct (or class).
And as next step, we will not use your “read” and “print” functions. We will use the dedicated function for Stream-IO, the extractor operator >> and the inserter operator <<. And we will overwrite the standard IO-functions in our struct.
In function main we will open the the source file and check, if the open was successful. BTW. All input output functions shall be checked, if they were successful.
Then, we use the next iterator, the std::istream_iterator. And this together with our “AgeAndWeight”-type and the input file stream. Also here we use CTAD and the default constructed end-iterator. The std::istream_iterator will repeatedly call the AgeAndWeight extractor operator, until all lines of the source file are read.
For output, we will use the std::ostream_iterator. This will call the inserter operator for "AgeAndWeight" until all data are written.

Reading in from a .tsv file

I'm trying to read in information from a tab separated value file with the format:
<string> <int> <string>
Example:
Seaking 119 Azumao
Mr. Mime 122 Barrierd
Weedle 13 Beedle
This is currently how I'm doing it:
string americanName;
int pokedexNumber;
string japaneseName;
inFile >> americanName;
inFile >> pokedexNumber
inFile >> japaneseName;
My issue stems from the space in the "Mr. Mime" as the strings can contain spaces.
I would like to know how to read the file in properly.
Standard library uses such things as locales to determine the categories of different symbols and other locale-dependent things depending on your system locale. Standard streams use that to determine what is a space because of various unicode issues.
You can use this fact to control the meaning of ' ' in your case:
#include <iostream>
#include <locale>
#include <algorithm>
struct tsv_ws : std::ctype<char>
{
mask t[table_size]; // classification table, stores category for each character
tsv_ws() : ctype(t) // ctype will use our table to check character type
{
// copy all default values to our table;
std::copy_n(classic_table(), table_size, t);
// here we tell, that ' ' is a punctuation, but not a space :)
t[' '] = punct;
}
};
int main() {
std::string s;
std::cin.imbue(std::locale(std::cin.getloc(), new tsv_ws)); // using our locale, will work for any stream
while (std::cin >> s) {
std::cout << "read: '" << s << "'\n";
}
}
Here we make ' ' a punctuation symbol, but not a space symbol, so streams don't consider it a separator anymore. The exact category isn't important, but it mustn't be space.
That's quite powerful technique. For example, you could redefine ',' to be a space to read in CSV format.
You can use std::getline to extract strings with non-tab whitespace.
std::getline(inFile, americanName, '\t'); // read up to first tab
inFile >> pokedexNumber >> std::ws; // read number then second tab
std::getline(inFile, japaneseName); // read up to first newline
Seems like you want to read csv data or in your case tsv data. But let's stick to the common term "csv". This is a standard task and I will give you detailed explanations. In the end all the reading will be done in a one-liner.
I would recommend to use "modern" C++ approach.
After searching for "reading csv data", people are still are linking to How can I read and parse CSV files in C++?, the questions is from 2009 and now over 10 years old. Most answers are also old and very complicated. So, maybe its time for a change.
In modern C++ you have algorithms that iterate over ranges. You will often see something like "someAlgoritm(container.begin(), container.end(), someLambda)". The idea is that we iterate over some similar elements.
In your case we iterate over tokens in your input string, and create substrings. This is called tokenizing.
And for exactly that purpose, we have the std::sregex_token_iterator. And because we have something that has been defined for such purpose, we should use it.
This thing is an iterator. For iterating over a string, hence sregex. The begin part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
1 --> give me the stuff that I defined in the regex and
-1 --> give me that what is NOT matched based on the regex.
So, now that we understand the iterator, we can std::copy the tokens from the iterator to our target, a std::vector of std::string. And since we do not know, how may columns we have, we will use the std::back_inserter as a target. This will add all tokens that we get from the std::sregex_token_iterator and append it ot our std::vector<std::string>>. It does'nt matter how many columns we have.
Good. Such a statement could look like
std::copy( // We want to copy something
std::sregex_token_iterator // The iterator begin, the sregex_token_iterator. Give back first token
(
line.begin(), // Evaluate the input string from the beginning
line.end(), // to the end
re, // Add match a comma
-1 // But give me back not the comma but everything else
),
std::sregex_token_iterator(), // iterator end for sregex_token_iterator, last token + 1
std::back_inserter(cp.columns) // Append everything to the target container
);
Now we can understand, how this copy operation works.
Next step. We want to read from a file. The file conatins also some kind of same data. The same data are rows.
And as for above, we can iterate of similar data. If it is the file input or whatever. For this purpose C++ has the std::istream_iterator. This is a template and as a template parameter it gets the type of data that it should read and, as a constructor parameter it gets a reference to an input stream. It doesnt't matter, if the input stream is a std::cin, or a std::ifstream or a std::istringstream. The behaviour is identical for all kinds of streams.
And since we do not have files an SO, I use (in the below example) a std::istringstream to store the input csv file. But of course you can open a file, by defining a std::ifstream testCsv(filename). No problem.
And with std::istream_iterator, we iterate over the input and read similar data. In our case one problem is that we want to iterate over special data and not over some build in data type.
To solve this, we define a Proxy class, which does the internal work for us (we do not want to know how, that should be encapsulated in the proxy). In the proxy we overwrite the type cast operator, to get the result to our expected type for the std::istream_iterator.
And the last important step. A std::vector has a range constructor. It has also a lot of other constructors that we can use in the definition of a variable of type std::vector. But for our purposes this constructor fits best.
So we define a variable csv and use its range constructor and give it a begin of a range and an end of a range. And, in our specific example, we use the begin and end iterator of std::istream_iterator.
If we combine all the above, reading the complete CSV file is a one-liner, it is the definition of a variable with calling its constructor.
Please see the resulting code:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <algorithm>
std::istringstream testCsv{ R"(Seaking 119 Azumao
Mr. Mime 122 Barrierd
Weedle 13 Beedle)" };
// Define Alias for easier Reading
using Columns = std::vector<std::string>;
using CSV = std::vector<Columns>;
// Proxy for the input Iterator
struct ColumnProxy {
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {
// Read a line
std::string line; cp.columns.clear();
if (std::getline(is, line)) {
// The delimiter
const std::regex re("\t");
// Split values and copy into resulting vector
std::copy(std::sregex_token_iterator(line.begin(), line.end(), re, -1),
std::sregex_token_iterator(),
std::back_inserter(cp.columns));
}
return is;
}
// Type cast operator overload. Cast the type 'Columns' to std::vector<std::string>
operator std::vector<std::string>() const { return columns; }
protected:
// Temporary to hold the read vector
Columns columns{};
};
int main()
{
// Define variable CSV with its range constructor. Read complete CSV in this statement, So, one liner
CSV csv{ std::istream_iterator<ColumnProxy>(testCsv), std::istream_iterator<ColumnProxy>() };
// Print result. Go through all lines and then copy line elements to std::cout
std::for_each(csv.begin(), csv.end(), [](Columns & c) {
std::copy(c.begin(), c.end(), std::ostream_iterator<std::string>(std::cout, " ")); std::cout << "\n"; });
}
I hope the explanation was detailed enough to give you an idea, what you can do with modern C++.
This example does basically not care how many rows and columns are in the source text file. It will eat everything.

Parse key, value pairs when key is not unique

My input are multiple key, value pairs e.g.:
A=1, B=2, C=3, ..., A=4
I want to parse the input into the following type:
std::map< char, std::vector< int > > m
Values for equal keys shall be appended to the vector. So the parsed output should be equal to:
m['A']={1,4};
m['B']={2};
m['C']={3};
What is the simplest solution using 'boost::spirit::qi' ?
Here is one way to do it:
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/vector.hpp>
#include <boost/fusion/include/at_c.hpp>
#include <iostream>
#include <utility>
#include <string>
#include <vector>
#include <map>
namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;
int main()
{
std::string str = "A=1, B=2, C=3, A=4";
std::map< char, std::vector< int > > m;
auto inserter = [&m](fusion::vector< char, int > const& parsed,
qi::unused_type, qi::unused_type)
{
m[fusion::at_c< 0 >(parsed)].push_back(fusion::at_c< 1 >(parsed));
};
auto it = str.begin(), end = str.end();
bool res = qi::phrase_parse(it, end,
((qi::char_ >> '=' >> qi::int_)[inserter]) % ',',
qi::space);
if (res && it == end)
std::cout << "Parsing complete" << std::endl;
else
std::cout << "Parsing incomplete" << std::endl;
for (auto const& elem : m)
{
std::cout << "m['" << elem.first << "'] = {";
for (auto value : elem.second)
std::cout << " " << value;
std::cout << " }" << std::endl;
}
return 0;
}
A few comments about the implementation:
qi::phrase_parse is a Boost.Spirit algorithm that takes a pair of iterators, a parser, and a skip parser, and runs the parsers on the input denoted by the iterators. In the process, it updates the beginning iterator (it in this example) so that it points to the end of the consumed input upon return. The returned res value indicates whether the parsers have succeeded (i.e. the consumed input could be successfully parsed). There are other forms of qi::phrase_parse that allow extracting attributes (which is the parsed data, in terms of Boost.Spirit) but we're not using attributes here because you have a peculiar requirement of the resulting container structure.
The skip parser is used to skip portions of the input between the elements of the main parser. In this case, qi::space means that any whitespace characters will be ignored in the input, so that e.g. "A = 1" and "A=1" can both be parsed similarly. There is qi::parse family of algorithms which do not have a skip parser and therefore require the main parser to handle all input without skips.
The (qi::char_ >> '=' >> qi::int_) part of the main parser matches a single character, followed by the equals sign character, followed by a signed integer. The equals sign is expressed as a literal (i.e. it is equivalent to the qi::lit('=') parser), which means it only matches the input but does not result in a parsed data. Therefore the result of this parser is an attribute that is a sequence of two elements - a character and an integer.
The % ',' part of the parser is a list parser, which parses any number of pieces of input described by the parser on the left (which is the parser described above), separated by the pieces described by the parser on the right (i.e. with comma characters in our case). As before, the comma character is a literal parser, so it doesn't produce output.
The [inserter] part is a semantic action, which is a function that is called by the parser every time it matches a portion of input string. The parser passes all its parsed output as the first argument to this function. In our case the semantic action is attached to the parser described in bullet #3, which means a sequence of a character and an integer is passed. Boost.Spirit uses a fusion::vector to pass these data. The other two arguments of the semantic action are not used in this example and can be ignored.
The inserter function in this example is a lambda function, but it could be any other kind of function object, including a regular function, a function generated by std::bind, etc. The important part is that it has the specified signature and that the type of its first argument is compatible with the attribute of the parser, to which it is attached as a semantic action. So, if we had a different parser in bullet #3, this argument would have to be changed accordingly.
fusion::at_c< N >() in the inserter obtains the element of the vector at index N. It is very similar to std::get< N >() when applied to std::tuple.

C++ copy cin into cout directly but in reverse order

Is there is any similar solution to this command:
using namespace std;
copy(istream_iterator<string>(cin), istream_iterator<string>(),ostream_iterator<string>(cout, "\n"));
-- this command copies everything into cout but I would like to change it to copy the string in reverse order so I have used this:
using namespace std;
reverse_copy(istream_iterator<string>(cin), istream_iterator<string>(),ostream_iterator<string>(cout, "\n"));
-- but this did not even compile. Are there any solutions to this? Thank you
The first two arguments to std::reverse_copy must be Bidirectional Iterator whereas std::istream_iterator is Input Iterator which cannot behave as Bidirectional Iterator. That explains why it doesn't work — it wouldn't even compile.
You've to write your own iterator — or do it manually in a loop — to solve this problem (which is not clear as to what mean by reverse : given foo bar as input, do you want bar foo or oof rab, or rab oof? as many of the commenters say).
You can write a recursive function. For example
#include <iostream>
#include <string>
#include <sstream>
std::ostream & reverse_output( std::istream &is = std::cin,
std::ostream &os = std::cout )
{
std::string s;
if ( is >> s ) reverse_output( is, os ) << s << '\n';
return os;
}
int main()
{
std::istringstream is( "Hello Bobul Mentol" );
reverse_output( is );
}
The program output is
Mentol
Bobul
Hello
Of course instead of the string stream I used for the demonstrative purpose you can use std::cin. In this case the call of the function will look just like
reverse_output();
Otherwise you need to store the input in some container and use it to reverse the inputted data for outputing.
For example
std::vector<std::string> v( std::istream_iterator<std::string>( std::cin ),
std::istream_iterator<std::string>() );
std::reverse_copy( v.begin(), v.end(),
std::ostream_iterator<std::string>( std::cout, "\n" ) );
I have never heard of some standard algorithm which can copy a reversed collection by an input_iterator or even forward_iterator - probably if this exist, it requires at least bidirectional_iterator.
So, you can use the temporary collection to store the values read, like this:
vector<string> tmp;
copy(istream_iterator<string>(cin), istream_iterator<string>(), back_inserter(tmp));
copy(tmp.rbegin(), tmp.rend(), ostream_iterator<string>(cout, "\n"));
There is a general problem here: std::cin is a stream. When it is attached to a file, you could imagine a way to initially find the size and so how to know where the reverse iterator should start. But when it is attached to a terminal, with an imprevisible human being able to type input data at will, at what position should the reverse iterator start? I have no idea of it, and it looks like cin designer had no more - more seriously, cin does not propose a reverse iterator, and it is by design.
If you want to present what has been inputted in cin but in reverse order, you must first specify:
what is the piece of input to reverse: anything until stream is closed, or anything until first end of line, or [put here your own definition]. Once it's done you have the start place for your reverse iterator
what is the unit to be reversed: one character or one word at a time. Once this is done, you know what your reverse iterator should return.
The implementation could use a vector of strings. You consistently accumulate words or single characters in it until what you have defined as the end of the stream. Once you hit the end, you hold a container with bidirectional iterators so copying it in reverse order should be easy.

How does std::copy work with stream iterators

A usual STL construct is:
vector<string> col;
copy(istream_iterator<string>(cin), istream_iterator<string>(),
back_inserter(col));
where we use an istream_iterator to copy from std input (cin) to a vector.
Can anyone explain how this code works?
my problem is that I don't really understand this part:
istream_iterator<string>(cin), istream_iterator<string>()
First, note that in this case, there's no real need to use std::copy at all. You can just initialize the vector directly from the iterators:
vector<string> col((istream_iterator<string>(cin)),
istream_iterator<string>());
This probably doesn't make the code a whole lot easier to understand though.
As far as how the code works, it's probably a little more straighforward than you think. An istream_iterator looks vaguely like this:
template <class T>
class istream_iterator {
std::istream *is;
T data;
public:
istream_iterator(std::istream &is) : is(&is) { ++(*this); }
istream_iterator() : is(nullptr) {}
T operator++() { (*is) >> data; return *this; }
T operator++(int) { (*is) >> data; return *this; }
T const &operator*() { return data; }
bool operator !=(istream_iterator &end) { return (*is).good(); }
bool operator ==(istream_iterator &end) { return !(*is).good(); }
};
Obviously there's more more I'm skipping over, but that's most of what we care about here. So, what happens is that when you create the iterator, it reads (or attempts to) an item from the stream into the variable I've called data. When you dereference the iterator, it returns data. When you increment the iterator, it reads (or attempts to) the next item from the file. Despite being written as if they compare one iterator to another, operator== and operator!= really just check for the end of the file1.
That's then used by std::copy, which (again simplified) looks vaguely like this:
template <class InIt, class OutIt>
void std::copy(InIt b, InIt e, OutIt d) {
while (b != e) {
*d = *b;
++b;
++d;
}
}
So, this reads and item from the input iterator, writes that item to the output iterator, and repeats until the iterator for the current position compares equal to the iterator for the end of the input (which will happen when you reach the end of the file). Note that unlike other iterators, the only "end" position you're allowed to use with an istream iterator is the end of the file.
Note that technically, this isn't conforming behavior. I've simplified comparison to keep things simple. Two default-constructed iterators should compare equal, and if you construct two iterators from the same stream, they should compare equal at least before you've read anything from the stream. This makes little practical difference though -- the only comparison you seen in real use is to determine whether you've reached the end of file yet.
Part of the answer below is quoted from C++ standard library: A tutorial and reference by Nicolai M. Josuttis with a little tweaks.
The expression
istream_iterator<string>(cin)
creates a string iterator that reads from the standard input stream cin. The template argument string specifies that the stream iterator reads elements of this type. These elements are read with the usual input operator >>. Thus, each time the algorithm wants to process the next element, the istream iterator transforms that desire into a call of
cin >> string
The input operator for strings usually reads one word separated by whitespaces.
The expression
istream_iterator<string>()
calls the default constructor of the istream iterators that creates a so-called end-of-stream iterator. It represents a stream from which you can no longer read. The end-of-string iterator is used as the end of the range, so the algorithm copy reads all strings from cin until it can no longer read any more.
The last one:
back_inserter(col))
according to back_inserter documentation:
A std::back_insert_iterator which can be used to add elements to the end of the container c
It will add all read in strings into col.
You can find information about std::istream_iterator and std::back_inserter for details.