C++ Tokenizing using iterators in an eof() cycle - c++

I'm trying to adapt this answer
How do I tokenize a string in C++?
to my current string problem which involves reading from a file till eof.
from this source file:
Fix grammatical or spelling errors
Clarify meaning without changing it
Correct minor mistakes
I want to create a vector with all the tokenized words. Example: vector<string> allTheText[0] should be "Fix"
I don't understad the purpose of istream_iterator<std::string> end; but I included cause it was on the original poster's answer.
So far, I've got this non-working code:
vector<string> allTheText;
stringstream strstr;
istream_iterator<std::string> end;
istream_iterator<std::string> it(strstr);
while (!streamOfText.eof()){
getline (streamOfText, readTextLine);
cout<<readTextLine<<endl;
stringstream strstr(readTextLine);
// how should I initialize the iterators it and end here?
}
Edit:
I changed the code to
vector<string> allTheText;
stringstream strstr;
istream_iterator<std::string> end;
istream_iterator<std::string> it(strstr);
while (getline(streamOfText, readTextLine)) {
cout << readTextLine << endl;
vector<string> vec((istream_iterator<string>(streamOfText)), istream_iterator<string>()); // generates RuntimeError
}
And got a RuntimeError, why?

Using a while (!….eof()) loop in C++ is broken because the loop will never be exited when the stream goes into an error state!
Rather, you should test the stream's state directly. Adapted to your code, this could look like this:
while (getline(streamOfText, readTextLine)) {
cout << readTextLine << endl;
}
However, you already have a stream. Why put it into a string stream as well? Or do you need to do this line by line for any reason?
You can directly initialize your vector with the input iterators. No need to build a string stream, and no need to use the copy algorithm either because there's an appropriate constructor overload.
vector<string> vec((istream_iterator<string>(cin)), istream_iterator<string>());
Notice the extra parentheses around the first argument which are necessary to disambiguate this from a function declaration.
EDIT A small explanation what this code does:
C++ offers a unified way of specifying ranges. A range is just a collection of typed values, without going into details about how these values are stored. In C++, these ranges are denoted as half-open intervals [a, b[. That means that a range is delimited by two iterators (which are kind of like pointers but more general; pointers are a special kind of iterator). The first iterator, a, points to the first element of the range. The second, b, points behind the last element. Why behind? Because this allows to iterate over the elements very easily:
for (Iterator i = a; i != b; ++i)
cout << *i;
Like pointers, iterators are dereferenced by applying * to them. This returns their value.
Container classes in C++ (e.g. vector, list) have a special constructor which allows easy copying of values from another range into the new container. Consequently, this constructor expects two iterators. For example, the following copies the C-style array into the vector:
int values[3] = { 1, 2, 3 };
vector<int> v(values, values + 3);
Here, values is synonymous with &values[0] which means that it points to the array's first element. values + 3, thanks to pointer arithmetic, is nearly equivalent to &values[3] (but this is invalid C++!) and points to the virtual element behind the array.
Now, my code above does the exact same as in this last example. The only difference is the type of iterator I use. Instead of using a plain pointer, I use a special iterator class that C++ provides. This iterator class wraps an input stream in such a way that ++ advances the input stream and * reads the next element from the stream. The kind of element is specified by the type argument (hence string in this case).
To make this work as a range, we need to specify a beginning and an end. Alas, we don't know the end of the input (this is logical, since the end of the stream may actually move over time as the user enters more input into a console!). Therefore, to create a virtual end iterator, we pass no argument to the constructor of istream_iterator. Conversely, to create a begin iterator, we pass an input stream. This then creates an iterator that points to the current position in the stream (here, cin).
My above code is functionally equivalent to the following:
istream_iterator<string> front(cin);
istream_iterator<string> back;
vector<string> vec;
for (istream_iterator<string> i = front; i != back; ++i)
vec.push_back(*i);
and this, in turn, is equivalent to using the following loop:
string word;
while (cin >> word)
vec.push_back(word);

Related

Converting from vector<string> to vector<double> without std::stod

It started with converting a vector<string> to vector<double>. I use gcc without C++11 so I could not use this approach. After having a look at the algorithm template of transpose I tried to use the template instead, but there is a bug for std::stod used inside the template (for gcc 4.x which results in ‘stod’ is not a member of ‘std’) So why not write it yourself and I use while-loop with stringstream (expensive?):
int i = 0;
while (svec.begin() != svec.end()) {
stringstream(*svec.begin())>> dvec[i] ;//svec: strings, dvec: doubles
//dvec[i] = std::stod(*svec.begin());
++svec.begin(); i++;
}
Unfortunately I get: double free or corruption (out): 0x00007f4788000ba0 ***
I think the issue here is that
++svec.begin();
doesn't work the way you think it does. This doesn't advance forward the beginning of the vector. Instead, it gets a (temporary) iterator to the beginning of the vector, then increments that. As a result, you'll be stuck in an infinite loop. To fix this, use a more traditional "loop over a container" loop either by using a range-based for loop or just counting up the indices.
I also noticed that you're writing into dvec by index. Without seeing the code to initialize dvec, I can't be sure of whether this is safe, since if you didn't already resize the vector this will write off the end and lead to undefined behavior. Even if you did set it up properly, because of the broken loop, my guess is that this eventually writes off the end of the vector and is what directly triggers the issue.
First of all, let's clear up some confusion:
std::stod is a C++11 function, so if you do not use C++11, then it is not a "bug" if it cannot be found.
You mean std::transform and not transpose, don't you?
You may very well use std::transform without C++11 and without std::stod. Unfortunately you don't show the code with which you tried.
So why not write it yourself and I use while-loop with stringstream
Yes, why not?
(expensive?):
Measure it :)
But it's unlikely you'll notice a difference.
int i = 0;
while (svec.begin() != svec.end()) {
The loop condition does not make sense. begin() and end() do not change. This line effectively reads as "do this stuff as long as the vector does not become empty".
stringstream(*svec.begin())>> dvec[i] ;//svec: strings, dvec: doubles
//dvec[i] = std::stod(*svec.begin());
++svec.begin(); i++;
}
I'd say you are overthinking this. The crash you get may come from the fact that you access dvec[i] while dvec is still empty, and you never actually add elements to it. That's undefined behaviour.
It's really as simple as "loop through string vector, use a stringstream on each element to get a double value, and add that value to the result vector", expressed in C++03 as:
// loop through string vector
for (std::vector<std::string>::const_iterator iter = svec.begin(); iter != svec.end(); ++iter)
{
std::string const& element = *iter;
// use a stringstream to get a double value:
std::istringstream is(element);
double result;
is >> result;
// add the double value to the result vector:
dvec.push_back(result);
}
If you don't have stod (which was added in C++11) you can use strtod and apply it to a C-string that corresponds to the C++ string that you have. Copying from the approach that you linked to and making that change (and cleaning up the formatting):
std::vector<double> convertStringVectortoDoubleVector(
const std::vector<std::string>& stringVector){
std::vector<double> doubleVector(stringVector.size());
std::transform(stringVector.begin(), stringVector.end(),
doubleVector.begin(), [](const std::string& val)
{
return strtod(val.c_str(), 0);
});
return doubleVector;
}
Note that if the code hasn't already ensured that each string holds a valid representation of a floating-point value you'll have to add code to check whether strtod succeeded.

How do I use an iterator on an ifstream twice in C++?

I'm new to C++ and I'm confused about using iterators with ifstream. In this following code I have an ifstream variable called dataFile.
In the code I first iterate through the file once to count how many characters it has (is there a more efficient way to do this?). Then I create a matrix of that size, and iterate through again to fill the matrix.
The problem is that the iterator refuses to iterate the second time around, and will not do anything. I tried resetting the ifstream from the beginning by using dataFile.clear(), but this didn't work, probably because I have some deep misunderstanding about iterators. Could someone help me please?
typedef istreambuf_iterator<char> dataIterator;
for (dataIterator counter(dataFile), end; counter != end; ++counter, ++numCodons) {} // Finds file size in characters.
MatrixXd YMatrix = MatrixXd::Constant(3, numCodons, 0);
dataFile.clear(); // Resets the ifstream to be used again.
for (dataIterator counter(dataFile), end; counter != end; ++counter) {...}
istreambuf_iterator is an input iterator which once has been incremented, all copies of its previous value may be invalidated, not a forward iterator which guarantees validity when used in multipass algorithms. More about the category of iterators, see here.

no suitable conversion from "std::string" to "char" exists

I'm working on a project for school and I am running into a bit of a problem (error is in the title).
Here is the line of code that runs into the error:
kruskalS[n].nodeList[m].push_back(tempFirstCity);
kruskalS is a struct and nodeList is a vector of type string within the struct and I'm trying to insert tempFirstCity (also a string) into that array.
I could easily be making a basic mistake since I haven't done any programming since April. Any kind of help would be appreciated and I'm willing to post a bit more information from the program if needed.
A std::string is (sort of) a container of chars. A push_back function is used to add one element to the end of a container. So when you call kruskalS[n].nodeList[m].push_back(tempFirstCity);, you say you are trying to add one element to the end of the string called kruskalS[n].nodeList[m]. So the compiler expects that one element to be a char.
If you know that tempFirstCity is not empty and you want to add the first char from tempFirstCity to the end of kruskalS[n].nodeList[m] (including the case where you know tempFirstCity.size() == 1), you can do
kruskalS[n].nodeList[m].push_back(tempFirstCity[0]);
If you want to add the entire string after any current contents, you can do
kruskalS[n].nodeList[m] += tempFirstCity;
If you expect there are no current contents and/or you want to just replace anything already there with the tempFirstCity string, you can do
kruskalS[n].nodeList[m] = tempFirstCity;
You can use:
std::string::c_str()
It returns a const char *.
You say nodeList is an array of type string. i.e. std::string nodeList[x] where x is a constant.
Then assigning a new element to that array where m < x is as follows:
kruskalS[n].nodeList[m] = tempFirstCity;
Based on comments:
For appending to end of vector you don't need the index m:
kruskalS[n].nodeList.push_back(tempFirstCity);
For inserting at index m:
vector<string>::iterator itr = nodeList.begin();
for (int i = 0; i < m; i++)
itr++;
nodeList.insert(itr, tempFirstCity);
In C++, you can use string::c_str() to convert a string to C programming char array..

C++ map<K,T> Initialization

I'm reading "Ivor Horton's Beginning Programming Visual C++ 2010", and I'm at Chapter 10-The Standard Template Library. My problem is with the map container map<Person, string> mapname. The book showed me a lot of ways of adding elements to it, such as with the pair<K, T> and using the make_pair() function later, and mapname.insert(pair). But suddenly he introduced an element adding technique used in the following code:
int main()
{
std::map<string, int> words
cout << "Enter some text and press Enter followed by Ctrl+Z then Enter to end:"
<< endl << endl;
std::istream_iterator<string> begin(cin);
std::istream_iterator<string> end;
while(being != end) // iterate over words in the stream
//PROBLEM WITH THIS LINE:
words[*begin++]++; // Increment and store a word count
//there are still more but irrelevant to this question)
}
The indicated line is my problem. I understand that words is the map, but I've never seen such initialization. And what's going on in that thing with its increment. I believe Ivor Horton failed to elaborate this further, or at least he should've given introductions big enough not to suprise noobs like me.
You have such a map:
sts::map<std::string, int> m;
The access operator [key] gives you a reference to the element stored with that key, or inserts one if it doesn't exist. So for an empty map, this
m["hello"];
inserts an entry in the map, with key "Hello" and value 0. It also returns a reference to the value. So you can increment it directly:
m["Bye"]++;
would insert a value 0 under key "Bye" and increment it by one, or increment an existing value by 1.
As for the stuff happening inside the [] operator,
*begin++
is a means to incrementing the istream_iterator and dereferencing the value before the increment:
begin++;
increments begin and returns the value before the increment
*someIterator
dereferences the iterator.
He is doing two things at once, and generally being more clever than he needs to be.
He is getting the value the iterator points to, and then incrementing the iterator. So, interpret *begin++ as *(begin++). Note that it's a postincrement, though, so the increment happens AFTER the dereference.
He is incrementing the value for the given key in your map. When you dereference the iterator, you get a string. This string is used as the key for the words map, whose value is incremented.
Spread over more lines, it looks like this:
std::string x = *begin;
begin++;
words[x] += 1;

String initialization with pair of iterators

I'm trying to initialize string with iterators and something like this works:
ifstream fin("tmp.txt");
istream_iterator<char> in_i(fin), eos;
//here eos is 1 over the end
string s(in_i, eos);
but this doesn't:
ifstream fin("tmp.txt");
istream_iterator<char> in_i(fin), eos(fin);
/* here eos is at this same position as in_i*/
//moving eos forward
for (int i = 0; i < 20; ++i)
{
++eos;
}
// trying to initialize string with
// pair of iterators gives me ""
// result
string s(in_i, eos);
Thank you.
I don't think you can advance the end iterator to a suitable position: to advance the iterator means to read input, also both iterators are referencing the same stream - therefore advancing one iterator means to advance the second. They both end up referencing the same position in the stream.
Unless you are willing to write or find an iterator adaptor (boost?) that does an operation on n items referenced by some iterator, it might not be possible to initialize the string like that. Or you read the value with other methods and set the value of the string later.
istream_iterator is input so your second code fragment isn't correct, you can't do this (second pass). Please look here and pay attention to "single pass" algorithms support (second paragraph under "description" title). First fragment doesn't try to perform 2 passes.
Is this explanation OK? BTW SGI STL reference (from where I posted link) is somewhat outdated but very usable for some quick references (by my opinion). I'd recommend it to be bookmarked.
The istream_iterator is a very limited iterator knows as a input iterator.
See: http://www.sgi.com/tech/stl/InputIterator.html
But basically for input iterators very few gurantees are made.
Specifically to your case:
i == j does not imply ++i == ++j.
So your first example is an iterator that is passed the end of stream. It is valid (as long as it is not incremented) and comparable to other iterators so works for reading the whole stream.
From the standard [24.5.1]/1:
[...] The constructor with no arguments
istream_iterator() always constructs
an end of stream input iterator
object, which is the only legitimate
iterator to be used for the end
condition. [...] The main peculiarity of the istream iterators is the fact that ++ operators are not equality preserving, that is, i == j does not guarantee at all that ++i == ++j. Every time ++ is used a new value is read.
[24.5.1]/3
Two end-of-stream iterators are
always equal. An end-of-stream
iterator is not equal to a
non-end-of-stream iterator. Two
non-end-of-stream iterators are equal
when they are constructed from the
same stream
The first paragraph states that you cannot use any but end-of-stream iterators as end conditions, so your first usage is correct and expected. The third paragraph in the chapter states that any two non-end-of-stream iterators into the same stream are guaranteed to be equal at all times. That is, the second usage is correct from the language standpoint and will provide the results you are getting.
The last part of paragraph 1, where it states that i == j does not imply ++i == ++j deals with the particular case of one last element present in the stream. After incrementing the first of the iterators (either i or j) the datum is consumed by that iterator. Advancing the other iterator will hit the end of stream and thus the two iterators will differ. In all other cases (there is more than one datum left in the stream), ++i == ++j.
Also note that the sentence ++i == ++j executes two mutating operations on the same element (stream) and thus it is not which of the two iterators get the first/second datum in the stream.