String initialization with pair of iterators - c++

I'm trying to initialize string with iterators and something like this works:
ifstream fin("tmp.txt");
istream_iterator<char> in_i(fin), eos;
//here eos is 1 over the end
string s(in_i, eos);
but this doesn't:
ifstream fin("tmp.txt");
istream_iterator<char> in_i(fin), eos(fin);
/* here eos is at this same position as in_i*/
//moving eos forward
for (int i = 0; i < 20; ++i)
{
++eos;
}
// trying to initialize string with
// pair of iterators gives me ""
// result
string s(in_i, eos);
Thank you.

I don't think you can advance the end iterator to a suitable position: to advance the iterator means to read input, also both iterators are referencing the same stream - therefore advancing one iterator means to advance the second. They both end up referencing the same position in the stream.
Unless you are willing to write or find an iterator adaptor (boost?) that does an operation on n items referenced by some iterator, it might not be possible to initialize the string like that. Or you read the value with other methods and set the value of the string later.

istream_iterator is input so your second code fragment isn't correct, you can't do this (second pass). Please look here and pay attention to "single pass" algorithms support (second paragraph under "description" title). First fragment doesn't try to perform 2 passes.
Is this explanation OK? BTW SGI STL reference (from where I posted link) is somewhat outdated but very usable for some quick references (by my opinion). I'd recommend it to be bookmarked.

The istream_iterator is a very limited iterator knows as a input iterator.
See: http://www.sgi.com/tech/stl/InputIterator.html
But basically for input iterators very few gurantees are made.
Specifically to your case:
i == j does not imply ++i == ++j.
So your first example is an iterator that is passed the end of stream. It is valid (as long as it is not incremented) and comparable to other iterators so works for reading the whole stream.

From the standard [24.5.1]/1:
[...] The constructor with no arguments
istream_iterator() always constructs
an end of stream input iterator
object, which is the only legitimate
iterator to be used for the end
condition. [...] The main peculiarity of the istream iterators is the fact that ++ operators are not equality preserving, that is, i == j does not guarantee at all that ++i == ++j. Every time ++ is used a new value is read.
[24.5.1]/3
Two end-of-stream iterators are
always equal. An end-of-stream
iterator is not equal to a
non-end-of-stream iterator. Two
non-end-of-stream iterators are equal
when they are constructed from the
same stream
The first paragraph states that you cannot use any but end-of-stream iterators as end conditions, so your first usage is correct and expected. The third paragraph in the chapter states that any two non-end-of-stream iterators into the same stream are guaranteed to be equal at all times. That is, the second usage is correct from the language standpoint and will provide the results you are getting.
The last part of paragraph 1, where it states that i == j does not imply ++i == ++j deals with the particular case of one last element present in the stream. After incrementing the first of the iterators (either i or j) the datum is consumed by that iterator. Advancing the other iterator will hit the end of stream and thus the two iterators will differ. In all other cases (there is more than one datum left in the stream), ++i == ++j.
Also note that the sentence ++i == ++j executes two mutating operations on the same element (stream) and thus it is not which of the two iterators get the first/second datum in the stream.

Related

Input iterator can be read repeatedly while Output Iterator can only be written once?

I was reading The C++ Programming Language 4th edition by Bjarne Stroustrup. In the iterator chapter (Chapter 31.1.2), it says:
Input iterator: We can iterate forward using ++ and read each element (repeatedly) using *.
Output iterator: We can iterate forward using ++ and write an element once only using *.
I have done many searches on whether input iterator can be read only once or repeatedly, for example:
http://www.cplusplus.com/reference/iterator/InputIterator/
https://www.geeksforgeeks.org/input-iterators-in-cpp/
and most suggests that input iterator can be read once only. But why the author says repeatedly for the input iterator? Is this correct? And if so, why input iterator can be read repeatedly but output iterator can only be written once. I always thought input and output iterator are completely opposite of each other.
Thanks everyone!
The book is correct; and the contradicting sources are not. There appears to be no rule that disallows reading an object more than once by indirecting through an input iterator.
The other sources may be confused by another similar limitation which is that once input iterator has been incremented, all copies of the previous iterator are invalidated, and thus may not be indirected anymore. This limitation is shared by output iterators. By example:
value = *inputIt;
value = *inputIt; // OK
copyIt = inputIt;
++inputIt;
value = *copyIt; // Not OK
The book is also correct that output iterator does have the limitation:
*outputIt = value;
++outputIt;
*outputIt = value; // OK
*outputIt = value; // Not OK
I always thought input and output iterator are completely opposite of each other.
Many output iterators are also input iterators, so "opposite" isn't exactly very descriptive. They are partially overlapping sets of requirements. An iterator can meet both sets of requirements.
If we have *outputIt = 1; then *outputIt = 2; aren't we just assigning to the same *outputit twice?
Yes; And that's something that output iterators are not required to support.
Consider for example an output iterator that sends packets over the internet. You've written a packet, and it has been sent to the internet and received by some other machine. You can't travel back in time and decide that the packet that was sent is something different. You must move on to the next packet and send that instead.
Bjarne's quote is correct. If you have an input iterator you can do *iterator as many times as you want. If you have an output iterator you can only do *iterator once.
What they both have in common though is they can only be used in single pass algorithms. Once you increment either an input or output iterator then an iterator to a previous position is no longer required to be dereferenceable
That means in
while (iterator != end)
{
if (*iterator == some_value)
something = *iterator;
++iterator;
}
iterator has to be an input iterator since we dereference it twice each iteration. On the other hand
while (iterator != end)
{
something = *iterator;
++iterator;
}
works for both an input and output iterators since we do only a single dereference.
You can read through an input iterator as many times as you want to. That comes from the requirement that "(void)*a, *a is equivalent to *a" [input.iterators], see table.
You can only write through an output iterator once. For *r = o, "After this operation r is not required to be dereferenceable". [output.iterators], see table. After you increment r you have a new iterator, and you can again assign through it once.
Once you combine these two into a forward iterator, the restriction on multiple assignments through the same iterator goes away.

Variables inside and outside a loop with/without asterix

I am not proficient in C++ but I am converting a short script to PHP
for(auto it = First; it != Last; ++it)
{
Result += *it;
}
From this snippet, I can speculate this simply means
Result = Result + it
where * is a reference to the pointer of the loop.
That said I see this symbol used outside of loops and in some cases I see variables without this symbol both in and outside of loops which puts holes in my theory.
Again I am trying to RTFM but I am unsure what I am searching for.
Both First and Last are iterator objects, representing a generalization of pointers in C++ Standard Library. Additionally, the two iterators reference the same collection, and Last can be reached from First by incrementing the iterator*.
Result is some sort of accumulator. If it is of numeric type, += means Result = Result + *it, where *it is whatever the iterator is pointing to. In other words, Result accumulates the total of elements of the collection between First, inclusive, and Last, exclusive. If First points to the beginning of an array and Last points to one-past-the-end of an array of numeric type, your code would be equivalent to calling PHP array_sum() on the array.
However, Result is not required to be numeric. For example, it could be a std::string, in which case += represents appending the value to the string.
* In terms of pointers and arrays this would be "pointing to the same array," and "Last points to a higher index of the array than First."
I believe your speculation is incorrect.
it, first and last are either iterators or pointers. Iterators are C++ objects that can be used to iterator over containers. For basic usage, they behave much like pointers, and can be dereferenced the same way.
For example:
std::vector<int> myList;
...
// Search for the number 10 in the list.
std::vector<int>::iterator it = std::find(myList.begin(), myList.end(), 10);
// If the number 10 was found in the list, change the value to 11.
if (it != myList.end())
*it = 11; //< Similar to pointer syntax.
In your specific example, the Result variable has a value added to it. To get that value, your code uses the * operator to get the value from the iterator.
The same concept applies to pointers. although iterators and pointers are very different concepts, accessing their values is very similar.

Keeping std::list iterators valid through insertion

Note: This is not a question whether I should "use list or deque". It's a question about the validity of iterators in the face of insert().
This may be a simple question and I'm just too dense to see the right way to do this. I'm implementing (for better or worse) a network traffic buffer as a std::list<char> buf, and I'm maintaining my current read position as an iterator readpos.
When I add data, I do something like
buf.insert(buf.end(), newdata.begin(), newdata.end());
My question is now, how do I keep the readpos iterator valid? If it points to the middle of the old buf, then it should be fine (by the iterator guarantees for std::list), but typically I may have read and processed all data and I have readpos == buf.end(). After the insertion, I want readpos always to point to the next unread character, which in case of the insertion should be the first inserted one.
Any suggestions? (Short of changing the buffer to a std::deque<char>, which appears to be much better suited to the task, as suggested below.)
Update: From a quick test with GCC4.4 I observe that deque and list behave differently with respect to readpos = buf.end(): After inserting at the end, readpos is broken in a list, but points to the next element in a deque. Is this a standard guarantee?
(According to cplusplus, any deque::insert() invalidated all iterators. That's no good. Maybe using a counter is better than an iterator to track a position in a deque?)
if (readpos == buf.begin())
{
buf.insert(buf.end(), newdata.begin(), newdata.end());
readpos = buf.begin();
}
else
{
--readpos;
buf.insert(buf.end(), newdata.begin(), newdata.end());
++readpos;
}
Not elegant, but it should work.
From http://www.sgi.com/tech/stl/List.html
"Lists have the important property that insertion and splicing do not invalidate iterators to list elements, and that even removal invalidates only the iterators that point to the elements that are removed."
Therefore, readpos should still be valid after the insert.
However...
std::list< char > is a very inefficient way to solve this problem. Each byte you store in a std::list requires a pointer to keep track of the byte, plus the size of the list node structure, two more pointers usually. That is at least 12 or 24 bytes (32 or 64-bit) of memory used to keep track of a single byte of data.
std::deque< char> is probably a better container for this. Like std::vector it provides constant time insertions at the back however it also provides constant time removal at the front. Finally, like std::vector std::deque is a random-access container so you can use offsets/indexes instead of iterators. These three features make it an efficient choice.
I was indeed being dense. The standard gives us all the tools we need. Specifically, the sequence container requirements 23.2.3/9 say:
The iterator returned from a.insert(p, i, j) points to the copy of the first element inserted into a, or p if i == j.
Next, the description of list::insert says (23.3.5.4/1):
Does not affect the validity of iterators and references.
So in fact if pos is my current iterator inside the list which is being consumed, I can say:
auto it = buf.insert(buf.end(), newdata.begin(), newdata.end());
if (pos == buf.end()) { pos = it; }
The range of new elements in my list is [it, buf.end()), and the range of yet unprocessed elements is [pos, buf.end()). This works because if pos was equal to buf.end() before the insertion, then it still is after the insertion, since insertion does not invalidate any iterators, not even the end.
list<char> is a very inefficient way to store a string. It is probably 10-20 times larger than the string itself, plus you are chasing a pointer for every character...
Have you considered using std::dequeue<char> instead?
[edit]
To answer your actual question, adding and removing elements does not invalidate iterators in a list... But end() is still going to be end(). So you would need to check for that as a special case at the point where you insert the new element in order to update your readpos iterator.

How can I stop iterating "n" before the end of a map when the iterators aren't random-access?

I would like to traverse a map in C++ with iterators but not all the way to the end.
The problem is that even if we can do basic operations with iterators, we cannot add or compare iterators with integers.
How can I write the following instructions? (final is a map; window, an integer)
for (it=final.begin(); it!=final.end()-window; it++)
You cannot subtract from a map iterator directly, because it is an expensive operation (in practice doing --iter the required number of times). If you really want to do it anyway, you can use the standard library function 'advance'.
map<...>::iterator end = final.end();
std::advance(end, -window);
That will give you the end of your window.
std::map<T1, T2>::iterator it = final.begin();
for (int i = 0; i < final.size()-window; ++i, ++it)
{
// TODO: add your normal loop body
}
Replace T1 and T2 with the actual types of the keys and values of the map.
Why don't you make 'it' an iterator as well ?
See the example here : http://www.cplusplus.com/reference/stl/map/begin/
Another solution:
size_t count=final.size();
size_t processCount=(window<count?count-window:0);
for (it=final.begin(); processCount && it!=final.end(); ++it, --processCount)
{
// loop body
}
This one is a bit safer:
It takes care of the case when your map is actually smaller than the value of window.
It will process at most processCount elements, even if you change the size of your map inside your loop (e.g. add new elements)
According to STL, size() can take O(n) time to compute, although usual implementations can do this in O(1). To be on the safe side, it is better not to call size() many times, if it is not necessary.
'end()' on the other hand has amortized constant time, so it should be OK to have it in the for-loop condition
++it may be faster than it++. The post-increment operator creates a temporary object, while the other - does not. When the variable is a simple integral type, compiler can optimise it out, but with iterators it is not always the case.

C++ Tokenizing using iterators in an eof() cycle

I'm trying to adapt this answer
How do I tokenize a string in C++?
to my current string problem which involves reading from a file till eof.
from this source file:
Fix grammatical or spelling errors
Clarify meaning without changing it
Correct minor mistakes
I want to create a vector with all the tokenized words. Example: vector<string> allTheText[0] should be "Fix"
I don't understad the purpose of istream_iterator<std::string> end; but I included cause it was on the original poster's answer.
So far, I've got this non-working code:
vector<string> allTheText;
stringstream strstr;
istream_iterator<std::string> end;
istream_iterator<std::string> it(strstr);
while (!streamOfText.eof()){
getline (streamOfText, readTextLine);
cout<<readTextLine<<endl;
stringstream strstr(readTextLine);
// how should I initialize the iterators it and end here?
}
Edit:
I changed the code to
vector<string> allTheText;
stringstream strstr;
istream_iterator<std::string> end;
istream_iterator<std::string> it(strstr);
while (getline(streamOfText, readTextLine)) {
cout << readTextLine << endl;
vector<string> vec((istream_iterator<string>(streamOfText)), istream_iterator<string>()); // generates RuntimeError
}
And got a RuntimeError, why?
Using a while (!….eof()) loop in C++ is broken because the loop will never be exited when the stream goes into an error state!
Rather, you should test the stream's state directly. Adapted to your code, this could look like this:
while (getline(streamOfText, readTextLine)) {
cout << readTextLine << endl;
}
However, you already have a stream. Why put it into a string stream as well? Or do you need to do this line by line for any reason?
You can directly initialize your vector with the input iterators. No need to build a string stream, and no need to use the copy algorithm either because there's an appropriate constructor overload.
vector<string> vec((istream_iterator<string>(cin)), istream_iterator<string>());
Notice the extra parentheses around the first argument which are necessary to disambiguate this from a function declaration.
EDIT A small explanation what this code does:
C++ offers a unified way of specifying ranges. A range is just a collection of typed values, without going into details about how these values are stored. In C++, these ranges are denoted as half-open intervals [a, b[. That means that a range is delimited by two iterators (which are kind of like pointers but more general; pointers are a special kind of iterator). The first iterator, a, points to the first element of the range. The second, b, points behind the last element. Why behind? Because this allows to iterate over the elements very easily:
for (Iterator i = a; i != b; ++i)
cout << *i;
Like pointers, iterators are dereferenced by applying * to them. This returns their value.
Container classes in C++ (e.g. vector, list) have a special constructor which allows easy copying of values from another range into the new container. Consequently, this constructor expects two iterators. For example, the following copies the C-style array into the vector:
int values[3] = { 1, 2, 3 };
vector<int> v(values, values + 3);
Here, values is synonymous with &values[0] which means that it points to the array's first element. values + 3, thanks to pointer arithmetic, is nearly equivalent to &values[3] (but this is invalid C++!) and points to the virtual element behind the array.
Now, my code above does the exact same as in this last example. The only difference is the type of iterator I use. Instead of using a plain pointer, I use a special iterator class that C++ provides. This iterator class wraps an input stream in such a way that ++ advances the input stream and * reads the next element from the stream. The kind of element is specified by the type argument (hence string in this case).
To make this work as a range, we need to specify a beginning and an end. Alas, we don't know the end of the input (this is logical, since the end of the stream may actually move over time as the user enters more input into a console!). Therefore, to create a virtual end iterator, we pass no argument to the constructor of istream_iterator. Conversely, to create a begin iterator, we pass an input stream. This then creates an iterator that points to the current position in the stream (here, cin).
My above code is functionally equivalent to the following:
istream_iterator<string> front(cin);
istream_iterator<string> back;
vector<string> vec;
for (istream_iterator<string> i = front; i != back; ++i)
vec.push_back(*i);
and this, in turn, is equivalent to using the following loop:
string word;
while (cin >> word)
vec.push_back(word);