Converting from vector<string> to vector<double> without std::stod - c++

It started with converting a vector<string> to vector<double>. I use gcc without C++11 so I could not use this approach. After having a look at the algorithm template of transpose I tried to use the template instead, but there is a bug for std::stod used inside the template (for gcc 4.x which results in ‘stod’ is not a member of ‘std’) So why not write it yourself and I use while-loop with stringstream (expensive?):
int i = 0;
while (svec.begin() != svec.end()) {
stringstream(*svec.begin())>> dvec[i] ;//svec: strings, dvec: doubles
//dvec[i] = std::stod(*svec.begin());
++svec.begin(); i++;
}
Unfortunately I get: double free or corruption (out): 0x00007f4788000ba0 ***

I think the issue here is that
++svec.begin();
doesn't work the way you think it does. This doesn't advance forward the beginning of the vector. Instead, it gets a (temporary) iterator to the beginning of the vector, then increments that. As a result, you'll be stuck in an infinite loop. To fix this, use a more traditional "loop over a container" loop either by using a range-based for loop or just counting up the indices.
I also noticed that you're writing into dvec by index. Without seeing the code to initialize dvec, I can't be sure of whether this is safe, since if you didn't already resize the vector this will write off the end and lead to undefined behavior. Even if you did set it up properly, because of the broken loop, my guess is that this eventually writes off the end of the vector and is what directly triggers the issue.

First of all, let's clear up some confusion:
std::stod is a C++11 function, so if you do not use C++11, then it is not a "bug" if it cannot be found.
You mean std::transform and not transpose, don't you?
You may very well use std::transform without C++11 and without std::stod. Unfortunately you don't show the code with which you tried.
So why not write it yourself and I use while-loop with stringstream
Yes, why not?
(expensive?):
Measure it :)
But it's unlikely you'll notice a difference.
int i = 0;
while (svec.begin() != svec.end()) {
The loop condition does not make sense. begin() and end() do not change. This line effectively reads as "do this stuff as long as the vector does not become empty".
stringstream(*svec.begin())>> dvec[i] ;//svec: strings, dvec: doubles
//dvec[i] = std::stod(*svec.begin());
++svec.begin(); i++;
}
I'd say you are overthinking this. The crash you get may come from the fact that you access dvec[i] while dvec is still empty, and you never actually add elements to it. That's undefined behaviour.
It's really as simple as "loop through string vector, use a stringstream on each element to get a double value, and add that value to the result vector", expressed in C++03 as:
// loop through string vector
for (std::vector<std::string>::const_iterator iter = svec.begin(); iter != svec.end(); ++iter)
{
std::string const& element = *iter;
// use a stringstream to get a double value:
std::istringstream is(element);
double result;
is >> result;
// add the double value to the result vector:
dvec.push_back(result);
}

If you don't have stod (which was added in C++11) you can use strtod and apply it to a C-string that corresponds to the C++ string that you have. Copying from the approach that you linked to and making that change (and cleaning up the formatting):
std::vector<double> convertStringVectortoDoubleVector(
const std::vector<std::string>& stringVector){
std::vector<double> doubleVector(stringVector.size());
std::transform(stringVector.begin(), stringVector.end(),
doubleVector.begin(), [](const std::string& val)
{
return strtod(val.c_str(), 0);
});
return doubleVector;
}
Note that if the code hasn't already ensured that each string holds a valid representation of a floating-point value you'll have to add code to check whether strtod succeeded.

Related

Insert to beginning of copied vector

I have a std:;vector<double> that's the output from a simulation code. The size can be anywhere from O(10^1) to O(10^4). I need to create a new vector that's a copy of this vector with an additional element at the beginning, so I can either write:
// old_vec is some std::vector<double> from a simulation code
auto new_vec = old_vec;
double val = 1.0;
new_vec.insert(new_vec.begin(), val);
or
std::vector<double> new_vec{val};
new_vec.insert(new_vec.end(), old_vec.begin(), old_vec.end());
I believe the first approach will cause a reallocation due to the insertion at the beginning of a vector, whereas the second one will just append everything to the end, so the latter seems better? Is there any guarantee that the compiler may optimize the first code into the second code?
I wouldn't trust directly using the "=" operator to copy the vector, but more of a combination between your two methods. List-initialization may be safer first, then use insert() to add the first element:
vector <double> new_vec = {old_vec.begin(), old_vec.end()};
new_vec.insert(new_vec.begin(), val);
Your suspicions of problems may vary across different compilers, so you may or may not get an error. However, if you would like a foolproof way, that would be outright inserting and copying:
vector <double> new_vec; new_vec.push_back(val);
for (double i : old_vec) { new_vec.push_back(i); }

Prefer Iterators Over Pointers?

This question is a bump of a question that had a comment here but was deleted as part of the bump.
For those of you who can't see deleted posts, the comment was on my use of const char*s instead of string::const_iterators in this answer: "Iterators may have been a better path from the get go, since it appears that is exactly how your pointers seems be treated."
So my question is this, do iterators hold string::const_iterators hold any intrinsic value over a const char*s such that switching my answer over to string::const_iterators makes sense?
Introduction
There are many perks of using iterators instead of pointers, among them are:
different code-path in release vs debug, and;
better type-safety, and;
making it possible to write generic code (iterators can be made to work with any data-structure, such as a linked-list, whereas intrinsic pointers are very limited in this regard).
Debugging
Since, among other things, dereferencing an iterator that is passed the end of a range is undefined-behavior, an implementation is free to do whatever it feels necessary in such case - including raising diagnostics saying that you are doing something wrong.
The standard library implementation, libstdc++, provided by gcc will issues diagnostics when it detects something fault (if Debug Mode is enabled).
Example
#define _GLIBCXX_DEBUG 1 /* enable debug mode */
#include <vector>
#include <iostream>
int
main (int argc, char *argv[])
{
std::vector<int> v1 {1,2,3};
for (auto it = v1.begin (); ; ++it)
std::cout << *it;
}
/usr/include/c++/4.9.2/debug/safe_iterator.h:261:error: attempt to
dereference a past-the-end iterator.
Objects involved in the operation:
iterator "this" # 0x0x7fff828696e0 {
type = N11__gnu_debug14_Safe_iteratorIN9__gnu_cxx17__normal_iteratorIPiNSt9__cxx19986vectorIiSaIiEEEEENSt7__debug6vectorIiS6_EEEE (mutable iterator);
state = past-the-end;
references sequence with type `NSt7__debug6vectorIiSaIiEEE' # 0x0x7fff82869710
}
123
The above would not happen if we were working with pointers, no matter if we are in debug-mode or not.
If we don't enable debug mode for libstdc++, a more performance friendly version (without the added bookkeeping) implementation will be used - and no diagnostics will be issued.
(Potentially) better Type Safety
Since the actual type of iterators are implementation-defined, this could be used to increase type-safety - but you will have to check the documentation of your implementation to see whether this is the case.
Consider the below example:
#include <vector>
struct A { };
struct B : A { };
// .-- oops
// v
void it_func (std::vector<B>::iterator beg, std::vector<A>::iterator end);
void ptr_func (B * beg, A * end);
// ^-- oops
int
main (int argc, char *argv[])
{
std::vector<B> v1;
it_func (v1.begin (), v1.end ()); // (A)
ptr_func (v1.data (), v1.data () + v1.size ()); // (B)
}
Elaboration
(A) could, depending on the implementation, be a compile-time error since std::vector<A>::iterator and std::vector<B>::iterator potentially isn't of the same type.
(B) would, however, always compile since there's an implicit conversion from B* to A*.
Iterators are intended to provide an abstraction over pointers.
For example, incrementing an iterator always manipulates the iterator so that if there's a next item in the collection, it refers to that next item. If it already referred to the last item in the collection, after the increment it'll be a unique value that can't be dereferenced, but will compare equal to another iterator pointing one past the end of the same collection (usually obtained with collection.end()).
In the specific case of an iterator into a string (or a vector), a pointer provides all the capabilities required of an iterator, so a pointer can be used as an iterator with no loss of required functionality.
For example, you could use std::sort to sort the items in a string or a vector. Since pointers provide the required capabilities, you can also use it to sort items in a native (C-style) array.
At the same time, yes, defining (or using) an iterator that's separate from a pointer can provide extra capabilities that aren't strictly required. Just for example, some iterators provide at least some degree of checking, to assure that (for example) when you compare two iterators, they're both iterators into the same collection, and that you aren't attempting an out of bounds access. A raw pointer can't (or at least normally won't) provide this kind of capability.
Much of this comes back to the "don't pay for what you don't use" mentality. If you really only need and want the capabilities of native pointers, they can be used as iterators, and you'll normally get code that's essentially identical to what you'd get by directly manipulating pointers. At the same time, for cases where you do want extra capabilities, such as traversing a threaded RB-tree or a B+ tree instead of a simple array, iterators allow you to do that while maintaining a single, simple interface. Likewise, for cases where you don't mind paying extra (in terms of storage and/or run-time) for extra safety, you can get that too (and it's decoupled from things like the individual algorithm, so you can get it where you want it without being forced to use it in other places that may, for example, have too critical of timing requirements to support it.
In my opinion, many people kind of miss the point when it comes to iterators. Many people happily rewrite something like:
for (size_t i=0; i<s.size(); i++)
...into something like:
for (std::string::iterator i = s.begin; i != s.end(); i++)
...and act as if it's a major accomplishment. I don't think it is. For a case like this, there's probably little (if any) gain from replacing an integer type with an iterator. Likewise, taking the code you posted and changing char const * to std::string::iterator seems unlikely to accomplish much (if anything). In fact, such conversions often make the code more verbose and less understandable, while gaining nothing in return.
If you were going to change the code, you should (in my opinion) do so in an attempt at making it more versatile by making it truly generic (which std::string::iterator really isn't going to do).
For example, consider your split (copied from the post you linked):
vector<string> split(const char* start, const char* finish){
const char delimiters[] = ",(";
const char* it;
vector<string> result;
do{
for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
it != finish && *it == '(';
it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
auto&& temp = interpolate(start, it);
result.insert(result.end(), temp.begin(), temp.end());
start = ++it;
} while (it <= finish);
return result;
}
As it stands, this is restricted to being used on narrow strings. If somebody wants to work with wide strings, UTF-32 strings, etc., it's relatively difficult to get it to do that. Likewise, if somebody wanted to match [ or '{' instead of (, the code would need to be rewritten for that as well.
If there were a chance of wanting to support various string types, we might want to make the code more generic, something like this:
template <class InIt, class OutIt, class charT>
void split(InIt start, InIt finish, charT paren, charT comma, OutIt result) {
typedef std::iterator_traits<OutIt>::value_type o_t;
charT delimiters[] = { comma, paren };
InIt it;
do{
for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
it != finish && *it == paren;
it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
auto&& temp = interpolate(start, it);
*result++ = o_t{temp.begin(), temp.end()};
start = ++it;
} while (it != finish);
}
This hasn't been tested (or even compiled) so it's really just a sketch of a general direction you could take the code, not actual, finished code. Nonetheless, I think the general idea should at least be apparent--we don't just change it to "use iterators". We change it to be generic, and iterators (passed as template parameters, with types not directly specified here) are only a part of that. To get very far, we also eliminated hard-coding the paren and comma characters. Although not strictly necessary, I also change the parameters to fit more closely with the convention used by standard algorithms, so (for example) output is also written via an iterator rather than being returned as a collection.
Although it may not be immediately apparent, the latter does add quite a bit of flexibility. Just for example, if somebody just wanted to print out the strings after splitting them, he could pass an std::ostream_iterator, to have each result written directly to std::cout as it's produced, rather than getting a vector of strings, and then having to separately print them out.

How do I use an iterator on an ifstream twice in C++?

I'm new to C++ and I'm confused about using iterators with ifstream. In this following code I have an ifstream variable called dataFile.
In the code I first iterate through the file once to count how many characters it has (is there a more efficient way to do this?). Then I create a matrix of that size, and iterate through again to fill the matrix.
The problem is that the iterator refuses to iterate the second time around, and will not do anything. I tried resetting the ifstream from the beginning by using dataFile.clear(), but this didn't work, probably because I have some deep misunderstanding about iterators. Could someone help me please?
typedef istreambuf_iterator<char> dataIterator;
for (dataIterator counter(dataFile), end; counter != end; ++counter, ++numCodons) {} // Finds file size in characters.
MatrixXd YMatrix = MatrixXd::Constant(3, numCodons, 0);
dataFile.clear(); // Resets the ifstream to be used again.
for (dataIterator counter(dataFile), end; counter != end; ++counter) {...}
istreambuf_iterator is an input iterator which once has been incremented, all copies of its previous value may be invalidated, not a forward iterator which guarantees validity when used in multipass algorithms. More about the category of iterators, see here.

std::map::erase infinite loop

I have a map of a vector of char's and a vector of strings. Every so often, if I've seen the vector of characters before, I'd like to add a string to my vector of strings. Below is my code to do that.
map<vector<char>, vector<string>>::iterator myIter = mMyMap.find(vChars);
if(myIter != mMyMap.end()) {
vector<string> vStrings = myIter->second;
mMyMap.erase(myIter);
vStrings.push_back(some_other_string);
mMyMap.insert(pair<vector<char>, vector<string>>(vChars, vStrings));
return TRUE;
}
The call to mMyMap.erase() seems to get stuck an in infinite loop though. I'm guessing it's because vStrings isn't getting a deep-copy of myIter->second.
Do I need to initalize vStrings like:
vector<string> vStrings(myIter->second);
Or what's the proper fix?
I don't see an error in the posted code fragment (other than a missing )). But may I suggest simplifying lines 2-8 to:
if(myIter != mMyMap.end()) {
myIter->second.push_back(some_other_string);
}
vector vStrings = myIter->second;
and
vector vStrings(myIter->second);
are same things. They both call copy constructor. And the copy is deep copy only. My guess is that the vector that is getting copied is too big(or long). Each element of the vector will be copied one by one. And hence the time.

C++ Tokenizing using iterators in an eof() cycle

I'm trying to adapt this answer
How do I tokenize a string in C++?
to my current string problem which involves reading from a file till eof.
from this source file:
Fix grammatical or spelling errors
Clarify meaning without changing it
Correct minor mistakes
I want to create a vector with all the tokenized words. Example: vector<string> allTheText[0] should be "Fix"
I don't understad the purpose of istream_iterator<std::string> end; but I included cause it was on the original poster's answer.
So far, I've got this non-working code:
vector<string> allTheText;
stringstream strstr;
istream_iterator<std::string> end;
istream_iterator<std::string> it(strstr);
while (!streamOfText.eof()){
getline (streamOfText, readTextLine);
cout<<readTextLine<<endl;
stringstream strstr(readTextLine);
// how should I initialize the iterators it and end here?
}
Edit:
I changed the code to
vector<string> allTheText;
stringstream strstr;
istream_iterator<std::string> end;
istream_iterator<std::string> it(strstr);
while (getline(streamOfText, readTextLine)) {
cout << readTextLine << endl;
vector<string> vec((istream_iterator<string>(streamOfText)), istream_iterator<string>()); // generates RuntimeError
}
And got a RuntimeError, why?
Using a while (!….eof()) loop in C++ is broken because the loop will never be exited when the stream goes into an error state!
Rather, you should test the stream's state directly. Adapted to your code, this could look like this:
while (getline(streamOfText, readTextLine)) {
cout << readTextLine << endl;
}
However, you already have a stream. Why put it into a string stream as well? Or do you need to do this line by line for any reason?
You can directly initialize your vector with the input iterators. No need to build a string stream, and no need to use the copy algorithm either because there's an appropriate constructor overload.
vector<string> vec((istream_iterator<string>(cin)), istream_iterator<string>());
Notice the extra parentheses around the first argument which are necessary to disambiguate this from a function declaration.
EDIT A small explanation what this code does:
C++ offers a unified way of specifying ranges. A range is just a collection of typed values, without going into details about how these values are stored. In C++, these ranges are denoted as half-open intervals [a, b[. That means that a range is delimited by two iterators (which are kind of like pointers but more general; pointers are a special kind of iterator). The first iterator, a, points to the first element of the range. The second, b, points behind the last element. Why behind? Because this allows to iterate over the elements very easily:
for (Iterator i = a; i != b; ++i)
cout << *i;
Like pointers, iterators are dereferenced by applying * to them. This returns their value.
Container classes in C++ (e.g. vector, list) have a special constructor which allows easy copying of values from another range into the new container. Consequently, this constructor expects two iterators. For example, the following copies the C-style array into the vector:
int values[3] = { 1, 2, 3 };
vector<int> v(values, values + 3);
Here, values is synonymous with &values[0] which means that it points to the array's first element. values + 3, thanks to pointer arithmetic, is nearly equivalent to &values[3] (but this is invalid C++!) and points to the virtual element behind the array.
Now, my code above does the exact same as in this last example. The only difference is the type of iterator I use. Instead of using a plain pointer, I use a special iterator class that C++ provides. This iterator class wraps an input stream in such a way that ++ advances the input stream and * reads the next element from the stream. The kind of element is specified by the type argument (hence string in this case).
To make this work as a range, we need to specify a beginning and an end. Alas, we don't know the end of the input (this is logical, since the end of the stream may actually move over time as the user enters more input into a console!). Therefore, to create a virtual end iterator, we pass no argument to the constructor of istream_iterator. Conversely, to create a begin iterator, we pass an input stream. This then creates an iterator that points to the current position in the stream (here, cin).
My above code is functionally equivalent to the following:
istream_iterator<string> front(cin);
istream_iterator<string> back;
vector<string> vec;
for (istream_iterator<string> i = front; i != back; ++i)
vec.push_back(*i);
and this, in turn, is equivalent to using the following loop:
string word;
while (cin >> word)
vec.push_back(word);