A common piece of code I use for simple string splitting looks like this:
inline std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
Someone mentioned that this will silently "swallow" errors occurring in std::getline. And of course I agree that's the case. But it occurred to me, what could possibly go wrong here in practice that I would need to worry about. basically it all boils down to this:
inline std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
if(/* what error can I catch here? */) {
// *** How did we get here!? ***
}
return elems;
}
A stringstream is backed by a string, so we don't have to worry about any of the issues associated with reading from a file. There is no type conversion going on here since getline simply reads until it sees the line delimeter or EOF. So we can't get any of the errors that something like boost::lexical_cast has to worry about.
I simply can't think of something besides failing to allocate enough memory that could go wrong, but that'll just throw a std::bad_alloc well before the std::getline even takes place. What am I missing?
I can't imagine what errors this person thinks might happen, and you should ask them to explain. Nothing can go wrong except allocation errors, as you mentioned, which are thrown and not swallowed.
The only thing I see that you're directly missing is that ss.fail() is guaranteed to be true after the while loop, because that's the condition being tested. (bool(stream) is equivalent to !stream.fail(), not stream.good().) As expected, ss.eof() will also be true, indicating the failure was due to EOF.
However, there might be some confusion over what is actually happening. Because getline uses delim-terminated fields rather than delim-separated fields, input data such as "a\nb\n" has two instead of three fields, and this might be surprising. For lines this makes complete sense (and is POSIX standard), but how many fields, with a delim of '-', would you expect to find in "a-b-" after splitting?
Incidentally, here's how I'd write split:
template<class OutIter>
OutIter split(std::string const& s, char delim, OutIter dest) {
std::string::size_type begin = 0, end;
while ((end = s.find(delim, begin)) != s.npos) {
*dest++ = s.substr(begin, end - begin);
begin = end + 1;
}
*dest++ = s.substr(begin);
return dest;
}
This avoids all of the problems with iostreams in the first place, avoids extra copies (the stringstream's backing string; plus the temp returned by substr can even use a C++0x rvalue reference for move semantics if supported, as written), has the behavior I expect from split (different from yours), and works with any container.
deque<string> c;
split("a-b-", '-', back_inserter(c));
// c == {"a", "b", ""}
Related
I'd like to have a std::getline function which is able to stop if it encounters any of the characters listed in a string, so I came up with the following:
std::istream& read_until(std::istream& is, std::string& s, const std::string& list) {
s.clear();
while (is.peek() && is && list.find(is.peek()) == list.npos) {
s += is.get();
}
return is;
}
The fact that it leaves the terminating character on the stream is the desired behavior. This works, but it's ugly and doesn't feel the right way to go. I'd like to ask if you see any clear mistake or if you have a better way of handling this.
Is there a convenient way to parse an integer from a string::iterator in c++? For this specific question I only care about nonnegative base 10 integers, but all of these solutions can be pretty easily extended to arbitrary integers. Note, unlike similar questions I don't have a reference to the original string, only an iterator, e.g.
int parse_next_int(std::string::iterator begin, std::string::iterator end) {
// ...
}
I can think of a number of ways, but none are great. Another note, I'm not declaring stl headers, and I'm assuming everything is done in the std namespace. Hopefully this won't make the examples too difficult to parse.
Allocate a new string, and then call stoi:
int parse_next_int(string::iterator begin, string::iterator end) {
string::iterator num_end = find_if(
begin, end, [](char c)->bool{return !isdigit(c);});
string to_parse(begin, num_end);
return stoi(to_parse);
}
The downside of this is that I end up allocating a new buffer for something that could presumably be parsed on the fly.
Treat unsafely as a c string.
int parse_next_int(std::string::iterator begin, std::string::iterator end) {
return atoi(&(*begin));
}
This will somewhat work, but if it hits the end of the string and it's not not null terminated (which isn't guaranteed with c++ strings) it will segfault, so while nice and concise, this is probably the worst.
Write it myself:
int parse_next_int(std::string::iterator begin, std::string::iterator end) {
int result = 0;
while (begin != end && isdigit(*begin)) {
result = result * 10 + (*begin++ - '0');
}
return result;
}
This works and is simple, but it's also heavily problem dependent and not very error tolerant.
Is there some significantly different method that mostly relies on more tolerant stl calls, while still being simple and avoids copying unnecessary buffers?
If you have access to boost you could use:
int parse_next_int(std::string::iterator begin, std::string::iterator end) {
return boost::lexical_cast<int>(&(*begin), std::distance(begin, end));
}
Create a std::string from the iterators.
Create a std::istringstream from the string.
Extract the integer from the istringstream.
int parse_next_int(std::string::iterator begin, std::string::iterator end) {
std::string s(begin, end);
std::istringstream str(s);
int i;
str >> i;
return i;
}
PS Add error handling code to make it production worthy.
Don't use atoi, it causes undefined behaviour if the number would exceed INT_MAX. Your option 3 has the same problem.
My suggestion is:
Find the end of the number, using find_if or strchr or whatever other method; allow for leading - or + if you want.
Null-terminate the substring
Use strtol to convert, with code to handle all the overflow cases.
Regarding the null termination, you could choose one of the following:
Copy to an automatic array (easiest option).
If end is not actually the end of the string, the write a temporary null terminator there, and restore the old character afterwards.
Note that since C++11, std::strings are guaranteed to be null-terminated, so your dereference-and-treat-as-a-C-string solution is not unsafe at all; and, with a comment explaining what's going on, it would have my vote for the best solution to this problem.
I recently saw the following code-block as a response to this question: Split a string in C++?
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string>
&elems) {
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
Why is returning the passed-by-reference array "elems" so important here? Couldn't we make this a void function, or return an integer to indicate success/failure? We are editing the actual array anyway, right?
Thank you!
By returning a reference to the object you passed in you can do some chaining or cascading in one expression and be working with the same vector the whole time. Some people find this conventient: IE
std::vector<std::string> elems;
std::cout << "Number of items:" << split("foo.cat.dog", '.', elems).size();
// get just foo
std::cout << "First item is:" << split("foo.cat.dog", '.', elems)[0];
// change first item to bar
split("foo.cat.dog", '.', elems)[0] = "bar";
It's not returning the memory address, It's actually returning the object by non-const reference. The same way it was passed in. This might seem a bit of overkill because the calling code can either rely on the third parameter passed, which will be populated on return from the function, or the return parameter.
The reason for doing it this way is to allow chaining. So you can do:
split(myString, ',', asAVector).size().
which will perform the function and allow you to chain the results by calling a function on the vector (in this case size)
Despite the neatness, there are some potential drawbacks to this approach: For example, no error code is present in the return value so you are reliant on the function either working proerly or throwing an exception; therefore you'd usually expect to wrap the above with try / catch semantics. Of course, the more chaining you do, the more likely it will be that the possibilities for different types of exception will go up so you may have to cover more catch blocks.
Mind you, passing back by reference is a whole lot better than passing back by pointer. Chaining with pointers is notorious for crashing when one of the functions in the chain decides to fail and return 0.
I'm trying create simple application in C++. This application has to read from file and displays data. I've written function:
std::vector <AndroidApplication> AndroidApplication::getAllApp(){
std::vector<AndroidApplication> allApp;
std::fstream f;
f.open("freeApps.txt");
std::string line;
if(f.is_open()){
while(getline(f, line)) {
std::string myLine = "";
char * line2 = line.c_str();
myLine = strtok(line2,"\t");
AndroidApplication * tmpApp = new AndroidApplication(myLine[1], myLine[2], myLine[4]);
tmpApp->Developer = myLine[0];
tmpApp->Pop = myLine[3];
tmpApp->Type = myLine[5];
allApp->pushBack(tmpApp);
}
}
return allApp;
}
It throws me an error in line:
myLine = strtok(line2,"\t");
An error:
cannot convert from 'const char *' to 'char *'
Could you tell me how can I deal with it?
Don't use strtok. std::string has its own functions for string-scanning, e.g., find.
To use strtok, you'll need a writeable copy of the string. c_str() returns a read-only pointer.
You can't just "convert it" and forget about it. The pointer you get from .c_str() is to a read-only buffer. You need to copy it into a new buffer to work with: ideally, by avoiding using antiquated functions like strtok in the first place.
(I'm not quite sure what you're doing with that tokenisation, actually; you're just indexing into characters in the once-tokenised string, not indexing tokens.)
You're also confusing dynamic and automatic storage.
std::vector<AndroidApplication> AndroidApplication::getAllApp()
{
std::vector<AndroidApplication> allApp;
// Your use of fstreams can be simplified
std::fstream f("freeApps.txt");
if (!f.is_open())
return allApp;
std::string line;
while (getline(f, line)) {
// This is how you tokenise a string in C++
std::istringstream split(line);
std::vector<std::string> tokens;
for (std::string each;
std::getline(split, each, '\t');
tokens.push_back(each));
// No need for dynamic allocation here,
// and I'm assuming you wanted tokens ("words"), not characters.
AndroidApplication tmpApp(tokens[1], tokens[2], tokens[4]);
tmpApp.Developer = tokens[0];
tmpApp.Pop = tokens[3];
tmpApp.Type = tokens[5];
// The vector contains objects, not pointers
allApp.push_back(tmpApp);
}
return allApp;
}
I suspect the error is actually on the previous line,
char * line2 = line.c_str();
This is because c_str() gives a read-only pointer to the string contents. There is no standard way to get a modifiable C-style string from a C++ string.
The easiest option to read space-separated words from a string (assuming that's what you're tying to do) is to use a string stream:
std::vector<std::string> words;
std::istringstream stream(line);
std::copy(std::istream_iterator<std::string>(stream),
std::istream_iterator<std::string>(),
back_inserter(words));
If you really want to use strtok, then you'll need a writable copy of the string, with a C-style terminator; one way to do this is to copy it into a vector:
std::vector<char> writable(line.c_str(), line.c_str() + line.length() + 1);
std::vector<char *> words;
while (char * word = strtok(words.empty() ? &writable[0] : NULL, " ")) {
words.push_back(word);
}
Bear in mind that strtok is quite difficult to use correctly; you need to call it once for each token, not once to create an array of tokens, and make sure nothing else (such as another thread) calls it until you've finished with the string. I'm not sure that my code is entirely correct; I haven't tried to use this particular form of evil in a long time.
Since you asked for it:
Theoretically you could use const_cast<char*>(line.c_str()) to get a char*. However giving the result of this to strtok (which modifies its parameter) is IIRC not valid c++ (you may cast away constness, but you may not modify a const object). So it might work for your specific platform/compiler or not (and even if it works it might break anytime).
The other way is to create a copy, which is filled with the contents of the string (and modifyable):
std::vector<char> tmp_str(line.begin(), line.end());
myLine = strtok(&tmp_str[0],"\t");
Of course as the other answers tell you in great detail, you really should avoid using functions like strtok in c++ in favour of functionality working directly on std::string (at least unless you have a firm grasp on c++, high performance requirements and know that using the c-api function is faster in your specific case (through profiling)).
I have the following code in C++:
string str="a b c";
stringstream sstr(str);
vector<string> my_vec((istream_iterator<string>(sstr)),
istream_iterator<string>());
Is there any way to save the use of sstr, something like the following?
vector<string> my_vec((istream_iterator<string>(str)),
istream_iterator<string>());
istream_iterator's argument needs to be able to bind to a non-const reference, and a temporary cannot. However, (as Alf points out), ostream happens to have a function, flush(), that returns a non-const reference to itself. So a possibility is:
string str="a b c";
vector<string> my_vec(istream_iterator<string>(
static_cast<stringstream&>(stringstream(str).flush())
), istream_iterator<string>());
Though that's an eye-sore. If you're concerned about having too many lines, then use a function:
vector<string> string_to_vector(const string& str)
{
stringstream sstr(str);
return vector<string>(istream_iterator<string>(sstr),
istream_iterator<string>());
}
Giving:
string str="a b c";
vector<string> my_vec = string_to_vector(str);
This is even cleaner than what you'd get even if you could shorten your code, because now what is being done is not expressed in code but rather the name of a function; the latter is much easier to grasp.
*Of course, we can add boiler-plate code to do silly things:
class temporary_stringstream
{
public:
temporary_stringstream(const string& str) :
mStream(str)
{}
operator stringstream&()
{
// only persists as long as temporary_stringstream!
return mStream;
}
private:
stringstream mStream;
};
Giving:
string str="a b c";
vector<string> my_vec((istream_iterator<string>(temporary_stringstream(str))),
istream_iterator<string>());
But this is just as ugly as the first solution.
You're using the two-iterator constructor for vector with istream_iterator to split the string by whitespace into a sequence of strings to be stored.
istream_iterator needs an istream, for which there is no direct cast from string. The compiler is not going to infer a stringstream because the constructor for istream_iterator takes a templated type and not explicitly a stringstream. It's just too much of a leap for a compiler to assume that much.
Besides, even if the compiler made such a leap of faith, it would generate the same code as what you already have, so you're no better off in the end.
A better approach might be:
std::vector<std::string> split_words(const std::string& str)
{ size_t offset = str.find_first_not_of(" \t\r\n");
std::vector<std::string> result;
while(offset != std::string::npos)
{ size_t end = str.find_first_of(" \t\r\n", offset);
if(end != offset)
result.push_back(std::string(str, offset, end));
offset = str.find_first_not_of(" \t\r\n", end);
}
return result;
}
which takes less code and objects to get the same job done. On my Mac, this is 3203 bytes code and 273 data, while the original three lines of code is 5136 bytes code and 353 data. (I added return my_vec.size(); at the end of main().)
Boost has a library dedicated to algorithm on string: check out the Split section :)
std::vector<std::string> vec;
boost::split(vec, "a b c", boost::is_any_of(" "));
// vec == { "a", "b", "c" }
Probably the clearest way to do it :)