I have the following code in C++:
string str="a b c";
stringstream sstr(str);
vector<string> my_vec((istream_iterator<string>(sstr)),
istream_iterator<string>());
Is there any way to save the use of sstr, something like the following?
vector<string> my_vec((istream_iterator<string>(str)),
istream_iterator<string>());
istream_iterator's argument needs to be able to bind to a non-const reference, and a temporary cannot. However, (as Alf points out), ostream happens to have a function, flush(), that returns a non-const reference to itself. So a possibility is:
string str="a b c";
vector<string> my_vec(istream_iterator<string>(
static_cast<stringstream&>(stringstream(str).flush())
), istream_iterator<string>());
Though that's an eye-sore. If you're concerned about having too many lines, then use a function:
vector<string> string_to_vector(const string& str)
{
stringstream sstr(str);
return vector<string>(istream_iterator<string>(sstr),
istream_iterator<string>());
}
Giving:
string str="a b c";
vector<string> my_vec = string_to_vector(str);
This is even cleaner than what you'd get even if you could shorten your code, because now what is being done is not expressed in code but rather the name of a function; the latter is much easier to grasp.
*Of course, we can add boiler-plate code to do silly things:
class temporary_stringstream
{
public:
temporary_stringstream(const string& str) :
mStream(str)
{}
operator stringstream&()
{
// only persists as long as temporary_stringstream!
return mStream;
}
private:
stringstream mStream;
};
Giving:
string str="a b c";
vector<string> my_vec((istream_iterator<string>(temporary_stringstream(str))),
istream_iterator<string>());
But this is just as ugly as the first solution.
You're using the two-iterator constructor for vector with istream_iterator to split the string by whitespace into a sequence of strings to be stored.
istream_iterator needs an istream, for which there is no direct cast from string. The compiler is not going to infer a stringstream because the constructor for istream_iterator takes a templated type and not explicitly a stringstream. It's just too much of a leap for a compiler to assume that much.
Besides, even if the compiler made such a leap of faith, it would generate the same code as what you already have, so you're no better off in the end.
A better approach might be:
std::vector<std::string> split_words(const std::string& str)
{ size_t offset = str.find_first_not_of(" \t\r\n");
std::vector<std::string> result;
while(offset != std::string::npos)
{ size_t end = str.find_first_of(" \t\r\n", offset);
if(end != offset)
result.push_back(std::string(str, offset, end));
offset = str.find_first_not_of(" \t\r\n", end);
}
return result;
}
which takes less code and objects to get the same job done. On my Mac, this is 3203 bytes code and 273 data, while the original three lines of code is 5136 bytes code and 353 data. (I added return my_vec.size(); at the end of main().)
Boost has a library dedicated to algorithm on string: check out the Split section :)
std::vector<std::string> vec;
boost::split(vec, "a b c", boost::is_any_of(" "));
// vec == { "a", "b", "c" }
Probably the clearest way to do it :)
Related
If I want to construct a std::string with a line like:
std::string my_string("a\0b");
Where i want to have three characters in the resulting string (a, null, b), I only get one. What is the proper syntax?
Since C++14
we have been able to create literal std::string
#include <iostream>
#include <string>
int main()
{
using namespace std::string_literals;
std::string s = "pl-\0-op"s; // <- Notice the "s" at the end
// This is a std::string literal not
// a C-String literal.
std::cout << s << "\n";
}
Before C++14
The problem is the std::string constructor that takes a const char* assumes the input is a C-string. C-strings are \0 terminated and thus parsing stops when it reaches the \0 character.
To compensate for this, you need to use the constructor that builds the string from a char array (not a C-String). This takes two parameters - a pointer to the array and a length:
std::string x("pq\0rs"); // Two characters because input assumed to be C-String
std::string x("pq\0rs",5); // 5 Characters as the input is now a char array with 5 characters.
Note: C++ std::string is NOT \0-terminated (as suggested in other posts). However, you can extract a pointer to an internal buffer that contains a C-String with the method c_str().
Also check out Doug T's answer below about using a vector<char>.
Also check out RiaD for a C++14 solution.
If you are doing manipulation like you would with a c-style string (array of chars) consider using
std::vector<char>
You have more freedom to treat it like an array in the same manner you would treat a c-string. You can use copy() to copy into a string:
std::vector<char> vec(100)
strncpy(&vec[0], "blah blah blah", 100);
std::string vecAsStr( vec.begin(), vec.end());
and you can use it in many of the same places you can use c-strings
printf("%s" &vec[0])
vec[10] = '\0';
vec[11] = 'b';
Naturally, however, you suffer from the same problems as c-strings. You may forget your null terminal or write past the allocated space.
I have no idea why you'd want to do such a thing, but try this:
std::string my_string("a\0b", 3);
What new capabilities do user-defined literals add to C++? presents an elegant answer: Define
std::string operator "" _s(const char* str, size_t n)
{
return std::string(str, n);
}
then you can create your string this way:
std::string my_string("a\0b"_s);
or even so:
auto my_string = "a\0b"_s;
There's an "old style" way:
#define S(s) s, sizeof s - 1 // trailing NUL does not belong to the string
then you can define
std::string my_string(S("a\0b"));
The following will work...
std::string s;
s.push_back('a');
s.push_back('\0');
s.push_back('b');
You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string using most methods. See: Rules for C++ string literals escape character.
For example, I dropped this innocent looking snippet in the middle of a program
// Create '\0' followed by '0' 40 times ;)
std::string str("\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", 80);
std::cerr << "Entering loop.\n";
for (char & c : str) {
std::cerr << c;
// 'Q' is way cooler than '\0' or '0'
c = 'Q';
}
std::cerr << "\n";
for (char & c : str) {
std::cerr << c;
}
std::cerr << "\n";
Here is what this program output for me:
Entering loop.
Entering loop.
vector::_M_emplace_ba
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
That was my first print statement twice, several non-printing characters, followed by a newline, followed by something in internal memory, which I just overwrote (and then printed, showing that it has been overwritten). Worst of all, even compiling this with thorough and verbose gcc warnings gave me no indication of something being wrong, and running the program through valgrind didn't complain about any improper memory access patterns. In other words, it's completely undetectable by modern tools.
You can get this same problem with the much simpler std::string("0", 100);, but the example above is a little trickier, and thus harder to see what's wrong.
Fortunately, C++11 gives us a good solution to the problem using initializer list syntax. This saves you from having to specify the number of characters (which, as I showed above, you can do incorrectly), and avoids combining escaped numbers. std::string str({'a', '\0', 'b'}) is safe for any string content, unlike versions that take an array of char and a size.
In C++14 you now may use literals
using namespace std::literals::string_literals;
std::string s = "a\0b"s;
std::cout << s.size(); // 3
Better to use std::vector<char> if this question isn't just for educational purposes.
anonym's answer is excellent, but there's a non-macro solution in C++98 as well:
template <size_t N>
std::string RawString(const char (&ch)[N])
{
return std::string(ch, N-1); // Again, exclude trailing `null`
}
With this function, RawString(/* literal */) will produce the same string as S(/* literal */):
std::string my_string_t(RawString("a\0b"));
std::string my_string_m(S("a\0b"));
std::cout << "Using template: " << my_string_t << std::endl;
std::cout << "Using macro: " << my_string_m << std::endl;
Additionally, there's an issue with the macro: the expression is not actually a std::string as written, and therefore can't be used e.g. for simple assignment-initialization:
std::string s = S("a\0b"); // ERROR!
...so it might be preferable to use:
#define std::string(s, sizeof s - 1)
Obviously you should only use one or the other solution in your project and call it whatever you think is appropriate.
I know it is a long time this question has been asked. But for anyone who is having a similar problem might be interested in the following code.
CComBSTR(20,"mystring1\0mystring2\0")
Almost all implementations of std::strings are null-terminated, so you probably shouldn't do this. Note that "a\0b" is actually four characters long because of the automatic null terminator (a, null, b, null). If you really want to do this and break std::string's contract, you can do:
std::string s("aab");
s.at(1) = '\0';
but if you do, all your friends will laugh at you, you will never find true happiness.
I have created my custom function to turn a wstring into lower case. However, it is pretty slow in DebugMode. Yes, I know ReleaseMode is what counts, but anyway it is pretty unnerving.
wstring wstringToLower(wstring u)
{
wstring s;
for (int i=0;i<u.size();i++)
{
wstring sChar;
sChar=u.substr(i,1);
int iChar=static_cast<int>(sChar[0]);
int iNewChar=charCodeToLower(iChar);
wstring sNewChar=wstring(1,iNewChar);
s.append(sNewChar);
}
return s;
}
Does anybody see anything obvious that I could improve to speed up the code, even in DebugMode?
Thank you!
There's no need to make temporary strings.
So, for start, instead of:
wstring sNewChar=wstring(1,iNewChar);
s.append(sNewChar);
This should do the trick:
s.push_back(iNewChar);
Then, instead of:
wstring sChar;
sChar=u.substr(i,1);
int iChar=static_cast<int>(sChar[0]);
This should work:
int iChar=static_cast<int>(u[i]);
And, of course, as noted by Marcel, you can do everything on the passed copy, avoiding the extra string allocation.
Also, as noted in the comments: How to convert std::string to lower case? . Also, read all answers (and comments) here: how to Make lower case letters for unicode characters :
#include <algorithm>
#include <string>
#include <iostream>
using namespace std;
int main()
{
::setlocale(LC_ALL,"");
std::wstring data = L"НЕМАЊА БОРИЋ"; // Wide chars
std::transform(data.begin(), data.end(), data.begin(), ::towlower);
// prints немања борић
std::wcout << data << std::endl;
return 0;
}
http://en.cppreference.com/w/cpp/string/wide/towlower
First of all I would avoid to allocate memory for variables each run, since allocating is a heavy operation.
Then do not call u.size() in the for-loop declaration. It will be called every loop otherwise. Every function call less that you call in a loop is a good win for performance.
Next everything Nemanja Boric said in the other answer.
And since the variable u is passed as copy, you can use it as return value and operate directly on it.
wstring wstringToLower(wstring u)
{
int size = u.size();
for (int i = 0; i < size; ++i)
{
u[i] = charCodeToLower(static_cast<int>(u[i]));
}
return u;
}
Conclusion: Basically avoid to allocate memory or calling functions in loops. Do just as much as you really have to.
There is actually no need for the wstringToLower function at all. You can use <algorithm> to do most of the work:
std::wstring str = "Some String";
std::transform(str.begin(), str.end(), str.begin(), ::towlower);
If you are trying to localize it, you may want to modify it slightly:
std::wstring str = "Some String";
std::locale loc; // set your locale
std::transform(str.begin(), str.end(), str.begin(), [](wchar_t c)
{
return use_facet<ctype<wchar_t>>(loc).tolower(c);
});
I'm trying create simple application in C++. This application has to read from file and displays data. I've written function:
std::vector <AndroidApplication> AndroidApplication::getAllApp(){
std::vector<AndroidApplication> allApp;
std::fstream f;
f.open("freeApps.txt");
std::string line;
if(f.is_open()){
while(getline(f, line)) {
std::string myLine = "";
char * line2 = line.c_str();
myLine = strtok(line2,"\t");
AndroidApplication * tmpApp = new AndroidApplication(myLine[1], myLine[2], myLine[4]);
tmpApp->Developer = myLine[0];
tmpApp->Pop = myLine[3];
tmpApp->Type = myLine[5];
allApp->pushBack(tmpApp);
}
}
return allApp;
}
It throws me an error in line:
myLine = strtok(line2,"\t");
An error:
cannot convert from 'const char *' to 'char *'
Could you tell me how can I deal with it?
Don't use strtok. std::string has its own functions for string-scanning, e.g., find.
To use strtok, you'll need a writeable copy of the string. c_str() returns a read-only pointer.
You can't just "convert it" and forget about it. The pointer you get from .c_str() is to a read-only buffer. You need to copy it into a new buffer to work with: ideally, by avoiding using antiquated functions like strtok in the first place.
(I'm not quite sure what you're doing with that tokenisation, actually; you're just indexing into characters in the once-tokenised string, not indexing tokens.)
You're also confusing dynamic and automatic storage.
std::vector<AndroidApplication> AndroidApplication::getAllApp()
{
std::vector<AndroidApplication> allApp;
// Your use of fstreams can be simplified
std::fstream f("freeApps.txt");
if (!f.is_open())
return allApp;
std::string line;
while (getline(f, line)) {
// This is how you tokenise a string in C++
std::istringstream split(line);
std::vector<std::string> tokens;
for (std::string each;
std::getline(split, each, '\t');
tokens.push_back(each));
// No need for dynamic allocation here,
// and I'm assuming you wanted tokens ("words"), not characters.
AndroidApplication tmpApp(tokens[1], tokens[2], tokens[4]);
tmpApp.Developer = tokens[0];
tmpApp.Pop = tokens[3];
tmpApp.Type = tokens[5];
// The vector contains objects, not pointers
allApp.push_back(tmpApp);
}
return allApp;
}
I suspect the error is actually on the previous line,
char * line2 = line.c_str();
This is because c_str() gives a read-only pointer to the string contents. There is no standard way to get a modifiable C-style string from a C++ string.
The easiest option to read space-separated words from a string (assuming that's what you're tying to do) is to use a string stream:
std::vector<std::string> words;
std::istringstream stream(line);
std::copy(std::istream_iterator<std::string>(stream),
std::istream_iterator<std::string>(),
back_inserter(words));
If you really want to use strtok, then you'll need a writable copy of the string, with a C-style terminator; one way to do this is to copy it into a vector:
std::vector<char> writable(line.c_str(), line.c_str() + line.length() + 1);
std::vector<char *> words;
while (char * word = strtok(words.empty() ? &writable[0] : NULL, " ")) {
words.push_back(word);
}
Bear in mind that strtok is quite difficult to use correctly; you need to call it once for each token, not once to create an array of tokens, and make sure nothing else (such as another thread) calls it until you've finished with the string. I'm not sure that my code is entirely correct; I haven't tried to use this particular form of evil in a long time.
Since you asked for it:
Theoretically you could use const_cast<char*>(line.c_str()) to get a char*. However giving the result of this to strtok (which modifies its parameter) is IIRC not valid c++ (you may cast away constness, but you may not modify a const object). So it might work for your specific platform/compiler or not (and even if it works it might break anytime).
The other way is to create a copy, which is filled with the contents of the string (and modifyable):
std::vector<char> tmp_str(line.begin(), line.end());
myLine = strtok(&tmp_str[0],"\t");
Of course as the other answers tell you in great detail, you really should avoid using functions like strtok in c++ in favour of functionality working directly on std::string (at least unless you have a firm grasp on c++, high performance requirements and know that using the c-api function is faster in your specific case (through profiling)).
A common piece of code I use for simple string splitting looks like this:
inline std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
Someone mentioned that this will silently "swallow" errors occurring in std::getline. And of course I agree that's the case. But it occurred to me, what could possibly go wrong here in practice that I would need to worry about. basically it all boils down to this:
inline std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
if(/* what error can I catch here? */) {
// *** How did we get here!? ***
}
return elems;
}
A stringstream is backed by a string, so we don't have to worry about any of the issues associated with reading from a file. There is no type conversion going on here since getline simply reads until it sees the line delimeter or EOF. So we can't get any of the errors that something like boost::lexical_cast has to worry about.
I simply can't think of something besides failing to allocate enough memory that could go wrong, but that'll just throw a std::bad_alloc well before the std::getline even takes place. What am I missing?
I can't imagine what errors this person thinks might happen, and you should ask them to explain. Nothing can go wrong except allocation errors, as you mentioned, which are thrown and not swallowed.
The only thing I see that you're directly missing is that ss.fail() is guaranteed to be true after the while loop, because that's the condition being tested. (bool(stream) is equivalent to !stream.fail(), not stream.good().) As expected, ss.eof() will also be true, indicating the failure was due to EOF.
However, there might be some confusion over what is actually happening. Because getline uses delim-terminated fields rather than delim-separated fields, input data such as "a\nb\n" has two instead of three fields, and this might be surprising. For lines this makes complete sense (and is POSIX standard), but how many fields, with a delim of '-', would you expect to find in "a-b-" after splitting?
Incidentally, here's how I'd write split:
template<class OutIter>
OutIter split(std::string const& s, char delim, OutIter dest) {
std::string::size_type begin = 0, end;
while ((end = s.find(delim, begin)) != s.npos) {
*dest++ = s.substr(begin, end - begin);
begin = end + 1;
}
*dest++ = s.substr(begin);
return dest;
}
This avoids all of the problems with iostreams in the first place, avoids extra copies (the stringstream's backing string; plus the temp returned by substr can even use a C++0x rvalue reference for move semantics if supported, as written), has the behavior I expect from split (different from yours), and works with any container.
deque<string> c;
split("a-b-", '-', back_inserter(c));
// c == {"a", "b", ""}
If I want to construct a std::string with a line like:
std::string my_string("a\0b");
Where i want to have three characters in the resulting string (a, null, b), I only get one. What is the proper syntax?
Since C++14
we have been able to create literal std::string
#include <iostream>
#include <string>
int main()
{
using namespace std::string_literals;
std::string s = "pl-\0-op"s; // <- Notice the "s" at the end
// This is a std::string literal not
// a C-String literal.
std::cout << s << "\n";
}
Before C++14
The problem is the std::string constructor that takes a const char* assumes the input is a C-string. C-strings are \0 terminated and thus parsing stops when it reaches the \0 character.
To compensate for this, you need to use the constructor that builds the string from a char array (not a C-String). This takes two parameters - a pointer to the array and a length:
std::string x("pq\0rs"); // Two characters because input assumed to be C-String
std::string x("pq\0rs",5); // 5 Characters as the input is now a char array with 5 characters.
Note: C++ std::string is NOT \0-terminated (as suggested in other posts). However, you can extract a pointer to an internal buffer that contains a C-String with the method c_str().
Also check out Doug T's answer below about using a vector<char>.
Also check out RiaD for a C++14 solution.
If you are doing manipulation like you would with a c-style string (array of chars) consider using
std::vector<char>
You have more freedom to treat it like an array in the same manner you would treat a c-string. You can use copy() to copy into a string:
std::vector<char> vec(100)
strncpy(&vec[0], "blah blah blah", 100);
std::string vecAsStr( vec.begin(), vec.end());
and you can use it in many of the same places you can use c-strings
printf("%s" &vec[0])
vec[10] = '\0';
vec[11] = 'b';
Naturally, however, you suffer from the same problems as c-strings. You may forget your null terminal or write past the allocated space.
I have no idea why you'd want to do such a thing, but try this:
std::string my_string("a\0b", 3);
What new capabilities do user-defined literals add to C++? presents an elegant answer: Define
std::string operator "" _s(const char* str, size_t n)
{
return std::string(str, n);
}
then you can create your string this way:
std::string my_string("a\0b"_s);
or even so:
auto my_string = "a\0b"_s;
There's an "old style" way:
#define S(s) s, sizeof s - 1 // trailing NUL does not belong to the string
then you can define
std::string my_string(S("a\0b"));
The following will work...
std::string s;
s.push_back('a');
s.push_back('\0');
s.push_back('b');
You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string using most methods. See: Rules for C++ string literals escape character.
For example, I dropped this innocent looking snippet in the middle of a program
// Create '\0' followed by '0' 40 times ;)
std::string str("\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", 80);
std::cerr << "Entering loop.\n";
for (char & c : str) {
std::cerr << c;
// 'Q' is way cooler than '\0' or '0'
c = 'Q';
}
std::cerr << "\n";
for (char & c : str) {
std::cerr << c;
}
std::cerr << "\n";
Here is what this program output for me:
Entering loop.
Entering loop.
vector::_M_emplace_ba
QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
That was my first print statement twice, several non-printing characters, followed by a newline, followed by something in internal memory, which I just overwrote (and then printed, showing that it has been overwritten). Worst of all, even compiling this with thorough and verbose gcc warnings gave me no indication of something being wrong, and running the program through valgrind didn't complain about any improper memory access patterns. In other words, it's completely undetectable by modern tools.
You can get this same problem with the much simpler std::string("0", 100);, but the example above is a little trickier, and thus harder to see what's wrong.
Fortunately, C++11 gives us a good solution to the problem using initializer list syntax. This saves you from having to specify the number of characters (which, as I showed above, you can do incorrectly), and avoids combining escaped numbers. std::string str({'a', '\0', 'b'}) is safe for any string content, unlike versions that take an array of char and a size.
In C++14 you now may use literals
using namespace std::literals::string_literals;
std::string s = "a\0b"s;
std::cout << s.size(); // 3
Better to use std::vector<char> if this question isn't just for educational purposes.
anonym's answer is excellent, but there's a non-macro solution in C++98 as well:
template <size_t N>
std::string RawString(const char (&ch)[N])
{
return std::string(ch, N-1); // Again, exclude trailing `null`
}
With this function, RawString(/* literal */) will produce the same string as S(/* literal */):
std::string my_string_t(RawString("a\0b"));
std::string my_string_m(S("a\0b"));
std::cout << "Using template: " << my_string_t << std::endl;
std::cout << "Using macro: " << my_string_m << std::endl;
Additionally, there's an issue with the macro: the expression is not actually a std::string as written, and therefore can't be used e.g. for simple assignment-initialization:
std::string s = S("a\0b"); // ERROR!
...so it might be preferable to use:
#define std::string(s, sizeof s - 1)
Obviously you should only use one or the other solution in your project and call it whatever you think is appropriate.
I know it is a long time this question has been asked. But for anyone who is having a similar problem might be interested in the following code.
CComBSTR(20,"mystring1\0mystring2\0")
Almost all implementations of std::strings are null-terminated, so you probably shouldn't do this. Note that "a\0b" is actually four characters long because of the automatic null terminator (a, null, b, null). If you really want to do this and break std::string's contract, you can do:
std::string s("aab");
s.at(1) = '\0';
but if you do, all your friends will laugh at you, you will never find true happiness.