Why am I getting seg faults from using the istream iterator? - c++

void parse_and_run_command(const std::string &command) {
std::istringstream iss(command);
std::istream_iterator<char*> begin(iss), end;
std::vector<char*> tokens(begin, end); //place the arguments in a vector
tokens.push_back(NULL);
According to GDB, the segfault occurs after executing the second line with the istream_iterator. It did not segfault earlier when I was using string vectors.

You first need to create a std::vector of std::string which will own the string data, you can then transform that std::vector into a std::vector of pointers, note that the pointers will only be valid for the lifetime of the std::string std::vector:
#include <string>
#include <iostream>
#include <sstream>
#include <iterator>
#include <vector>
#include <algorithm>
void parse_and_run_command(const std::string &command) {
std::istringstream iss(command);
std::istream_iterator<std::string> begin(iss), end;
std::vector<std::string> tokens(begin, end);
std::vector<char*> ctokens;
std::transform(tokens.begin(), tokens.end(), std::back_inserter(ctokens), [](std::string& s) { return s.data(); });
ctokens.push_back(nullptr);
for (char* s : ctokens) {
if (s) {
std::cout << s << "\n";
}
else {
std::cout << "nullptr\n";
}
}
}
int main() {
parse_and_run_command("test test2 test3");
}

First, you need to split the std::string command into list of tokens of type std::vector<std::string>. Then, you may want to use std::transform in order to fill the new list of tokens of type std::vector<char const*>.
Here is a sample code:
void parse_and_run_command(std::string const& command) {
std::istringstream iss(command);
std::vector<std::string> results(std::istream_iterator<std::string>{iss},
std::istream_iterator<std::string>());
// debugging
for (auto const& token : results) {
std::cout << token << " ";
}
std::cout << std::endl;
std::vector<const char*> pointer_results;
pointer_results.resize(results.size(), nullptr);
std::transform(
std::begin(results), std::end(results),
std::begin(pointer_results),
[&results](std::string const& str) {
return str.c_str();
}
);
// debugging
for (auto const& token : pointer_results) {
std::cout << token << " ";
}
std::cout << std::endl;
// execv expects NULL as last element
pointer_results.push_back(nullptr);
char **cmd = const_cast<char**>(pointer_results.data());
execv(cmd[0], &cmd[0]);
}
Note the last part of the function: execv expects last element to be nullptr.

Hm, very interesting. Sounds like an easy task, but there are several caveats.
First of all, we need to consider that there are at least 2 different implementations of execv.
One under Posix / Linux, see here and a windows version: see here and here.
Please note the different function signatures:
Linux / POSIX: int execv(const char *path, char *const argv[]);
Windows: intptr_t _execv(const char *cmdname, const char *const *argv);
In this case I find the WIndows version a little bit cleaner, because the argv parameter is of type const char *const *. Anyway, the major problem is, that we have to call legacy code.
Ok, let's see.
The execv function requires a NULL-terminated array of char pointers with the argument for the function call. This we need to create.
We start with a std::string containing the command. This needs to be split up into parts. There are several ways and I added different examples.
The most simple way is maybe to put the std::string into a std::istringstream and then to use the std::istream_iterator to split it into parts. This is the typical short sequence:
// Put this into istringstream
std::istringstream iss(command);
// Split
std::vector parts(std::istream_iterator<std::string>(iss), {});
We use the range constructor for the std::vector. And we can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction").
Additionally, you can see that I do not use the "end()"-iterator explicitely.
This iterator will be constructed from the empty brace-enclosed default initializer with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.
We can avoid the usage of std::istringstream and directly convert the string into tokens using std::sregex_token_iterator. Very simple to use. And the result is a one liner for splitting the original comand string:
// Split
std::vector<std::string> parts(std::sregex_token_iterator(command.begin(), command.end(), re, -1), {});
All this then boils down to 6 lines of code, including the definition of the variable and the invocation of the execv function:
Please see:
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
#include <memory>
#include <algorithm>
#include <regex>
const std::regex re{ " " };
// Define Dummy function for _execv (Windows style, eveything const)
// Note: Type of argv decays to " const char* const* "
int _execv(const char* path, const char* const argv[]) {
std::cout << "\n\nPath: " << path << "\n\nArguments:\n\n";
while (*argv != 0) std::cout << *argv++ << "\n";
return 0;
}
// Define Dummy function for _execv (Posix style)
// Note: Type of argv decays to " char* const* "
int execv(const char* path, char* const argv[]) {
std::cout << "\n\nPath: " << path << "\n\nArguments:\n\n";
while (*argv != 0) std::cout << *argv++ << "\n";
return 0;
}
int main() {
{
// ----------------------------------------------------------------------
// Solution 1
// Initial example
char path[] = "path";
const char* const argv[] = { "arg1", "arg2", "arg3", 0 };
_execv(path, argv);
}
{
// ----------------------------------------------------------------------
// Solution 2
// Now, string, with command convert to a handmade argv array
std::string command{ "path arg1 arg2 arg3" };
// Put this into istringstream
std::istringstream iss(command);
// Split into substrings
std::vector parts(std::istream_iterator<std::string>(iss), {});
// create "argv" List. argv is of type " const char* "
std::unique_ptr<const char*[]> argv = std::make_unique<const char*[]>(parts.size());
// Fill argv array
size_t i = 1U;
for (; i < parts.size(); ++i) {
argv[i - 1] = parts[i].c_str();
}
argv[i - 1] = static_cast<char*>(0);
// Call execv
// Windows
_execv(parts[0].c_str(), argv.get());
// Linux / Posix
execv(parts[0].c_str(), const_cast<char* const*>(argv.get()));
}
{
// ----------------------------------------------------------------------
// Solution 3
// Transform string vector to vector of char*
std::string command{ "path arg1 arg2 arg3" };
// Put this into istringstream
std::istringstream iss(command);
// Split
std::vector parts(std::istream_iterator<std::string>(iss), {});
// Fill argv
std::vector<const char*> argv{};
std::transform(parts.begin(), parts.end(), std::back_inserter(argv), [](const std::string& s) { return s.c_str(); });
argv.push_back(static_cast<const char*>(0));
// Call execv
// Windows
_execv(argv[0], &argv[1]);
// Linux / Posix
execv(argv[0], const_cast<char* const*>(&argv[1]));
}
{
// ----------------------------------------------------------------------
// Solution 4
// Transform string vector to vector of char*. Get rid of istringstream
std::string command{ "path arg1 arg2 arg3" };
// Split
std::vector<std::string> parts(std::sregex_token_iterator(command.begin(), command.end(), re, -1), {});
// Fill argv
std::vector<const char*> argv{};
std::transform(parts.begin(), parts.end(), std::back_inserter(argv), [](const std::string& s) { return s.c_str(); });
argv.push_back(static_cast<const char*>(0));
// Call execv
// Windows
_execv(argv[0], &argv[1]);
// Linux / Posix
execv(argv[0], const_cast<char* const*>(&argv[1]));
}
return 0;
}

Related

How to find the index of element (and a few other things)

I was writing a code that would substitute some random 17 character strings into a single alphabet, and I can't find a way. Basically, what I'm trying to do is this:
char strings[] = {
"L-nIbhm5<z:92~+,x",
"9bC5f0q#qA(RKZ>|r",
"9bC5f0q#qA(RKZ>|r",
"k=5,ln(08IAl(gGAK",
"|N,8]dGu)'^MaYpu[",
"!&,Y*nz8C*,J}{+d]",
"Us9%^%?n5!~e##*+#",
"zF8,1KV#¥]$k?|9R#",
"0B4>=nioEjp>4rhgi",
}
char alphabet[]{
"a","b","c","d","e","f","g","h","i",
}
replace(std::string str){
/**get str and then see the index of the corresponding string in strings[], and replace the string with alphabet[index number], while deleting the original string part that was replaced**/
int main(){
cin >> std::string replace;
replace(replace);
example input: L-nIbhm5<z:92~+,x9bC5f0q#qA(RKZ>|r9bC5f0q#qA(RKZ>|r
expected output: abc
EDIT:
New Code
Changes from the original code
It also has a bigger array than the simplified version(previous code). It displays the structure of the full program.(where the strings are routed to and why)
Basically What it's doing
getting input from user, put it in the input variable, input goes through algorithm() function untouched, and then goes to the replace function and is replaced. It then the replaced string gets returned back through the original route to the main function, where it is displayed.
I've kept the arrays a string type because the const char* gave me a segmentation error.
std::string Subs[53]=
{
"LQlMv]G5^^1kcm?fk",
"7W^S;/vB(6%I|w[fl",
"<w7>4f//Z55ZxK'z.",
"_W5g(lu<pTu3^_A7n",
"OfLm%8:EF}0V1?BSS",
"|+E6t,AZ~XewXP17T",
"L-nIbhm5<z:92~+,x",
"L-nIbhm5<z:92~+,x",
"9bC5f0q#qA(RKZ>|r",
"9bC5f0q#qA(RKZ>|r",
"k=5,ln(08IAl(gGAK",
"|N,8]dGu)'^MaYpu[",
"!&,Y*nz8C*,J}{+d]",
"Us9%^%?n5!~e##*+#",
"zF8,1KV#¥]$k?|9R#",
"0B4>=nioEjp>4rhgi",
"EG#0[W9.N4i~E<f3x",
"(0Pwkk&IPchJHs.7A",
"7XgmQ6fW<|J+NY[m0",
".g4CwX/DU!!~!zbtZ",
"+_U'qn_/9Fo|gT/!n",
"=0s(mYh&F%y=MBS5(",
"cg71(}bo+Q5P8F[T6",
"lc|a\%5.9pOpooU+QR",
"E_(3A:o+.]qL3MYA6",
"H#O'X_RiVS#8l0bKD",
"Y1gbGD`~8d>HSWN35",
"LQlMv]G5^^1kcm?fk",
"T4}gI;`BFVfhw=-sf",
"6BHMA0IRix]/=(jht",
"yS$=#Jdpp?P2k6SMQ",
"t1~|kkh+>4d>}OQ`a",
"2Y-\\CU\"944yBluWD5",
"'M\\ZbIX5{`Xd;qi!o",
"?N+RtVqj_r(C5##0\"",
"2;*Livh?V$X/8z#Md",
")IN|7FOs2l-mAM[d#",
"(~f268J},xXrK'Rp'",
"&r/qf9fFHnzV!RzH/",
"}naDRH4p$NI2a).t,",
"{8DM+7!.Mge|~fnO|",
")r[#nI0YDH>6cE38p",
"(0Pwkk&IPchJHs.7A",
")r[#nI0YDH>6cE38p",
"8M-=cQFQ,pPo7eu=p",
"0PHw=/|(tZ1}FHm/'",
"[su`'0Oybc.\"-/W5)",
"1uHl[IC7Sr#NUJV;I",
"8z8%,jK0CDOkJz8I?",
"3Ao2yXDN%YzpE&Suy",
"zNs`7E'e/$i8VqaUL",
"bzHmA^K2>7`UZ?!AO",
};
std::string Alphabet[53] =
{
" ","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","r","w","x","y","z",
"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
};
std::string replace(std::string rep) {
int len = sizeof(Subs)/sizeof(Subs[0]);
std::stringstream ss1;
for(int i = 0; i < len; i++) {
if (rep.find(Subs[i]) != std::string::npos) {
ss1 << Subs[i];
}
}
std::string input = ss1.str();
return input;
}
std::string algorithm(std::string input)
{
//some other algorithms come here(not relative to this question)
input = replace(input);
return input;
}
int main(void){
int ed;
std::cin >> ed;
if(ed == 1){
//different function(not relative to the question)
}
else if(ed == 0){
std::string input;
std::cin >> input;
input = algorithm(input);
std::cout << input << std::endl;
}
else{
std::cout << "1 or 0" << std::endl;
main();
}
return 0;
}
example input: L-nIbhm5<z:92~+,x9bC5f0q#qA(RKZ>|r9bC5f0q#qA(RKZ>|r
expected output: abc
actual output: L-nIbhm5<z:92~+,xL-nIbhm5<z:92~+,x9bC5f0q#qA(RKZ>|r9bC5f0q#qA(RKZ>|r
Sorry it's become long.
There are few mistakes in above code :
char array initialization is not correct.
method body for main and replace method is not closed.
Currently by default return type of replace method is int.
There is string#find method which can be helpful here.
I have tried to make those fixes and here is updated code in C++17 :
#include <iostream>
#include <sstream>
using namespace std;
const char *strings[9] = {
"L-nIbhm5<z:92~+,x",
"9bC5f0q#qA(RKZ>|r",
"9bC5f0q#qA(RKZ>|r",
"k=5,ln(08IAl(gGAK",
"|N,8]dGu)'^MaYpu[",
"!&,Y*nz8C*,J}{+d]",
"Us9%^%?n5!~e##*+#",
"zF8,1KV#¥]$k?|9R#",
"0B4>=nioEjp>4rhgi"
};
const char *alphabet[9] = {
"a","b","c","d","e","f","g","h","i"
};
void replace(std::string rep) {
int len = sizeof(strings)/sizeof(strings[0]);
std::stringstream ss1;
for(int i = 0; i < len; i++) {
if (rep.find(strings[i]) != std::string::npos) {
ss1 << alphabet[i];
}
}
std::cout << ss1.str();
}
int main(){
std::string rep;
cin >> rep;
replace(rep);
}
For reference : https://onlinegdb.com/Bd9DXSPAa
Note - Above code is just for reference, please make sure to add all test cases handling.
I made a c++17 version for your code.
Replacing 'c' style arrays and pointers with C++ style containers, iterators.
And using std::string::replace function. Use the standardlibrary if you can,
its tested and well documented.
#include <algorithm>
#include <iostream>
#include <regex>
#include <string>
#include <vector>
// std::vector/std::array instead of 'c' style arrays.
// allows us to us range based for loops later.
std::vector<std::string> strings =
{
"L-nIbhm5<z:92~+,x",
"9bC5f0q#qA(RKZ>|r",
"k=5,ln(08IAl(gGAK",
"|N,8]dGu)'^MaYpu[",
"!&,Y*nz8C*,J}{+d]",
"Us9%^%?n5!~e##*+#",
//"zF8,1KV#¥]$k?|9R#", // <<== I commented out this line, ¥ is not a valid charcter in my environment
"0B4>=nioEjp>4rhgi"
};
// a string is already an array of characters.
std::string alphabet{ "abcdefghijkl" };
std::string replace_with_alphabet(const std::string& input)
{
std::string retval{ input };
std::size_t index{ 0 };
// range based for, it will keep the order of the vector.
for (const auto& str : strings)
{
// look if you can find any of the predefined strings
// in the input strings.
const size_t pos = retval.find(str, 0);
// if found
if (pos != std::string::npos)
{
// get the next character from the alphabet
std::string replacement{ alphabet[index++] };
// use std::string::replace for replacing the substring
const size_t len = str.length();
retval.replace(pos, len, replacement, 0);
}
}
return retval;
};
/**get str and then see the index of the corresponding string in strings[], and replace the string with alphabet[index number], while deleting the original string part that was replaced**/
int main()
{
auto output = replace_with_alphabet("L-nIbhm5<z:92~+,x9bC5f0q#qA(RKZ>|rk=5,ln(08IAl(gGAK");
std::cout << output << std::endl;
}

How can you split string in C++ and store them in variables? [duplicate]

Java has a convenient split method:
String str = "The quick brown fox";
String[] results = str.split(" ");
Is there an easy way to do this in C++?
The Boost tokenizer class can make this sort of thing quite simple:
#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int, char**)
{
string text = "token, test string";
char_separator<char> sep(", ");
tokenizer< char_separator<char> > tokens(text, sep);
BOOST_FOREACH (const string& t, tokens) {
cout << t << "." << endl;
}
}
Updated for C++11:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int, char**)
{
string text = "token, test string";
char_separator<char> sep(", ");
tokenizer<char_separator<char>> tokens(text, sep);
for (const auto& t : tokens) {
cout << t << "." << endl;
}
}
Here's a real simple one:
#include <vector>
#include <string>
using namespace std;
vector<string> split(const char *str, char c = ' ')
{
vector<string> result;
do
{
const char *begin = str;
while(*str != c && *str)
str++;
result.push_back(string(begin, str));
} while (0 != *str++);
return result;
}
C++ standard library algorithms are pretty universally based around iterators rather than concrete containers. Unfortunately this makes it hard to provide a Java-like split function in the C++ standard library, even though nobody argues that this would be convenient. But what would its return type be? std::vector<std::basic_string<…>>? Maybe, but then we’re forced to perform (potentially redundant and costly) allocations.
Instead, C++ offers a plethora of ways to split strings based on arbitrarily complex delimiters, but none of them is encapsulated as nicely as in other languages. The numerous ways fill whole blog posts.
At its simplest, you could iterate using std::string::find until you hit std::string::npos, and extract the contents using std::string::substr.
A more fluid (and idiomatic, but basic) version for splitting on whitespace would use a std::istringstream:
auto iss = std::istringstream{"The quick brown fox"};
auto str = std::string{};
while (iss >> str) {
process(str);
}
Using std::istream_iterators, the contents of the string stream could also be copied into a vector using its iterator range constructor.
Multiple libraries (such as Boost.Tokenizer) offer specific tokenisers.
More advanced splitting require regular expressions. C++ provides the std::regex_token_iterator for this purpose in particular:
auto const str = "The quick brown fox"s;
auto const re = std::regex{R"(\s+)"};
auto const vec = std::vector<std::string>(
std::sregex_token_iterator{begin(str), end(str), re, -1},
std::sregex_token_iterator{}
);
Another quick way is to use getline. Something like:
stringstream ss("bla bla");
string s;
while (getline(ss, s, ' ')) {
cout << s << endl;
}
If you want, you can make a simple split() method returning a vector<string>, which is
really useful.
Use strtok. In my opinion, there isn't a need to build a class around tokenizing unless strtok doesn't provide you with what you need. It might not, but in 15+ years of writing various parsing code in C and C++, I've always used strtok. Here is an example
char myString[] = "The quick brown fox";
char *p = strtok(myString, " ");
while (p) {
printf ("Token: %s\n", p);
p = strtok(NULL, " ");
}
A few caveats (which might not suit your needs). The string is "destroyed" in the process, meaning that EOS characters are placed inline in the delimter spots. Correct usage might require you to make a non-const version of the string. You can also change the list of delimiters mid parse.
In my own opinion, the above code is far simpler and easier to use than writing a separate class for it. To me, this is one of those functions that the language provides and it does it well and cleanly. It's simply a "C based" solution. It's appropriate, it's easy, and you don't have to write a lot of extra code :-)
You can use streams, iterators, and the copy algorithm to do this fairly directly.
#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>
int main()
{
std::string str = "The quick brown fox";
// construct a stream from the string
std::stringstream strstr(str);
// use stream iterators to copy the stream to the vector as whitespace separated strings
std::istream_iterator<std::string> it(strstr);
std::istream_iterator<std::string> end;
std::vector<std::string> results(it, end);
// send the vector to stdout.
std::ostream_iterator<std::string> oit(std::cout);
std::copy(results.begin(), results.end(), oit);
}
A solution using regex_token_iterators:
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main()
{
string str("The quick brown fox");
regex reg("\\s+");
sregex_token_iterator iter(str.begin(), str.end(), reg, -1);
sregex_token_iterator end;
vector<string> vec(iter, end);
for (auto a : vec)
{
cout << a << endl;
}
}
No offense folks, but for such a simple problem, you are making things way too complicated. There are a lot of reasons to use Boost. But for something this simple, it's like hitting a fly with a 20# sledge.
void
split( vector<string> & theStringVector, /* Altered/returned value */
const string & theString,
const string & theDelimiter)
{
UASSERT( theDelimiter.size(), >, 0); // My own ASSERT macro.
size_t start = 0, end = 0;
while ( end != string::npos)
{
end = theString.find( theDelimiter, start);
// If at end, use length=maxLength. Else use length=end-start.
theStringVector.push_back( theString.substr( start,
(end == string::npos) ? string::npos : end - start));
// If at end, use start=maxSize. Else use start=end+delimiter.
start = ( ( end > (string::npos - theDelimiter.size()) )
? string::npos : end + theDelimiter.size());
}
}
For example (for Doug's case),
#define SHOW(I,X) cout << "[" << (I) << "]\t " # X " = \"" << (X) << "\"" << endl
int
main()
{
vector<string> v;
split( v, "A:PEP:909:Inventory Item", ":" );
for (unsigned int i = 0; i < v.size(); i++)
SHOW( i, v[i] );
}
And yes, we could have split() return a new vector rather than passing one in. It's trivial to wrap and overload. But depending on what I'm doing, I often find it better to re-use pre-existing objects rather than always creating new ones. (Just as long as I don't forget to empty the vector in between!)
Reference: http://www.cplusplus.com/reference/string/string/.
(I was originally writing a response to Doug's question: C++ Strings Modifying and Extracting based on Separators (closed). But since Martin York closed that question with a pointer over here... I'll just generalize my code.)
Boost has a strong split function: boost::algorithm::split.
Sample program:
#include <vector>
#include <boost/algorithm/string.hpp>
int main() {
auto s = "a,b, c ,,e,f,";
std::vector<std::string> fields;
boost::split(fields, s, boost::is_any_of(","));
for (const auto& field : fields)
std::cout << "\"" << field << "\"\n";
return 0;
}
Output:
"a"
"b"
" c "
""
"e"
"f"
""
This is a simple STL-only solution (~5 lines!) using std::find and std::find_first_not_of that handles repetitions of the delimiter (like spaces or periods for instance), as well leading and trailing delimiters:
#include <string>
#include <vector>
void tokenize(std::string str, std::vector<string> &token_v){
size_t start = str.find_first_not_of(DELIMITER), end=start;
while (start != std::string::npos){
// Find next occurence of delimiter
end = str.find(DELIMITER, start);
// Push back the token found into vector
token_v.push_back(str.substr(start, end-start));
// Skip all occurences of the delimiter to find new start
start = str.find_first_not_of(DELIMITER, end);
}
}
Try it out live!
I know you asked for a C++ solution, but you might consider this helpful:
Qt
#include <QString>
...
QString str = "The quick brown fox";
QStringList results = str.split(" ");
The advantage over Boost in this example is that it's a direct one to one mapping to your post's code.
See more at Qt documentation
Here is a sample tokenizer class that might do what you want
//Header file
class Tokenizer
{
public:
static const std::string DELIMITERS;
Tokenizer(const std::string& str);
Tokenizer(const std::string& str, const std::string& delimiters);
bool NextToken();
bool NextToken(const std::string& delimiters);
const std::string GetToken() const;
void Reset();
protected:
size_t m_offset;
const std::string m_string;
std::string m_token;
std::string m_delimiters;
};
//CPP file
const std::string Tokenizer::DELIMITERS(" \t\n\r");
Tokenizer::Tokenizer(const std::string& s) :
m_string(s),
m_offset(0),
m_delimiters(DELIMITERS) {}
Tokenizer::Tokenizer(const std::string& s, const std::string& delimiters) :
m_string(s),
m_offset(0),
m_delimiters(delimiters) {}
bool Tokenizer::NextToken()
{
return NextToken(m_delimiters);
}
bool Tokenizer::NextToken(const std::string& delimiters)
{
size_t i = m_string.find_first_not_of(delimiters, m_offset);
if (std::string::npos == i)
{
m_offset = m_string.length();
return false;
}
size_t j = m_string.find_first_of(delimiters, i);
if (std::string::npos == j)
{
m_token = m_string.substr(i);
m_offset = m_string.length();
return true;
}
m_token = m_string.substr(i, j - i);
m_offset = j;
return true;
}
Example:
std::vector <std::string> v;
Tokenizer s("split this string", " ");
while (s.NextToken())
{
v.push_back(s.GetToken());
}
pystring is a small library which implements a bunch of Python's string functions, including the split method:
#include <string>
#include <vector>
#include "pystring.h"
std::vector<std::string> chunks;
pystring::split("this string", chunks);
// also can specify a separator
pystring::split("this-string", chunks, "-");
I posted this answer for similar question.
Don't reinvent the wheel. I've used a number of libraries and the fastest and most flexible I have come across is: C++ String Toolkit Library.
Here is an example of how to use it that I've posted else where on the stackoverflow.
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>
const char *whitespace = " \t\r\n\f";
const char *whitespace_and_punctuation = " \t\r\n\f;,=";
int main()
{
{ // normal parsing of a string into a vector of strings
std::string s("Somewhere down the road");
std::vector<std::string> result;
if( strtk::parse( s, whitespace, result ) )
{
for(size_t i = 0; i < result.size(); ++i )
std::cout << result[i] << std::endl;
}
}
{ // parsing a string into a vector of floats with other separators
// besides spaces
std::string s("3.0, 3.14; 4.0");
std::vector<float> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
}
{ // parsing a string into specific variables
std::string s("angle = 45; radius = 9.9");
std::string w1, w2;
float v1, v2;
if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
{
std::cout << "word " << w1 << ", value " << v1 << std::endl;
std::cout << "word " << w2 << ", value " << v2 << std::endl;
}
}
return 0;
}
Adam Pierce's answer provides an hand-spun tokenizer taking in a const char*. It's a bit more problematic to do with iterators because incrementing a string's end iterator is undefined. That said, given string str{ "The quick brown fox" } we can certainly accomplish this:
auto start = find(cbegin(str), cend(str), ' ');
vector<string> tokens{ string(cbegin(str), start) };
while (start != cend(str)) {
const auto finish = find(++start, cend(str), ' ');
tokens.push_back(string(start, finish));
start = finish;
}
Live Example
If you're looking to abstract complexity by using standard functionality, as On Freund suggests strtok is a simple option:
vector<string> tokens;
for (auto i = strtok(data(str), " "); i != nullptr; i = strtok(nullptr, " ")) tokens.push_back(i);
If you don't have access to C++17 you'll need to substitute data(str) as in this example: http://ideone.com/8kAGoa
Though not demonstrated in the example, strtok need not use the same delimiter for each token. Along with this advantage though, there are several drawbacks:
strtok cannot be used on multiple strings at the same time: Either a nullptr must be passed to continue tokenizing the current string or a new char* to tokenize must be passed (there are some non-standard implementations which do support this however, such as: strtok_s)
For the same reason strtok cannot be used on multiple threads simultaneously (this may however be implementation defined, for example: Visual Studio's implementation is thread safe)
Calling strtok modifies the string it is operating on, so it cannot be used on const strings, const char*s, or literal strings, to tokenize any of these with strtok or to operate on a string who's contents need to be preserved, str would have to be copied, then the copy could be operated on
c++20 provides us with split_view to tokenize strings, in a non-destructive manner: https://topanswers.xyz/cplusplus?q=749#a874
The previous methods cannot generate a tokenized vector in-place, meaning without abstracting them into a helper function they cannot initialize const vector<string> tokens. That functionality and the ability to accept any white-space delimiter can be harnessed using an istream_iterator. For example given: const string str{ "The quick \tbrown \nfox" } we can do this:
istringstream is{ str };
const vector<string> tokens{ istream_iterator<string>(is), istream_iterator<string>() };
Live Example
The required construction of an istringstream for this option has far greater cost than the previous 2 options, however this cost is typically hidden in the expense of string allocation.
If none of the above options are flexable enough for your tokenization needs, the most flexible option is using a regex_token_iterator of course with this flexibility comes greater expense, but again this is likely hidden in the string allocation cost. Say for example we want to tokenize based on non-escaped commas, also eating white-space, given the following input: const string str{ "The ,qu\\,ick ,\tbrown, fox" } we can do this:
const regex re{ "\\s*((?:[^\\\\,]|\\\\.)*?)\\s*(?:,|$)" };
const vector<string> tokens{ sregex_token_iterator(cbegin(str), cend(str), re, 1), sregex_token_iterator() };
Live Example
Check this example. It might help you..
#include <iostream>
#include <sstream>
using namespace std;
int main ()
{
string tmps;
istringstream is ("the dellimiter is the space");
while (is.good ()) {
is >> tmps;
cout << tmps << "\n";
}
return 0;
}
If you're using C++ ranges - the full ranges-v3 library, not the limited functionality accepted into C++20 - you could do it this way:
auto results = str | ranges::views::tokenize(" ",1);
... and this is lazily-evaluated. You can alternatively set a vector to this range:
auto results = str | ranges::views::tokenize(" ",1) | ranges::to<std::vector>();
this will take O(m) space and O(n) time if str has n characters making up m words.
See also the library's own tokenization example, here.
MFC/ATL has a very nice tokenizer. From MSDN:
CAtlString str( "%First Second#Third" );
CAtlString resToken;
int curPos= 0;
resToken= str.Tokenize("% #",curPos);
while (resToken != "")
{
printf("Resulting token: %s\n", resToken);
resToken= str.Tokenize("% #",curPos);
};
Output
Resulting Token: First
Resulting Token: Second
Resulting Token: Third
If you're willing to use C, you can use the strtok function. You should pay attention to multi-threading issues when using it.
For simple stuff I just use the following:
unsigned TokenizeString(const std::string& i_source,
const std::string& i_seperators,
bool i_discard_empty_tokens,
std::vector<std::string>& o_tokens)
{
unsigned prev_pos = 0;
unsigned pos = 0;
unsigned number_of_tokens = 0;
o_tokens.clear();
pos = i_source.find_first_of(i_seperators, pos);
while (pos != std::string::npos)
{
std::string token = i_source.substr(prev_pos, pos - prev_pos);
if (!i_discard_empty_tokens || token != "")
{
o_tokens.push_back(i_source.substr(prev_pos, pos - prev_pos));
number_of_tokens++;
}
pos++;
prev_pos = pos;
pos = i_source.find_first_of(i_seperators, pos);
}
if (prev_pos < i_source.length())
{
o_tokens.push_back(i_source.substr(prev_pos));
number_of_tokens++;
}
return number_of_tokens;
}
Cowardly disclaimer: I write real-time data processing software where the data comes in through binary files, sockets, or some API call (I/O cards, camera's). I never use this function for something more complicated or time-critical than reading external configuration files on startup.
You can simply use a regular expression library and solve that using regular expressions.
Use expression (\w+) and the variable in \1 (or $1 depending on the library implementation of regular expressions).
Many overly complicated suggestions here. Try this simple std::string solution:
using namespace std;
string someText = ...
string::size_type tokenOff = 0, sepOff = tokenOff;
while (sepOff != string::npos)
{
sepOff = someText.find(' ', sepOff);
string::size_type tokenLen = (sepOff == string::npos) ? sepOff : sepOff++ - tokenOff;
string token = someText.substr(tokenOff, tokenLen);
if (!token.empty())
/* do something with token */;
tokenOff = sepOff;
}
I thought that was what the >> operator on string streams was for:
string word; sin >> word;
Here's an approach that allows you control over whether empty tokens are included (like strsep) or excluded (like strtok).
#include <string.h> // for strchr and strlen
/*
* want_empty_tokens==true : include empty tokens, like strsep()
* want_empty_tokens==false : exclude empty tokens, like strtok()
*/
std::vector<std::string> tokenize(const char* src,
char delim,
bool want_empty_tokens)
{
std::vector<std::string> tokens;
if (src and *src != '\0') // defensive
while( true ) {
const char* d = strchr(src, delim);
size_t len = (d)? d-src : strlen(src);
if (len or want_empty_tokens)
tokens.push_back( std::string(src, len) ); // capture token
if (d) src += len+1; else break;
}
return tokens;
}
Seems odd to me that with all us speed conscious nerds here on SO no one has presented a version that uses a compile time generated look up table for the delimiter (example implementation further down). Using a look up table and iterators should beat std::regex in efficiency, if you don't need to beat regex, just use it, its standard as of C++11 and super flexible.
Some have suggested regex already but for the noobs here is a packaged example that should do exactly what the OP expects:
std::vector<std::string> split(std::string::const_iterator it, std::string::const_iterator end, std::regex e = std::regex{"\\w+"}){
std::smatch m{};
std::vector<std::string> ret{};
while (std::regex_search (it,end,m,e)) {
ret.emplace_back(m.str());
std::advance(it, m.position() + m.length()); //next start position = match position + match length
}
return ret;
}
std::vector<std::string> split(const std::string &s, std::regex e = std::regex{"\\w+"}){ //comfort version calls flexible version
return split(s.cbegin(), s.cend(), std::move(e));
}
int main ()
{
std::string str {"Some people, excluding those present, have been compile time constants - since puberty."};
auto v = split(str);
for(const auto&s:v){
std::cout << s << std::endl;
}
std::cout << "crazy version:" << std::endl;
v = split(str, std::regex{"[^e]+"}); //using e as delim shows flexibility
for(const auto&s:v){
std::cout << s << std::endl;
}
return 0;
}
If we need to be faster and accept the constraint that all chars must be 8 bits we can make a look up table at compile time using metaprogramming:
template<bool...> struct BoolSequence{}; //just here to hold bools
template<char...> struct CharSequence{}; //just here to hold chars
template<typename T, char C> struct Contains; //generic
template<char First, char... Cs, char Match> //not first specialization
struct Contains<CharSequence<First, Cs...>,Match> :
Contains<CharSequence<Cs...>, Match>{}; //strip first and increase index
template<char First, char... Cs> //is first specialization
struct Contains<CharSequence<First, Cs...>,First>: std::true_type {};
template<char Match> //not found specialization
struct Contains<CharSequence<>,Match>: std::false_type{};
template<int I, typename T, typename U>
struct MakeSequence; //generic
template<int I, bool... Bs, typename U>
struct MakeSequence<I,BoolSequence<Bs...>, U>: //not last
MakeSequence<I-1, BoolSequence<Contains<U,I-1>::value,Bs...>, U>{};
template<bool... Bs, typename U>
struct MakeSequence<0,BoolSequence<Bs...>,U>{ //last
using Type = BoolSequence<Bs...>;
};
template<typename T> struct BoolASCIITable;
template<bool... Bs> struct BoolASCIITable<BoolSequence<Bs...>>{
/* could be made constexpr but not yet supported by MSVC */
static bool isDelim(const char c){
static const bool table[256] = {Bs...};
return table[static_cast<int>(c)];
}
};
using Delims = CharSequence<'.',',',' ',':','\n'>; //list your custom delimiters here
using Table = BoolASCIITable<typename MakeSequence<256,BoolSequence<>,Delims>::Type>;
With that in place making a getNextToken function is easy:
template<typename T_It>
std::pair<T_It,T_It> getNextToken(T_It begin,T_It end){
begin = std::find_if(begin,end,std::not1(Table{})); //find first non delim or end
auto second = std::find_if(begin,end,Table{}); //find first delim or end
return std::make_pair(begin,second);
}
Using it is also easy:
int main() {
std::string s{"Some people, excluding those present, have been compile time constants - since puberty."};
auto it = std::begin(s);
auto end = std::end(s);
while(it != std::end(s)){
auto token = getNextToken(it,end);
std::cout << std::string(token.first,token.second) << std::endl;
it = token.second;
}
return 0;
}
Here is a live example: http://ideone.com/GKtkLQ
I know this question is already answered but I want to contribute. Maybe my solution is a bit simple but this is what I came up with:
vector<string> get_words(string const& text, string const& separator)
{
vector<string> result;
string tmp = text;
size_t first_pos = 0;
size_t second_pos = tmp.find(separator);
while (second_pos != string::npos)
{
if (first_pos != second_pos)
{
string word = tmp.substr(first_pos, second_pos - first_pos);
result.push_back(word);
}
tmp = tmp.substr(second_pos + separator.length());
second_pos = tmp.find(separator);
}
result.push_back(tmp);
return result;
}
Please comment if there is a better approach to something in my code or if something is wrong.
UPDATE: added generic separator
you can take advantage of boost::make_find_iterator. Something similar to this:
template<typename CH>
inline vector< basic_string<CH> > tokenize(
const basic_string<CH> &Input,
const basic_string<CH> &Delimiter,
bool remove_empty_token
) {
typedef typename basic_string<CH>::const_iterator string_iterator_t;
typedef boost::find_iterator< string_iterator_t > string_find_iterator_t;
vector< basic_string<CH> > Result;
string_iterator_t it = Input.begin();
string_iterator_t it_end = Input.end();
for(string_find_iterator_t i = boost::make_find_iterator(Input, boost::first_finder(Delimiter, boost::is_equal()));
i != string_find_iterator_t();
++i) {
if(remove_empty_token){
if(it != i->begin())
Result.push_back(basic_string<CH>(it,i->begin()));
}
else
Result.push_back(basic_string<CH>(it,i->begin()));
it = i->end();
}
if(it != it_end)
Result.push_back(basic_string<CH>(it,it_end));
return Result;
}
Here's my Swiss® Army Knife of string-tokenizers for splitting up strings by whitespace, accounting for single and double-quote wrapped strings as well as stripping those characters from the results. I used RegexBuddy 4.x to generate most of the code-snippet, but I added custom handling for stripping quotes and a few other things.
#include <string>
#include <locale>
#include <regex>
std::vector<std::wstring> tokenize_string(std::wstring string_to_tokenize) {
std::vector<std::wstring> tokens;
std::wregex re(LR"(("[^"]*"|'[^']*'|[^"' ]+))", std::regex_constants::collate);
std::wsregex_iterator next( string_to_tokenize.begin(),
string_to_tokenize.end(),
re,
std::regex_constants::match_not_null );
std::wsregex_iterator end;
const wchar_t single_quote = L'\'';
const wchar_t double_quote = L'\"';
while ( next != end ) {
std::wsmatch match = *next;
const std::wstring token = match.str( 0 );
next++;
if (token.length() > 2 && (token.front() == double_quote || token.front() == single_quote))
tokens.emplace_back( std::wstring(token.begin()+1, token.begin()+token.length()-1) );
else
tokens.emplace_back(token);
}
return tokens;
}
I wrote a simplified version (and maybe a little bit efficient) of https://stackoverflow.com/a/50247503/3976739 for my own use. I hope it would help.
void StrTokenizer(string& source, const char* delimiter, vector<string>& Tokens)
{
size_t new_index = 0;
size_t old_index = 0;
while (new_index != std::string::npos)
{
new_index = source.find(delimiter, old_index);
Tokens.emplace_back(source.substr(old_index, new_index-old_index));
if (new_index != std::string::npos)
old_index = ++new_index;
}
}
If the maximum length of the input string to be tokenized is known, one can exploit this and implement a very fast version. I am sketching the basic idea below, which was inspired by both strtok() and the "suffix array"-data structure described Jon Bentley's "Programming Perls" 2nd edition, chapter 15. The C++ class in this case only gives some organization and convenience of use. The implementation shown can be easily extended for removing leading and trailing whitespace characters in the tokens.
Basically one can replace the separator characters with string-terminating '\0'-characters and set pointers to the tokens withing the modified string. In the extreme case when the string consists only of separators, one gets string-length plus 1 resulting empty tokens. It is practical to duplicate the string to be modified.
Header file:
class TextLineSplitter
{
public:
TextLineSplitter( const size_t max_line_len );
~TextLineSplitter();
void SplitLine( const char *line,
const char sep_char = ',',
);
inline size_t NumTokens( void ) const
{
return mNumTokens;
}
const char * GetToken( const size_t token_idx ) const
{
assert( token_idx < mNumTokens );
return mTokens[ token_idx ];
}
private:
const size_t mStorageSize;
char *mBuff;
char **mTokens;
size_t mNumTokens;
inline void ResetContent( void )
{
memset( mBuff, 0, mStorageSize );
// mark all items as empty:
memset( mTokens, 0, mStorageSize * sizeof( char* ) );
// reset counter for found items:
mNumTokens = 0L;
}
};
Implementattion file:
TextLineSplitter::TextLineSplitter( const size_t max_line_len ):
mStorageSize ( max_line_len + 1L )
{
// allocate memory
mBuff = new char [ mStorageSize ];
mTokens = new char* [ mStorageSize ];
ResetContent();
}
TextLineSplitter::~TextLineSplitter()
{
delete [] mBuff;
delete [] mTokens;
}
void TextLineSplitter::SplitLine( const char *line,
const char sep_char /* = ',' */,
)
{
assert( sep_char != '\0' );
ResetContent();
strncpy( mBuff, line, mMaxLineLen );
size_t idx = 0L; // running index for characters
do
{
assert( idx < mStorageSize );
const char chr = line[ idx ]; // retrieve current character
if( mTokens[ mNumTokens ] == NULL )
{
mTokens[ mNumTokens ] = &mBuff[ idx ];
} // if
if( chr == sep_char || chr == '\0' )
{ // item or line finished
// overwrite separator with a 0-terminating character:
mBuff[ idx ] = '\0';
// count-up items:
mNumTokens ++;
} // if
} while( line[ idx++ ] );
}
A scenario of usage would be:
// create an instance capable of splitting strings up to 1000 chars long:
TextLineSplitter spl( 1000 );
spl.SplitLine( "Item1,,Item2,Item3" );
for( size_t i = 0; i < spl.NumTokens(); i++ )
{
printf( "%s\n", spl.GetToken( i ) );
}
output:
Item1
Item2
Item3

Split and convert from string to char array

How to convert:
string x = "1+2+3";
to:
char y[] = {'1', '2', '3'};
What approach should I do?
The task is to split a string separated by '+'. In the below example, the delimiter ',' is used.
Splitting a string into tokens is a very old task. There are many many solutions available. All have different properties. Some are difficult to understand, some are hard to develop, some are more complex, slower or faster or more flexible or not.
Alternatives
Handcrafted, many variants, using pointers or iterators, maybe hard to develop and error prone.
Using old style std::strtok function. Maybe unsafe. Maybe should not be used any longer
std::getline. Most used implementation. But actually a "misuse" and not so flexible
Using dedicated modern function, specifically developed for this purpose, most flexible and good fitting into the STL environment and algortithm landscape. But slower.
Please see 4 examples in one piece of code.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
#include <cstring>
#include <forward_list>
#include <deque>
using Container = std::vector<std::string>;
std::regex delimiter{ "," };
int main() {
// Some function to print the contents of an STL container
auto print = [](const auto& container) -> void { std::copy(container.begin(), container.end(),
std::ostream_iterator<std::decay<decltype(*container.begin())>::type>(std::cout, " ")); std::cout << '\n'; };
// Example 1: Handcrafted -------------------------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Search for comma, then take the part and add to the result
for (size_t i{ 0U }, startpos{ 0U }; i <= stringToSplit.size(); ++i) {
// So, if there is a comma or the end of the string
if ((stringToSplit[i] == ',') || (i == (stringToSplit.size()))) {
// Copy substring
c.push_back(stringToSplit.substr(startpos, i - startpos));
startpos = i + 1;
}
}
print(c);
}
// Example 2: Using very old strtok function ----------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Split string into parts in a simple for loop
#pragma warning(suppress : 4996)
for (char* token = std::strtok(const_cast<char*>(stringToSplit.data()), ","); token != nullptr; token = std::strtok(nullptr, ",")) {
c.push_back(token);
}
print(c);
}
// Example 3: Very often used std::getline with additional istringstream ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Put string in an std::istringstream
std::istringstream iss{ stringToSplit };
// Extract string parts in simple for loop
for (std::string part{}; std::getline(iss, part, ','); c.push_back(part))
;
print(c);
}
// Example 4: Most flexible iterator solution ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
//
// Everything done already with range constructor. No additional code needed.
//
print(c);
// Works also with other containers in the same way
std::forward_list<std::string> c2(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
print(c2);
// And works with algorithms
std::deque<std::string> c3{};
std::copy(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {}, std::back_inserter(c3));
print(c3);
}
return 0;
}
You can use an std::vector<std::string> instead of char[], that way, it would work with more than one-digit numbers. Try this:
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
int main() {
using namespace std;
std::string str("1+2+3");
std::string buff;
std::stringstream ss(str);
std::vector<std::string> result;
while(getline(ss, buff, '+')){
result.push_back(buff);
}
for(std::string num : result){
std::cout << num << std::endl;
}
}
Here is a coliru link to show it works with numbers having more than one digit.
Here are my steps:
convert the original string into char*
split the obtained char* with the delimiter + by using the function strtok. I store each token into a vector<char>
convert this vector<char> into a C char array char*
#include <iostream>
#include <string.h>
#include <vector>
using namespace std;
int main()
{
string line = "1+2+3";
std::vector<char> vectChar;
// convert the original string into a char array to allow splitting
char* input= (char*) malloc(sizeof(char)*line.size());
strcpy(input,line.data());
// splitting the string
char *token = strtok(input, "+");
int len=0;
while(token) {
std::cout << *token;
vectChar.push_back(*token);
token = strtok(NULL, "+");
}
// end of splitting step
std::cout << std::endl;
//test display the content of the vect<char>={'1', '2', ...}
for (int i=0; i< vectChar.size(); i++)
{
std::cout << vectChar[i];
}
// Now that the vector contains the needed list of char
// we need to convert it to char array (char*)
// first malloc
char* buffer = (char*) malloc(vectChar.size()*sizeof(char));
// then convert the vector into char*
std::copy(vectChar.begin(), vectChar.end(), buffer);
std::cout << std::endl;
//now buffer={'1', '2', ...}
// les ut stest by displaying
while ( *buffer != '\0')
{
printf("%c", *buffer);
buffer++;
}
}
You can run/check this code in https://repl.it/#JomaCorpFX/StringSplit#main.cpp
Code
#include <iostream>
#include <vector>
std::vector<std::string> Split(const std::string &data, const std::string &toFind)
{
std::vector<std::string> v;
if (data.empty() || toFind.empty())
{
v.push_back(data);
return v;
}
size_t ini = 0;
size_t pos;
while ((pos = data.find(toFind, ini)) != std::string::npos)
{
std::string s = data.substr(ini, pos - ini);
if (!s.empty())
{
v.push_back(s);
}
ini = pos + toFind.length();
}
if (ini < data.length())
{
v.push_back(data.substr(ini));
}
return v;
}
int main()
{
std::string x = "1+2+3";
for (auto value : Split(x, u8"+"))
{
std::cout << "Value: " << value << std::endl;
}
std::cout << u8"Press enter to continue... ";
std::cin.get();
return EXIT_SUCCESS;
}
Output
Value: 1
Value: 2
Value: 3
Press enter to continue...

Turn std::string into array of char* const*'s

I am writing a command shell in C++ using the POSIX api, and have hit a snag. I am executing via execvp(3), so I somehow need to turn the std::string that contains the command into a suitable array of char* consts*'s that can be passed to:
int execvp(const char *file, char *const argv[]);
I have been racking my brain for hours but I can't think of any realistic or sane way to do this. Any help or insight on how I can achieve this conversion would be greatly appreciated. Thank you and have a good day!
edit:
As per request of Chnossos, here is an example:
const char *args[] = {"echo", "Hello,", "world!"};
execvp(args[0], args);
Assuming you have a string that contains more than "one argument", you will first have to split the string (using a std::vector<std::string> would work to store the separate strings), then for each element in the vector, store the .c_str() of that string into a const char args[MAXARGS] [or a std::vector<const char*> args; and use args.data() if you don't mind using C++11]. Do not forget to store a 0 or nullptr in the last element.
It is critical if you use c_str that the string you are basing that of is not a temporary: const char* x = str.substr(11, 33).c_str(); will not give you the thing you want, because at the end of that line, the temporary string is destroyed, and its storage freed.
If you have only one actual argument,
const char* args[2] = { str.c_str(), 0 };
would work.
Examplary approach:
#include <string>
#include <vector>
#include <cstring>
using namespace std;
int execvp(const char *file, char *const argv[]) {
//doing sth
}
int main() {
string s = "echo Hello world!";
char* cs = strdup(s.c_str());
char* lastbeg = cs;
vector<char *> collection;
for (char *itcs = cs; *itcs; itcs++) {
if (*itcs == ' ') {
*itcs = 0;
collection.push_back(lastbeg);
lastbeg = itcs + 1;
}
}
collection.push_back(lastbeg);
for (auto x: collection) {
printf("%s\n", x);
}
execvp("abc.txt", &collection[0]);
}
Notice that the memory for the cs isn't freed here... in your application you would need to take care of that...
The number of elements in array can be simply extracted from collection.size()
I use this:
command_line.hpp:
#pragma once
#include <vector>
#include <string>
namespace wpsc { namespace unittest { namespace mock {
class command_line final
{
public:
explicit command_line(std::vector<std::string> args = {});
explicit command_line(int argc, char const * const * const argv);
int argc() const;
/// #remark altering memory returned by this function results in UB
char** argv() const;
std::string string() const;
private:
std::vector<std::string> args_;
mutable std::vector<char*> c_args_;
};
}}} // wpsc::unittest::mock
command_line.cpp:
#include <wpsc/unittest/mock/command_line.hpp>
#include <algorithm>
#include <sstream>
namespace wpsc { namespace unittest { namespace mock {
command_line::command_line(std::vector<std::string> args)
: args_( std::move(args) ), c_args_( )
{
}
command_line::command_line(int argc, char const * const * const argv)
: command_line{ std::vector<std::string>{ argv, argv + argc } }
{
}
int command_line::argc() const
{
return static_cast<int>(args_.size());
}
char ** command_line::argv() const
{
if(args_.empty())
return nullptr;
if(c_args_.size() != args_.size() + 1)
{
c_args_.clear();
using namespace std;
transform(begin(args_), end(args_), back_inserter(c_args_),
[](const std::string& s) { return const_cast<char*>(s.c_str()); }
);
c_args_.push_back(nullptr);
}
return c_args_.data();
}
std::string command_line::string() const
{
using namespace std;
ostringstream buffer;
copy(begin(args_), end(args_), ostream_iterator<std::string>{ buffer, " " });
return buffer.str();
}
}}} // wpsc::unittest::mock
Client code:
int main(int argc, char** argv)
{
wpsc::unittest::mock::command_line cmd1{ argc, argv };
// wpsc::unittest::mock::command_line cmd2{ {"app.exe" "-h"} };
some_app_controller c;
return c.run(cmd1.argc(), cmd1.argv());
}
If the parsing can actually be really complicated, I'd go with something like that:
std::string cmd = "some really complicated command here";
char * const args[] =
{
"sh",
"-c",
cmd.c_str(),
(char *) NULL
};
execvp(args[0], args);
So the problem is the splitting of the line into individual arguments, and filling the argument vector with the respective pointers?
Assuming you want to split at the whitespace in the line, you replace whitespace in the string with null-bytes (in-place). You can then fill the argument vector with pointers into the string.
You will have to write a single loop to go through the string.
You need to decide what the rules will be for your shell and implement them. That's a significant fraction of the work of making a shell.
You need to write this code, and it's not simple. In a typical shell, echo "Hello world!" has to become { echo, Hello world! }, while echo \"Hello world!\" has to become { echo, "Hello world!" }. And so on.
What will " do in your shell? What will ' do? You need to make these decision before you code this part.

How to implode a vector of strings into a string (the elegant way)

I'm looking for the most elegant way to implode a vector of strings into a string. Below is the solution I'm using now:
static std::string& implode(const std::vector<std::string>& elems, char delim, std::string& s)
{
for (std::vector<std::string>::const_iterator ii = elems.begin(); ii != elems.end(); ++ii)
{
s += (*ii);
if ( ii + 1 != elems.end() ) {
s += delim;
}
}
return s;
}
static std::string implode(const std::vector<std::string>& elems, char delim)
{
std::string s;
return implode(elems, delim, s);
}
Is there any others out there?
Use boost::algorithm::join(..):
#include <boost/algorithm/string/join.hpp>
...
std::string joinedString = boost::algorithm::join(elems, delim);
See also this question.
std::vector<std::string> strings;
const char* const delim = ", ";
std::ostringstream imploded;
std::copy(strings.begin(), strings.end(),
std::ostream_iterator<std::string>(imploded, delim));
(include <string>, <vector>, <sstream> and <iterator>)
If you want to have a clean end (no trailing delimiter) have a look here
You should use std::ostringstream rather than std::string to build the output (then you can call its str() method at the end to get a string, so your interface need not change, only the temporary s).
From there, you could change to using std::ostream_iterator, like so:
copy(elems.begin(), elems.end(), ostream_iterator<string>(s, delim));
But this has two problems:
delim now needs to be a const char*, rather than a single char. No big deal.
std::ostream_iterator writes the delimiter after every single element, including the last. So you'd either need to erase the last one at the end, or write your own version of the iterator which doesn't have this annoyance. It'd be worth doing the latter if you have a lot of code that needs things like this; otherwise the whole mess might be best avoided (i.e. use ostringstream but not ostream_iterator).
Because I love one-liners (they are very useful for all kinds of weird stuff, as you'll see at the end), here's a solution using std::accumulate and C++11 lambda:
std::accumulate(alist.begin(), alist.end(), std::string(),
[](const std::string& a, const std::string& b) -> std::string {
return a + (a.length() > 0 ? "," : "") + b;
} )
I find this syntax useful with stream operator, where I don't want to have all kinds of weird logic out of scope from the stream operation, just to do a simple string join. Consider for example this return statement from method that formats a string using stream operators (using std;):
return (dynamic_cast<ostringstream&>(ostringstream()
<< "List content: " << endl
<< std::accumulate(alist.begin(), alist.end(), std::string(),
[](const std::string& a, const std::string& b) -> std::string {
return a + (a.length() > 0 ? "," : "") + b;
} ) << endl
<< "Maybe some more stuff" << endl
)).str();
Update:
As pointed out by #plexando in the comments, the above code suffers from misbehavior when the array starts with empty strings due to the fact that the check for "first run" is missing previous runs that have resulted in no additional characters, and also - it is weird to run a check for "is first run" on all runs (i.e. the code is under-optimized).
The solution for both of these problems is easy if we know for a fact that the list has at least one element. OTOH, if we know for a fact that the list does not have at least one element, then we can shorten the run even more.
I think the resulting code isn't as pretty, so I'm adding it here as The Correct Solution, but I think the discussion above still has merrit:
alist.empty() ? "" : /* leave early if there are no items in the list */
std::accumulate( /* otherwise, accumulate */
++alist.begin(), alist.end(), /* the range 2nd to after-last */
*alist.begin(), /* and start accumulating with the first item */
[](auto& a, auto& b) { return a + "," + b; });
Notes:
For containers that support direct access to the first element, its probably better to use that for the third argument instead, so alist[0] for vectors.
As per the discussion in the comments and chat, the lambda still does some copying. This can be minimized by using this (less pretty) lambda instead: [](auto&& a, auto&& b) -> auto& { a += ','; a += b; return a; }) which (on GCC 10) improves performance by more than x10. Thanks to #Deduplicator for the suggestion. I'm still trying to figure out what is going on here.
I like to use this one-liner accumulate (no trailing delimiter):
(std::accumulate defined in <numeric>)
std::accumulate(
std::next(elems.begin()),
elems.end(),
elems[0],
[](std::string a, std::string b) {
return a + delimiter + b;
}
);
what about simple stupid solution?
std::string String::join(const std::vector<std::string> &lst, const std::string &delim)
{
std::string ret;
for(const auto &s : lst) {
if(!ret.empty())
ret += delim;
ret += s;
}
return ret;
}
With fmt you can do.
#include <fmt/format.h>
auto s = fmt::format("{}",fmt::join(elems,delim));
But I don't know if join will make it to std::format.
string join(const vector<string>& vec, const char* delim)
{
stringstream res;
copy(vec.begin(), vec.end(), ostream_iterator<string>(res, delim));
return res.str();
}
Especially with bigger collections, you want to avoid having to check if youre still adding the first element or not to ensure no trailing separator...
So for the empty or single-element list, there is no iteration at all.
Empty ranges are trivial: return "".
Single element or multi-element can be handled perfectly by accumulate:
auto join = [](const auto &&range, const auto separator) {
if (range.empty()) return std::string();
return std::accumulate(
next(begin(range)), // there is at least 1 element, so OK.
end(range),
range[0], // the initial value
[&separator](auto result, const auto &value) {
return result + separator + value;
});
};
Running sample (require C++14): http://cpp.sh/8uspd
A version that uses std::accumulate:
#include <numeric>
#include <iostream>
#include <string>
struct infix {
std::string sep;
infix(const std::string& sep) : sep(sep) {}
std::string operator()(const std::string& lhs, const std::string& rhs) {
std::string rz(lhs);
if(!lhs.empty() && !rhs.empty())
rz += sep;
rz += rhs;
return rz;
}
};
int main() {
std::string a[] = { "Hello", "World", "is", "a", "program" };
std::string sum = std::accumulate(a, a+5, std::string(), infix(", "));
std::cout << sum << "\n";
}
While I would normally recommend using Boost as per the top answer, I recognise that in some projects that's not desired.
The STL solutions suggested using std::ostream_iterator will not work as intended - it'll append a delimiter at the end.
There is now a way to do this with modern C++ using std::experimental::ostream_joiner:
std::ostringstream outstream;
std::copy(strings.begin(),
strings.end(),
std::experimental::make_ostream_joiner(outstream, delimiter.c_str()));
return outstream.str();
Here's what I use, simple and flexible
string joinList(vector<string> arr, string delimiter)
{
if (arr.empty()) return "";
string str;
for (auto i : arr)
str += i + delimiter;
str = str.substr(0, str.size() - delimiter.size());
return str;
}
using:
string a = joinList({ "a", "bbb", "c" }, "!##");
output:
a!##bbb!##c
Here is another one that doesn't add the delimiter after the last element:
std::string concat_strings(const std::vector<std::string> &elements,
const std::string &separator)
{
if (!elements.empty())
{
std::stringstream ss;
auto it = elements.cbegin();
while (true)
{
ss << *it++;
if (it != elements.cend())
ss << separator;
else
return ss.str();
}
}
return "";
Using part of this answer to another question gives you a joined this, based on a separator without a trailing comma,
Usage:
std::vector<std::string> input_str = std::vector<std::string>({"a", "b", "c"});
std::string result = string_join(input_str, ",");
printf("%s", result.c_str());
/// a,b,c
Code:
std::string string_join(const std::vector<std::string>& elements, const char* const separator)
{
switch (elements.size())
{
case 0:
return "";
case 1:
return elements[0];
default:
std::ostringstream os;
std::copy(elements.begin(), elements.end() - 1, std::ostream_iterator<std::string>(os, separator));
os << *elements.rbegin();
return os.str();
}
}
Another simple and good solution is using ranges v3. The current version is C++14 or greater, but there are older versions that are C++11 or greater. Unfortunately, C++20 ranges don't have the intersperse function.
The benefits of this approach are:
Elegant
Easily handle empty strings
Handles the last element of the list
Efficiency. Because ranges are lazily evaluated.
Small and useful library
Functions breakdown(Reference):
accumulate = Similar to std::accumulate but arguments are a range and the initial value. There is an optional third argument that is the operator function.
filter = Like std::filter, filter the elements that don't fit the predicate.
intersperse = The key function! Intersperses a delimiter between range input elements.
#include <iostream>
#include <string>
#include <vector>
#include <range/v3/numeric/accumulate.hpp>
#include <range/v3/view/filter.hpp>
#include <range/v3/view/intersperse.hpp>
int main()
{
using namespace ranges;
// Can be any std container
std::vector<std::string> a{ "Hello", "", "World", "is", "", "a", "program" };
std::string delimiter{", "};
std::string finalString =
accumulate(a | views::filter([](std::string s){return !s.empty();})
| views::intersperse(delimiter)
, std::string());
std::cout << finalString << std::endl; // Hello, World, is, a, program
}
A possible solution with ternary operator ?:.
std::string join(const std::vector<std::string> & v, const std::string & delimiter = ", ") {
std::string result;
for (size_t i = 0; i < v.size(); ++i) {
result += (i ? delimiter : "") + v[i];
}
return result;
}
join({"2", "4", "5"}) will give you 2, 4, 5.
If you are already using a C++ base library (for commonly used tools), string-processing features are typically included. Besides Boost mentioned above, Abseil provides:
std::vector<std::string> names {"Linus", "Dennis", "Ken"};
std::cout << absl::StrJoin(names, ", ") << std::endl;
Folly provides:
std::vector<std::string> names {"Linus", "Dennis", "Ken"};
std::cout << folly::join(", ", names) << std::endl;
Both give the string "Linus, Dennis, Ken".
Slightly long solution, but doesn't use std::ostringstream, and doesn't require a hack to remove the last delimiter.
http://www.ideone.com/hW1M9
And the code:
struct appender
{
appender(char d, std::string& sd, int ic) : delim(d), dest(sd), count(ic)
{
dest.reserve(2048);
}
void operator()(std::string const& copy)
{
dest.append(copy);
if (--count)
dest.append(1, delim);
}
char delim;
mutable std::string& dest;
mutable int count;
};
void implode(const std::vector<std::string>& elems, char delim, std::string& s)
{
std::for_each(elems.begin(), elems.end(), appender(delim, s, elems.size()));
}
This can be solved using boost
#include <boost/range/adaptor/filtered.hpp>
#include <boost/algorithm/string/join.hpp>
#include <boost/algorithm/algorithm.hpp>
std::vector<std::string> win {"Stack", "", "Overflow"};
const std::string Delimitor{","};
const std::string combined_string =
boost::algorithm::join(win |
boost::adaptors::filtered([](const auto &x) {
return x.size() != 0;
}), Delimitor);
Output:
combined_string: "Stack,Overflow"
I'm using the following approach that works fine in C++17. The function starts checking if the given vector is empty, in which case returns an empty string. If that's not the case, it takes the first element from the vector, then iterates from the second one until the end and appends the separator followed by the vector element.
template <typename T>
std::basic_string<T> Join(std::vector<std::basic_string<T>> vValues,
std::basic_string<T> strDelim)
{
std::basic_string<T> strRet;
typename std::vector<std::basic_string<T>>::iterator it(vValues.begin());
if (it != vValues.end()) // The vector is not empty
{
strRet = *it;
while (++it != vValues.end()) strRet += strDelim + *it;
}
return strRet;
}
Usage example:
std::vector<std::string> v1;
std::vector<std::string> v2 { "Hello" };
std::vector<std::string> v3 { "Str1", "Str2" };
std::cout << "(1): " << Join<char>(v1, ",") << std::endl;
std::cout << "(2): " << Join<char>(v2, "; ") << std::endl;
std::cout << "(3): [" << Join<char>(v3, "] [") << "]" << std::endl;
Output:
(1):
(2): Hello
(3): [Str1] [Str2]