I have a set of strings separated by #. I want to split them and insert into unordered_set.
For example.
abc#def#ghi
xyz#mno#pqr
I use boost split by passing unordered set. But everytime I get new result set. I want to append the next result into the same set.
std::string str1 = "abc#def#ghi";
std::string str2 = "xyz#mno#pqr";
std::unordered_set<std::string> result
boost::split(result, str1, boost::is_any_of("#"));
boost::split(result, str2, boost::is_any_of("#"));
If i check result set, i only get xyz, mno, pqr. I want it to have been appended with "abc def and ghi". How to achieve it.
Note: I dont want to use a any additional container.
I'd do: (see it Live On Coliru)
#include <sstream>
#include <unordered_set>
#include <iostream>
int main()
{
std::unordered_set<std::string> result;
std::istringstream iss("abc#def#ghi");
std::string tok;
while (std::getline(iss, tok, '#'))
result.insert(tok);
iss.str("xyz#mno#pqr");
iss.clear();
while (std::getline(iss, tok, '#'))
result.insert(tok);
for (auto& s : result)
std::cout << s << "\n";
}
This is because boost::split clean the destination container before writing into it.
I'd use boost::tokenizer for what you want.
#include<boost/tokenizer>
// ....
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
boost::char_separator<char> sep("#");
std::string str1 = "abc#def#ghi";
std::string str2 = "xyz#mno#pqr";
std::unordered_set<std::string> result;
tokenizer t1(str1, sep), t2(str2, sep);
std::copy(t1.begin(), t1.end(), std::inserter(result, result.end()) );
std::copy(t2.begin(), t2.end(), std::inserter(result, result.end()) );
Related
I have seen some very popular questions here, on StackOverflow about splitting a string in C++, but every time, they needed to split that string by the SPACE delimiter. Instead, I want to split an std::string by the ; delimiter.
This code is taken from a n answer on StackOverflow, but I don't know how to update it for ;, instead of SPACE.
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "\n"));
}
Can you help me?
Here is one of the answers from Split a string in C++? that uses any delimiter.
I use this to split string by a delim. The first puts the results in a pre-constructed vector, the second returns a new vector.
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}
Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:
std::vector<std::string> x = split("one:two::three", ':');
I have a string which i then want to store in a vector
string a = "N\nT\n";
after each new line to be in a different cell.
std::string ss (".V/\n.F/\n.R/\n");
for(int i = 0; i< ss.size(); i++)
{
test1.push_back(ss);
}
I want to store the string in vector test1
is this the best way?
Your code won't work; it'll store the string ss.size() times in the vector.
You might want to use a string stream to split the string:
std::stringstream stream(ss);
std::string line;
while (std::getline(stream, line)) {
test1.push_back(line);
}
Note that the newline character will be discarded. If you want to keep it, push_back(line + "\n");.
Boost::split will do this for you. See usage details here:
http://www.boost.org/doc/libs/1_49_0/doc/html/string_algo/usage.html#id3184031
If the newline can be discarded then you could use std::copy():
#include <iostream>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <string>
#include <vector>
int main()
{
std::string ss(".V/\n.F/\n.R/\n");
std::istringstream in(ss);
std::vector<std::string> test1;
std::copy(std::istream_iterator<std::string>(in),
std::istream_iterator<std::string>(),
std::back_inserter(test1));
std::for_each(test1.begin(),
test1.end(),
[](const std::string& s)
{
std::cout << s << "\n";
});
return 0;
}
Output:
.V/
.F/
.R/
This certainly isn't the best way, because it doesn't work. This just pushes ss.size() instances of the std::string in the vector.
You can use the find and substr methods to partition the string and push them in the array. (not gonna write the actual code though, might be a good exercise).
I have a text file with a set of names formatted in the following way:
"MARY","PATRICIA","LINDA","BARBARA","ELIZABETH"
and so on. I want to open the file using ifstream and read the names into a string array (without quotes, commas). I somehow managed to do it by checking the input stream character by character. Is there an easier way to take this formatted input?
EDIT:
I heard that you can use something like
fscanf (f, "\"%[a-zA-Z]\",", str);
in C, but is there such a method for ifstream?
That input should be parsable with std::getline or std::regex_token_iterator (though the latter is shooting sparrows with artillery).
Examples:
Regex
Quick and dirty, yet heavyweight solution (using boost so most compilers eat this)
#include <boost/regex.hpp>
#include <iostream>
int main() {
const std::string s = "\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"";
boost::regex re("\"(.*?)\"");
for (boost::sregex_token_iterator it(s.begin(), s.end(), re, 1), end;
it != end; ++it)
{
std::cout << *it << std::endl;
}
}
Output:
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
Alternatively, you can use
boost::regex re(",");
for (boost::sregex_token_iterator it(s.begin(), s.end(), re, -1), end;
to let it split along commas (note also the -1) or other regexes.
getline
getline solution (whitespace allowed)
#include <sstream>
#include <iostream>
int main() {
std::stringstream ss;
ss.str ("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"");
std::string curr;
while (std::getline (ss, curr, ',')) {
size_t from = 1 + curr.find_first_of ('"'),
to = curr.find_last_of ('"');
std::cout << curr.substr (from, to-from) << std::endl;
}
}
Output is the same.
getline
getline solution (whitespace not allowed)
The loop becomes almost trivial:
std::string curr;
while (std::getline (ss, curr, ',')) {
std::cout << curr.substr (1, curr.length()-2) << std::endl;
}
homebrew solution
Least wasteful w.r.t. performance (especially when you wouldn't store those strings, but iterators or indices instead)
#include <iostream>
int main() {
const std::string str ("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"");
size_t i = 0;
while (i != std::string::npos) {
size_t begin = str.find ('"', i) + 1, // one behind initial '"'
end = str.find ('"', begin),
comma = str.find (',', end);
i = comma;
std::cout << str.substr(begin, end-begin) << std::endl;
}
}
As far as I know, there is no tokenizer in the STL. But if you are willing to use boost, there's a very good tokenizer class there. Other than that, character by character is your best C++ way of addressing it (unless you are willing to go the C route, and use strtok_t on your raw char * strings).
A simple tokenizer should do the trick; no need for something heavy-weight like regular expressions. C++ doesn't have a built-in one, but it's easy enough to write. Here's one which I myself stole off the internet so long ago I don't even remember who wrote it, so apologies for the blatant plagiarism:
#include <vector>
#include <string>
std::vector<std::string>
tokenize(const std::string & str, const std::string & delimiters)
{
std::vector<std::string> tokens;
// Skip delimiters at beginning.
std::string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
std::string::size_type pos = str.find_first_of(delimiters, lastPos);
while (std::string::npos != pos || std::string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
return tokens;
}
Usage: std::vector<std::string> words = tokenize(line, ",");
Actually, because I was interested, I worked out how to do this using Boost.Spirit.Qi:
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace boost::spirit::qi;
int main() {
// our test-string
std::string data("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\"");
// this is where we will store the names
std::vector<std::string> names;
// parse the string
phrase_parse(data.begin(), data.end(),
( lexeme['"' >> +(char_ - '"') >> '"'] % ',' ),
space, names);
// print what we have parsed
std::copy(names.begin(), names.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
To check if an error occurred during parsing, simply store the iterators over the string in variables, and compare them afterwards. If they are equal, the whole string was matched, if not, the begin-iterator will point to the error site.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to split a string in C++?
Best way to split a string in C++? The string can be assumed to be composed of words separated by ;
From our guide lines point of view C string functions are not allowed and also Boost is also not allowed to use because of security conecerns open source is not allowed.
The best solution I have right now is:
string str("denmark;sweden;india;us");
Above str should be stored in vector as strings. how can we achieve this?
Thanks for inputs.
I find std::getline() is often the simplest. The optional delimiter parameter means it's not just for reading "lines":
#include <sstream>
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<string> strings;
istringstream f("denmark;sweden;india;us");
string s;
while (getline(f, s, ';')) {
cout << s << endl;
strings.push_back(s);
}
}
You could use a string stream and read the elements into the vector.
Here are many different examples...
A copy of one of the examples:
std::vector<std::string> split(const std::string& s, char seperator)
{
std::vector<std::string> output;
std::string::size_type prev_pos = 0, pos = 0;
while((pos = s.find(seperator, pos)) != std::string::npos)
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
output.push_back(substring);
prev_pos = ++pos;
}
output.push_back(s.substr(prev_pos, pos-prev_pos)); // Last word
return output;
}
There are several libraries available solving this problem, but the simplest is probably to use Boost Tokenizer:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
std::string str("denmark;sweden;india;us");
boost::char_separator<char> sep(";");
tokenizer tokens(str, sep);
BOOST_FOREACH(std::string const& token, tokens)
{
std::cout << "<" << *tok_iter << "> " << "\n";
}
I have a string "stack+ovrflow*newyork;" i have to split this stack,overflow,newyork
any idea??
First and foremost if available, I would always use boost::tokenizer for this kind of task (see and upvote the great answers below)
Without access to boost, you have a couple of options:
You can use C++ std::strings and parse them using a stringstream and getline (safest way)
std::string str = "stack+overflow*newyork;";
std::istringstream stream(str);
std::string tok1;
std::string tok2;
std::string tok3;
std::getline(stream, tok1, '+');
std::getline(stream, tok2, '*');
std::getline(stream, tok3, ';');
std::cout << tok1 << "," << tok2 << "," << tok3 << std::endl
Or you can use one of the strtok family of functions (see Naveen's answer for the unicode agnostic version; see xtofls comments below for warnings about thread safety), if you are comfortable with char pointers
char str[30];
strncpy(str, "stack+overflow*newyork;", 30);
// point to the delimeters
char* result1 = strtok(str, "+");
char* result2 = strtok(str, "*");
char* result3 = strtok(str, ";");
// replace these with commas
if (result1 != NULL)
{
*result1 = ',';
}
if (result2 != NULL)
{
*result2 = ',';
}
// output the result
printf(str);
Boost tokenizer
Simple like this:
#include <boost/tokenizer.hpp>
#include <vector>
#include <string>
std::string stringToTokenize= "stack+ovrflow*newyork;";
boost::char_separator<char> sep("+*;");
boost::tokenizer< boost::char_separator<char> > tok(stringToTokenize, sep);
std::vector<std::string> vectorWithTokenizedStrings;
vectorWithTokenizedStrings.assign(tok.begin(), tok.end());
Now vectorWithTokenizedStrings has the tokens you are looking for. Notice the boost::char_separator variable. It holds the separators between the tokens.
See boost tokenizer here.
You can use _tcstok to tokenize the string based on a delimiter.
This site has a string tokenising function that takes a string of characters to use as delimiters and returns a vector of strings.
Simple STL String Tokenizer Function
There is another way to split a string using c/c++ :
First define a function to split a string:
//pointers of the substrings, assume the number of fields will not be over 5
char *fields[5];
//str: the string to splitted
//splitter: the split charactor
//return the real number of fields or 0 if any error exits
int split(char* str, char *splitter)
{
if(NULL == str)
{
return 0;
}
int cnt;
fields[0] = str;
for(cnt = 1; (fields[cnt] = strstr(fields[cnt - 1], splitter)) != NULL &&
cnt < 5; cnt++)
{
*fields[cnt] = '\0';
++fields[cnt];
}
return cnt;
}
then you can use this function to split string as following:
char* str = "stack+ovrflow*newyork;"
split(str, "+");
printf("%s\n", fields[0]); //print "stack"
split(fields[1], "*");
printf("%s\n", fields[0]); //print "ovrflow"
split(fields[1], ";");
printf("%s\n", fields[0]); //print "newyork"
this way will be more efficient and reusable