Split string according to character-combination/ at `\n` - c++

What is the right way to split a string like below at a specific character-combination into a string vector?
string myString = "This is \n a test. Let's go on. \n Yeah.";
split at "\n" to get this result:
vector<string> myVector = {
"This is ",
" a test. Let's go on. ",
" Yeah."
}
I was using boost algorithm library but now I'd like to achieve this all without using an external library like boost.
#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>
std::vector<std::string> result;
boost::split(result, "This is \n a test. Let's go on. \n Yeah.",
boost::is_any_of("\n"), boost::token_compress_on);

How about something like this:
#include <iostream>
#include <sstream>
#include <vector>
#include <string>
#include <iterator>
class line : public std::string {};
std::istream &operator>>(std::istream &iss, line &line)
{
std::getline(iss, line, '\n');
return iss;
}
int main()
{
std::istringstream iss("This is \n a test. Let's go on. \n Yeah.");
std::vector<std::string> v(std::istream_iterator<line>{iss}, std::istream_iterator<line>{});
// test
for (auto const &s : v)
std::cout << s << std::endl;
return 0;
}
Basically make a new type of string which is line and use stream iterator to read whole lines straight to vector range constructor
Working demo: https://ideone.com/4qdfY2

Solution 1: Just to remove "\n" from the string.
Just to remove "\n", you can use erase-remove idiom . SEE LIVE HERE
#include <iostream>
#include <string>
#include <algorithm>
int main()
{
std::string myString = "This is \n a test. Let's go on. \n Yeah.";
myString.erase(std::remove(myString.begin(), myString.end(), '\n'),
myString.end());
std::cout << myString<< std::endl;
}
Output:
This is a test. Let's go on. Yeah
Solution 2: To remove "\n" from the string and save each split at \n to a vector. (un-efficient)
Replace all \n occurance with some other charectors, which doesn't exist in the string (here I have chosen ;). Then parse with the help of std::stringstream and std::getline as follows. SEE LIVE HERE
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <sstream>
int main()
{
std::string myString = "This is \n a test. Let's go on. \n Yeah.";
std::replace(myString.begin(), myString.end(), '\n', ';');
std::stringstream ssMyString(myString);
std::string each_split;
std::vector<std::string> vec;
while(std::getline(ssMyString, each_split, ';')) vec.emplace_back(each_split);
for(const auto& it: vec) std::cout << it << "\n";
}
Output:
This is
a test. Let's go on.
Yeah.
Solution 3: To remove "\n" from the string and save each split at \n to a vector.
Loop through the string and find positions(using std::string::find) where \n(end position) finds. Push back the substrings (std::string::substr) using the information of starting position and the number of charectors between start and end position. Each time update the start and end positions, so that look up will not start again from the beging of the input string. SEE LIVE HERE
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <cstddef>
int main()
{
std::string myString = "This is \n a test. Let's go on. \n Yeah.";
std::vector<std::string> vec;
std::size_t start_pos = 0;
std::size_t end_pos = 0;
while ((end_pos = myString.find("\n", end_pos)) != std::string::npos)
{
vec.emplace_back(myString.substr(start_pos, end_pos - start_pos));
start_pos = end_pos + 1;
end_pos += 2;
}
vec.emplace_back(myString.substr(start_pos, myString.size() - start_pos)); // last substring
for(const auto& it: vec) std::cout << it << "\n";
}
Output:
This is
a test. Let's go on.
Yeah.

Related

Boost split string by Blank Line

Is there a way to use boost::split to split a string when a blank line is encountered?
Here is a snippet of what I mean.
std::stringstream source;
source.str(input_string);
std::string line;
std::getline(source, line, '\0');
std::vector<std::string> token;
boost:split(token,line, boost::is_any_of("what goes here for blank line");
You can split by double \n\n unless you meant blank line as "a line that may contain other whitespace".
Live On Coliru
#include <boost/regex.hpp>
#include <boost/algorithm/string_regex.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <sstream>
#include <iostream>
#include <iomanip>
int main() {
std::stringstream source;
source.str(R"(line one
that was an empty line, now some whitespace:
bye)");
std::string line(std::istreambuf_iterator<char>(source), {});
std::vector<std::string> tokens;
auto re = boost::regex("\n\n");
boost::split_regex(tokens, line, re);
for (auto token : tokens) {
std::cout << std::quoted(token) << "\n";
}
}
Prints
"line one"
"that was an empty line, now some whitespace:
bye"
Allow whitespace on "empty" lines
Just express it in a regular expression:
auto re = boost::regex(R"(\n\s*\n)");
Now the output is: Live On Coliru
"line one"
"that was an empty line, now some whitespace:"
"bye"

Ignore spaces in vector C++

I'm trying to split a string in individual words using vector in C++. So I would like to know how to ignore spaces in vector, if user put more than one space between words in string.
How would I do that?
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
int main(){
cout<<"Sentence: ";
string sentence;
getline(cin,sentence);
vector<string> my;
int start=0;
unsigned int end=sentence.size();
unsigned int temp=0;
while(temp<end){
int te=sentence.find(" ",start);
temp=te;
my.push_back(sentence.substr(start, temp-start));
start=temp+1;
}
unsigned int i;
for(i=0 ; i<my.size() ; i++){
cout<<my[i]<<endl;
}
return 0;
}
Four things:
When reading input from a stream into astring using the overloaded >> operator, then it automatically separates on white-space. I.e. it reads "words".
There exists an input stream that uses a string as the input, std::istringstream.
You can use iterators with streams, like e.g. std::istream_iterator.
std::vector have a constructor taking a pair of iterators.
That means your code could simply be
std::string line;
std::getline(std::cin, line);
std::istringstream istr(line);
std::vector<std::string> words(std::istream_iterator<std::string>(istr),
std::istream_iterator<std::string>());
After this, the vector words will contain all the "words" from the input line.
You can easily print the "words" using std::ostream_iterator and std::copy:
std::copy(begin(words), end(words),
std::ostream_iterator<std::string>(std::cout, "\n"));
The easiest way is to use a std::istringstream like follows:
std::string sentence;
std::getline(std::cin,sentence);
std::istringstream iss(sentence);
std::vector<std::string> my;
std::string word;
while(iss >> word) {
my.push_back(word);
}
Any whitespaces will be ignored and skipped automatically.
You can create the vector directly using the std::istream_iterator which skips white spaces:
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
int main() {
std::string str = "Hello World Lorem Ipsum The Quick Brown Fox";
std::istringstream iss(str);
std::vector<std::string> vec {std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>() };
for (const auto& el : vec) {
std::cout << el << '\n';
}
}
Here is a function which divides given sentence into words.
#include <string>
#include <vector>
#include <sstream>
#include <utility>
std::vector<std::string> divideSentence(const std::string& sentence) {
std::stringstream stream(sentence);
std::vector<std::string> words;
std::string word;
while(stream >> word) {
words.push_back(std::move(word));
}
return words;
}
Reducing double, triple etc. spaces in string is a problem you'll encounter again and again. I've always used the following very simple algorithm:
Pseudocode:
while " " in string:
string.replace(" ", " ")
After the while loop, you know your string only has single spaces since multiple consecutive spaces were compressed to singles.
Most languages allow you to search for a substring in a string and most languages have the ability to run string.replace() so it's a useful trick.

Extracting multiple strings from a single line in a text file in c++

So i have a text file that contains information about books (title,author,genre) on every line that would look like this '[title]' '[author]' '[genre]'. How could i divide this line in 3 different strings so that each one is the title/author/genre?
You can split string according ANY rule if you can define regexp for that rule , then use sregex_token_iterator to enumerate all matches in string. This example would save all matches into a vector.
#include <vector>
#include <iostream>
#include <string>
#include <regex>
std::vector<std::string> get_params(const std::string& sentence)
{
std::regex reg("([^\']*)");
std::vector<std::string> names(
std::sregex_token_iterator(sentence.begin(), sentence.end(), reg),
std::sregex_token_iterator());
return names;
}
int main()
{
std::string str = "\'String1\' \'String2\' \'String3\'";
std::vector<std::string> v = get_params(str);
for (auto const& s : v)
std::cout << s << '\n';
}

getline(param1,param2,param3) usage in c++ ,linux

...may be so simple question,but am going to write a simple c++ code to parse a string using a delimiter,i want the delimiter to contain multiple spaces(actually one or more space). My question is,is it possible to do that way? my sample code is :
#include <stdio.h>
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <stdlib.h>
#include <cstring>
#include <sstream>
using namespace std;
int main()
{
string str="HELLO THIS IS 888and777";
char buf[1000];
getline(buf, 1000);
string str(buf);
stringstream stream(buf);
string toStr;
getline(stream, toStr,' ');//here the delimiter is six spaces
string str1=tostr;
getline(stream, toStr,' ');//here the delimiter is two spaces
string str2=tostr;
getline(stream, toStr,' ');//here the delimiter is three spaces
string str3=tostr;
cout<<str1<<"\t"<<str2<<"\t"<<str3<<endl;
return 0;
}
but,i cant use a delimiter of multiple chars. any idea please.
i get the following error:
error: invalid conversion from ‘void*’ to ‘char**’
error: cannot convert ‘std::string’ to ‘size_t*’ for argument ‘2’ to ‘__ssize_t getline(char**, size_t*, FILE*)’
The delimiter used by std::getline() is purely an individual character. To accept a string would require a non-trivial algorithm to guarantee suitable performance. In addition, the entities defined using 'x' normally need to result in an individual char.
For the example I think the easiest approach is to simply tokenize the string directly:
#include <tuple>
#include <utility>
#include <string>
#include <iostream>
std::pair<std::string, std::string::size_type>
get_token(std::string const& value, std::string::size_type pos, std::string const& delimiter)
{
if (pos == value.npos) {
return std::make_pair(std::string(), pos);
}
std::string::size_type end(value.find(delimiter, pos));
return end == value.npos
? std::make_pair(value.substr(pos), end)
: std::make_pair(value.substr(pos, end - pos), end + delimiter.size());
}
int main()
{
std::string str("HELLO THIS IS 888and777");
std::string str1, str2, str3;
std::string::size_type pos(0);
std::tie(str1, pos) = get_token(str, pos, " ");
std::tie(str2, pos) = get_token(str, pos, " ");
std::tie(str3, pos) = get_token(str, pos, " ");
std::cout << "str1='" << str1 << "' str2='" << str2 << "' str3='" << str3 << "'\n";
}

Splitting std::string and inserting into a std::set

As per request of the fantastic fellas over at the C++ chat lounge, what is a good way to break down a file (which in my case contains a string with roughly 100 lines, and about 10 words in each line) and insert all these words into a std::set?
The easiest way to construct any container from a source that holds a series of that element, is to use the constructor that takes a pair of iterators. Use istream_iterator to iterate over a stream.
#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
using namespace std;
int main()
{
//I create an iterator that retrieves `string` objects from `cin`
auto begin = istream_iterator<string>(cin);
//I create an iterator that represents the end of a stream
auto end = istream_iterator<string>();
//and iterate over the file, and copy those elements into my `set`
set<string> myset(begin, end);
//this line copies the elements in the set to `cout`
//I have this to verify that I did it all right
copy(myset.begin(), myset.end(), ostream_iterator<string>(cout, "\n"));
return 0;
}
http://ideone.com/iz1q0
Assuming you've read your file into a string, boost::split will do the trick:
#include <set>
#include <boost/foreach.hpp>
#include <boost/algorithm/string.hpp>
std::string astring = "abc 123 abc 123\ndef 456 def 456"; // your string
std::set<std::string> tokens; // this will receive the words
boost::split(tokens, astring, boost::is_any_of("\n ")); // split on space & newline
// Print the individual words
BOOST_FOREACH(std::string token, tokens){
std::cout << "\n" << token << std::endl;
}
Lists or Vectors can be used instead of a Set if necessary.
Also note this is almost a dupe of:
Split a string in C++?
#include <set>
#include <iostream>
#include <string>
int main()
{
std::string temp, mystring;
std::set<std::string> myset;
while(std::getline(std::cin, temp))
mystring += temp + ' ';
temp = "";
for (size_t i = 0; i < mystring.length(); i++)
{
if (mystring.at(i) == ' ' || mystring.at(i) == '\n' || mystring.at(i) == '\t')
{
myset.insert(temp);
temp = "";
}
else
{
temp.push_back(mystring.at(i));
}
}
if (temp != " " || temp != "\n" || temp != "\t")
myset.insert(temp);
for (std::set<std::string>::iterator i = myset.begin(); i != myset.end(); i++)
{
std::cout << *i << std::endl;
}
return 0;
}
Let's start at the top. First off, you need a few variables to work with. temp is just a placeholder for the string while you build it from each character in the string you want to parse. mystring is the string you are looking to split up and myset is where you will be sticking the split strings.
So then we read the file (input through < piping) and insert the contents into mystring.
Now we want to iterate down the length of the string, searching for spaces, newlines, or tabs to split the string up with. If we find one of those characters, then we need to insert the string into the set, and empty our placeholder string, otherwise, we add the character to the placeholder, which will build up the string. Once we finish, we need to add the last string to the set.
Finally, we iterate down the set, and print each string, which is simply for verification, but could be useful otherwise.
Edit: A significant improvement on my code provided by Loki Astari in a comment which I thought should be integrated into the answer:
#include <set>
#include <iostream>
#include <string>
int main()
{
std::set<std::string> myset;
std::string word;
while(std::cin >> word)
{
myset.insert(std::move(word));
}
for(std::set<std::string>::const_iterator it=myset.begin(); it!=myset.end(); ++it)
std::cout << *it << '\n';
}