I'm trying to use the split() function provided in boost/algorithm/string.hpp in the following function :
vector<std::string> splitString(string input, string pivot) { //Pivot: e.g., "##"
vector<string> splitInput; //Vector where the string is split and stored
split(splitInput,input,is_any_of(pivot),token_compress_on); //Split the string
return splitInput;
}
The following call :
string hello = "Hieafds##addgaeg##adf#h";
vector<string> split = splitString(hello,"##"); //Split the string based on occurrences of "##"
splits the string into "Hieafds" "addgaeg" "adf" & "h". However I don't want the string to be split by a single #. I think that the problem is with is_any_of().
How should the function be modified so that the string is split only by occurrences of "##" ?
You're right, you have to use is_any_of()
std::string input = "some##text";
std::vector<std::string> output;
split( output, input, is_any_of( "##" ) );
update
But, if you want to split on exactly two sharp, maybe you have to use a regular expression:
split_regex( output, input, regex( "##" ) );
take a look at the documentation example.
Related
I could have a string like:
During this time , Bond meets a stunning IRS agent , whom he seduces .
I need to remove the extra spaces before the comma and before the period in my whole string. I tried throwing this into a char vector and only not push_back if the current char was " " and the following char was a "." or "," but it did not work. I know there is a simple way to do it maybe using trim(), find(), or erase() or some kind of regex but I am not the most familiar with regex.
A solution could be (using regex library):
std::string fix_string(const std::string& str) {
static const std::regex rgx_pattern("\\s+(?=[\\.,])");
std::string rtn;
rtn.reserve(str.size());
std::regex_replace(std::back_insert_iterator<std::string>(rtn),
str.cbegin(),
str.cend(),
rgx_pattern,
"");
return rtn;
}
This function takes in input a string and "fixes the spaces problem".
Here a demo
On a loop search for string " ," and if you find one replace that to ",":
std::string str = "...";
while( true ) {
auto pos = str.find( " ," );
if( pos == std::string::npos )
break;
str.replace( pos, 2, "," );
}
Do the same for " .". If you need to process different space symbols like tab use regex and proper group.
I don't know how to use regex for C++, also not sure if C++ supports PCRE regex, anyway I post this answer for the regex (I could delete it if it doesn't work for C++).
You can use this regex:
\s+(?=[,.])
Regex demo
First, there is no need to use a vector of char: you could very well do the same by using an std::string.
Then, your approach can't work because your copy is independent of the position of the space. Unfortunately you have to remove only spaces around the punctuation, and not those between words.
Modifying your code slightly you could delay copy of spaces waiting to the value of the first non-space: if it's not a punctuation you'd copy a space before the character, otherwise you just copy the non-space char (thus getting rid of spaces.
Similarly, once you've copied a punctuation just loop and ignore the following spaces until the first non-space char.
I could have written code. It would have been shorter. But i prefer letting you finish your homework with full understanding of the approach.
S12345>T12345:abcdancd
Here start with S & few numeric char then > then T & few numeric char then : and any string.
#"^S\d+>T\d+:[a-z]+",Regexoption.Ignorecase
Instead of regex you can also use simpler means like String.Split method:
string str = "S12345>T12345:abcdancd";
string[] parts = str.Split('>', ':');
foreach (var part in parts)
Console.WriteLine(part);
Output:
// S12345
// T12345
// abcdancd
std::istream::ignore discards characters until one compares equal to delim. Is there an alternative working on strings rather then chars, i.e one that discards strings until one compares equal to the specified?
The easiest way would be to continuously extract a string until you find the right one:
std::istringstream iss;
std::string str;
std::string pattern = "find me";
while ( iss >> str && str != pattern ) ;
if (!iss) { /* Error occured */ }
This assumes that the strings are delimited with whitespace characters, of course.
small question about C++ replace function. I'm parsing every line of text input line by line. Example of the text file:
SF27_34KJ
EEE_30888
KPD324222
4230_333
And I need to remove all the underscores on every line and replace it with a comma. When I try something like this:
mystring.replace(mystring.begin(), mystring.end(), '_', ',');
on every line - instead of "SF27,34KJ" I get 95x "," char. What could be wrong?
Use std::replace():
std::replace(mystring.begin(), mystring.end(), '_', ',');
basic_string::replace doesn't do what you think it does.
basic_string::replace(it_a, it_e, ... ) replaces all of the characters between it_a and it_e with whatever you specify, not just those that match something.
There are a hundred ways to do what you're trying to do, but the simplest is probably to use the std::replace from <algorithm>, which does do what you want:
std::replace(mystring.begin(), mystring.end(), '_', ',');
Another method is to use std::transform in conjunction with a functor. This has an advantage over std::replace in that you can perform multiple substitutions in a single pass.
Here is a C++03 functor that would do it:
struct ReplChars
{
char operator()(char c) const
{
if( c == '_' )
return ',';
if( c == '*' )
return '.';
return c;
}
};
...and the use of it:
std::transform(mystring.begin(), mystring.end(), mystring.begin(), ReplChars());
In C++11, this can be reduced by using a lambda instead of the functor:
std::transform(mystring.begin(), mystring.end(), mystring.begin(), [](char c)->char
{
if( c == '_' )
return ',';
if( c == '*' )
return '.';
return c;
});
Looking here, there is no replace method which takes two iterators and then two characters. Considering the ascii value of '_' is 95, I'm guessing you're hitting this one instead:
string& replace ( iterator i1, iterator i2, size_t n2, char c );
So instead of replacing all instances of '_' with ',', instead you're replacing the string from begin() to end() with 95 ','s.
See here for how to replace occurrances in a string.
This isn't exactly how replace works. Check out the api http://www.cplusplus.com/reference/string/string/replace/, you give iterators for the beginning and end along with a string to copy in, but the other argument is the maximum length of that section.
To get the functionality that you're going for, try finding a substring and replacing it (two calls). That's detailed here: Replace part of a string with another string .
I have written this code to split up a string containing words with many spaces and/or tab into a string vector just containing the words.
#include<iostream>
#include<vector>
#include<boost/algorithm/string/split.hpp>
#include<boost/algorithm/string.hpp>
int main()
{
using namespace std;
string str("cONtainS SoMe CApiTaL WORDS");
vector<string> strVec;
using boost::is_any_of;
boost::algorithm::split(strVec, str, is_any_of("\t "));
vector<string>::iterator i ;
for(i = strVec.begin() ; i != strVec.end(); i++)
cout<<*i<<endl;
return 0;
}
I was expecting an output
cONtainS
SoMe
CApiTaL
WORDS
but i m geting output with space as an element in the strVec i.e
cONtainS
SoMe
CApiTaL
WORDS
You need to add a final parameter with the value boost::token_compress_on, as per the documentation:
boost::algorithm::split(strVec,str,is_any_of("\t "),boost::token_compress_on);
It's because your input contains consecutive separators. By default split interprets that to mean they have empty strings between them.
To get the output you expected, you need to specify the optional eCompress parameter, with value token_compress_on.
http://www.boost.org/doc/libs/1_43_0/doc/html/boost/algorithm/split_id667600.html