Boost regex token iterator: getting input between parentheses - c++

I'm using the following function with Boost::tr1::sregex_token_iterator
int regexMultiple(std::string **s, std::string r)
{
std::tr1::regex term=(std::tr1::regex)r;
const std::tr1::sregex_token_iterator end;
int nCountOcurrences;
std::string sTemp=**s;
for (std::tr1::sregex_token_iterator i(sTemp.begin(),sTemp.end(), term); i != end; ++i)
{
(*s)[nCountOcurrences]=*i;
nCountOcurrences++;
}
return nCountOcurrences;
}
As you can suppose, **s is a pointer to a string, and r is the regex in question. This function works (in fact, this one might not work because I modified it from the original just to make it simpler, given that the rest is not relevant to the question).
What I want to know is, given, for example, a regex of this kind: "Email: (.*?) Phone:...", is there any way to retrieve only the (.*?) part from it, or should I apply substrings over the given result to achieve this instead?
Else, it's going to throw out: Email: myemail#domain.com Phone: ..
Thanks.

Should use regex_search like Kerrek SB recommended instead: http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/regex_search.html
int regexMultiple(std::string **s, std::string r)
{
std::tr1::regex term=(std::tr1::regex)r;
std::string::const_iterator start, end;
boost::match_results<std::string::const_iterator> what;
int nCountOcurrences=0;
std::string sTemp=**s;
start=sTemp.begin();
end=sTemp.end();
boost::match_flag_type flags = boost::match_default;
while (regex_search(start,end, what, term, flags))
{
(*s)[nCountOcurrences]=what[1];
nCountOcurrences++;
start = what[0].second;
flags |= boost::match_prev_avail;
flags |= boost::match_not_bob;
}
return nCountOcurrences;
}

Related

C++ regex replace whole word

I have a small game to do in which I need to sometimes replace some group of characters with the name of the player in the sentences.
For example, I could have a sentence like :
"[Player]! Are you okay? A plane crash happened, it's on fire!"
And I need to replace the "[Player]" with some name contained in a std::string.
I have been looking for about 20 minutes in other SO questions and in the CPP reference and I really can't understand how to use the regex.
I would like to know how I can replace all instances of the "[Player]" string in a std::string.
Personally I would not use regex for this. A simple search and replace should be enough.
These are (roughly) the functions I use:
// change the string in-place
std::string& replace_all_mute(std::string& s,
const std::string& from, const std::string& to)
{
if(!from.empty())
for(std::size_t pos = 0; (pos = s.find(from, pos) + 1); pos += to.size())
s.replace(--pos, from.size(), to);
return s;
}
// return a copy of the string
std::string replace_all_copy(std::string s,
const std::string& from, const std::string& to)
{
return replace_all_mute(s, from, to);
}
int main()
{
std::string s = "[Player]! Are you okay? A plane crash happened, it's on fire!";
replace_all_mute(s, "[Player]", "Uncle Bob");
std::cout << s << '\n';
}
Output:
Uncle Bob! Are you okay? A plane crash happened, it's on fire!
Regex is meant for more complex patterns. Consider, for example, that instead of simply matching [Player], you wanted to match anything between brackets. That would be a good use for regex.
Following is an example that does just that. Unfortunately, the interface of <regex> is not flexible enough to enable dynamic replacements, so we have to implement the actual replacing ourselves.
#include <iostream>
#include <regex>
int main() {
// Anything stored here can be replaced in the string.
std::map<std::string, std::string> vars {
{"Player1", "Bill"},
{"Player2", "Ted"}
};
// Matches anything between brackets.
std::regex r(R"(\[([^\]]+?)\])");
std::string str = "[Player1], [Player1]! Are you okay? [Player2] said that a plane crash happened!";
// We need to keep track of where we are, or else we would need to search from the start of
// the string everytime, which is very wasteful.
// std::regex_iterator won't help, because the replacement may be smaller
// than the match, and it would cause strings like "[Player1][Player1]" to not match properly.
auto pos=str.cbegin();
do {
// First, we try to get a match. If there's no more matches, exit.
std::smatch m;
regex_search(pos, str.cend(), m, r);
if (m.empty()) break;
// The interface of std::match_results is terrible. Let's get what we need and
// place it in apropriately named variables.
auto var_name = m[1].str();
auto start = m[0].first;
auto end = m[0].second;
auto value = vars[var_name];
// This does the actual replacement
str.replace(start, end, value);
// We update our position. The new search will start right at the end of the replacement.
pos = m[0].first + value.size();
} while(true);
std::cout << str;
}
Output:
Bill, Bill! Are you okay? Ted said that a plane crash happened!
See it live on Coliru
Simply find and replace, e.g. boost::replace_all()
#include <boost/algorithm/string.hpp>
std::string target(""[Player]! Are you okay? A plane crash happened, it's on fire!"");
boost::replace_all(target, "[Player]", "NiNite");
As some people have mentioned, find and replace might be more useful for this scenario, you could do something like this.
std::string name = "Bill";
std::string strToFind = "[Player]";
std::string str = "[Player]! Are you okay? A plane crash happened, it's on fire!";
str.replace(str.find(strToFind), strToFind.length(), name);

Find index of first match using C++ regex

I'm trying to write a split function in C++ using regexes. So far I've come up with this;
vector<string> split(string s, regex r)
{
vector<string> splits;
while (regex_search(s, r))
{
int split_on = // index of regex match
splits.push_back(s.substr(0, split_on));
s = s.substr(split_on + 1);
}
splits.push_back(s);
return splits;
}
What I want to know is how to fill in the commented line.
You'll need just a little more than that, but see the comments in the code below. The man trick is to use a match object, here std::smatch because you're matching on a std::string, to remember where you matched (not just that you did):
vector<string> split(string s, regex r)
{
vector<string> splits;
smatch m; // <-- need a match object
while (regex_search(s, m, r)) // <-- use it here to get the match
{
int split_on = m.position(); // <-- use the match position
splits.push_back(s.substr(0, split_on));
s = s.substr(split_on + m.length()); // <-- also, skip the whole match
}
if(!s.empty()) {
splits.push_back(s); // and there may be one last token at the end
}
return splits;
}
This can be used like so:
auto v = split("foo1bar2baz345qux", std::regex("[0-9]+"));
and will give you "foo", "bar", "baz", "qux".
std::smatch is a specialization of std::match_results, for which reference documentation exists here.

Replace whole words in a string list without using external libraries

I want to replace some words without using external libraries.
My first attempt was to make a copy of the string, but it was not efficient, so this is another attempt where I use addresses:
void ReplaceString(std::string &subject, const std::string &search, const std::string &replace)
{
size_t position = 0;
while ((position = subject.find(search, position)) != std::string::npos) //if something messes up --> failure
{
subject.replace(position, search.length(), replace);
position = position + replace.length();
}
}
Because this is not very efficient either, I want to use another thing, but I got stuck; I want to use a function like replace_stuff(std::string & a); with a single parameter using string.replace() and string.find() (parsing it with a for loop or something) and then make use of std::map <std::string,std::string>; which is very convenient for me.
I want to use it for a large number of input words. (let's say replacing many bad words with some harmless ones)
The problem with your question is the lack of the necessary components in the Standard library. If you want an efficient implementation, you'd probably need a trie for efficient lookups. Writing one as part of the answer would be way to much code.
If you use a std::map or, if C++11 is available in your environment, a std::unordered_map, you will need to utilitize additional information about the input string and the search-replace pairs from the map. You'd then tokenize the string and check each token if it has to be replaced. Using positions pointing in the input string is a good idea since it avoids copying data. Which brings us to:
Efficiency will depend on memory access (reads and writes), so you should not modify the input string. Create the output by starting with an empty string and by appending pieces from the input. Check each part of the input: If it is a word, check if it needs to be replaced or if it is appended to the output unmodified. If it is not part of a word, append it unmodified.
It sounds like you want to replace all the "bad" words in a string with harmless ones, but your current implementation is inefficient because the list of bad words is much larger than the length of your input string (subject). Is this correct?
If so, the following code should make it more efficient. As you can see, I had to pass the map as a parameter, but if your function is going to be part of a class, you don't need to do so.
void ReplaceString(std::string &subject, const std::map<std::string, std::string>& replace_map)
{
size_t startofword = 0, endofword = 0;
while(startofword < subject.size())
{
size_t length = std::string::npos;
//get next word in string
endofword = subject.find_first_of(" ", startofword);
if(endofword != std::string::npos)
length = endofword-startofword;
std::string search = subject.substr(startofword, length);
//try to find this word in the map
if(replace_map.find(search) != replace_map.end())
{
//if found, replace the word with a new word
subject.replace(startofword, length, replace_map[search]);
startofword += replace_map[search].length();
}
else
{
startofword += length;
}
}
}
I use the following functions, hope it helps:
//=============================================================================
//replaces each occurence of the phrase in sWhat with sReplacement
std::string& sReplaceAll(std::string& sS, const std::string& sWhat, const std::string& sReplacement)
{
size_t pos = 0, fpos;
while ((fpos = sS.find(sWhat, pos)) != std::string::npos)
{
sS.replace(fpos, sWhat.size(), sReplacement);
pos = fpos + sReplacement.length();
}
return sS;
}
//=============================================================================
// replaces each single char from sCharList that is found within sS with entire sReplacement
std::string& sReplaceChars(std::string& sS, const std::string& sCharList, const std::string& sReplacement)
{
size_t pos=0;
while (pos < sS.length())
{
if (sCharList.find(sS.at(pos),0)!=std::string::npos) //pos is where a charlist-char was found
{
sS.replace(pos, 1, sReplacement);
pos += sReplacement.length()-1;
}
pos++;
}
return sS;
}
You might create a class, say Replacer:
class Replacer
{
std::map<std::string,> replacement;
public:
Replacer()
{
// init the map here
replacement.insert ( std::pair<std::string,std::string>("C#","C++") );
//...
}
void replace_stuff(std::string & a);
}
Then the replace_stuff definition would be very similar to your original ReplaceString (it would use map entries instead of the passed parameters).

get atof to continue converting a string to a number after the first non valid ch in a string

i'd like to know if there a way to get atof continue converting to number even if there are non valid charcters in the way
for example let say i have string "444-3-3-33"
i want to covert it to a double a=4443333
(and keep the string as it was)
happy to get any suggestions or an alternative way
thanks!
I can't take credit for this solution, though it's a good one, see this SO post. For those too lazy to skip over, the author recommends using a locale to treat all non-numeric digits as whitespace. It might be overkill for your solution but the idea is easily adaptable. Instead of all non-numeric, you could just use "-" as your whitespace. Here's his code, not mine. Please, if you like this give him the upvote.
struct digits_only: std::ctype<char>
{
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['0'], &rc['9'], std::ctype_base::digit);
return &rc[0];
}
};
bool in_range(int lower, int upper, std::string const &input) {
std::istringstream buffer(input);
buffer.imbue(std::locale(std::locale(), new digits_only()));
int n;
while (buffer>>n)
if (n < lower || upper < n)
return false;
return true;
}
Then just remove the whitespace and pass the string to atof.
Both of the following strip out non-digits for me
bool no_digit(char ch) {return !std::isdigit(ch);}
std::string read_number(const std::string& input)
{
std::string result;
std::remove_copy_if( input.begin()
, input.end()
, std::back_inserter(result)
, &no_digit);
return result;
}
std::string read_number(std::istream& is)
{
std::string result;
for(;;) {
while(is.good() && !std::isdigit(is.peek()))
is.get();
if(!is.good())
return result;
result += is.get();
}
assert(false);
}
You can then read number using string streams:
std::istringstream iss(read_number("444-3-3-33");
int i;
if( !(iss>>i) ) throw "something went wrong!";
std::cout << i << '\n';
I would recommend sscanf
[edit]
upon further review, it would seem that you'll have to use strstr as sscanf could have an issue with the embedded '-'
further, the page should give you a good start on finding (and removing) your '-' char's
[/edit]
copy the 'string number' to a local buffer(a std::string), then strip out the accepted chars from the number(compressing the string, as to not leave blank space, thus using std::string.replace), then call atof on std::string.c_str. alternatly you can use c strings, but then this wouldn't be C++.
alternatively, create a custom version of atof your self, using the source from say stdlibc or basic math.

How to get a basic preg_match_all replacement for std::string in C++?

with "basic" is meant: Only the operators "+" (->following..) and "|" (->or) are needed.
Prototype:
preg_match_all(std::string pattern, std::string subject, std::vector<std::string> &matches)
Usage Example:
std::vector<std::string> matches;
std::string pattern, subject;
subject = "Some text with a lots of foo foo and " + char(255) + " again " + char(255);
pattern = "/" + char(255) + char(255) + "+|foo+/";
preg_match_all(pattern, subject, matches);
The matches should be available afterwardsa via matches[n]. Someone got a hint without using boost and/or PCRE? If not, how I got this realized with boost?
This will return a vector of all matches an the index they were found at.
std::vector<std::pair<std::string, unsigned int>> RegexPP::MatchAll(std::string pattern, std::string haystack) {
std::vector<std::pair<std::string, unsigned int>> matches;
std::regex p(pattern);
std::sregex_iterator start(haystack.begin(), haystack.end(), p), end;
for(; start != end; start++) {
auto match = *start; // dereference the iterator to get the match_result
matches.push_back(std::pair<std::string, unsigned int>(match.str(), match.position()));
}
return matches;
}
You could rollup something using std::string::find, doing matches via a functor, and pushing the results onto a string vector.
The way it's realized in boost is probably overkill for what you want -- you first would need to decompose the expression into lexems and then compile a state machine for parsing the given regexp.
Look into Boost.Regex, http://www.boost.org/doc/libs/1_41_0/libs/regex/doc/html/index.html