Conditional replace string using boost::regex_replace - c++

I want to simplify the signs in a mathematical expression using regex_replace, here is a sample code:
string entry="6+-3++5";
boost::regex signs("[\-\+]+");
cout<<boost::regex_replace(entry,signs,"?")<<endl;
The output is then 6?3?5. My question is: How can I get the proper result of 6-3+5 with some neat regular expression tools? Thanks a lot.
Tried something else with sregex_iterator and smatch, but still has some problem:
string s="63--17--42+5555";
collect_sign(s);
Output is
63+17--42+5555+42+5555+5555
i.e.
63+(17--42+5555)+(42+5555)+5555
It seems to me that the problem is related to the match.suffix(), Could anybody help please? The collect_sign function basically just iterate through every sign strings, convert it to "-"/"+" if the number of "-" is odd/even, and then stitch together the suffix expression of the signs.
void collect_sign(string& entry)
{
boost::regex signs("[\-\+]+");
string output="";
auto signs_begin = boost::sregex_iterator(entry.begin(), entry.end(), signs);
auto signs_end = boost::sregex_iterator();
for (boost::sregex_iterator it = signs_begin; it != signs_end; ++it)
{
boost::smatch match = *it;
if (it ==signs_begin)
output+=match.prefix().str();
string match_signs = match.str();
int n_minus=count(match_signs.begin(),match_signs.end(),'-');
if (n_minus%2==0)
output+="+";
else
output+="-";
output+=match.suffix();
}
cout<<"simplify to: "<<output<<endl;
}

Use:
[+\-*\/]*([+\-*\/])
Replace:
$1
You can test here

If you just want a mathematical simplification, you can use:
s = boost::regex_replace(s, boost::regex("(?:++|--"), "+", boost::format_all);
s = boost::regex_replace(s, boost::regex("(?:+-|-+"), "-", boost::format_all);

Related

Boost::xpressive regex_search concatenate matches in a single string

I want to concatenate all matches found by regex_search into a single string, and then return it. I tried doing it with std::accumulate, but failed.
Is there a way to return something like std::accumulate(what.begin()+1, what.end(), someFunc)?
I'm not very familiar with functional programming. I know that I can make a for loop that adds strings together, but I want to try doing it otherwise. Thanks in advance!
Here is a pseudo-code snippet that might help you better understand what I want to do.
std::string foo(const std::string& text)
{
using namespace boost::xpressive;
sregex srx = +_d >> as_xpr("_") >> +_d; // some random regex
smatch what;
if (regex_search(filename, what, srx))
{
// Here I want to return a string,
// concatenated from what[1].str() + what[2].str() + ... + what[n].str();
// How do I do this?
// What about what[1].str() + "-" + what[2].str()...?
}
return std::string();
}

Extract the string between two words using RegEx in QT [duplicate]

can anybody help me with this?
I have a string which contains N substrings, delimited by tags and I have to get ALL of the substrings. The string is like
STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND
I would like to get all the strings between START/END tags, I tried with a couple of regular expressions with no luck:
(START)(.*)(END) gives me ALL the contend between the first and last tag
(START)(\w+)(END) gives me no result
The code is much simple:
QString l_str "STARTfoo barENDSTARThi there!ENDSTARTstackoverflowrulezEND";
QRegExp rx("(START)(\w+)(END)");
QStringList list;
int pos = 0;
while ((pos = rx.indexIn(l_str, pos)) != -1)
{
list << rx.cap(1);
pos += rx.matchedLength();
}
qWarning() << list;
I'd like a resulting list like:
STARTfoo barEND
STARThi there!END
STARTstackoverflowrulezEND
Any help?
Thanks!
Use rx.setMinimal(true) with .* to make it lazy:
QRegExp rx("START.*END");
rx.setMinimal(true);
See the QRegExp::setMinimal docs:
Enables or disables minimal matching. If minimal is false, matching is greedy (maximal) which is the default.

C++: Regex: returns full string and not matched group

for those asking, the {0} allows selection of any one block within the sResult string separated by the | 0 is the first block
it needs to be dynamic for future expansion as that number will be configurable by users
So I am working on a regex to extract 1 portion of a string, however while it matches the results return are not what is expected.
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern("^(?:[^|]+[|]){0}([^|;]+)");
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
for( int i = 0; i < regMatch.size(); i++)
{
//SUBMATCH 0 = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE"
//SUBMATCH 1 = "BUT|NOT|ANYTHNG|ELSE"
std::ssub_match sm = regMatch[i];
bValid = strcmp(regMatch[i].str().c_str(), pzPoint->_ptrTarget->_pzTag->szOPCItem);
}
}
For some reason I cannot figure out the code to get me just the MATCH_ME back so I can compare it to expected results list on the C++ side.
Anyone have any ideas on where I went wrong here.
It seems you're using regular expressions for what they haven't been designed for. You should first split your string at the delimiter | and apply regular expressions on the resulting tokens if you want to check them for validity.
By the way: The std::regex implementation in libstdc++ seems to be buggy. I just did some tests and found that even simple patterns containing escaped pipe characters like \\| failed to compile throwing a std::regex_error with no further information in the error message (GCC 4.8.1).
The following code example shows how to do what you are after - you compile this, then call it with a single numerical argument to extract that element of the input:
#include <iostream>
#include <cstring>
#include <regex>
int main(int argc, char *argv[]) {
char pat[100];
if (argc > 1) {
sprintf(pat, "^(?:[^|]+[|]){%s}([^|;]+)", argv[1]);
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern(pat);
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
std::ssub_match sm = regMatch[1];
std::cout << "The match is " << sm << std::endl;
//bValid = strcmp(regMatch[i].str().c_str(), pzPoint->_ptrTarget->_pzTag->szOPCItem);
}
}
return 0;
}
Creating an executable called match, you can then do
>> match 2
The match is NOT
which is what you wanted.
The regex, it turns out, works just fine - although as a matter of preference I would use \| instead of [|] for the first part.
Turns out the problem was on the C side in extracting the match, it had to be done more directly, below is the code that gets me exactly what I wanted out of the string so I can use it later.
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern("^(?:[^|]+[|]){0}([^|;]+)");
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
std::string theMatchedPortion = regMatch[1];
//the issue was not with the regex but in how I was retrieving the results.
//theMatchedPortion now equals "MATCH_ME" and by changing the number associated
with it I can navigate through the string
}

How do I replace the extensions of filenames using regex in c++?

I would like to replace the file extensions from .nef to .bmp. How do I do it using regex?
My code is something like -
string str("abc.NEF");
regex e("(.*)(\\.)(N|n)(E|e)(F|f)");
string st2 = regex_replace(str, e, "$1");
cout<<regex_match (str,e)<<"REX:"<<st2<<endl;
regex_match (str,e) gets me a hit, but st2 turns out blank. I am not very familiar with regex, but I expected to have something appear in st2. What I am doing wrong?
try this.
it will match .NEF or .nef
string str("abc.NEF");
regex e(".*(\.(NEF)|\.(nef))");
string st2 = regex_replace(str,e,"$1");
$1 will capture .NEF or .nef
check here
Try this
string test = "abc.NEF";
regex reg("\.(nef|NEF)");
test = regex_replace(test, reg, "your_string");
I suggest not to use regex for such a simple task.
Try this function:
#include <string>
#include <algorithm>
std::string Rename(const std::string& name){
std::string newName(name);
static const std::string oldSuffix = "nef";
static const std::string newSuffix = "bmp";
auto dotPos = newName.rfind('.');
if (dotPos == newName.size() - oldSuffix.size() - 1){
auto suffix = newName.substr(dotPos + 1);
std::transform(suffix.begin(), suffix.end(), suffix.begin(), ::tolower);
if (suffix == oldSuffix)
newName.replace(dotPos + 1, std::string::npos, newSuffix);
}
return newName;
}
At first we find a delimiter position, then fetching the whole file extension (suffix), converting it to lower case and compare to oldSuffix.
Of course you can set oldSuffix and newSuffix to be arguments, not static consts.
Here is a test program: http://ideone.com/D09NVL
I think boost offers the simplest and most readable solution with
auto result = boost::algorithm::ireplace_last_copy(input, ".nef", ".bmp");
I think this
string str("abc.NEF");
regex e("(.*)\\.[Nn][Ee][Ff]$");
string st2 = regex_replace(str, e, "$1.bmp");
cout<<regex_match(str, e)<<"REX:"<<st2<<endl;
will work out better for you.

Boost regex not working as expected in my code

I just started using Boost::regex today and am quite a novice in Regular Expressions too. I have been using "The Regulator" and Expresso to test my regex and seem satisfied with what I see there, but transferring that regex to boost, does not seem to do what I want it to do. Any pointers to help me a solution would be most welcome. As a side question are there any tools that would help me test my regex against boost.regex?
using namespace boost;
using namespace std;
vector<string> tokenizer::to_vector_int(const string s)
{
regex re("\\d*");
vector<string> vs;
cmatch matches;
if( regex_match(s.c_str(), matches, re) ) {
MessageBox(NULL, L"Hmmm", L"", MB_OK); // it never gets here
for( unsigned int i = 1 ; i < matches.size() ; ++i ) {
string match(matches[i].first, matches[i].second);
vs.push_back(match);
}
}
return vs;
}
void _uttokenizer::test_to_vector_int()
{
vector<string> __vi = tokenizer::to_vector_int("0<br/>1");
for( int i = 0 ; i < __vi.size() ; ++i ) INFO(__vi[i]);
CPPUNIT_ASSERT_EQUAL(2, (int)__vi.size());//always fails
}
Update (Thanks to Dav for helping me clarify my question):
I was hoping to get a vector with 2 strings in them => "0" and "1". I instead never get a successful regex_match() (regex_match() always returns false) so the vector is always empty.
Thanks '1800 INFORMATION' for your suggestions. The to_vector_int() method now looks like this, but it goes into a never ending loop (I took the code you gave and modified it to make it compilable) and find "0","","","" and so on. It never find the "1".
vector<string> tokenizer::to_vector_int(const string s)
{
regex re("(\\d*)");
vector<string> vs;
cmatch matches;
char * loc = const_cast<char *>(s.c_str());
while( regex_search(loc, matches, re) ) {
vs.push_back(string(matches[0].first, matches[0].second));
loc = const_cast<char *>(matches.suffix().str().c_str());
}
return vs;
}
In all honesty I don't think I have still understood the basics of searching for a pattern and getting the matches. Are there any tutorials with examples that explains this?
The basic problem is that you are using regex_match when you should be using regex_search:
The algorithms regex_search and
regex_match make use of match_results
to report what matched; the difference
between these algorithms is that
regex_match will only find matches
that consume all of the input text,
where as regex_search will search for
a match anywhere within the text being
matched.
From the boost documentation. Change it to use regex_search and it will work.
Also, it looks like you are not capturing the matches. Try changing the regex to this:
regex re("(\\d*)");
Or, maybe you need to be calling regex_search repeatedly:
char *where = s.c_str();
while (regex_search(s.c_str(), matches, re))
{
where = m.suffix().first;
}
This is since you only have one capture in your regex.
Alternatively, change your regex, if you know the basic structure of the data:
regex re("(\\d+).*?(\\d+)");
This would match two numbers within the search string.
Note that the regular expression \d* will match zero or more digits - this includes the empty string "" since this is exactly zero digits. I would change the expression to \d+ which will match 1 or more.