search and replace using regex_replace - c++

I have a string to be searched
QString sObjectName = "looolok"
The regex_search for ".?o" results in 3 matched texts which I push to a vector matchedText
"lo" "oo" "lo"
Now I my replace text is "o"
So I would expect the str to be changed to
oook
I am using boost xpressive regex_replace for this operation . This is my code
std::vector<QString>::iterator it = matchedText.begin();
wsregex regExp;
std::string strOut;
std::string::iterator itStr = strOut.begin(); ;
for( ; it != matchedText.end(); ++it )
{
regExp = wsregex::compile( (*it).toStdWString() );
boost::xpressive::regex_replace( itStr, sObjectName.begin(), sObjectName.end(), regExp, qReplaceBy.toStdString(), regex_constants::format_perl );
}
However the strOut contains ooook.
What am I missing ?

Related

C++ regex replace with a callback function

I have a map that stores id to value mapping, an input string can contain a bunch of ids. I need to replace those ids with their corresponding values. For example:
string = "I am in #1 city, it is now #2 time" // (#1 and #2 are ids here)
id_to_val_map = {1 => "New York", 2 => "summer"}
Desired output:
"I am in New York city, it is now summer time"
Is there a way I can have a callback function (that takes in the matched string and returns the string to be used as replacement) ? std::regex_replace doesn't seem to support that.
The alternative is to find all the matches, then compute their replacement values, and then perform the actual replacement. Which won't be that efficient.
You might do:
const std::map<int, std::string> m = {{1, "New York"}, {2, "summer"}};
std::string s = "I am in #1 city, it is now #2 time";
for (const auto& [id, value] : m) {
s = std::regex_replace(s, std::regex("#" + std::to_string(id)), value);
}
std::cout << s << std::endl;
Demo
A homegrown way is to use a while loop with regex_search() then
build the output string as you go.
This is essentially what regex_replace() does in a single pass.
No need to do a separate regex for each map item which has overhead of
reassignment on every item ( s=regex_replace() ) as well as covering the same
real estate with every pass.
Something like this regex
(?s)
( .*? ) # (1)
(?:
\#
( \d+ ) # (2)
| $
)
with this code
typedef std::string::const_iterator SITR;
typedef std::smatch X_smatch;
#define REGEX_SEARCH std::regex_search
std::regex _Rx = std::regex( "(?s)(.*?)(?:\\#(\\d+)|$)" );
SITR start = oldstr.begin();
SITR end = oldstr.end();
X_smatch m;
std::string newstr = "";
while ( REGEX_SEARCH( start, end, m, _Rx ) )
{
newstr.append( m[1].str() );
if ( m[2].matched ) {
{
// append the map keys value here, do error checking etc..
// std::string key = m[2].str();
int ndx = std::atoi( m[2].str() );
newstr.append( mymap[ ndx ] );
}
start = m[0].second;
}
// assign the old string with new string if need be
oldstr = newstr;

C++ RegExp and placeholders

I'm on C++11 MSVC2013, I need to extract a number from a file name, for example:
string filename = "s 027.wav";
If I were writing code in Perl, Java or Basic, I would use a regular expression and something like this would do the trick in Perl5:
filename ~= /(\d+)/g;
and I would have the number "027" in placeholder variable $1.
Can I do this in C++ as well? Or can you suggest a different method to extract the number 027 from that string? Also, I should convert the resulting numerical string into an integral scalar, I think atoi() is what I need, right?
You can do this in C++, as of C++11 with the collection of classes found in regex. It's pretty similar to other regular expressions you've used in other languages. Here's a no-frills example of how you might search for the number in the filename you posted:
const std::string filename = "s 027.wav";
std::regex re = std::regex("[0-9]+");
std::smatch matches;
if (std::regex_search(filename, matches, re)) {
std::cout << matches.size() << " matches." << std::endl;
for (auto &match : matches) {
std::cout << match << std::endl;
}
}
As far as converting 027 into a number, you could use atoi (from cstdlib) like you mentioned, but this will store the value 27, not 027. If you want to keep the 0 prefix, I believe you will need to keep this as a string. match above is a sub_match so, extract a string and convert to a const char* for atoi:
int value = atoi(match.str().c_str());
Ok, I solved using std::regex which for some reason I couldn't get to work properly when trying to modify the examples I found around the web. It was simpler than I thought. This is the code I wrote:
#include <regex>
#include <string>
string FileName = "s 027.wav";
// The search object
smatch m;
// The regexp /\d+/ works in Perl and Java but for some reason didn't work here.
// With this other variation I look for exactly a string of 1 to 3 characters
// containing only numbers from 0 to 9
regex re("[0-9]{1,3}");
// Do the search
regex_search (FileName, m, re);
// 'm' is actually an array where every index contains a match
// (equally to $1, $2, $2, etc. in Perl)
string sMidiNoteNum = m[0];
// This casts the string to an integer number
int MidiNote = atoi(sMidiNoteNum.c_str());
Here is an example using Boost, substitute the proper namespace and it should work.
typedef std::string::const_iterator SITR;
SITR start = str.begin();
SITR end = str.end();
boost::regex NumRx("\\d+");
boost::smatch m;
while ( boost::regex_search ( start, end, m, NumRx ) )
{
int val = atoi( m[0].str().c_str() )
start = m[0].second;
}

How to extract a list of substring from a string using QT RegExp

How to extract a list of substrings from a string using QT RegExp for example, if i have this input string "qjkfsjkdfn 54df#Sub1#sdkf ++sdf #Sub2#q qfsdf445#Sub3#sdf"
i want to get a list that contains "Sub1", "Sub2" and "Sub3" using "(#.+#)" RegExp.
You can use the following code:
QRegExp rx("#([^#]+)#"); // create the regular expression
string text = "qjkfsjkdfn 54df#Sub1#sdkf ++sdf #Sub2#q qfsdf445#Sub3#sdf";
int pos = 0;
while ( (pos = rx.search(text, pos)) != -1 ) // while there is a matching substring
{
cout << rx.cap(1); // output the text captured in group 1
}

Remove characters from std::string from "(" to ")" with erase ?

I want to remove the substring of my string , it looks something like this :
At(Robot,Room3)
or
SwitchOn(Room2)
or
SwitchOff(Room1)
How can I remove all the characters from the left bracket ( to the right bracket ) , when I don't know their indexes ?
If you know the string matches the pattern then you can do:
std::string str = "At(Robot,Room3)";
str.erase( str.begin() + str.find_first_of("("),
str.begin() + str.find_last_of(")"));
or if you want to be safer
auto begin = str.find_first_of("(");
auto end = str.find_last_of(")");
if (std::string::npos!=begin && std::string::npos!=end && begin <= end)
str.erase(begin, end-begin);
else
report error...
You can also use the standard library <regex>.
std::string str = "At(Robot,Room3)";
str = std::regex_replace(str, std::regex("([^(]*)\\([^)]*\\)(.*)"), "$1$2");
If your compiler and standard library is new enough, then you could use std::regex_replace.
Otherwise, you search for the first '(', do a reverse search for the last ')', and use std::string::erase to remove everything in between. Or if there can be nothing after the closing parenthesis then find the first and use std::string::substr to extract the string you want to keep.
If the trouble you have is actually finding the parentheses the use std::string::find and/or std::string::rfind.
You have to search for the first '(' then erase after until 'str.length() - 1' (assuming your second bracket is always at the end)
A simple and safe and efficient solution:
std::string str = "At(Robot,Room3)";
size_t const open = str.find('(');
assert(open != std::string::npos && "Could not find opening parenthesis");
size_t const close = std.find(')', open);
assert(open != std::string::npos && "Could not find closing parenthesis");
str.erase(str.begin() + open, str.begin() + close);
Never parse a character more than once, beware of ill-formed inputs.

Remove non-ASCII characters

I have a problem where odd characters (from Word etc) are getting into a field in the database and then when I am showing that field it is showing spurious characters.
Is it possible with a RegEx to remove any non-ASCII characters? Obviously I want people to still be able to use any special characters like !#£$%^&*()_-+= etc just not non-ASCII characters.
If anyone could help that would be great!
Many Thanks!
Updated: This is in CLASSIC ASP.
In order to do this task you will need to build up various regular expressions and execute them with a sub routine call before inserting your record into the database.
Here is an excerpt from an explanation from 1stclassmedia.
str = str.replace( /\s*FONT-FAMILY:[^;"]*;?/gi, "" ) ;
str = str.replace(/<(\w[^>]*) class=([^ |>]*)([^>]*)/gi, "<$1$3") ;
str = str.replace( /<(\w[^>]*) style="([^\"]*)"([^>]*)/gi, "<$1$3" ) ;
str = str.replace( /\s*style="\s*"/gi, '' ) ;
str = str.replace( /<SPAN\s*[^>]*>\s* \s*<\/SPAN>/gi, ' ' ) ;
str = str.replace( /<SPAN\s*[^>]*><\/SPAN>/gi, '' ) ;
str = str.replace(/<(\w[^>]*) lang=([^ |>]*)([^>]*)/gi, "<$1$3") ;
str = str.replace( /<SPAN\s*>(.*?)<\/SPAN>/gi, '$1' ) ;
str = str.replace( /<FONT\s*>(.*?)<\/FONT>/gi, '$1' ) ;
//some RegEx code for the picky browsers
var re = new RegExp("(<P)([^>]*>.*?)(<\/P>)","gi") ;
str = str.replace( re, "<div$2</div>" ) ;
var re2 = new RegExp("(<font|<FONT)([^*>]*>.*?)(<\/FONT>|<\/font>)","gi") ;
str = str.replace( re2, "<div$2</div>") ;
str = str.replace( /size|SIZE = ([\d]{1})/g, '' ) ;