C++ regex replace with a callback function - c++

I have a map that stores id to value mapping, an input string can contain a bunch of ids. I need to replace those ids with their corresponding values. For example:
string = "I am in #1 city, it is now #2 time" // (#1 and #2 are ids here)
id_to_val_map = {1 => "New York", 2 => "summer"}
Desired output:
"I am in New York city, it is now summer time"
Is there a way I can have a callback function (that takes in the matched string and returns the string to be used as replacement) ? std::regex_replace doesn't seem to support that.
The alternative is to find all the matches, then compute their replacement values, and then perform the actual replacement. Which won't be that efficient.

You might do:
const std::map<int, std::string> m = {{1, "New York"}, {2, "summer"}};
std::string s = "I am in #1 city, it is now #2 time";
for (const auto& [id, value] : m) {
s = std::regex_replace(s, std::regex("#" + std::to_string(id)), value);
}
std::cout << s << std::endl;
Demo

A homegrown way is to use a while loop with regex_search() then
build the output string as you go.
This is essentially what regex_replace() does in a single pass.
No need to do a separate regex for each map item which has overhead of
reassignment on every item ( s=regex_replace() ) as well as covering the same
real estate with every pass.
Something like this regex
(?s)
( .*? ) # (1)
(?:
\#
( \d+ ) # (2)
| $
)
with this code
typedef std::string::const_iterator SITR;
typedef std::smatch X_smatch;
#define REGEX_SEARCH std::regex_search
std::regex _Rx = std::regex( "(?s)(.*?)(?:\\#(\\d+)|$)" );
SITR start = oldstr.begin();
SITR end = oldstr.end();
X_smatch m;
std::string newstr = "";
while ( REGEX_SEARCH( start, end, m, _Rx ) )
{
newstr.append( m[1].str() );
if ( m[2].matched ) {
{
// append the map keys value here, do error checking etc..
// std::string key = m[2].str();
int ndx = std::atoi( m[2].str() );
newstr.append( mymap[ ndx ] );
}
start = m[0].second;
}
// assign the old string with new string if need be
oldstr = newstr;

Related

search and replace using regex_replace

I have a string to be searched
QString sObjectName = "looolok"
The regex_search for ".?o" results in 3 matched texts which I push to a vector matchedText
"lo" "oo" "lo"
Now I my replace text is "o"
So I would expect the str to be changed to
oook
I am using boost xpressive regex_replace for this operation . This is my code
std::vector<QString>::iterator it = matchedText.begin();
wsregex regExp;
std::string strOut;
std::string::iterator itStr = strOut.begin(); ;
for( ; it != matchedText.end(); ++it )
{
regExp = wsregex::compile( (*it).toStdWString() );
boost::xpressive::regex_replace( itStr, sObjectName.begin(), sObjectName.end(), regExp, qReplaceBy.toStdString(), regex_constants::format_perl );
}
However the strOut contains ooook.
What am I missing ?

A regex for extracting " ; " or "=" symbols from source code?

For example
int val = 13;
Serial.begin(9600);
val = DigitalWrite(900,HIGH);
I really want to extract special symbols like = and ;.
I've been able to extracted symbols that appear adjacent in the code, but I need all occurrences.
I tried [^ "//"A-Za-z\t\n0-9]* and [\;\=\{\}\,]+. Neither worked.
what's wrong?
i had made a rule for my scanner like below.(had been changed)
semicolon [;]([\n]|[^ "//"])
assignment (.)?[=]+
brace ([{]|[}])([\n]|[^ "//"])
roundbarcket ("()")" "
the problem was occurred like these situations
int val= 13; // it couldn't recognize "=" because "val" and "=" is adjoined. i want to recognize them either adjoined or not
serial.read(); // it couldn't recognize () and ; with individually. if i add semicolon rule and roundbarcket rule, (); was recognized.
how can i solve them ?
You want to break "DigitalWrite(900,HIGH);" into "DigitalWrite" "(" "900" "," "HIGH" ")" ";". I think looping each substring is the fastest way.
string text = "val = DigitalWrite(900,HIGH);";
string[] symbols = new string[] { "(", ")", ",", "=", ";"};
List<string> tokens = new List<string>();
string word = "";
for( int i = 0; i < text.Length; i++ )
{
string letter = text.Substring( i, 1 );
if( !letter.Equals( " " ) )
{
if( tokens.Contains( letter ) )
{
if( word.Length > 0 )
{
tokens.Add( word );
word = "";
}
tokens.Add( letter );
}
else
{
word += letter;
if(i == text.Length - 1 )
tokens.Add( word );
}
}
}
So searching for ";" and "=" is the ultimate goal you want to achieve?
In such case, why don't you just use something like .find() function?
Or, you can split strings by ";" first and search for "=" after.
If you want to grab text between "=" and ";", try use =([^;]*); or =(.*?);

C++ regex_match not working

Here is part of my code
bool CSettings::bParseLine ( const char* input )
{
//_asm INT 3
std::string line ( input );
std::size_t position = std::string::npos, comment;
regex cvarPattern ( "\\.([a-zA-Z_]+)" );
regex parentPattern ( "^([a-zA-Z0-9_]+)\\." );
regex cvarValue ( "\\.[a-zA-Z0-9_]+[ ]*=[ ]*(\\d+\\.*\\d*)" );
std::cmatch matchedParent, matchedCvar;
if ( line.empty ( ) )
return false;
if ( !std::regex_match ( line.c_str ( ), matchedParent, parentPattern ) )
return false;
if ( !std::regex_match ( line.c_str ( ), matchedCvar, cvarPattern ) )
return false;
...
}
I try to separate with it lines which I read from file - lines look like:
foo.bar = 15
baz.asd = 13
ddd.dgh = 66
and I want to extract parts from it - e.g. for 1st line foo.bar = 15, I want to end up with something like:
a = foo
b = bar
c = 15
but now, regex is returning always false, I tested it on many online regex checkers, and even in visual studio, and it's working great, do I need some different syntax for C++ regex_match? I'm using visual studio 2013 community
The problem is that std::regex_match must match the entire string but you are trying to match only part of it.
You need to either use std::regex_search or alter your regular expression to match all three parts at once:
#include <regex>
#include <string>
#include <iostream>
const auto test =
{
"foo.bar = 15"
, "baz.asd = 13"
, "ddd.dgh = 66"
};
int main()
{
const std::regex r(R"~(([^.]+)\.([^\s]+)[^0-9]+(\d+))~");
// ( 1 ) ( 2 ) ( 3 ) <- capture groups
std::cmatch m;
for(const auto& line: test)
{
if(std::regex_match(line, m, r))
{
// m.str(0) is the entire matched string
// m.str(1) is the 1st capture group
// etc...
std::cout << "a = " << m.str(1) << '\n';
std::cout << "b = " << m.str(2) << '\n';
std::cout << "c = " << m.str(3) << '\n';
std::cout << '\n';
}
}
}
Regular expression: https://regex101.com/r/kB2cX3/2
Output:
a = foo
b = bar
c = 15
a = baz
b = asd
c = 13
a = ddd
b = dgh
c = 66
To focus on regex patterns I'd prefer to use raw string literals in c++:
regex cvarPattern ( R"rgx(\.([a-zA-Z_]+))rgx" );
regex parentPattern ( R"rgx(^([a-zA-Z0-9_]+)\.)rgx" );
regex cvarValue ( R"rgx(\.[a-zA-Z0-9_]+[ ]*=[ ]*(\d+\.*\d*))rgx" );
Everything between the rgx( )rgx delimiters doesn't need any extra escaping for c++ char literal characters.
Actually what you have written in your question resembles to those regular expressions I've been writing as raw string literals.
You probably simply meant something like
regex cvarPattern ( R"rgx(.([a-zA-Z_]+))rgx" );
regex parentPattern ( R"rgx(^([a-zA-Z0-9_]+).)rgx" );
regex cvarValue ( R"rgx(.[a-zA-Z0-9_]+[ ]*=[ ]*(\d+(\.\d*)?))rgx" );
I didn't dig in deeper, but I'm not getting all of these escaped characters in your regular expression patterns now.
As for your question in the comment, you can use a choice of matching sub-pattern groups, and check for which of them was applied in the matches structure:
regex cvarValue (
R"rgx(.[a-zA-Z0-9_]+[ ]*=[ ]*((\d+)|(\d+\.\d?)|([a-zA-Z]+)){1})rgx" );
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You probably don't need these cvarPattern and parentPattern regular expressions to inspect other (more detailed) views about the matching pattern.

C++ RegExp and placeholders

I'm on C++11 MSVC2013, I need to extract a number from a file name, for example:
string filename = "s 027.wav";
If I were writing code in Perl, Java or Basic, I would use a regular expression and something like this would do the trick in Perl5:
filename ~= /(\d+)/g;
and I would have the number "027" in placeholder variable $1.
Can I do this in C++ as well? Or can you suggest a different method to extract the number 027 from that string? Also, I should convert the resulting numerical string into an integral scalar, I think atoi() is what I need, right?
You can do this in C++, as of C++11 with the collection of classes found in regex. It's pretty similar to other regular expressions you've used in other languages. Here's a no-frills example of how you might search for the number in the filename you posted:
const std::string filename = "s 027.wav";
std::regex re = std::regex("[0-9]+");
std::smatch matches;
if (std::regex_search(filename, matches, re)) {
std::cout << matches.size() << " matches." << std::endl;
for (auto &match : matches) {
std::cout << match << std::endl;
}
}
As far as converting 027 into a number, you could use atoi (from cstdlib) like you mentioned, but this will store the value 27, not 027. If you want to keep the 0 prefix, I believe you will need to keep this as a string. match above is a sub_match so, extract a string and convert to a const char* for atoi:
int value = atoi(match.str().c_str());
Ok, I solved using std::regex which for some reason I couldn't get to work properly when trying to modify the examples I found around the web. It was simpler than I thought. This is the code I wrote:
#include <regex>
#include <string>
string FileName = "s 027.wav";
// The search object
smatch m;
// The regexp /\d+/ works in Perl and Java but for some reason didn't work here.
// With this other variation I look for exactly a string of 1 to 3 characters
// containing only numbers from 0 to 9
regex re("[0-9]{1,3}");
// Do the search
regex_search (FileName, m, re);
// 'm' is actually an array where every index contains a match
// (equally to $1, $2, $2, etc. in Perl)
string sMidiNoteNum = m[0];
// This casts the string to an integer number
int MidiNote = atoi(sMidiNoteNum.c_str());
Here is an example using Boost, substitute the proper namespace and it should work.
typedef std::string::const_iterator SITR;
SITR start = str.begin();
SITR end = str.end();
boost::regex NumRx("\\d+");
boost::smatch m;
while ( boost::regex_search ( start, end, m, NumRx ) )
{
int val = atoi( m[0].str().c_str() )
start = m[0].second;
}

Need to parse a string, having a mask (something like this "%yr-%mh-%dy"), so i get the int values

For example i have to find time in format mentioned in the title(but %-tags order can be different) in a string "The date is 2009-August-25." How can i make the program interprete the tags and what construction is better to use for storing them among with information about how to act with certain pieces of date string?
First look into boost::date_time library. It has IO system witch may be what you want but I see lack of searching.
To do custom date searching you need boost::xpressive. It contain anything you will need. Lets look into my hastily writed example. First you should parse your custom pattern, witch is easy with Xpressive. First look at header you need:
#include <string>
#include <iostream>
#include <map>
#include <boost/xpressive/xpressive_static.hpp>
#include <boost/xpressive/regex_actions.hpp>
//make example shorter but less clear
using namespace boost::xpressive;
Second define map of your special tags:
std::map<std::string, int > number_map;
number_map["%yr"] = 0;
number_map["%mh"] = 1;
number_map["%dy"] = 2;
number_map["%%"] = 3; // escape a %
Next step is to create a regex witch will parse our pattern with tags and save values from map into variable tag_id when it find tag or save -1 otherwise:
int tag_id;
sregex rx=((a1=number_map)|(s1=+~as_xpr('%')))[ref(tag_id)=(a1|-1)];
More information and description look here and here.
Now lets parse some pattern:
std::string pattern("%yr-%mh-%dy"); // this will be parsed
sregex_token_iterator begin( pattern.begin(), pattern.end(), rx ), end;
if(begin == end) throw std::runtime_error("The pattern is empty!");
The sregex_token_iterator will iterate over our tokens, and each time it will set tag_id varible. All we have to do is to build regex using this tokens. We will construct this regex using tag corresponding parts of static regex defined in array:
sregex regex_group[] = {
range('1','9') >> repeat<3,3>( _d ), // 4 digit year
as_xpr( "January" ) | "February" | "August", // not all month XD so lazy
repeat<2,2>( range('0','9') )[ // two digit day
check(as<int>(_) >= 1 && as<int>(_) <= 31) ], //only bettwen 1 and 31
as_xpr( '%' ) // match escaped %
};
Finally, lets start build our special regex. The first match will construct first part of it. If the tag is matched and tag_id is non negative we choose regex from array, else the match is probably the delimiter and we construct regex witch match it:
sregex custom_regex = (tag_id>=0) ? regex_group[tag_id] : as_xpr(begin->str());
Next we will iterate from begin to end and append next regex:
while(++begin != end)
{
if(tag_id>=0)
{
sregex nextregex = custom_regex >> regex_group[tag_id];
custom_regex = nextregex;
}
else
{
sregex nextregex = custom_regex >> as_xpr(begin->str());
custom_regex = nextregex;
}
}
Now our regex is ready, lets find some dates :-]
std::string input = "The date is 2009-August-25.";
smatch mydate;
if( regex_search( input, mydate, custom_regex ) )
std::cout << "Found " << mydate.str() << "." << std::endl;
The xpressive library is very powerful and fast. It's also beautiful use of patterns.
If you like this example, let me know in comment or points ;-)
I'd transform the tagged string in a regular expression with capture for the 3 fields and search for it. The complexity of the regular expression will depend on what you want to accept for %yr. You can also have a less strict expression and then check for valid values, this can leads to better error messages ("Invalid month: Augsut" instead of "date not found") or to false positives depending on the context.