How can I make this regex replace function efficient? - c++

I am using this function to perform regex replace on std::string:
String regexReplace(String s,String search,String replace,String modifier,int user){
bool case_sensitive=true,global=false;
String replaced_string=s;
if(modifier=="gi" || modifier=="ig"){global=true;case_sensitive=false;}
else if(modifier=="i"){case_sensitive=false;}
else if(modifier=="g"){global=true;}
try {
std::regex re (search);
if(user==1){re=createRegex(search,case_sensitive);}
else if(!case_sensitive){re= Regex (search, REGEX_DEFAULT | ICASE);}
if(global){
replaced_string=std::regex_replace (s,re,replace,std::regex_constants::format_default);
}
else{
replaced_string=std::regex_replace (s,re,replace,NON_RECURSIVE_REGEX_REPLACE);
}
}
catch (std::regex_error& e) {
printErrorLog("Invalid replace string regex: "+search);
Exit(1);
}
return replaced_string;
}
typedefs and #defines used:
typedef std::regex Regex;
typedef std::string String;
#define REGEX_DEFAULT std::regex::ECMAScript
#define ICASE std::regex::icase
#define NON_RECURSIVE_REGEX_REPLACE std::regex_constants::format_first_only
But this function consumes approximately 0.3 seconds on 14x4 consecutive executions:
res=regexReplace(res,"([^\\.]*\\.\\d+?)0+$","$1","i",0);
res=regexReplace(res,"([^\\.]*\\.\\d+?)0+(e.*)$","$1$2","i",0);
res=regexReplace(res,"([^\\.]*)\\.0*$","$1","i",0);
res=regexReplace(res,"([^\\.]*)\\.0*(e.*)$","$1$2","i",0);
Can I make it more efficient to lessen the execution time?
Note:
The createRegex() function is not being called (user=0 by default).

Related

Cross platform file list using wildcard

I'm looking for a cross platform function that supports wildcard listing of a directory contents similar to what FindFirstFile on windows.
Is the wildcard pattern accepted in windows very specific to windows? I want something that supports FindFirstFile wildcard pattern but he working in Linux as well.
If C++17 and above:
You can "walk" a directory using a directory iterator, and match walked file names with a regex, like this:
static std::optional<std::string> find_file(const std::string& search_path, const std::regex& regex) {
const std::filesystem::directory_iterator end;
try {
for (std::filesystem::directory_iterator iter{search_path}; iter != end; iter++) {
const std::string file_ext = iter->path().extension().string();
if (std::filesystem::is_regular_file(*iter)) {
if (std::regex_match(file_ext, regex)) {
return (iter->path().string());
}
}
}
}
catch (std::exception&) {}
return std::nullopt;
}
Usage would be for example, for finding the first file, that ends in .txt:
auto first_file = find_file("DocumentsDirectory", std::regex("\\.(?:txt)"));
Similarly, if you are interested in more than matching by extension, the function line
const std::string file_ext = iter->path().extension().string();
should be modified to something that captures the part of the filename you are interested in (or the whole path to the file)
This could then be used in a function, which performs the wildcard listing by directory.
Here is a recursive variant.
It calls a functional f for each file in the list and returns the number of files found.
It is also recursive: it descends sub directories to the max depth specified.
Note that the search filter does a filename is matched.
The try-catch block in removed so that the caller can catch and process any problems.
#include <string>
#include <regex>
#include <filesystem>
// recursively call a functional *f* for each file that matches the expression
inline int foreach_file(const std::string& search_path, const std::regex& regex, int depth, std::function<void(std::string)> f) {
int n = 0;
const std::filesystem::directory_iterator end;
for (std::filesystem::directory_iterator iter{ search_path }; iter != end; iter++) {
const std::string filename = iter->path().filename().string();
if (std::filesystem::is_regular_file(*iter)) {
if (std::regex_match(filename, regex)) {
n++;
f(iter->path().string());
}
}
else if (std::filesystem::is_directory(*iter) && depth>0) {
n += foreach_file(iter->path().string(), regex, depth - 1, f);
}
}
return n;
}
Example:
void do_something(string filename) {
...
}
void do_all_json_that_start_with_z() {
// regex matches the whole filename
regex r("z.*.json", regex::ECMAScript | regex::icase); // ignoring case
foreach_file(R"(C:\MyFiles\)", r, 99, do_something); // max depth 99
}
// can use lambdas
void do_all_json_that_start_with_z() {
int n=0;
foreach_file(
R"(C:\MyFiles\)", // using raw string - for windows
regex("z.*.json"),
0, // do not descend to sub-directories
[&n](string s) { printf("%d) %s\n", ++n, s.c_str()); });
}

If statement fails with regex comparison

public list[str] deleteBlockComments(list[str] fileLines)
{
bool blockComment = false;
list[str] sourceFile = [];
for(fileLine <- fileLines)
{
fileLine = trim(fileLine);
println(fileLine);
if (/^[\t]*[\/*].*$/ := fileLine)
{
blockComment = true;
}
if (/^[\t]*[*\/].*$/ := fileLine)
{
blockComment = false;
}
println(blockComment);
if(!blockComment)
{
sourceFile = sourceFile + fileLine;
}
}
return sourceFile;
}
For some reason, I am not able to detect /* at the beginning of a string. If I execute this on the command line, it seems to work fine.
Can someone tell me what I am doing wrong? In the picture below you can see the string to be compared above the comparison result (false).
[\/*] is a character set that matches forward slash or star, not both one after the other. Simply remove the square brackets and your pattern should start behaving as you expect.
While we're at it, let's also get rid of the superfluous square brackets around \t
^\t*\/*.*$

Not matching string of regular expression in C programing

Requirement is,
In a string (plan is the variable), For which does not have the sub string "TT", should return success message in the pattern matching ("Regular expression not having TT" message in the console).
I tried [^(TT)], I tried negation,
I tried ^(?!.*TT).$
#include <stdio.h>
#include "regex.h"
int main()
{
regex_t exps;
int r1 =-1;
int r2= -1;
char *pattern="\\^(\\?\\!.*TT).*\\$";
char *plan="TEST QBSE US 5USD charge sample conv offer";
r1=regcomp(&exps, pattern, REG_EXTENDED | REG_NOSUB);
if (r1 == 0)
{
printf("Regular expression is parsed sucessfully:%s \n",pattern);
}
else
{
printf("Regular expression parsing failed.\n");
}
r2=regexec(&exps, plan, (size_t)0, NULL, 0);
if (r2 == 0)
{
printf("Regular expression not having TT \n");
}
else
{
printf("Regular expression is not matched.\n");
}
regfree(&exps);
return 0;
}
current output:
Regular expression is parsed sucessfully :\^(\?!.TT).\$
Regular expression is not matched.
You may use
char *pattern="^[^T]*(T($|[^T]+))*$";
See the C code demo and the regex demo.
Details
^ - start of string
[^T]* - 0 or more chars other than T
(T($|[^T]+))* - 0 or more repetitions of
T - a T char...
($|[^T]+) - ... followed with the end of string ($) or (|) any 1 or more chars other than T ([^T]+)
$ - end of string.
NOTE: If the regex is any kind of regex pattern that you need to negate the way you describe (validate) you should actually do that easier with the code: use a mere char *pattern="TT"; pattern and if regexec return value is not 0 return true (if (r2 != 0)):
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
int main (void)
{
regex_t exps;
int r1 =-1;
int r2= -1;
char *pattern="TT"; // <-- The regex is simple
char *plan="TEST QBSE US 5USD charge sample conv offer";
r1=regcomp(&exps, pattern, REG_EXTENDED | REG_NOSUB);
if (r1 == 0)
{
printf("Regular expression is parsed sucessfully:%s \n",pattern);
}
else
{
printf("Regular expression parsing failed.\n");
}
r2=regexec(&exps, plan, (size_t)0, NULL, 0);
if (r2 != 0) // <-- Here goes the negation
{
printf("Regular expression not having TT \n");
}
else
{
printf("Regular expression is not matched.\n");
}
regfree(&exps);
return 0;
}
See the C demo online.

VB.NET replace codeblock in file found by variable

Let's assume I have files in a local Github and I need to replace this code
bool CBlock::ReadFromDisk(const CBlockIndex* pindex, bool fReadTransactions)
{
if (!fReadTransactions)
{
*this = pindex->GetBlockHeader();
return true;
}
if (!ReadFromDisk(pindex->nFile, pindex->nBlockPos, fReadTransactions))
return false;
if (GetHash() != pindex->GetBlockHash())
return error("CBlock::ReadFromDisk() : GetHash() doesn't match index");
return true;
}
I'm looking for a way to 'scan' the files and match the initial
bool CBlock::ReadFromDisk(const CBlockIndex* pindex, bool fReadTransactions)
and replace the code between the brackets {} entirely.
So the only part I know about the blocks I need to replace are the declarations.
To match everything between ( and ) you can use:
(?<=bool CBlock::ReadFromDisk\().*?(?=\))
Dim ResultString As String
Try
ResultString = Regex.Replace(SubjectString, "(?<=bool CBlock::ReadFromDisk\().*?(?=\))", "replacement_text_here", RegexOptions.Multiline)
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
To match everything between { and } you can use:
(?<=bool CBlock::ReadFromDisk\(const CBlockIndex\* pindex, bool fReadTransactions\)\s*\{)(?:[^{}]|(?<open>{)|(?<-open>}))+(?(open)(?!))(?=\})
Dim ResultString As String
Try
ResultString = Regex.Replace(SubjectString, "(?<=bool CBlock::ReadFromDisk\(const CBlockIndex\* pindex, bool fReadTransactions\)\s*\{)(?:[^{}]|(?<open>{)|(?<-open>}))+(?(open)(?!))(?=\})", "replacement_text_here")
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try

boost xpressive: wrong match?

I want to match a simple expression with boost, but it behaves strange... The code below should match and display "a" from first and second strings:
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
#include "stdio.h"
using namespace boost::xpressive;
void xmatch_action( const char *line ) {
cregex g_re_var;
cmatch what;
g_re_var = cregex::compile( "\\s*var\\s+([\\w]+)\\s*=.*?" );
if (regex_match(line, what, g_re_var )) {
printf("OK\n");
printf(">%s<\n", what[1] );
}
else {
printf("NOK\n");
}
}
int main()
{
xmatch_action("var a = qqq");
xmatch_action(" var a = aaa");
xmatch_action(" var abc ");
}
but my actual output is:
OK
>a = qqq<
OK
>a = aaa<
NOK
and it should be
OK
>a<
OK
>a<
NOK
Instead of printf() use the << operator to print the sub_match object (what[1]). Or you can try using what[1].str() instead of what[1].
See the docs: sub_match, match_results, regex_match
Remove square brackets around \w in regex AND use std::cout for printing. Then you will get result that you want.