How to implement /e modifier with PCRE2? - c++

In Perl, we can do this
s/pattern/func($1)/e
Is there any convenient function that does the same thing with PCRE2, like
::pcre2_substitute_with_callback(
re, // the compiled pattern
pcuSubject, ccuSubject, // the subject and its length
PCRE2_SUBSTITUTE_GLOBAL, // the substitute options
matches,
NULL, // the match context
[](PCRE2_SPTR pcuMatched)->PCRE2_SPTR{ // the callback
return "replacement";
},
pcuResult, &ccuResult
);
Thanks.

No, I think that there is no such convenience in pcre2. See the wrapper below though.
However, I believe that the replacement string for the call to pcre2_substitute can be prepared without any particular restrictions. (I cannot test now.) The use of escape character ($) for capturing groups or pattern items is clearly specified but I don't see why one couldn't use that in a function/callback to form the replacement string.
That can then be wrapped in a method with a desired signature.
Some more documentation from pcre2api is at Creating a new string with substitutions
There is a C++ wrapper JPCRE2. It uses the replace method of RegexReplace for this purpose. However, about half-way through the main page it also informs us that
There's another replace function (jp::RegexReplace::nreplace()) that takes a MatchEvaluator with a callback function. It's required when you have to create the replacement strings dynamically according to some criteria.
The class jp::MatchEvaluator implements several constructor overloads to take different callback functions.
The page continues with a full example for usage of jp::RegexReplace::nreplace().
More detailed examples are offered in a test file in the distribution.

Related

Aspose.Words - Range.Parse(regex, value, FindReplaceOptions) method can't find text specified with regex in paragraph but in table cell it does

I wrote simple word document with one paragraph and one table (with one cell) under that paragraph. I'm using Aspose 16.7 and Aspose 22.9 (on both versions I have same problem).
When I open that word document using aspose it will look like this:
\r\r\r<<AC:doc_title:value>>\r<<AC:doc_title:value>>\a\a\rTest\r\r\r
Replace method won't work when it tries to find and replace <<AC:doc_title:value>> which is in paragraph but when I put same tag in table cell, replace method will find that tag and replace it with given text. This is my replace method call:
node.Range.Replace(new Regex("<<AC:doc_title:value>>"), "Replaced text", new FindReplaceOptions(FindReplaceDirection.Forward));
I tried to call Parse method with different FindReplaceOptions but that didn't give any results.
I also tried Replace method with only two parameters, node.Range.Replace(new Regex("<<AC:doc_title:value>>"), value) and when using this method, I didn't have any problems, it works fine (but problem is that method is Obsolete now).
Thank you for your help.
Range.Replace(Regex, string) overload is no marked as obsolete. So you can use it:
https://reference.aspose.com/words/net/aspose.words/range/replace/#replace_2
This overload internally calls the overload with FindReplaeOptions, so both should work the same:
public int Replace(Regex pattern, string replacement)
{
return Replace(pattern, replacement, new FindReplaceOptions());
}
If you still has problems, please post the question in Aspose.Words support forum and attach your input document there for testing.

Regex Notepad++ Function List

I am trying to create a regex expression that will create a function list for a propietary programming language inside of notepad++ using the functionList.xml file. The regex expression needs to capture all instances of any Method, Function or Macro following this syntax:
Method Blah1(pParam1, pParam2)
[New] dynamicVar = "string"
[New] dynamicVar2 = 4
[New] result = Bar2(dynamicVar, dynamicVar2)
End Method
Function Bar2(pParam3, pParam2)
Foo3
Return pParam3 && pParam2
End Function
Macro Foo3()
End Macro
So the regexp should capture 3 instances for the above example.
Link to regexr: http://www.regexr.com/3fcd7
functionList.xml
<parser
displayName="MOX"
id ="mox_function"
commentExpr="(?s:/\*.*?\*/)|(?m-s://.*?$)"
>
<function
mainExpr="^(s|Method|Macro|Function)"
>
<functionName>
<nameExpr expr="[A-Za-z_]\w*\s*[=:]|[A-Za-z_]?\w*\s*\(" />
<nameExpr expr="[A-Za-z_]?\w*" />
</functionName>
<className>
<nameExpr expr="([A-Za-z_]\w*\.)*[A-Za-z_]\w*\." />
<nameExpr expr="([A-Za-z_]\w*\.)*[A-Za-z_]\w*" />
</className>
</function>
</parser>
Try this.
(?m)^(Method|Macro).*\(.*\)\n.*\n?^\s?End.*$
https://regex101.com/r/pAkFDU/
I've never been able to use FunctionList. It's been known to have stability issues for a long time and is even not listed, by default, in the plugin manager.
It's possible highly likely that a too-inclusive expression won't properly capture nested groups (unless FunctionList recurses through matches, or supports Recursion in Regex). Just something you should be aware of.
Too-inclusive regexes frequently stumble on scenarios like this
Function foo()
Function refoo()
End Function // 1
End Function // 2
Function solo(stuff,thing)
End Function
Function bar()
sample = Function (x,y)
return x & y
End Function
End Function // 3
Either failing at points 1 or 3. It can be quite difficult to properly capture the function, unless you have recursion available. (Function|Method|Macro)[\s\S]*?End \1 captures too little; take the ? away and it captures too much.
Truly recursive ((?R)) Regular Expressions (or Balanced Pairs), which aren't widely supported, can properly capture to the right point, but without code to recurse through it, won't flag the nested functions.
(Function|Method|Macro) (\w+)\((.*)\)((?:(?R)|[\s\S]*?)+)End \1 would correctly stop at point 2.
You can use a similar expression with a lookahead: (Function|Method|Macro) (\w+)\((.*)\)(?=(?:(?R)|[\s\S]*?)+End \1) but it's also easy to fool. All it basically cares about is that End \1, follows it anywhere in the file.
If FunctionList does support recursive regex and doesn't recurse (code-side) through the results (I don't know, the plugin won't work for me), I would look at a simpler expression like below. Honestly, even if it does, you'll get the best performance out of a simple expression. It will work faster and you'll know exactly what it is and is not indicating. Anything else has too much overhead for relatively little gain.
^\s*(Function|Method|Macro) (\w+)\((.+)\)
Output: \2 Args: \3
^\s*(Function|Method|Macro) (\w+)\(\)
Output: \2
The outputs are just examples, the plugin I use lets you customize the output. Others may not. Also, prefixing with ^\s* allows you to find functions not assigned to variables or commented-out.
Or both above can be captured in one expression by changing the first expression's + to a *. It won't tell you if there's a matching End, but it will find the part you're interested in for a document outline.
Personally, I've always had good luck with an a similar plugin ambiguously named "SourceCookifier" but it only parses the file as it was last saved/opened. If I add a new function, it's not listed until save/open.
The interface for adding new languages isn't very intuitive, packing a lot of features in a small area, but it works and it works well. Like FunctionList, and probably most other alternatives, you'll need to add rules for "Mox".
I'm not saying you should switch to this particular utility, but FunctionList hasn't been updated in 7 years.

Remove invalid url characters with TRegExpr

Good day. I can't seem to find an example of how use the TRegExpr component to do a simple replace of invalid characters. For example i have a string = 'abcdeg3fghijk'; and i want to replace all the characters that are invalid such as the numerial '3', how would process this with TRegExpr to replace all invalid characters. My intention is learn how to use the TRegExpr to build a simple url cleaner/validator.
procedure TForm1.Button3Click(Sender: TObject);
var
RegExp: TRegExpr;
astr:string;
begin
astr:='h"ttp://ww"w.msn."com~~~';
// I want to clean the string to remove all non valid chars
//this is where I am lost
RegExp:=TRegExpr.Create;
try
RegExp.Expression:=RegExpression;
finally
RegExp.Free;
end;
end;
Judging from the commments and the question edit, you are trying to work out how to perform a replacement using a regex. The function you need is TRegEx.Replace.
There are lots of overloads. The simplest to use are the class functions. For example:
NewValue := TRegEx.Replace(OldValue, '3', '4');
will replace all occurrences of 3 with 4.
Or if you want to use the instance method approach, do it like this:
var
RegEx: TRegEx;
....
RegEx.Create('3');
NewValue := RegEx.Replace(OldValue, '4');
Remember that TRegEx is a record, a value type. There's no Free to call and no need for try/finally. I personally regard Create as very badly named. I would have preferred Initialize if I had been designing the TRegEx type.
Using the instance method approach allows the expression to be compiled and that speeds up performance for repeated matching of the same expression to different input data. I don't know whether that would matter for you. If not then use the class function interface which is simpler to use.
You'll obviously extend this to use a useful regex for your replacement!
The documentation for the PCRE regex flavour that Delphi uses is here: http://www.regular-expressions.info/pcre.html

Using Regex to find function containing a specific method or variable

This is my first post on stackoverflow, so please be gentle with me...
I am still learning regex - mostly because I have finally discovered how useful they can be and this is in part through using Sublime Text 2. So this is Perl regex (I believe)
I have done searching on this and other sites but I am now genuinely stuck. Maybe I am trying to do something that can't be done
I would like to find a regex (pattern) that will let me find the function or method or procedure etc that contains a given variable or method call.
I have tried a number of expressions and they seem to get part of the way but not all the way. Particularly when searching in Javascript I pick up multiple function declarations instead of the one nearest to the call/variable that I am looking for.
for example:
I am looking for the function that calls the method save data()
I have learnt, from this excellent site that I can use (?s) to switch . to include newlines
function.*(?=(?s).*?savedata\(\))
however, that will find the first instance of the word function and then all the text unto and including savedata()
if there are multiple procedures then it will start at the next function and repeat until it gets to savedata() again
function(?s).*?savedata\(\) does something similar
I have tried asking it to ignore the second function (I believe) by using something like:
function(?s).*?(?:(?!function).*?)*savedata\(\)
But that doesn't work.
I have done some investigation with look forwards and look backwards but either I am doing it wrong (highly possible) or they are not the right thing.
In summary (I guess), how do I go backwards, from a given word to the nearest occurrence of a different word.
At the moment I am using this to search through some javascript files to try and understand the structure/calls etc but ultimately I am hoping to use on c# files and some vb.net files
Many thanks in advance
Thanks for the swift responses and sorry for not added an example block of code - which I will do now (modified but still sufficient to show the issue)
if I have a simple block of javascript like the following:
function a_CellClickHandler(gridName, cellId, button){
var stuffhappenshere;
var and here;
if(something or other){
if (anothertest) {
event.returnValue=false;
event.cancelBubble=true;
return true;
}
else{
event.returnValue=false;
event.cancelBubble=true;
return true;
}
}
}
function a_DblClickHandler(gridName, cellId){
var userRow = rowfromsomewhere;
var userCell = cellfromsomewhereelse;
//this will need to save the local data before allowing any inserts to ensure that they are inserted in the correct place
if (checkforarangeofthings){
if (differenttest) {
InsSeqNum = insertnumbervalue;
InsRowID = arow.getValue()
blnWasInsert = true;
blnWasDoubleClick = true;
SaveData();
}
}
}
running the regex against this - including the second one that was identified as should be working Sublime Text 2 will select everything from the first function through to SaveData()
I would like to be able to get to just the dblClickHandler in this case - not both.
Hopefully this code snippet will add some clarity and sorry for not posting originally as I hoped a standard code file would suffice.
This regex will find every Javascript function containing the SaveData method:
(?<=[\r\n])([\t ]*+)function[^\r\n]*+[\r\n]++(?:(?!\1\})[^\r\n]*+[\r\n]++)*?[^\r\n]*?\bSaveData\(\)
It will match all the lines in the function up to, and including, the first line containing the SaveData method.
Caveat:
The source code must have well-formed indentation for this to work, as the regex uses matching indentations to detect the end of functions.
Will not match a function if it starts on the first line of the file.
Explanation:
(?<=[\r\n]) Start at the beginning of a line
([\t ]*+) Capture the indentation of that line in Capture Group 1
function[^\r\n]*+[\r\n]++ Match the rest of the declaration line of the function
(?:(?!\1\})[^\r\n]*+[\r\n]++)*? Match more lines (lazily) which are not the last line of the function, until:
[^\r\n]*?\bSaveData\(\) Match the first line of the function containing the SaveData method call
Note: The *+ and ++ are possessive quantifiers, only used to speed up execution.
EDIT:
Fixed two minor problems with the regex.
EDIT:
Fixed another minor problem with the regex.

Regex to Find/replace argument pattern in a function-call across all files

I have a large codebase, where we need to make a pattern-change in the argument of a specific function.
i.e. All arguments to a function foo() are renamed from the format something.anotherThing are to be renamed as something_anotherThing
The arguments can be anything but will always be in a str1.str2 format. It is to be done for arguments of this one function only, all other code should remain untouched.
e.g.
foo(a.x) --> foo(a_x)
foo(a4.b6) --> foo(a4_b6)
Is there any way I can achieve it using regular expression or a tool, where i can do this in one step for all the files, for one specific function?
If the function would have only one argument, it would be easy:
Use a tool that is able to search and replace in multiple files, eg. TextCrawler.
And than select the regular expression tab and fill in:
RegExp:
(foo\([^)]+)(\.)([^)]+\))
Replace:
$1_$3
This will not work, if there are more arguments in the function. But you can click the "Replace" button again and then again until it says that no result was found. You will have to do it maximum n-times, where n = max number of arguments in any function.