Regular expression for duplicating conditions in every if statement - regex

I'm writing a Regular Expression in Notepad++ to duplicate and modify certain if conditions.
For instance:
if (variable1 == "") should become
if (variable1 == "" or len(variable1) == 0)
The key things I need to match are the variable names until the =="" so I can duplicate them for the or condition
I had the following expression:
[A-Za-z0-9()]*?\s*==\s*""
But it fails when there is no white space between the If and the parentheses
https://regex101.com/r/Beyumi/3
I believe the following lines should cover most cases:
If (Trim(var1) == "" And var2 == "")
/*Then do something*
ElseIf(var3 == "" And var4 == "" And Trim(var5)=="") Then
/*block of code*/
ElseIf var6 ==""
The expression should be able to match:
Trim(var1) == ""
var2 == ""
var3 == ""
var4 == ""
Trim(var5)==""
var6 ==""
Note: In the event of And statements, I can manually add the necessary parentheses when adding the 'or' condition

You can use this regex,
((\w+|trim\(\w+\))\s*==\s*"")
and replace it with this,
(\1 or len(\2) == 0)
Notice, I've enabled case insensitive matching and (\w+|trim\(\w+\)) part matches either plain variable using \w+ or matches trim(variablename) using trim\(\w+\) and captures it in group2, where \s* around == takes care of optional whitespaces and whole group is captured in parenthesis so it can be replaced with current expression and additionally with or len(\2) == 0 and whole of it surrounded by parenthesis for appropriate grouping due to and condition.
Regex Demo

Related

is groovy regex (slightly broken)?

println "p(cat || cats, n)" ==~ /^p\(.+||.+,sn\)$/
println "" ==~ /^p\(.+||.+,sn\)$/
why does the 2nd line return true? Is this a bug?
| is a special character that means "OR" and needs to be escaped to obtain a literal |. The second regex returns true because || matches the empty string (between the two "OR")
Note there is no "s" after the comma in the first string but a space.

VBS script to report AD groups - Regex pattern not working with multiple matches

Having an issue with getting a regex statement to accept two expressions.
The "re.pattern" code here works:
If UserChoice = "" Then WScript.Quit 'Detect Cancel
re.Pattern = "[^(a-z)^(0,4,5,6,7,8,9)]"
re.Global = True
re.IgnoreCase = True
if re.test( UserChoice ) then
Exit Do
End if
MsgBox "Please choose either 1, 2 or 3 ", 48, "Invalid Entry"
While the below "regex.pattern " code does not. I want to use it to format the results of a DSQUERY command where groups are collected, but I don't want any of the info after the ",", nor do i want the leading CN= that is normally collected when the following dsquery is run:
"dsquery.exe user forestroot -samid "& strInput &" | dsget user -memberof")
The string I want to format would look something like this before formatting:
CN=APP_GROUP_123,OU=Global Groups,OU=Accounts,DC=corp,DC=contoso,DC=biz
This is the result I want:
APP_GROUP_123
Set regEx = New RegExp
**regEx.Pattern = "[,.*]["CN=]"**
Result = regEx.Replace(StrLine, "")
I'm only able to get the regex to work when used individually, either
regEx.Pattern = ",."
or
regEx.Pattern = "CN="
code is nested here:
Set InputFile = FSO.OpenTextFile("Temp.txt", 1)
Set InputFile = FSO.OpenTextFile("Temp.txt", 1)
set OutPutFile = FSO.OpenTextFile(StrInput & "-Results.txt", 8, True)
do While InputFile.AtEndOfStream = False
StrLine = InputFile.ReadLine
If inStr(strLine, TaskChoice) then
Set regEx = New RegExp
regEx.Pattern = "[A-Za-z]{2}=(.+?),.*"
Result = regEx.Replace(StrLine, "")
OutputFile.write(Replace(Result,"""","")) & vbCrLf
End if
This should get you started:
str = "CN=APP_GROUP_123,OU=Global Groups,OU=Accounts,DC=corp,DC=contoso,DC=biz"
Set re = New RegExp
re.pattern = "[A-Za-z]{2}=(.+?),.*"
if re.Test(str) then
set matches = re.Execute(str)
matched_str = "Matched: " & matches(0).SubMatches(0)
Wscript.echo matched_str
else
Wscript.echo "Not a match"
end if
Output:Matched: APP_GROUP_123
The regex you need is [A-Za-z]{2}=(.+?),.*
If the match is successful, it captures everything in the parenthesis. .+? means it will match any character non-greedily up until the first comma. The ? in .+? makes the expression non-greedy. If you were to omit it, you would capture everything up to the final comma at ,DC=biz
Your regular expression "[,.*]["CN=]" doesn't work for 2 reasons:
It contains an unescaped double quote. Double quotes inside VBScript strings must be escaped by doubling them, otherwise the interpreter would interpret your expression as a string "[,.*][", followed by an (invalid) variablename CN=] (without an operator too) and the beginning of the next string (the 3rd double quote).
You misunderstand regular expression syntax. Square brackets indicate a character class. An expression [,.*] would match any single comma, period or asterisk, not a comma followed by any number of characters.
What you meant to use was an alternation, which is expressed by a pipe symbol (|), and the beginning of a string is matched by a caret (^):
regEx.Pattern = ",.*|^CN="
With that said, in your case a better approach would be using a group and replacing the whole string with just the group match:
regEx.Pattern = "^cn=(.*?),.*"
regEx.IgnoreCase = True
Result = regEx.Replace(strLine, "$1")

how to have regular expression for a textfield which accepts all characters except a comma (,) and do not accept a white space at both ends

How to write a regular expression for a text field which accepts all characters except a comma (,) and do not accept a white space at both the ends? I have tried
[^,][\B ]
but no use
like 'product generic no' instead of 'product,generic,no' or ' product generic no '
I suggest a solution without regular expression. As you said you're using JS so the function is in JavaScript:
function isItInvalid(str) {
var last = str.length - 1;
return (last < 2 ||
str[0] == ' ' ||
str[last] == ' ' ||
str.indexOf(',') != -1);
}
EDIT: Just made it a bit more readable. It also checks if the string is at least 3 chars.
Something like below:
/^\S[^,]*\S$/
Using a Perl regular expression
/^\S[^,]*\S$/
This should work from 2 characters up, but fails in the edge case where the string has only one non-comma character. To cover that too:
/^((\S[^,]*\S)|([^\s,]))$/

use regular expression to find and replace but only every 3 characters for DNA sequence

Is it possible to do a find/replace using regular expressions on a string of dna such that it only considers every 3 characters (a codon of dna) at a time.
for example I would like the regular expression to see this:
dna="AAACCCTTTGGG"
as this:
AAA CCC TTT GGG
If I use the regular expressions right now and the expression was
Regex.Replace(dna,"ACC","AAA") it would find a match, but in this case of looking at 3 characters at a time there would be no match.
Is this possible?
Why use a regex? Try this instead, which is probably more efficient to boot:
public string DnaReplaceCodon(string input, string match, string replace) {
if (match.Length != 3 || replace.Length != 3)
throw new ArgumentOutOfRangeException();
var output = new StringBuilder(input.Length);
int i = 0;
while (i + 2 < input.Length) {
if (input[i] == match[0] && input[i+1] == match[1] && input[i+2] == match[2]) {
output.Append(replace);
} else {
output.Append(input[i]);
output.Append(input[i]+1);
output.Append(input[i]+2);
}
i += 3;
}
// pick up trailing letters.
while (i < input.Length) output.Append(input[i]);
return output.ToString();
}
Solution
It is possible to do this with regex. Assuming the input is valid (contains only A, T, G, C):
Regex.Replace(input, #"\G((?:.{3})*?)" + codon, "$1" + replacement);
DEMO
If the input is not guaranteed to be valid, you can just do a check with the regex ^[ATCG]*$ (allow non-multiple of 3) or ^([ATCG]{3})*$ (sequence must be multiple of 3). It doesn't make sense to operate on invalid input anyway.
Explanation
The construction above works for any codon. For the sake of explanation, let the codon be AAA. The regex will be \G((?:.{3})*?)AAA.
The whole regex actually matches the shortest substring that ends with the codon to be replaced.
\G # Must be at beginning of the string, or where last match left off
((?:.{3})*?) # Match any number of codon, lazily. The text is also captured.
AAA # The codon we want to replace
We make sure the matches only starts from positions whose index is multiple of 3 with:
\G which asserts that the match starts from where the previous match left off (or the beginning of the string)
And the fact that the pattern ((?:.{3})*?)AAA can only match a sequence whose length is multiple of 3.
Due to the lazy quantifier, we can be sure that in each match, the part before the codon to be replaced (matched by ((?:.{3})*?) part) does not contain the codon.
In the replacement, we put back the part before the codon (which is captured in capturing group 1 and can be referred to with $1), follows by the replacement codon.
NOTE
As explained in the comment, the following is not a good solution! I leave it in so that others will not fall for the same mistake
You can usually find out where a match starts and ends via m.start() and m.end(). If m.start() % 3 == 0 you found a relevant match.

Regex with Non-capturing Group

I am trying to understand Non-capturing groups in Regex.
If I have the following input:
He hit the ball. Then he ran. The crowd was cheering! How did he feel? I felt so energized!
If I want to extract the first word in each sentence, I was trying to use the match pattern:
^(\w+\b.*?)|[\.!\?]\s+(\w+)
That puts the desired output in the submatch.
Match $1
He He
. Then Then
. The The
! How How
? I I
But I was thinking that using non-capturing groups, I should be able to get them back in the match.
I tried:
^(?:\w+\b.*?)|(?:[\.!\?]\s+)(\w+)
and that yielded:
Match $1
He
. Then Then
. The The
! How How
? I I
and
^(?:\w+\b.*?)|(?:[.!\?]\s+)\w+
yielded:
Match
He
. Then
. The
! How
? I
What am I missing?
(I am testing my regex using RegExLib.com, but will then transfer it to VBA).
A simple example against string "foo":
(f)(o+)
Will yield $1 = 'f' and $2 = 'oo';
(?:f)(o+)
Here, $1 = 'oo' because you've explicitly said not to capture the first matching group. And there is no second matching group.
For your scenario, this feels about right:
(?:(\w+).*?[\.\?!] {2}?)
Note that the outermost group is a non-capturing group, while the inner group (the first word of the sentence) is capturing.
The following constructs a non-capturing group for the boundary condition, and captures the word after it with a capturing group.
(?:^|[.?!]\s*)(\w+)
It's not clear from youf question how you are applying the regex to the text, but your regular "pull out another until there are no more matches" loop should work.
This works and is simple:
([A-Z])\w*
VBA requires these flag settings:
Global = True 'Match all occurrences not just first
IgnoreCase = False 'First word of each sentence starts with a capital letter
Here's some additional hard-earned info: since your regex has at least one parenthesis set, you can use Submatches to pull out only the values in the parenthesis and ignore the rest - very useful. Here is the debug output of a function I use to get Submatches, run on your string:
theMatches.Count=5
Match='He'
Submatch Count=1
Submatch='H'
Match='Then'
Submatch Count=1
Submatch='T'
Match='The'
Submatch Count=1
Submatch='T'
Match='How'
Submatch Count=1
Submatch='H'
Match='I'
Submatch Count=1
Submatch='I'
T
Here's the call to my function that returned the above:
sText = "He hit the ball. Then he ran. The crowd was cheering! How did he feel? I felt so energized!"
sRegEx = "([A-Z])\w*"
Debug.Print ExecuteRegexCapture(sText, sRegEx, 2, 0) '3rd match, 1st Submatch
And here's the function:
'Returns Submatch specified by the passed zero-based indices:
'iMatch is which match you want,
'iSubmatch is the index within the match of the parenthesis
'containing the desired results.
Function ExecuteRegexCapture(sStringToSearch, sRegEx, iMatch, iSubmatch)
Dim oRegex As Object
Set oRegex = New RegExp
oRegex.Pattern = sRegEx
oRegex.Global = True 'True = find all matches, not just first
oRegex.IgnoreCase = False
oRegex.Multiline = True 'True = [\r\n] matches across line breaks, e.g. "([\r\n].*)" will match next line + anything on it
bDebug = True
ExecuteRegexCapture = ""
Set theMatches = oRegex.Execute(sStringToSearch)
If bDebug Then Debug.Print "theMatches.Count=" & theMatches.Count
For i = 0 To theMatches.Count - 1
If bDebug Then Debug.Print "Match='" & theMatches(i) & "'"
If bDebug Then Debug.Print " Submatch Count=" & theMatches(i).SubMatches.Count
For j = 0 To theMatches(i).SubMatches.Count - 1
If bDebug Then Debug.Print " Submatch='" & theMatches(i).SubMatches(j) & "'"
Next j
Next i
If bDebug Then Debug.Print ""
If iMatch < theMatches.Count Then
If iSubmatch < theMatches(iMatch).SubMatches.Count Then
ExecuteRegexCapture = theMatches(iMatch).SubMatches(iSubmatch)
End If
End If
End Function