Capturing two groups out of a string with a regex - regex

I don't know anything about regular expressions and I don't really have the time to study them at the moment.
I have a string like this:
test (22/22/22)
I need to capture the test and the date 22/22/22 in an array.
the test string could also be a multiple words string:
test test(1) tes-t (22/22/22)
should capture test test(1) tes-t and 22/22/22
I have no idea how to get started on this. I managed to capture the date string with the parentheses by doing:
(\(.*)
but that really doesn't get me anywhere.
Could someone help me out here and provide an explanation of how I should go about capturing this? I'm kinda lost.
Thanks

To explain the given regular expression : (.*)\(([^)]+)\)
(.*) will match anything, and capture it (the parenthesis capture what their inner expression matches)
\( is an escaped parenthesis. That's what you'll write when you wnat to capture a parenthesis.
[^)]+ means anything but a parenthesis (special characters must not be escaped within square brackets) one or more times.
([^)]+) captures what's explained above
\) matches a closing parenthesis
So this regex will fail and capture the wrong strings if you have, say, a parenthesis in your first words like in :
test test(1) tes-t (22/22/22)
I'd recommend to think about what is the information you want to capture, and how do you spearate it from the rest of your string. This done,it will be much more easier to build an effective regular expression.

Try this
^(.*)\(([^)]*)\)
See it here online on Regexr
While hovering with the mouse over the blue colored matches, you can see the content of the capturing groups.
Explanation
^ BeginOfLine
(.*) CapturingGroup 1 AnyCharacterExcept\n, zero or more times
\(([^)]*)\) ( CapturingGroup 2, AnyCharNotIn[ )] zero or more times

This needle works on your example input:
(.*)\(([^)]+)\)

Related

Regex: a number vs. a backreference to a capture group

I've been studying regular expressions, and I'm scratching my head on this one. On this page (https://www.regular-expressions.info/conditional.html) I see that, in a conditional regex, a reference to a numbered backreference is just a number. For example,
(a)?b(?(1)c|d)
How does regex know that we aren't supposed to match the number "1" instead of the backreference to the 1st capture group? Previously in the lessons I had learned that a backreference would be escaped, such as \1, \2, etc.
As per the regex tutorial you're following:
A special construct (?ifthen|else) allows you to create conditional regular expressions. If the if part evaluates to true, then the regex engine will attempt to match the then part. Otherwise, the else part is attempted instead. The syntax consists of a pair of parentheses. The opening bracket must be followed by a question mark, immediately followed by the if part, immediately followed by the then part. This part can be followed by a vertical bar and the else part. You may omit the else part, and the vertical bar with it.
Alternatively, you can check in the if part whether a capturing group has taken part in the match thus far. Place the number of the capturing group inside parentheses, and use that as the if part.
Your second question is this:
RegEx Demo of \b(a)?b(?(1)c|d)\b
Note that I have added word boundary to avoid matching string like abd partially.
What if someone actually wanted to match the literal 1 this way?
valid input: 1c or d invalid input: 1d
That would be:
\b(1)?(?(1)c|d)\b

RegEx: Find & Replace snake_case to UpperCamelCase/PascalCase Between Characters

I am using my IDE's Find & Replace (w/ RegEx) feature to find & replace the type parameter of arguments to go from snake_case to PascalCase (AKA UpperCamelCase). There are several files and lines throughout the project that need to be changed, and manually doing so is quite error prone and tedious (plus I am sure I am going to need the essential pattern again for future changes).
For example:
CURRENT: function find_all_by_name_and_status(_i_find_all_by_name_and_statusCriteria find_all_by_name_and_status_criteria) ...
Should be:
DESIRED: function find_all_by_name_and_status(IFindAllByNameAndStatusCriteria find_all_by_name_and_status_criteria) ...
The patterns I am using are the following:
FIND: (?<=\()_(.)(Criteria)*
REPLACE: \U$1\L
The replace pattern will work, as far as I can see, if the 1st found capture group is correct (the letter just after an "_").
The core pattern of _(.) finds the correct components to replace, however, it captures the other parts of the string as well. So, I added a positive lookbehind (?<=\() to start at the opening parentheses and an ending dummy capture for (Criteria)*. The entire pattern seems to cause the core pattern to only match once and not repeatedly. (?R) does not seem to help either.
P.S.
It looks the (Criteria)* does not do anything either, but I figured that is the second problem to address after getting the core pattern to find all matches / repeat.
I feel like I am close to a solution, but not quite there yet. I, of course, could be VERY off base on the solution. Any help would be appreciated.
This expression,
(.*\()|(_)([a-z])([a-z]*)|(Criteria.*)
which is not really the best one, with a replacement of something similar to:
$1\U$3\L$4\E$5
might likely work here (the \E is for demoing).
In this demo on the right panel, the expression is explained, if you might be interested.
RegEx Circuit
jex.im visualizes regular expressions:
This is working with Notepad++
Ctrl+H
Find what: (\(|\G)_(.[^\W_]*)(?=\w+Criteria)
Replace with: $1\u$2
check Match case
check Wrap around
check Regular expression
Replace all
Explanation:
(\(|\G) # group 1, openning parenthesis or restart from last match position
_ # underscore
(.[^\W_]*) # group 2, 1 any character followed by 0 or more alphanum
(?=\w+Criteria) # positive lookahead, make sure we have 1 or more word character and Criteria
Replacement:
$1 # content of group 1
\u$2 # content of group 2 with first character uppercased
Result for given example:
function find_all_by_name_and_status(IFindAllByNameAndStatusCriteria find_all_by_name_and_status_criteria) ...
Screen capture:

Regex substring matching on capture group

I have an advanced regex question (unless I am overthinking this).
With my basic knowledge of Regex, it is trivial to match static capture group further down in the string.
P(.): D:\1
Correctly matches
Pb: Db
Pa: Da
and (correctly) does not match
Pa: D:b
So far so good. However, what I need to capture is a set of [a-z]+ after the P and match the one character. So that these should also match:
Pabc: D:c
Pabc: D:a
Pba: D:b
Pba: D:a
but not
Pabc: D:x
Pba: D:g
I started going down the path of writing separate patterns like so (spaces added around the alternation for clarity):
P(.): D:\1 | P(.)(.): D:(\1|\2) | P(.)(.)(.): D:(\1|\2|\3)
But I cannot make even this clumsy solution work in Javascript Regex.
Is there an elegant, correct way to do this? Can it be done with Javascript's limited engine?
The following regex will do it:
P.*(.).*: D:\1
.*(.).* will match one or more characters, capturing one of them.
If the captured character matches the character after D:, then the regex matches.
If the captured character doesn't match, backtracking will ensure that it tries again with a different captured character, until all combinations have been tried.
See regex101.com for running example.

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/

How to use LookBehind in this case? I'm lost

That's my string:
myclass.test() and(myclass.mytest() and myclass.test("argument")) or (myclass.mytests(1))
I am trying to capture only the openings of parentheses "(" that is not part of a function,
So I tried to start capturing the functions (and then deny this rule):
\w*\.\w[^(]*\(
Perfect, i catch only the functions, when I tried to use the following expression I did not succeed (why?)
(?<=(\w*\.\w[^(]*\())\(
Notes:
- myclass. never changes
- don't forget the "and("
- (?<=t)( < works fine.
Thanks :)
Temporary Solution
I will continue studying and trying to apply the "lookbehind" for this case, it seems an interesting approach, but our friend #hwnd suggested a different approach that applies in my case:
\((?=myclass)
Thank u guys.
I am a bit confused on what is part of a function or not here.
To match the following myclass.test( parts you could just do.
[a-zA-Z]+\.[a-zA-Z]+\(
Both of these will match the open parentheses that is not part of the myclass. function.
Positive Lookahead
\((?=[^)])
Regular expression:
\( '('
(?= look ahead to see if there is:
[^)] any character except: ')'
) end of look-ahead
Negative Lookahead
\((?!\))
Regular expression:
\( '('
(?! look ahead to see if there is not:
\) ')'
) end of look-ahead
See live demo
You could possibly even use a Negative Lookbehind here.
(?<!\.)\((?!\))
Since you can't use variable-length lookbehind in Python, you will need to do some of the job outside regex. One possible way is to capture two groups, the first one will capture the class.function part if it exists, the second one will capture the open parenthesis. So you can just take those parenthesis for which the first group has no match.
In this case, we check whether the match length is one character (i.e., only the opening parenthesis), then we print the matching index. You can print the matching string also, which would always be an open parenthesis =D
import re
text = 'myclass.test() and(myclass.mytest() and myclass.test("argument")) or (myclass.mytests(1))'
for result in re.finditer(r'(\w+\.\w[^(]*\()?\(',text):
if result.end()-result.start()==1:
print result.span(), result.string
Result:
(18,19)
(69,70)