How to extract parameter definitions using regex? - regex

I am trying to extract parameter definitions from a Jenkins script and can't work out an appropriate regex (I' working in Dyalog APL which supports PCRE8).
Here's how the subject looks like:
pipeline {
agent none
parameters {
string(name: 'foo', defaultValue: 'bar')
string(name: 'goo', defaultValue: 'hoo')
}
stages {
stage('action') {
steps {
echo "foo = ${params.foo}"
}
}
}
}
I would like to get the individual param definitions captured in group 1 (in other words: I'm looking for a results that reports two matches: string(name: 'foo', defaultValue: 'bar') and string(name: 'goo', defaultValue: 'hoo') ), but the matches are either too long or too short (depending on greediness).
My regex:
parameters\s*{(\s*\D*\(.*\)\s*)*} (dot matches nl)
Parameter types may vary, so my best idea was to use \D* for those (any # of non-digits). I am suspicious that this captures more than I expected - but replacing that with \w did not help.
An alternative idea was
parameters\s*{(\s*(\w*)\(([^\)]*)\))*\s*}
which seemed more precise wrt matching parameter types and also the content of the parens - but surprisingly that returned goo only and skipped foo.
What am I missing?

Using PCRE you can use this regex in MULTILINE mode:
(?m)(?:^\h*parameters\h*{|(?!^)\G).*\R\h*\w+\(\w+:\h*'\K[^']+
RegEx Demo
RegEx Details:
(?m): Enable MULTILINE mode
(?:: Start non-capture group
^\h*parameters\h*{: Match a line that starts with parameters {
|: OR
(?!^)\G:
): End non-capture group
.*: Match anything
\R: Match a line break
\h*: Match 0 or more whitespaces
\w+: Match 1+ word chars
\(: Match (
\w+: Match 1+ word chars
:: Match a :
\h*: Match 0 or more whitespaces
': Match a '
\K: Reset all the matched info
[^']+: Match 1+ of any char that is not ' (this is our parameter name)

Related

Do not match if nothing exists between optional parenthesis

I'm attempting to parse group names from /etc/security/login-access.conf. We have a mixed environment of LDAP & AD machines. AD groups are encapsulated with parenthesis ().
I have the following regex that works to extract only the group name, however the only problem I am having with it is there is routinely a 'null' group and the regex returns a null & the ) characters:
Current regex:
/(?<=\+\s:\s[#\(])(.*?)(?=[\)]?\s:)/
Sample /etc/security/login-access.conf:
+ : #ldapgroup1 : ALL
+ : #ldapgroup2 : ALL
+ : (#adgroup1) : ALL
+ : (#adgroup2) : ALL
+ : () : ALL # <---This is the problematic entry.
I'm not sure if or how to tune the regex to ignore an entry that contains nothing between the parenthesis. Any help is appreciated.
Since your regex engine appears to have capture groups, I would just express your pattern as:
\+ : (\(#\S+\)|#\S+) : \S+
Demo
Here I use an alternation to cleanly match either the parentheses or non parentheses variants of the LDAP group names.
Might not be the most efficient, definitely ugly but it works:
(?<=\+\s:\s#|\()([a-zA-Z0-9_-]+)(?=[\)]?\s:)
If you are using perl, you can use a branch reset group:
\+\h:\h(?|#([\w-]+)|\(#([\w-]+)\))\h:
The pattern matches:
\+\h:\h Match + and a colon between horizontal whitespace chars
(?| Branch reset group
#([\w-]+) Match # and capture 1+ word chars or a hyphen in group 1
| Or
\(#([\w-]+)\) Match (#, capture capture 1+ word chars or a hyphen in group 2 (which will be available in group 1 due to the branch reset group) and match )
)\h: Close branch reset group
Regex demo

Return data between first and fifth backslash

I need some regex that will return the value between first and fifth backslash has highlighted below in bold:
dataCapture/22E6F953EA6D445C8FB20E9D29A977D7/6.20.0-3c1e4b0c459eb93e43eb64fed7447a41fb4d4029/uuid_2b896c17-eb5c-4fd1-ae44-78dcda6c8ee9/36/3D1C3A58A039103375D320E524500A74
So far I've only been able to come up with regex that returns data up till the first backslash:
\/dataCapture\/(.+?)\/
How do I extend the above to include data up to the fifth backslash?
Might not be the cleanest but that makes the job done:
const regex = /dataCapture\/([a-zA-Z0-9]+\/[a-zA-Z0-9\.\-]+\/[a-zA-Z0-9\.\-\_]+\/[0-9]+)\/.*/;
const value = "dataCapture/22E6F953EA6D445C8FB20E9D29A977D7/6.20.0-3c1e4b0c459eb93e43eb64fed7447a41fb4d4029/uuid_2b896c17-eb5c-4fd1-ae44-78dcda6c8ee9/36/3D1C3A58A039103375D320E524500A74";
console.log(value.match(regex)[1]); // => 22E6F953EA6D445C8FB20E9D29A977D7/6.20.0-3c1e4b0c459eb93e43eb64fed7447a41fb4d4029/uuid_2b896c17-eb5c-4fd1-ae44-78dcda6c8ee9/36
In order to solve this regex pattern, you have to use the following code:
^\/dataCapture\/(.+?)\/(.+?)\/(.+?)\/(.+?)\/
You can test this regex in this site.
I am not familiar with JMeter, but I understand it uses a slight variant of Perl5's regex engine, so I expect matching the following regular expression will extract the string of interest.
(?<=^dataCapture\/)(?:[^\/]*\/){3}[^\/]*(?=\/)
demo
The regex engine performs the following operations.
(?<= : begin positive lookbehind
^ : match beginning of string
dataCapture\/ : match 'dataCapture\/
) : end positive lookbehind
(?:[^\/]*\/) : match 0+ charsother than '/', followed by '/', in
a non-capture group
{3} : execute the non-capture group 3 times
[^\/]* : match 0+ chars other than '/'
(?=\/) : positive lookahead asserts that the next char is '/'

How do I make a regular expression that matches text with an open parenthesis only not preceded by a space?

How do I craft a regular expression with a group that includes text with an open parenthesis not preceded by a space, but does not include an open parenthesis preceded by a space (and everything after that)?
Some examples:
Matching: "Yasmani Grandal (1B 1.84)"
Would return: "Yasmani Grandal"
Matching: "J.T. Realmuto"
Would return: "J.T. Realmuto"
Matching: "WillD. Smith(LAD)"
Would return: "WillD. Smith(LAD)"
Matching: "Adley(round/1/2019) Rutschman"
Would return: "Adley(round/1/2019) Rutschman"
Attempted solutions:
(.+)(?:\s\(.*)
This regular expression returns the "Yasmani Grandal" as group 1 when matching "Yasmani Grandal (1B 1.84)", but doesn't match "J.T. Realmuto" because the second (non-matching) group is not optional.
But if I make it optional: (.+)(?:\s\(.*)?
...then group 1 when matching "Yasmani Grandal (1B 1.84)" is ""Yasmani Grandal (1B 1.84)".
You may use
^(.*?)(?:\s+\(.*\))?$
See the regex demo
Details
^ - start of string
(.*?) - Capturing group 1: any 0 or more chars other than line break chars as few as possible
(?:\s+\(.*\))? - an optional non-capturing group matching 1 or 0 occurrences of
\s+ - 1+ whitespaces
\( - a ( char
.* - any 0 or more chars other than line break chars as many as possible
\) - a ) char
$ - end of string.
You could use the following regular expression to convert matches to empty strings. (I've escaped the leading space merely for readability.)
\ +\((?!.* \)).*
The converted string is presumably what you want, so there seems no point to saving it to a capture group. If you need to capture the part of the string that is converted to an empty string, replace .* with
(.*).
As this regex contains nothing more exotic the a positive lookahead it should work with most regex engines.
Start your engine!
The regex engine performs the following operations.
\ + : match 1+ spaces
\( : match '('
(?!.* \)) : use a negative lookahead to assert the remainder of
the line does contain the string ' )'`
.* : match 0+ characters other than line terminators
I've assumed you want to remove all spaces preceding the left parenthesis that is preceded by at least one space. If, for example, the string were:
Yasmani Grandal (1B 1.84)
^^^^^^^^^^^^^^^
the part identified by the party hats would be converted to an empty string.
Can you try this and let me know if this works?
(.+)\s\(.*
public class HelloWorld{
public static void main(String []args){
String[] names = new String[] {"Yasmani Grandal (1B 1.84)","J.T. Realmuto","WillD. Smith(LAD)","Adley(round/1/2019) Rutschman"};
for (String in : names)
System.out.println(in.replaceAll("(.+)\\s\\(.*","$1"));
}
}
Please note I wrote a minimal expression for this. You can extend it as per your additional requirements. The above code works just fine.

Regex to extract string if there is or not a specific word

Hi I'm a regex noob and I'd like to make a regex in order to extract the penultimate string from the URL if the word "xxxx" is contained or the last string if the word "xxxx" is not contained.
For example, I could have 2 scenarios:
www.hello.com/aaaa/1adf0023efae456
www.hello.com/aaaa/1adf0023efae456/xxxx
In both cases I want to extract the string 1adf0023efae456.
I've tried something like (?=(\w*xxxx\w*)\/.*\/(.*?)\/|[^\/]+$) but doesn't work properly.
You can match the forward slash before the digits, then match digits and assert what follows is either xxxx or the end of the string.
\d+(?=/xxxx|$)
Regex demo
If there should be a / before matching the digits, you could use a capturing group and get the value from group 1
/(\d+)(?=/xxxx|$)
/ Match /
(\d+) Capture group 1, match 1+ digits
(?=/xxxx|$) Positive lookahead, assert what is on the right is either xxxx or end of string
Regex demo
Edit
If there could possibly also be alphanumeric characters instead of digits, you could use a character class [a-z0-9]+ with an optional non capturing group.
/([a-z0-9]+)(?:/xxxx)?$
Regex demo
To match any char except a whitespace char or a forward slash, use [^\s/]+
Using lookarounds, you could assert a / on the left, match 1+ alphanumerics and assert what is at the right is either /xxxx or the end of the string which did not end with /xxxx
(?<=/)[a-z0-9]+(?=/xxxx$|$(?<!/xxxx))
Regex demo
You could avoid Regex:
string[] strings =
{
"www.hello.com/aaaa/1adf0023efae456",
"www.hello.com/aaaa/1adf0023efae456/xxxx"
};
var x = strings.Select(s => s.Split('/'))
.Select(arr => new { upper = arr.GetUpperBound(0), arr })
.Select(z => z.arr[z.upper] == "xxxx" ? z.arr[z.upper - 1] : z.arr[z.upper]);

how to capture from group from end line in js regex?

I'm trying to capture a text into 3 groups I have managed to capture 2 groups but having an issue with the 3rd group.
This is the text :
<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3]
Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending
itemsCount=1
I'm using the following regex:
(?=- )(.*?)(?= - )|(?=])(.*?)(?= -)
My 3rd group should be : "After sending itemsCount=1"
any suggestions?
Your original expression is fine, just missing a $:
(?=- )(.*?)(?= - |$)|(?=])(.*?)(?= -)
Demo
and maybe we would slightly modify that to an expression similar to:
(?=-\s+).*?([A-Z].*?)(?=\s+-\s+|$)|(?=]\s+).*?([A-Z].*?)(?=\s+-)
Demo
You have 2 capturing groups. You don't get the match for the third part because the postitive lookahead in the first alternation is not considering the end of the string. You might solve that by using an alternation to look at either a space or assert the end of the string
(?=[-\]] )(.*?)(?= - |$)
^^
If those matches are ok, you could simplify that pattern by making use of a character class to match either - or ] like [-\]] and omit the alternation and the group as you now have only the matches.
Your pattern then might look like (also capturing the leading hyphen like the first 2 matches)
(?=[-\]] ).*?(?= - |$)
Regex demo
If this is your string and you want to have 3 capturing groups, you might use:
^.*?\[\d+\]([^-]+)-([^-]+)-\s*([^-]+)$
^ Start of string
.*? Match any char except a newline non greedy
\[\d+\] match [ 1+ digits ]
([^-]+)- Capture group 1, match 1+ times not -, then match -
([^-]+)- Capture group 2, match 1+ times not -, then match -
\s* Match 0+ whitespace chars
([^-]+) Capture group 2, match 1+ times not -
$ End of string
Regex demo
For example creating the desired object from the comments, you could first get all the matches from match[0] and store those in an array.
After you have have all the values, assemble the object using the keys and the values.
var output = {};
var regex = new RegExp(/(?=[-\]] ).*?(?= - |$)/g);
var str = `<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3] Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending itemsCount=1`;
var match;
var values = [];
var keys = ['Thread', 'Class', 'Message'];
while ((match = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (match.index === regex.lastIndex) {
regex.lastIndex++;
}
values.push(match[0]);
}
keys.forEach((key, index) => output[key] = values[index]);
console.log(output);