Select comma by comma keywords with REGEX - regex

Hello folks I have a line like that in my file.
> **Keywords** : test, test2, test3
And I need to select keyword by keyword and all array with regex.
NOTE: That test elements can be more than 3
Group 1 : test, test2, test3
Group 2 : test
Group 3 : test2
Group 4 : test3
I try to write that regex but it's not repeated for all commas :(
/^(> \*\*Keywords\*\* : ),?([\w]+)/gmi
This is the test env : https://regex101.com/r/UHLrX1/2
How can I handle that regex?

In Javascript, you may use this regex with a lookbehind assertion:
(?<=(^> \*\*Keywords\*\* : )(?:\w+, )*)(\w+)
RegEx Demo
RegEx Details:
(?<=: Start positive lookbehind
(^> \*\*Keywords\*\* : ): Match > \*\*Keywords\*\* : and capture it in group #1
(?:\w+, )*: Followed by 0 or more comma separated words
): End positive lookbehind
(\w+): Match 1+ character word in capture group #2

EDIT: In case you want to capture more than 3 elements as per shown samples then one could try following regex:
^\*\*Keywords\*\*.*?:\s+((?:(?:[^,]*),\s+){1,}(?:.*))$
Online demo for above regex
With your shown samples, please try following regex.
^\*\*Keywords\*\*.*?:\s+(([^,]*),\s+([^,]*),\s+(.*))$
Online demo for above regex
Explanation: Adding detailed explanation for above.
^^\*\*Keywords\*\*.*?:\s+ ##From starting of value matching till colon followed by spaces(1 or more occurrences)
( ##Starting 1st capturing group here.
([^,]*) ##In 2nd capturing group matching everything till comma comes.
,\s+ ##Matching comma followed by spaces 1 or more occurrences.
([^,]*) ##In 3rd capturing group matching everything till comma comes.
,\s+ ##Matching comma followed by spaces 1 or more occurrences.
(.*) ##In 4th capturing group matching everything till comma comes.
)$ ##Closing 1st capturing group till end of value.

Related

Regex : how to optional capture a group

I'm trying to make an substring optional.
Here is the source :
Movie TOTO S09 E22 2022 Copyright
I want to optionally capture the substring : S09 E22
What I have tried so far :
/(Movie)(.*)(S\d\d\s*E\d\d)?/gmi
The problem is that it ends up by matching S09 E22 2022 Copyright instead of just S09 E22 :
Match 1 : 0-33 Movie TOTO S09 E22 2022 Copyright
Group 1 : 0-5 Movie
Group 2: 5-33 TOTO S09 E22 2022 Copyright
Is there anyway to fix this issue ?
Regards
You get that match because the .* is greedy and will first match until the end of the string.
Then your (S\d\d\s*E\d\d)? is optional so this will stay matched and does not backtrack.
If you don't want partial matches for S09 or E22 and the 4 digits for the year are not mandatory and you have movies longer than 1 word, with pcre you could use:
\b(Movie)\b\h+((?:(?!\h+[SE]\d+\b).)*)(?:\h(S\d+\h+E\d+))?
\b(Movie)\b Capture the word Movie
( Capture group
(?: Non capture group to repeat as a whole part
(?!\h+[SE]\d+\b). Match any character if either the S01 or E22 part is not directly to the right (where [SE] matches either a S or E char, and \h matches a horizontal whitespace char)
)* Close the non capture group and optionall repeat it
) Close capture group
(?:\h(S\d+\h+E\d+)) Optionally capture the S01 E22 part (where \d+ matches 1 or more digits)
Regex demo
Another option with a capture group for the S01 E22 part, or else match the redt of the line
\b(Movie)\h+([^S\n]*(?:S(?!\d+\h+E\d+\b)[^S\n]*)*+)(S\d+\h+E\d+)?
Regex demo
With your shown samples and attempts please try following regex.
^Movie\s+\S+\s+(S\d{2}\s+E\d{2}(?=\s+\d{4}))
Here is the Online Demo for used regex.
Explanation: Adding detailed explanation for used regex above.
^Movie\s+\S+\s+ ##Matching string Movie from starting of value followed by spaces non-spaces and spaces.
(S\d{2}\s+E\d{2} ##Creating one and only capturing group where matching:
##S followed by 2 digits followed by spaces followed by E and 2 digits.
(?=\s+\d{4}) ##Making sure by positive lookahead that previous regex is followed by spaces and 4 digits.
) ##Closing capturing group here.
An idea to make the dot lazy .*? and force it to match up to $ end if other part doesn't exist.
Movie\s*(.*?)\s*(S\d\d\s*E\d\d|$)
See this demo at regex101 (further I added some \s* spaces around captures)
There are several errors in your regex:
Blank space after Movie is not considered.
(.*) matches everything after Movie.
Try online at https://regex101.com/
(Movie\s*)(\w*\s*)(S\d{2}\s*E\d{2}\s*)?((?:\w*\s*)*)

Regexp_Extract - Data Studio extract value after second underscore

I have a simple string separated by underscores from which I need to pull all the values after a specific underscore using a regular expression with the REGEXP_EXTRACT formula in Google Data Studio
The strings look like this:
ABC123_DEF456_GHI789-JKL274
Basically the values after the second underscore can be alphanumeric or symbols as well.
I need to pull the values after the second underscore. In the case of the example I gave, it would be:
GHI789-JKL274
Any ideas would be greatly appreciated.
With your shown samples please try following regex.
^(?:.*?_){2}([^_]*)
OR
REGEXP_EXTRACT(yourField, "^(?:.*?_){2}([^_]*)")
Here is the Online Demo for used regex.
Explanation: Adding a detailed explanation for used regex here.
^ ##Matching from starting of the value here.
(?: ##Opening 1 non-capturing group here.
.*?_ ##Using Lazy match to match till next occurrence of _ here.
){2} ##Closing non-capturing group here and matching its 2 occurrences.
( ##Creating 1 and only capturing group here.
[^_]* ##Matching everything before _ here.
) ##Closing capturing group here.
You need to use
REGEXP_EXTRACT(some_field, "^(?:[^_]*_){2}([^_]*)")
See the regex demo.
Details:
^ - start of string
(?:[^_]*_){2} - two occurrences of any zero or more chars other than _ and then a _
([^_]*) - Capturing group #1: zero or more chars other than _.

Regex pattern for matching float followed by some fixed strings

I want to a regex pattern that could match the following cases:
0, 1, 0.1, .1, 1g, 0.1g, .1g, 1(g/100ml), .1(g/ml)
If the regex matches the pattern, I want to capture only the numerical part(0,1,0.1..)
I tried using following regex but it matches many cases:
((?=\.\d|\d)(?:\d+)?(?:\.?\d*))|((?=\.\d|\d)(?:\d+)?(?:\.?\d*))[a-zA-Z]+?|\([^)]*\)
How to achieve above with single regex pattern?
Edit:
To make the question solution more generic
What would be a single regex that would match below
Any numerical ( 0, 1, 0.1, ...)
Any numerical followed by g, mg any characters (0.1g, .1mg, 100kg)
Any numerical followed by anything in parentheses - .1(g/100ml), 100(mg/1kg)
And just capture the numerical part
You could make the pattern a bit more specific and use a capture group for the digits and optionally match what follows or (Updated with the comment of # anubhava) add a word boundary to prevent another partial match.
(\d*\.?\d+)(?:\(g\/\d*ml\)|g?\b)
(\d*\.?\d+) Capture group 1, match optional digits, optional . and 1+ digits
(?: Non capture group for the alternation
\(g\/\d*ml\) Match (g/ optional digits and ml)
| Or
g?\b Match an optional g followed by a word boundary
) Close non capture group
Regex demo
If the values should match in the comma separated string, you can assert either a , or the end of the string to the right.
(\d*\.?\d+)(?:\(g\/\d*ml\)|g)?(?=,|$)
Regex demo
Edit
A broad pattern to match anything between parenthesis or optional chars a-zA-Z after the digits:
(\d*\.?\d+)(?:\([^()]*\)|[a-zA-Z]*\b)
(\d*\.?\d+) Capture group 1, match optional digits, optional . and 1+ digits
(?: Non capture group
\([^()]*\) Match from opening till closing parenthesis
| Or
[a-zA-Z]*\b Optionally match chars in the ranges a-zA-Z followed by a word boundary
) Close non capture group
Regex demo
EDIT2: With OP's edited samples(to match 0, 1, 0.1 OR (0.1g, .1mg, 100kg) OR .1(g/100ml), 100(mg/1kg)), adding following solution here. Explanation is same as very first solution, only thing is in spite of matching specific strings, I have changed regex to match any alphabets here.
(\d*\.?\d+)(?:[a-zA-Z]+|\([a-zA-Z]+(?:\/\d*(?:[a-zA-Z]+))?\)|(?:,\s+|$))
Online Demo for above regex
EDIT1: As per OP's comments to match .01c and 100(g/1000L) kind of examples adding following regex, which is small edit to 1st solution here.
(\d*\.?\d+)(?:g|cc|\(g(?:\/\d*(?:ml|L))?\)|(?:,\s+|$))
Online demo for above regex
With your shown samples, please try following regex here.
(\d*\.?\d+)(?:g|\(g(?:\/\d*ml)?\)|(?:,\s+|$))
Online demo for above regex
Explanation: Adding detailed explanation for above.
(\d*\.?\d+) ##Matching digits 0 or more occurrences followed by .(optional, followed by 1 or more digits occurrences here.
(?: ##Starting a non-capturing group here.
g| ##matching only g here OR.
\(g(?:\/\d*ml)?\)| ##Matching (g) OR (g/digits ml) here OR.
(?:,\s+|$) ##Matching comma followed by 1 or more spaces occurrences OR end of value here.
) ##Closing non-capturing group here.
try this:
[\d]?\.?\d+(?:g|(?<p>\()(?(p)g\/(?:\d+)?ml\)))?
Demo

Regex to split up string of CPU usage without percentage

Is it possible to get the result just first two digits without % in the first group. Iam using Telegraf with Grafana.
Example:
5 Secs ( 22.3463%) 60 Secs ( 25.677%) 300 Secs ( 21.3522%)
Result:
22
I found out this regex in the similar topic, but it's return bad format for me :
^\s*\d+\s+Secs\s*\(\s*(\d+(?:\.\d+)?%)\)\s+\d+\s+Secs\s+\(\s+(\d+(?:\.\d+)?%)\)\s+\d+\s+Secs\s+\(\s+(\d+(?:\.\d+)?%)\)$
You can update your pattern to use a single capturing group by relocating the parenthesis around the digits only for the first occurrence.
You can omit the second and third capture groups as you don't need them.
^\s*\d+\s+Secs\s*\(\s*(\d+)(?:\.\d+)?%\)\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\)\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\)$
^ ^
Regex demo
Or you might use a named capture group, for example digits
^\s*\d+\s+Secs\s*\(\s*(?P<digits>\d+)(?:\.\d+)?%\)\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\)\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\)$
With your shown samples, please try following regex.
^\d+\s+Secs\s+\(\s+(\d+)(?:\.\d+%)?\)(?:\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\))*
Online demo for above regex
Explanation: Adding detailed explanation for above.
^\d+\s+Secs\s+\(\s+ ##From starting of value matching digits(1 or more occurrences) followed by space(s) Secs spaces ( spaces.
(\d+) ##Creating 1st and only capturing group where we have digits in it.
(?:\.\d+%)?\) ##In a non-capturing group matching dot digits % ) keeping it optional followed by )
(?: ##Creating a non-capturing group here.
\s+\d+\s+Secs\s+\(\s+\d+ ##matching spaces digits spaces Secs spaces ( spaces digits
(?:\.\d+)? ##In a non-capturing group matching dot digits keeping it optional.
%\) ##matching % followed by ) here.
)* ##Closing very first non-capturing group, and matching its 0 or more occurrences.
If it's just the 1st occurrence you're after, wouldn't the following work?
/secs\s*\(\s*(\d+)/i

Google Forms Regular Expressions (REGEX) comma delaminated (CSV)

I have a Google form field, that contains 1 or more Id's
Patterns:
The IDs are always 6 numbers.
If only one ID is entered, a comma and a space is NOT required.
If more than one ID is entered, a comma and a space is required.
If more than one ID is entered, the last ID, should not have a comma or a space at the end.
Allowed Examples:
a single ID: 123456
multiple ID: 123456, 456789, 987654
Here is my current REGEX (does not work correctly)
[0-9]{6}[,\s]?([0-9]{6}[,\s])*[0-9]?
What am I doing wrong?
With your shown samples, could you please try following.
^((?:\d{6})(?:(?:,\s+\d{6}){1,})?)$
Online demo for above regex
Explanation: Adding detailed explanation of above regex.
^( ##Checking from starting of value, creating single capturing group.
(?:\d{6}) ##Checking if there are 6 digits in a non-capturing group here.
(?: ##Creating 1st non-capturing group here
(?:,\s+\d{6}) ##In a non-capturing group checking it has comma space(1 or more occurrences) followed by 6 digits here.
){1,})? ##Closing 1st non-capturing group here, it could have 1 or more occurrences of it.
)$ ##Closing 1st capturing group here with $ to make sure its end of value.
You can use
^\d{6}(?:,\s\d{6})*$
^ Start of string
\d{6} Match 6 digits
(?: Non capture group to repeat as a whole
,\s\d{6} Match a , a whitespace char and 6 digits
)* Close group and optionally repeat
$ End of string
Regex demo