Select comma by comma keywords with REGEX

Select comma by comma keywords with REGEX - regex

Hello folks I have a line like that in my file.
> **Keywords** : test, test2, test3
And I need to select keyword by keyword and all array with regex.
NOTE: That test elements can be more than 3
Group 1 : test, test2, test3
Group 2 : test
Group 3 : test2
Group 4 : test3
I try to write that regex but it's not repeated for all commas :(
/^(> \*\*Keywords\*\* : ),?([\w]+)/gmi
This is the test env : https://regex101.com/r/UHLrX1/2
How can I handle that regex?

In Javascript, you may use this regex with a lookbehind assertion:
(?<=(^> \*\*Keywords\*\* : )(?:\w+, )*)(\w+)
RegEx Demo
RegEx Details:
(?<=: Start positive lookbehind
(^> \*\*Keywords\*\* : ): Match > \*\*Keywords\*\* : and capture it in group #1
(?:\w+, )*: Followed by 0 or more comma separated words
): End positive lookbehind
(\w+): Match 1+ character word in capture group #2

EDIT: In case you want to capture more than 3 elements as per shown samples then one could try following regex:
^\*\*Keywords\*\*.*?:\s+((?:(?:[^,]*),\s+){1,}(?:.*))$
Online demo for above regex
With your shown samples, please try following regex.
^\*\*Keywords\*\*.*?:\s+(([^,]*),\s+([^,]*),\s+(.*))$
Online demo for above regex
Explanation: Adding detailed explanation for above.
^^\*\*Keywords\*\*.*?:\s+ ##From starting of value matching till colon followed by spaces(1 or more occurrences)
( ##Starting 1st capturing group here.
([^,]*) ##In 2nd capturing group matching everything till comma comes.
,\s+ ##Matching comma followed by spaces 1 or more occurrences.
([^,]*) ##In 3rd capturing group matching everything till comma comes.
,\s+ ##Matching comma followed by spaces 1 or more occurrences.
(.*) ##In 4th capturing group matching everything till comma comes.
)$ ##Closing 1st capturing group till end of value.

Related

Regex : how to optional capture a group

I'm trying to make an substring optional.
Here is the source :
Movie TOTO S09 E22 2022 Copyright
I want to optionally capture the substring : S09 E22
What I have tried so far :
/(Movie)(.*)(S\d\d\s*E\d\d)?/gmi
The problem is that it ends up by matching S09 E22 2022 Copyright instead of just S09 E22 :
Match 1 : 0-33 Movie TOTO S09 E22 2022 Copyright
Group 1 : 0-5 Movie
Group 2: 5-33 TOTO S09 E22 2022 Copyright
Is there anyway to fix this issue ?
Regards

You get that match because the .* is greedy and will first match until the end of the string.
Then your (S\d\d\s*E\d\d)? is optional so this will stay matched and does not backtrack.
If you don't want partial matches for S09 or E22 and the 4 digits for the year are not mandatory and you have movies longer than 1 word, with pcre you could use:
\b(Movie)\b\h+((?:(?!\h+[SE]\d+\b).)*)(?:\h(S\d+\h+E\d+))?
\b(Movie)\b Capture the word Movie
( Capture group
(?: Non capture group to repeat as a whole part
(?!\h+[SE]\d+\b). Match any character if either the S01 or E22 part is not directly to the right (where [SE] matches either a S or E char, and \h matches a horizontal whitespace char)
)* Close the non capture group and optionall repeat it
) Close capture group
(?:\h(S\d+\h+E\d+)) Optionally capture the S01 E22 part (where \d+ matches 1 or more digits)
Regex demo
Another option with a capture group for the S01 E22 part, or else match the redt of the line
\b(Movie)\h+([^S\n]*(?:S(?!\d+\h+E\d+\b)[^S\n]*)*+)(S\d+\h+E\d+)?
Regex demo

With your shown samples and attempts please try following regex.
^Movie\s+\S+\s+(S\d{2}\s+E\d{2}(?=\s+\d{4}))
Here is the Online Demo for used regex.
Explanation: Adding detailed explanation for used regex above.
^Movie\s+\S+\s+ ##Matching string Movie from starting of value followed by spaces non-spaces and spaces.
(S\d{2}\s+E\d{2} ##Creating one and only capturing group where matching:
##S followed by 2 digits followed by spaces followed by E and 2 digits.
(?=\s+\d{4}) ##Making sure by positive lookahead that previous regex is followed by spaces and 4 digits.
) ##Closing capturing group here.

An idea to make the dot lazy .*? and force it to match up to $ end if other part doesn't exist.
Movie\s*(.*?)\s*(S\d\d\s*E\d\d|$)
See this demo at regex101 (further I added some \s* spaces around captures)

There are several errors in your regex:
Blank space after Movie is not considered.
(.*) matches everything after Movie.
Try online at https://regex101.com/
(Movie\s*)(\w*\s*)(S\d{2}\s*E\d{2}\s*)?((?:\w*\s*)*)

Regexp_Extract - Data Studio extract value after second underscore

I have a simple string separated by underscores from which I need to pull all the values after a specific underscore using a regular expression with the REGEXP_EXTRACT formula in Google Data Studio
The strings look like this:
ABC123_DEF456_GHI789-JKL274
Basically the values after the second underscore can be alphanumeric or symbols as well.
I need to pull the values after the second underscore. In the case of the example I gave, it would be:
GHI789-JKL274
Any ideas would be greatly appreciated.

With your shown samples please try following regex.
^(?:.*?_){2}([^_]*)
OR
REGEXP_EXTRACT(yourField, "^(?:.*?_){2}([^_]*)")
Here is the Online Demo for used regex.
Explanation: Adding a detailed explanation for used regex here.
^ ##Matching from starting of the value here.
(?: ##Opening 1 non-capturing group here.
.*?_ ##Using Lazy match to match till next occurrence of _ here.
){2} ##Closing non-capturing group here and matching its 2 occurrences.
( ##Creating 1 and only capturing group here.
[^_]* ##Matching everything before _ here.
) ##Closing capturing group here.

You need to use
REGEXP_EXTRACT(some_field, "^(?:[^_]*_){2}([^_]*)")
See the regex demo.
Details:
^ - start of string
(?:[^_]*_){2} - two occurrences of any zero or more chars other than _ and then a _
([^_]*) - Capturing group #1: zero or more chars other than _.

Regex pattern for matching float followed by some fixed strings

I want to a regex pattern that could match the following cases:
0, 1, 0.1, .1, 1g, 0.1g, .1g, 1(g/100ml), .1(g/ml)
If the regex matches the pattern, I want to capture only the numerical part(0,1,0.1..)
I tried using following regex but it matches many cases:
((?=\.\d|\d)(?:\d+)?(?:\.?\d*))|((?=\.\d|\d)(?:\d+)?(?:\.?\d*))[a-zA-Z]+?|\([^)]*\)
How to achieve above with single regex pattern?
Edit:
To make the question solution more generic
What would be a single regex that would match below
Any numerical ( 0, 1, 0.1, ...)
Any numerical followed by g, mg any characters (0.1g, .1mg, 100kg)
Any numerical followed by anything in parentheses - .1(g/100ml), 100(mg/1kg)
And just capture the numerical part

You could make the pattern a bit more specific and use a capture group for the digits and optionally match what follows or (Updated with the comment of # anubhava) add a word boundary to prevent another partial match.
(\d*\.?\d+)(?:\(g\/\d*ml\)|g?\b)
(\d*\.?\d+) Capture group 1, match optional digits, optional . and 1+ digits
(?: Non capture group for the alternation
\(g\/\d*ml\) Match (g/ optional digits and ml)
| Or
g?\b Match an optional g followed by a word boundary
) Close non capture group
Regex demo
If the values should match in the comma separated string, you can assert either a , or the end of the string to the right.
(\d*\.?\d+)(?:\(g\/\d*ml\)|g)?(?=,|$)
Regex demo
Edit
A broad pattern to match anything between parenthesis or optional chars a-zA-Z after the digits:
(\d*\.?\d+)(?:\([^()]*\)|[a-zA-Z]*\b)
(\d*\.?\d+) Capture group 1, match optional digits, optional . and 1+ digits
(?: Non capture group
\([^()]*\) Match from opening till closing parenthesis
| Or
[a-zA-Z]*\b Optionally match chars in the ranges a-zA-Z followed by a word boundary
) Close non capture group
Regex demo

EDIT2: With OP's edited samples(to match 0, 1, 0.1 OR (0.1g, .1mg, 100kg) OR .1(g/100ml), 100(mg/1kg)), adding following solution here. Explanation is same as very first solution, only thing is in spite of matching specific strings, I have changed regex to match any alphabets here.
(\d*\.?\d+)(?:[a-zA-Z]+|\([a-zA-Z]+(?:\/\d*(?:[a-zA-Z]+))?\)|(?:,\s+|$))
Online Demo for above regex
EDIT1: As per OP's comments to match .01c and 100(g/1000L) kind of examples adding following regex, which is small edit to 1st solution here.
(\d*\.?\d+)(?:g|cc|\(g(?:\/\d*(?:ml|L))?\)|(?:,\s+|$))
Online demo for above regex
With your shown samples, please try following regex here.
(\d*\.?\d+)(?:g|\(g(?:\/\d*ml)?\)|(?:,\s+|$))
Online demo for above regex
Explanation: Adding detailed explanation for above.
(\d*\.?\d+) ##Matching digits 0 or more occurrences followed by .(optional, followed by 1 or more digits occurrences here.
(?: ##Starting a non-capturing group here.
g| ##matching only g here OR.
\(g(?:\/\d*ml)?\)| ##Matching (g) OR (g/digits ml) here OR.
(?:,\s+|$) ##Matching comma followed by 1 or more spaces occurrences OR end of value here.
) ##Closing non-capturing group here.

try this:
[\d]?\.?\d+(?:g|(?<p>\()(?(p)g\/(?:\d+)?ml\)))?
Demo

Regex to split up string of CPU usage without percentage

Is it possible to get the result just first two digits without % in the first group. Iam using Telegraf with Grafana.
Example:
5 Secs ( 22.3463%) 60 Secs ( 25.677%) 300 Secs ( 21.3522%)
Result:
22
I found out this regex in the similar topic, but it's return bad format for me :
^\s*\d+\s+Secs\s*\(\s*(\d+(?:\.\d+)?%)\)\s+\d+\s+Secs\s+\(\s+(\d+(?:\.\d+)?%)\)\s+\d+\s+Secs\s+\(\s+(\d+(?:\.\d+)?%)\)$

You can update your pattern to use a single capturing group by relocating the parenthesis around the digits only for the first occurrence.
You can omit the second and third capture groups as you don't need them.
^\s*\d+\s+Secs\s*\(\s*(\d+)(?:\.\d+)?%\)\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\)\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\)$
^ ^
Regex demo
Or you might use a named capture group, for example digits
^\s*\d+\s+Secs\s*\(\s*(?P<digits>\d+)(?:\.\d+)?%\)\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\)\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\)$

With your shown samples, please try following regex.
^\d+\s+Secs\s+\(\s+(\d+)(?:\.\d+%)?\)(?:\s+\d+\s+Secs\s+\(\s+\d+(?:\.\d+)?%\))*
Online demo for above regex
Explanation: Adding detailed explanation for above.
^\d+\s+Secs\s+\(\s+ ##From starting of value matching digits(1 or more occurrences) followed by space(s) Secs spaces ( spaces.
(\d+) ##Creating 1st and only capturing group where we have digits in it.
(?:\.\d+%)?\) ##In a non-capturing group matching dot digits % ) keeping it optional followed by )
(?: ##Creating a non-capturing group here.
\s+\d+\s+Secs\s+\(\s+\d+ ##matching spaces digits spaces Secs spaces ( spaces digits
(?:\.\d+)? ##In a non-capturing group matching dot digits keeping it optional.
%\) ##matching % followed by ) here.
)* ##Closing very first non-capturing group, and matching its 0 or more occurrences.

If it's just the 1st occurrence you're after, wouldn't the following work?
/secs\s*\(\s*(\d+)/i

Google Forms Regular Expressions (REGEX) comma delaminated (CSV)

I have a Google form field, that contains 1 or more Id's
Patterns:
The IDs are always 6 numbers.
If only one ID is entered, a comma and a space is NOT required.
If more than one ID is entered, a comma and a space is required.
If more than one ID is entered, the last ID, should not have a comma or a space at the end.
Allowed Examples:
a single ID: 123456
multiple ID: 123456, 456789, 987654
Here is my current REGEX (does not work correctly)
[0-9]{6}[,\s]?([0-9]{6}[,\s])*[0-9]?
What am I doing wrong?

With your shown samples, could you please try following.
^((?:\d{6})(?:(?:,\s+\d{6}){1,})?)$
Online demo for above regex
Explanation: Adding detailed explanation of above regex.
^( ##Checking from starting of value, creating single capturing group.
(?:\d{6}) ##Checking if there are 6 digits in a non-capturing group here.
(?: ##Creating 1st non-capturing group here
(?:,\s+\d{6}) ##In a non-capturing group checking it has comma space(1 or more occurrences) followed by 6 digits here.
){1,})? ##Closing 1st non-capturing group here, it could have 1 or more occurrences of it.
)$ ##Closing 1st capturing group here with $ to make sure its end of value.

You can use
^\d{6}(?:,\s\d{6})*$
^ Start of string
\d{6} Match 6 digits
(?: Non capture group to repeat as a whole
,\s\d{6} Match a , a whitespace char and 6 digits
)* Close group and optionally repeat
$ End of string
Regex demo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Select comma by comma keywords with REGEX - regex

Related

Regex : how to optional capture a group

Regexp_Extract - Data Studio extract value after second underscore

Regex pattern for matching float followed by some fixed strings

Regex to split up string of CPU usage without percentage

Google Forms Regular Expressions (REGEX) comma delaminated (CSV)

Categories

Resources