I want to a regex pattern that could match the following cases:
0, 1, 0.1, .1, 1g, 0.1g, .1g, 1(g/100ml), .1(g/ml)
If the regex matches the pattern, I want to capture only the numerical part(0,1,0.1..)
I tried using following regex but it matches many cases:
((?=\.\d|\d)(?:\d+)?(?:\.?\d*))|((?=\.\d|\d)(?:\d+)?(?:\.?\d*))[a-zA-Z]+?|\([^)]*\)
How to achieve above with single regex pattern?
Edit:
To make the question solution more generic
What would be a single regex that would match below
Any numerical ( 0, 1, 0.1, ...)
Any numerical followed by g, mg any characters (0.1g, .1mg, 100kg)
Any numerical followed by anything in parentheses - .1(g/100ml), 100(mg/1kg)
And just capture the numerical part
You could make the pattern a bit more specific and use a capture group for the digits and optionally match what follows or (Updated with the comment of # anubhava) add a word boundary to prevent another partial match.
(\d*\.?\d+)(?:\(g\/\d*ml\)|g?\b)
(\d*\.?\d+) Capture group 1, match optional digits, optional . and 1+ digits
(?: Non capture group for the alternation
\(g\/\d*ml\) Match (g/ optional digits and ml)
| Or
g?\b Match an optional g followed by a word boundary
) Close non capture group
Regex demo
If the values should match in the comma separated string, you can assert either a , or the end of the string to the right.
(\d*\.?\d+)(?:\(g\/\d*ml\)|g)?(?=,|$)
Regex demo
Edit
A broad pattern to match anything between parenthesis or optional chars a-zA-Z after the digits:
(\d*\.?\d+)(?:\([^()]*\)|[a-zA-Z]*\b)
(\d*\.?\d+) Capture group 1, match optional digits, optional . and 1+ digits
(?: Non capture group
\([^()]*\) Match from opening till closing parenthesis
| Or
[a-zA-Z]*\b Optionally match chars in the ranges a-zA-Z followed by a word boundary
) Close non capture group
Regex demo
EDIT2: With OP's edited samples(to match 0, 1, 0.1 OR (0.1g, .1mg, 100kg) OR .1(g/100ml), 100(mg/1kg)), adding following solution here. Explanation is same as very first solution, only thing is in spite of matching specific strings, I have changed regex to match any alphabets here.
(\d*\.?\d+)(?:[a-zA-Z]+|\([a-zA-Z]+(?:\/\d*(?:[a-zA-Z]+))?\)|(?:,\s+|$))
Online Demo for above regex
EDIT1: As per OP's comments to match .01c and 100(g/1000L) kind of examples adding following regex, which is small edit to 1st solution here.
(\d*\.?\d+)(?:g|cc|\(g(?:\/\d*(?:ml|L))?\)|(?:,\s+|$))
Online demo for above regex
With your shown samples, please try following regex here.
(\d*\.?\d+)(?:g|\(g(?:\/\d*ml)?\)|(?:,\s+|$))
Online demo for above regex
Explanation: Adding detailed explanation for above.
(\d*\.?\d+) ##Matching digits 0 or more occurrences followed by .(optional, followed by 1 or more digits occurrences here.
(?: ##Starting a non-capturing group here.
g| ##matching only g here OR.
\(g(?:\/\d*ml)?\)| ##Matching (g) OR (g/digits ml) here OR.
(?:,\s+|$) ##Matching comma followed by 1 or more spaces occurrences OR end of value here.
) ##Closing non-capturing group here.
try this:
[\d]?\.?\d+(?:g|(?<p>\()(?(p)g\/(?:\d+)?ml\)))?
Demo
Related
I have a simple string separated by underscores from which I need to pull all the values after a specific underscore using a regular expression with the REGEXP_EXTRACT formula in Google Data Studio
The strings look like this:
ABC123_DEF456_GHI789-JKL274
Basically the values after the second underscore can be alphanumeric or symbols as well.
I need to pull the values after the second underscore. In the case of the example I gave, it would be:
GHI789-JKL274
Any ideas would be greatly appreciated.
With your shown samples please try following regex.
^(?:.*?_){2}([^_]*)
OR
REGEXP_EXTRACT(yourField, "^(?:.*?_){2}([^_]*)")
Here is the Online Demo for used regex.
Explanation: Adding a detailed explanation for used regex here.
^ ##Matching from starting of the value here.
(?: ##Opening 1 non-capturing group here.
.*?_ ##Using Lazy match to match till next occurrence of _ here.
){2} ##Closing non-capturing group here and matching its 2 occurrences.
( ##Creating 1 and only capturing group here.
[^_]* ##Matching everything before _ here.
) ##Closing capturing group here.
You need to use
REGEXP_EXTRACT(some_field, "^(?:[^_]*_){2}([^_]*)")
See the regex demo.
Details:
^ - start of string
(?:[^_]*_){2} - two occurrences of any zero or more chars other than _ and then a _
([^_]*) - Capturing group #1: zero or more chars other than _.
I am performing a string search where I am looking for the following three strings:
XXX-99-X
XXX-99X
XXX99-X
So far I have:
([A-Z]{3}(-?)[0-9]{2}(-?)[A-Z]{1})
How do I enforce that - has to be present at least once in either of the two possible locations?
You might use an alternation, to match either a - and optional - at the left or - at the right part.
Note that you can omit {1} from the pattern.
^[A-Z]{3}(?:-[0-9]{2}-?|[0-9]{2}-)[A-Z]$
^[A-Z]{3}
(?: Non capture group
-[0-9]{2}-?|[0-9]{2}- Match either - 2 digits and optional - Or 2 digits and -
) Close non capture group
$ end of string
regex demo
Or use a positive lookahead to assert a - at the right
^(?=[^-\r\n]*-)[A-Z]{3}-?[0-9]{2}-?[A-Z]$
^ Start of string
(?=[^-\r\n]*-) Positive lookahead, assert a - at the right
[A-Z]{3}-? Match 3 chars A-Z and optional -
[0-9]{2}-? Match 2 digits and optional -
[A-Z] Match a single char A-Z
$ End of string
Regex demo
With your shown samples, please try following.
^[A-Z]{3}(?:-?\d{2}-|-\d{2})[A-Z]+$
online demo for above regex
Explanation: Adding detailed explanation for above.
^[A-Z]{3} ##Matching if value starts with 3 alphabets here.
(?: ##Starting a non capturing group here.
-?\d{2}- ##Matching -(optional) followed by 2 digits followed by -
|
-\d{2} ##Matching dash followed by 2 digits.
) ##Closing very first capturing group.
[A-Z]+$ ##Matching 1 or more occurrences of capital letters at the end of value.
I have the following example of numbers, and I need to add a zero after the second period (.).
1.01.1
1.01.2
1.01.3
1.02.1
I would like them to be:
1.01.01
1.01.02
1.01.03
1.02.01
I have the following so far:
Search:
^([^.])(?:[^.]*\.){2}([^.].*)
Substitution:
0\1
but this returns:
01 only.
I need the 1.01. to be captured in a group as well, but now I'm getting confuddled.
Does anyone know what I am missing?
Thanks!!
You may try this regex replacement with 2 capture groups:
Search:
^(\d+\.\d+)\.([1-9])
Replacement:
\1.0\2
RegEx Demo
RegEx Details:
^: Start
(\d+\.\d+): Match 1+ digits + dot followed by 1+ digits in capture group #1
\.: Match a dot
([1-9]): Match digits 1-9 in capture group #2 (this is to avoid putting 0 before already existing 0)
Replacement: \1.0\2 inserts 0 just before capture group #2
You could try:
^([^.]*\.){2}\K
Replace with 0. See an online demo
^ - Start line anchor.
([^.]*\.){2} - Negated character 0+ times (greedy) followed by a literal dot, matched twice.
\K - Reset starting point of reported match.
EDIT:
Or/And if \K meta escape isn't supported, than see if the following does work:
^((?:[^.]*\.){2})
Replace with ${1}0. See the online demo
^ - Start line anchor.
( - Open 1st capture group;
(?: - Open non-capture group;
`Negated character 0+ times (greedy) followed by a literal dot.
){2} - Close non-capture group and match twice.
) - Close capture group.
Using your pattern, you can use 2 capture groups and prepend the second group with a dot in the replacement like for example \g<1>0\g<2> or ${1}0${2} or $10$2 depending on the language.
^((?:[^.]*\.){2})([^.])
^ Start of string
((?:[^.]*\.){2}) Capture group 1, match 2 times any char except a dot, then match the dot
([^.].*) Capture group 2, match any char except a dot
Regex demo
A more specific pattern could be matching the digits
^(\d+\.\d+\.)(\d)
^ Start of string
(\d+\.\d+\.) Capture group 1, match 2 times 1+ digits and a dot
(\d) Capture group 2, match a digit
Regex demo
For example in JavaScript
const regex = /^(\d+\.\d+\.)(\d)/;
[
"1.01.1",
"1.01.2",
"1.01.3",
"1.02.1",
].forEach(s => console.log(s.replace(regex, "$10$2")));
Obviously, there will be tons of solutions for this, but if this pattern holds (i.e. always the trailing group that is a single digit)... \.(\d)$ => \.0\1 would suffice - to merely insert a 0, you don't need to match the whole thing, only just enough context to uniquely identify the places targeted. In this case, finding all lines ending in a . followed by a single digit is enough.
I am trying to create a Regex with groups that should group 1234.0500- to 1234.05-.
What I have tried is:
^([0-9]+)(\.)([1-9]*)0*(-?)$
but it does not match 1234.0500-. Here is the example https://regex101.com/r/koSZoB/1. The regex should also group
1234.0000
0.9000
to
1234
0.9
In your pattern, this part ([1-9]*)0*(-?)$ matches optional digits 1-9 followed by optional zeroes and then an optional hyphen at the end of the string. It will succeed until the first zero:
0500
^
But the match will fail as it can not match (-?)$
You could use 3 capturing groups and use those in the replacement.
After group 1, you could either match a dot followed by only zeroes which should be removed, or capture in group 2 matching from the dot till the lats digits 1-9 and remove the trailing zeroes.
^(\d+)(?:\.0+|(\.\d*[1-9])0+)(-?)$
Explanation
^ Start of string
(\d+) Capture group 1, match 1+ digits
(?: Non capture group, match either
\.0+ Match a . and 1+ zeroes
| Or
(\.\d*[1-9])0+ Capture ., 0+ digits followed by a digit 1-9 and match the following 1+ zeroes to be removed
) Close group
(-?) Capture optional -
$ End of string
Regex demo
There is no language tagged, but for example in Javascript
const pattern = /^(\d+)(?:\.0+|(\.\d*[1-9])0+)(-?)$/;
[
"1234.0500-",
"1234.05500-",
"1234.0550588500-",
"1234.0000",
"0.9000",
"12.1222",
"12.1222-",
].forEach(s => console.log(s.replace(pattern, "$1$2$3")));
The third capture group doesn't include zeroes meaning that the 0 in 05 is making the match fail.
I would suggest making the third capture group non-greedy by adding a ?: ^([0-9]+)(\.)([0-9]*?)0*(-?)$ This will make it match the minimum amount of zeroes possible instead of the maximum. With the last group being greedy it should work.
I need to get only the string with names that is in Bold:
author={Trainor, Sarah F and Calef, Monika and Natcher, David and Chapin, F Stuart and McGuire, A David and Huntington, Orville and Duffy, Paul and Rupp, T Scott and DeWilde, La'Ona and Kwart, Mary and others},
Is there a way to skip all 'and' 'others' words from match result?
Tried to do lots of things, but nothing works as i expect
(?<=\{).+?(?<=and\s).+(?=\})
Instead of using omission, you could be better off by implementing rules which expect a specific format in order to match the examples you've provided:
([A-Z]+[A-Za-z]*('[A-Za-z]+)*, [A-Z]? ?[A-Z]+[A-Za-z]*('[A-Za-z]+)*( [A-Z])?)
https://regex101.com/r/9LGqn3/3
You could make use of \G and a capturing group to get you the matches.
The values are in capturing group 1.
(?:author={|\G(?!^))([^\s,]+,(?:\h+[^\s,]+)+)\h+and\h+(?=[^{}]*\})
About the pattern
(?: Non capturing group
author={ Match literally
| Or
\G(?!^) Assert position at the end of previous match, not at the start
) Close non capturing group
( Capture group 1
[^\s,]+, Match not a whitespace char or comma, then match a comma
(?:\h+[^\s,]+)+ Repeat 1+ times matching 1+ horizontal whitespace chars followed by matching any char except a whitespace char and a comma
) Close group 1
\h+and\h+ Match and between 1+ horizontal whitespaces
(?=[^{}]*\}) Assert what is on the right is a closing }
Regex demo