Regex: Non-Capturing Group in Non-Numeric?

Regex: Non-Capturing Group in Non-Numeric? - regex

I am trying to test that a timestamp (let's use HH:MM:ss as an example) does not have any numeric characters surrounding it, or to say that I would like to check for the presence of a non-numeric character before and after my timestamp. The non-numeric character does not need to exist, but no numeric character should exist directly before nor directly after. I do not want to capture this non-numeric character. How can I do this? Should I use "look-arounds" or non-capturing groups?
Fill in the blank + (2[0-3]:[0-5][0-9]:[0-5][0-9]|[0-1][0-9]:[0-5][0-9]:[0-5][0-9]) + Fill in the blank
Thanks!

I would like to check for the presence of a non-numeric character before and after my timestamp. The non-numeric character does not need to exist, but no numeric character should exist directly before nor directly after. I do not want to capture this non-numeric character.
The best way to match such a timestamp is using lookarounds:
(?<!\d)(2[0-3]:[0-5][0-9]:[0-5][0-9]|[0-1][0-9]:[0-5][0-9]:[0-5][0-9])(?!\d)
The (?<!\d) fails a match if there is a digit before the timestamp and (?!\d) fails a match if there is a digit after the timestamp.
If you use
\D*(2[0-3]:[0-5][0-9]:[0-5][0-9]|[0-1][0-9]:[0-5][0-9]:[0-5][0-9])\D*
(note that (?:...) non-capturing groups only hamper the regex engine, the patterns inside will still match, consume characters), you won't get overlapping matches (if there is a timestamp right after the timestapmp). However, this is a rare scenario I believe, so you still can use your regex and grab the value inside capture group 1.
Also, see my answer on How Negative Lookahead Works. A negative lookbehind works similarly, but with the text before the matching (consuming) pattern.
A JS solution is to use capturing groups:
var re = /(?:^|\D)(2[0-3]:[0-5][0-9]:[0-5][0-9]|[0-1][0-9]:[0-5][0-9]:[0-5][0-9])(?=\D|$)/g;
var text = "Some23:56:43text here Some13:26:45text there and here is a date 10/30/89T11:19:00am";
while ((m=re.exec(text)) !== null) {
document.body.innerHTML += m[1] + "<br/>";
}

The regex class for "anything that is not numeric" is:
\D
This is equivalent to:
[^\d]
So you would use:
\D*(2[0-3]:[0-5][0-9]:[0-5][0-9]|[0-1][0-9]:[0-5][0-9]:[0-5][0-9])\D*
You don't need to surround it with a non-capturing group (?:).

Related

Matching comma after certain phrase

I'm using Atom's regex search and replace feature and not JavaScript code.
I thought this JavaScript-compatible regex would work (I want to match the commas that have Or rather behind it):
(?!\b(Or rather)\b),
?! = Negative lookahead
\b = word boundary
(...) = search the words as a whole not character by character
\b = word boundary
, = the actual character.
However, if I remove characters from "Or rather" the regex still matches. I'm confused.
https://regexr.com/4keju

You probably meant to use positive lookbehind instead of negative lookbehind
(?<=\b(Or rather)\b),
Regex Demo
You can activate lookbehind in atom using flags, Read this thread

The (?!\b(Or rather)\b), pattern is equal to , as the negative lookahead always returns true since , is not equal to O.
To remove commas after Or rather in Atom, use
Find What: \b(Or rather),
Replace With: $1
Make sure you select the .* option to enable regular expressions (and the Aa is for case sensitivity swapping).
\b(Or rather), matches
\b - a word boundary
(Or rather) - Capturing group 1 that matches and saves the Or rather text in a memory buffer that can be accessed using $1 in the replacement pattern
, - a comma.
JS regex demo:
var s = "Or rather, an image.\nor rather, an image.\nor rather, friends.\nor rather, an image---\nOr rather, another time they.";
console.log(s.replace(/\b(Or rather),/g, '$1'));
// Case insensitive:
console.log(s.replace(/\b(Or rather),/gi, '$1'));

To Match any comma after "Or rather" you can simply use
(or rather)(,) and access the second group using match[2]
Or an alternative would be to use or rather as a non capturing group
(?:or rather)(,) so the first group would be commas after "Or rather"

Extract a substring from value of key-value pair using regex

I have a string in log and I want to mask values based on regex.
For example:
"email":"testEmail#test.com", "phone":"1111111111", "text":"sample text may contain email testEmail#test.com as well"
The regex should mask
email value - both inside the string after "email" and "text"
phone number
Desired output:
"email":"*****", "phone":"*****", "text":"sample text may contain email ***** as well"
What I have been able to do is to mask email and phone individually but not the email id present inside the string after "text".
Regex developed so far:
(?<=\"(?:email|phone)\"[:])(\")([^\"]*)(\")
https://regex101.com/r/UvDIjI/2/

As you are not matching an email address in the first part by matching not a double quote, you could match the email address in the text by also not matching a double quote.
One way to do this could be to get the matches using lookarounds and an alternation. Then replace the matches with *****
Note that you don't have to escape the double quote and the colon could be written without using the character class.
(?<="(?:phone|email)":")[^"]+(?=")|[^#"\s]+#[^#"\s]+
Explanation
(?<="(?:phone|email)":") Assert what is on the left is either "phone":" or "email":"
[^"]+(?=") Match not a double quote and make sure that there is one at the end
| Or
[^#"\s]+#[^#"\s]+ Match an email like pattern by making use of a negated character class matching not a double quote or #
See the regex demo

Your current RegEx is trying to accomplish too much in a single take. You'd be better off splitting the conditions and dealing with them separately. I'll assume that the input will always follow the structure of your example, no edge cases:
Emails:
\w+#.+?(?="|\s) - In emails, every character preceded by # is always a word character, so using \w+# is enough to capture the first half of the email. As for the second half, I used a wildcard (.) with a lazy quantifier (+?) to stop the capture as soon as possible and combined it with a positive lookahead that checks for double quotes or whitespaces ((?="|\s)) so to capture both the e-mails inside "email" and "text" properties. Lookarounds are zero-length assertions and thus they don't get captured.
Phone number:
(?<="phone":")\d+ - Here I just use the prefix "phone":" in a lookbehind and then capture only digits \d+.
Combine both conditions and you have your RegEx: \w+#.+?(?="|\s)|(?<="phone":")\d+.
Regex101: https://regex101.com/r/UvDIjI/3

Meta Sequence Word Boundary \b & Alternation |
The input string pattern has either quotes or spaces wrapped around the targets which both are considered non-words. So this: "\bemailPattern\b" and this: space\bemailPattern\bspace are matches. The alternation gives one line the power of two lines. Search for emailPattern OR phonePattern.
/(\b\w+?#\w+?\.\w+?\b|[0-9]{10})/g;
(Word boundary (a non-word on the left) \b
One or more word characters \w+?
Literal #
One or more word characters \w+?
Escaped literal .
One or more word characters \w+?
Word boundary (a non-word on the right) \b
OR |
10 consecutive numbers [0-9]{10} )
global flag continues search after first match.
Demo
let str = `"email":"testEmail#test.com", "phone":"1111111111", "text":"sample text may contain email testEmail#test.com as well"`;
const rgx = /(\b\w+?#\w+?\.\w+?\b|[0-9]{10})/g;
let res = str.replace(rgx, '*****');
console.log(res);

regex with match in GREL/OpenRefine

I'm using OpenRefine to parse a column with string values.
I want to find the cells that contain either: offer or discount.
The string value is usually a sentence
My code below is using the match function not working.
using value.contains() is limited to searching for one word only.
value.match(/.*(offer)|(discount)/)

What I can see in the documentation is that the .match function Attempts to match the string s in its entirety against the regex pattern p and returns an array of capture groups.
To match either one of them but not both, you might use a positive and a negative lookahead if that is supported.
To match either of the options, use an alternation to make sure one of the words is there and the other one is not and vice versa:
(?:(?!.*\bdiscount\b).*\boffer\b.*|(?!.*\boffer).*\bdiscount\b.*)
Regex demo
That will match
(?: Non capturing group
(?!.*\bdiscount\b).*\boffer\b.* Assert that on the right is no discount and match any char and offer
| Or
(?!.*\boffer).*\bdiscount\b.* Or assert the opposite
) Close non capturing group

regex - match only quotes surrounding numeric values

So I use a lot of regex to format SQL.
I'm trying to match all quotes surrounding numeric values (INT) so I can remove them.
I use this to match numerics in qoutes:
(?<=(['"])\b)(?:(?!\1|\\)[0-9]|\\.)*(?=\1)
Playing with this so far but no luck yet:
'(?=[0-9](?='))*
What i'm trying to say is look ahead infinity, matching anything that is a number unless it is quote then accept then match.
Any regex ninja's out there can help put me on the path?
Here's an example string:
'2018-12-09 07:29:00.0000000', 'US', 'MI', 'Detroit', '48206', '505', '68.61.112.245', '0', 'Verizon'
I just want to match the ' around 48206, 505, and 0 so I can strip them.
To be safe lets assume there are other characters as well that could appear in the test string. ie - its not really feasible to say just match anything that's no a dash a letter or a dot, etc. Also the question is language-agnostic so any applicable language is fine -- JavaScript, Python, Java, etc.

You can select all such numbers using this regex,
'(\d+)'
And then replace it with \1 or $2 as per your language.
Demo
This will get rid of all quotes that are surrounding numbers.
Let me know if this works for you.
Also, as an alternative solution, if your regex engine supports ECMAScript 2018, then you can exploit variable length look behind and use this regex to select only quotes that surround a number,
'(?=\d+')|(?<='\d+)'
And replace it with empty string.
Demo
Make sure you check this demo in Chrome which supports it and not Mozilla which doesn't support it.

.split().join() Chain
.split() can use RegEx such as this:
/'\b([0-9]+?)\b'/
Literal match single straight quote: '
Meta sequence word boundary sets the beginning of a word/number: \b
Capture group: ( class range: [ of any digit: 0-9]
Match at least once and continue to do so until the next word border is reached and a literal straight single quote: )+?\b'
Since .split() iterates through the string a global flag isn't needed. .join(''); is chained to .split() and the result is back to a string from am array.
Demo
var strA = `'2018-12-09 07:29:00.0000000', 'US', 'MI', 'Detroit', '48206', '505', '68.61.112.245', '0', 'Verizon'`;
var strB = strA.split(/'\b([0-9]+?)\b'/).join('');
console.log(strB);

You could capture a single or a double quote as in your first regex in a capturing group and then capture the digits in betweenin group 2 and finally use a backreference to group 1
In the replacement, use the second capturing group $2 for example
(['"])(\d+)\1
Explanation
(['"]) Capture ' or " in a capturing group
(\d+) Capture 1+ digits in a group
\1 Backreference to group 1
Regex demo
Result
''2018-12-09 07:29:00.0000000', 'US', 'MI', 'Detroit', 48206, 505, '68.61.112.245', 0, 'Verizon''

Split string in two groups with regex and show only the last group

I have some similar properties. First I need to select the property and second I want to have the value as result of a regular expression.
Data:
BlockSize:4096
TotalBlocks:68822
HighWater:68764
FreeBlocks:2553
RecordBlocks:25378
BIBlocksize:8192
BIClustersize:512
The regular expression to split them in two groups is: (FreeBlocks):(.*$). Now I want only the value (2e group) as the result. I want to use these expressions in a Zabbix key.

Accoring to the documentation zabbix uses PCRE. In that case you might use \K to reset the starting point of the reported match and match one or more time a digit \d+ or use .* to match any character zero or more times.
This will give you a match instead of a capturing group. If you do want the group you could use parenthesis (\d+)
FreeBlocks:\K\d+$
To match all before the colon you could use a negated character class:
^[^:]+:\K\d+$

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex: Non-Capturing Group in Non-Numeric? - regex

The regex class for "anything that is not numeric" is: \D This is equivalent to: [^\d] So you would use: \D(2[0-3]:[0-5][0-9]:[0-5][0-9]|[0-1][0-9]:[0-5][0-9]:[0-5][0-9])\D You don't need to surround it with a non-capturing group (?:).

Related

Matching comma after certain phrase

Extract a substring from value of key-value pair using regex

regex with match in GREL/OpenRefine

regex - match only quotes surrounding numeric values

Split string in two groups with regex and show only the last group

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex: Non-Capturing Group in Non-Numeric? - regex

The regex class for "anything that is not numeric" is: \D This is equivalent to: [^\d] So you would use: \D*(2[0-3]:[0-5][0-9]:[0-5][0-9]|[0-1][0-9]:[0-5][0-9]:[0-5][0-9])\D* You don't need to surround it with a non-capturing group (?:).

Related

Matching comma after certain phrase

Extract a substring from value of key-value pair using regex

regex with match in GREL/OpenRefine

regex - match only quotes surrounding numeric values

Split string in two groups with regex and show only the last group

Categories

Resources

The regex class for "anything that is not numeric" is: \D This is equivalent to: [^\d] So you would use: \D(2[0-3]:[0-5][0-9]:[0-5][0-9]|[0-1][0-9]:[0-5][0-9]:[0-5][0-9])\D You don't need to surround it with a non-capturing group (?:).