Remove whitespace from within a regex capturing group

Remove whitespace from within a regex capturing group - regex

This should be a simple job, but this morning I just can't seem to find the answer I need
Value:
N.123456 7
Current regex
N.(\d{6}\s?\d)
Returns single matching group
123456 7
Want it to return single matching group
1234567
Thanks

You can't return as single matching group. I think what are you looking for is non-capturing group (?: ).
There is explanation here
Maybe this regex would help you. It will exclude space character with non capturing group.
N.(\d{6})(?:\s?)(\d)
It will capture 123456 in group 1 and 7 in group 2.
What you want is probably this. It will return 1234567
"N.123456 7".replaceAll("N.(\\d{6})(?:\\s?)(\\d)", "$1$2")

Try this:
(?<=\d)\s+(?=\d+)
This should work.

Related

Regex to remove time zone stamp

In Google Sheets, I have time stamps with formats like the following:
5/25/2022 14:13:05
5/25/2022 13:21:07 EDT
5/25/2022 17:07:39 GMT+01:00
I am looking for a regex that will remove everything after the time, so the desired output would be:
5/25/2022 14:13:05
5/25/2022 13:21:07
5/25/2022 17:07:39
I have come up with the following regex after some trial and error, although I am not sure if it is prone to errors: [^0-9:\/' '\n].*
And the function in Google Sheets that I plan to use is REGEXREPLACE().
My goal is to be able to do calculations regardless of one's time zone, however the result will be stamped with the user's local time zone.
Could someone confirm this is correct? Appreciate any feedback I can get!

You can use
=REGEXREPLACE(A1, "^(\S+\s\S+).*", "$1")
=REGEXREPLACE(A1, "^([\d/]+\s[\d:]+).*", "$1")
See the regex demo #1 / regex demo #2.
Details:
^ - start of string
(\S+\s\S+) - Group 1: one or more non-whitespaces, one or more whitespaces and one or more non-whitespaces
[\d/]+\s[\d:]+ - one or more digits or / chars, a whitespace, one or more digits or colons
.* - any zero or more chars other than line break chars as many as possible.
The $1 is a replacement backreference that refers to the Group 1 value.

With your shown samples, attempts please try following regex in REGEXREPLACE. This will help to match time stamp specifically. Here is the Online demo for following regex. This will create only 1 capturing group with which we are replacing the whole value(as per requirement).
=REGEXREPLACE(A1, "^((?:\d{1,2}\/){2}\d{4}\s+(?:\d{1,2}:){2}\d{1,2}).*", "$1")
Explanation: Adding detailed explanation for above used regex.
^( ##Matching from starting of the value and creating/opening one and only capturing group.
(?:\d{1,2}\/){2} ##Creating a non-capturing group and matching 1 to 2 digits followed by / with 2 times occurrence here.
\d{4}\s+ ##Matching 4 digits occurrence followed by 1 or more spaces here.
(?:\d{1,2}:){2} ##In a non-capturing group matching 1 to 2 occurrence of digits followed by colon and this combination should occur2 times.
\d{1,2} ##Matching 1 to 2 occurrences of digits.
) ##Closing capturing group here.
.* ##This will match everything till last but its not captured.

You can do this without REGEX by simply splitting and adding the first and second index.
=ARRAYFORMULA(
IF(ISBLANK(A2:A),,
INDEX(SPLIT(A2:A," "),0,1)+
INDEX(SPLIT(A2:A," "),0,2)))

Regex Break String Into Groups

I have a string:
ABC/12345.DEF/ZYX.THIS IS THE REST OF THE STRING
I need regex that will break this into 3 names groups:
FIRST: 12345
SECOND: ZYX
THIRD: THIS IS THE REST OF THE STRING
This is what I have come up with:
(?=.*\bABC\/(?<FIRST>[\w\d\s,]*)\.\b)(?=.*\bDEF\/(?<SECOND>[\w])\b)(?<THIRD>[\w\W\s]*)
That yields:
FIRST: 12345
SECOND: ZYX
THIRD: ABC/12345.DEF/ZYX.THIS IS THE REST OF THE STRING
Any help will be greatly appreciated.
Thanks

Your current regex uses positive lookaheads before it hits the third capture group - this (ironically) forces it to capture the first and second capture groups in the final, third capture group. Here is an example that doesn't use lookaheads.
ABC\/(?<FIRST>[^\.\n]+)\.DEF\/(?<SECOND>[^\.\n]+)\.(?<THIRD>.*)
ABC\/ Read ABC/
(?<FIRST>[^\.\n]+) Named capture group that captures all characters not . or newline, up to first occurrence of either.
\.DEF\/ Read .DEF/
(?<SECOND>[^\.\n]+) Same as first capture group.
\. Read .
(?<THIRD>.*) Capture everything else.
Try it here!

Regex - optional capture group after wildcard

Say I have the following list:
No 1 And Your Bird Can Sing (4)
No 2 Baby, You're a Rich Man (5)
No 3 Blue Jay Way S
No 4 Everybody's Got Something to Hide Except Me and My Monkey (1)
And I want to extract the number, the title and the number of weeks in the parenthesis if it exists.
Works, but the last group is not optional (regstorm):
No (?<no>\d{1,3}) (?<title>.*?) \((?<weeks>\d)\)
Last group optional, only matches number (regstorm):
No (?<no>\d{1,3}) (?<title>.*?)( \((?<weeks>\d)\))?
Combining one pattern with week capture with a pattern without week capture works, but there gotta be a better way:
(No (?<no>\d{1,3}) (?<title>.*) \((?<weeks>\d)\))|(No (?<no>\d{1,3}) (?<title>.*))
I use C# and javascript but I guess this is a general regex question.

Your regex is almost there!
First and most importantly, you should add a $ at the end. This makes (?<title>.*?) match all the way towards the end of the string. Currently, (?<title>.*?) matches an empty string and then stops, because it realises that it has reached a point where the rest of the regex matches. Why does the rest of the regex match? Because the optional group can match any empty string. By putting the $, you are making the rest of the regex "harder" to match.
Secondly, you forgot to match an open parenthesis \(.
This is how your regex should look like:
No (?<no>\d{1,3}) (?<title>.*?)( \((?<weeks>\d)\))?$
Demo

You may use this regex with an optional last part:
^No (?<no>\d{1,3}) (?<title>.*?\S)(?: \((?<weeks>\d)\))?$
RegEx Demo

Another option could be for the title to match either not ( or when it does encounter a ( it should not be followed by a digit and a closing parenthesis.
^No (?<no>\d{1,3}) (?<title>(?:[^(\r\n]+|\((?!\d\)))+)(?:\((?<weeks>\d)\))?
In parts
^No
(?\d{1,3}) Group no and space
(?<title>
(?: Non capturing group
[^(\r\n]+ Match any char except ( or newline
| Or
\((?!\d\)) Match ( if not directly followed by a digit and )
)+ Close group and repeat 1+ times
) Close group title
(?: Non capturing group
\((?<weeks>\d)\) Group weeks between parenthesis
)? Close group and make it optional
Regex demo
If you don't want to trim the last space of the title you could exclude it from matching before the weeks.
Regex demo

Regex to capture words and numbers in separate groups

I need two groups - one to extract words, second - numbers. Example:
['| Sofia | 300']
need to extract:
Group 1 - Sofia; Group 2 - 300
My regex attempt:
([a-zA-Z]+[ ]*[a-zA-Z]+)([0-9]+)
I don't understand as to why this doesn't match. I've been reading for 30 minutes now and maybe I can't phrase my issue correctly, but I can't find solution. My thinking here is that each set of parentheses holds a group. The Regex inside them seems to work fine on its own, but when I try to capture 2 groups - it fails. Obviously I am missing something important about multiple group capturing.

It doesn't match because you're not matching the characters between "Sofia" and "300". This would match "Sofia300", but not "Sofia 300" or "Sofia | 300". Try this:
(\w+ *\w+).*?(\d+)
(I'm using \w instead of [a-zA-Z] and \d instead of [0-9] for brevity.)

The following will give you your groups:
/([a-z]+).*\|\s([0-9]+)/i
Example

How to match a group of value to group 1

Was tying to solve a regex question posted in SO, but was stuck with this.
From this string
Ob=Web technology,OB=Product SPe,OB=Dev profile,OB=Computer Management,oB=Hardware Services,cd=sti,CD=com,cd=ws
The values has to be removed as below.
Web technology,Product SPe,Dev profile,Computer Management,Hardware Services
I was trying the below regex.
(?=Ob)(?:(\w+=)([\w\s]+,?))+
My assumption was that group 1 should have all keys and group 2 should have all the values. But all except the last key value pair all others are getting assigned to group 0.
Is there a way go getting all values to group 2 ?
And here is what I was working on.

The issue with your regex is that group 1 and group 2 are enclosed within a non-capturing group. This caused the entire regex to get captured with group 0. And the other thing is the the positive-lookahead prevented the regex to do a global match.
Below regex will gather all keys to group to group 1 and values to group 2.
(\w+)=([\w\s]+)(?=[,\s]+)
Check it out how it works here.

,?cd=.*?(?:,|$)|ob=
Try this.Replace by empty string.See demo.Do not forget flag i.
http://regex101.com/r/lZ5mN8/59
or
cd=.*?(?:,|$)|[^=,]+=(.*?)(?=,|$)
Try this.Replace by $1.See demo.
http://regex101.com/r/lZ5mN8/57

REgex:
(?i)Ob=([^,]+)|(?!.*\bob\b).+
Replacement string:
$1
DEMO
(?i) Will do a case insensitive match.
Ob=([^,]+) Group index 1 contains all the Ob values.
| OR
(?!.*\bob\b).+ Match any character one or more times but it won't contain \bob\b

This regex should work for you:
^(?!Ob=).*(*SKIP)(*F)|(\w+)=(\w+(?=,|$))
You can see that you're getting all keys in group #1 and all values in group #2.
RegEx Demo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove whitespace from within a regex capturing group - regex

This should be a simple job, but this morning I just can't seem to find the answer I need Value: N.123456 7 Current regex N.(\d{6}\s?\d) Returns single matching group 123456 7 Want it to return single matching group 1234567 Thanks

Try this: (?<=\d)\s+(?=\d+) This should work.

Related

Regex to remove time zone stamp

Regex Break String Into Groups

Regex - optional capture group after wildcard

Regex to capture words and numbers in separate groups

How to match a group of value to group 1

Categories

Resources