RegexReplace the nth occurrence of a string of underscores - regex

I'm having trouble getting a REGEXREPLACE working in a Google Sheets formula. I'm aiming to replicate a certain card game which is opposed to humankind. I have a cell containing a string which contains one, two or three occurrences of a series of underscores, e.g.
"_____ is the new _____"
And let's say I want to substitute in the strings "Orange" for the first occurrence, and "Black" for the second occurrence.
I don't know how many underscores will be in each string, it could be one or more, so it seems like a job for regex. I tried SUBSTITUTE and it didn't seem to recognise asterisks. Based on this link, I tried using {1} {2} and {3} to match the first/second/third occurrence, but I'm not doing something right:
=REGEXREPLACE(G16,".*(_*){1}.*",G17)
G16 is: _____ is the new _____.
G17 is: Orange
The output of the formula is: OrangeOrange.
Can anyone help me figure out the correct way to do this?

You may use
=REGEXREPLACE(REGEXREPLACE(G16,"^([^_]*)_+","$1Orange"), "^([^_]*)_+", "$1Black")
|----- First occurrence -----------------|
|----------------- Second occurrence ------------------------------------------|
Details
^ - start of string
([^_]*) - Capturing group 1 ($1 will refer to this group value): 0 or more chars other than an underscore
_+ - 1 or more underscores.

Related

Using Data Validation in Google Sheets to only allow numbers, commas and spaces

I am creating an input google sheet to accept just numbers, commas & spaces - examples listed below. - At a basic level, I just want to exclude the use of A-Z / a-z.
80092382
800
800,876
98672102,20192210
I would like it to exclude anything like:
Hey 01
987 blue
black 1 white
orange
I am getting stuck early on, where I'm trying to only allow text with only numbers in, or excluding anything with letters in.
I have tried the following lines of code within the RegexMatch formula but it either accepts lines with text, or rejects my numbers.
=RegexMatch(L5,"\d")
- This one rejects the numbers.
=RegexMatch(to_text(L5),"\d")
- This rejects only where there are no numbers in the cell - So 'Hey 01' is accepted.
=RegexMatch(to_text(L5),"^\d")
- Same issue, if the cell starts with a number '987 blue' then its accepted
I've attempted a few other ways, such as using the NOT function at the beginning & using other regular expressions. If anyone can point me in the right direction then that would be much appreciated.
Test sheet
You can use
=ARRAYFORMULA(IF(REGEXMATCH(TO_TEXT(C2:C9), "^\d+(,\s*\d+)*$"), "Good", "Error"))
The regex matches
^ - start of string
\d+ - one or more digits
(,\s*\d+)* - zero or more repetitions of
, - a comma
\s* - zero or whitespace
\d+ - one or more digits
$ - end of string.
You can test the content with this type of regex:
=arrayformula(if(regexmatch(to_text(A1:A),"([^\d.,])"),"Error",))

Match a particular word along with one word before and one word after, when each of them are separated by a delimiter in postgreSQL

I want to match match a particular word along with a leading word and a trailing word using regex. The match should be done with a condition that the left word and right word are different form the one in the middle.
If i have few lines as below.
1. apple,banana,orange,banana
2. orange,banana,guava
3. banana,orange,banana,orange,apple,guava
4. guava,apple,orange,orange,banana
Required Output
If orange is a taken as middle word then, the output I'm looking for is as below:
1. (banana,orange,banana)
2. (orange,banana)
3. (banana,orange,banana), (banana,orange,apple)
4. (apple,orange,banana)
Currently, I have the following regex
([\w\-\(\) ]+)?,?(orange)+,?([\w\-\(\) ]+)?
which is giving me the following
1. (banana,orange,banana)
2. (orange,banana)
3. (banana,orange,banana), (NULL,orange,apple)
4. (apple,orange,orange)
I have already spent a day on this, please help me with this problem. Thanks!

Python Regular Expression: No space in between

I have the following string:
"......(some chars) aaa bbb ###8/13/2018 ......(some chars)"
The ### in the string represent some random characters. ###'s length is unknown and it could be None (just "aaa bbb 8/13/2018").
My goal is to find the date from the string (8/13/2018) and the starting index of ###.
I currently used the following code:
m = re.search(r'\s.*?([0-9]{1,}/[0-9]{1,}/[0-9]{2,})', str)
m.groups()[0] ## The date
m.start() ## index of ###
But the regex is matching bbb ###8/13/2018 instead of ###8/13/2018
I also tried change the regex to:
r'\s(?!\s).*?[0-9]{1,}/[0-9]{1,}/[0-9]{2,}'
r'\s(?!\s)*?[0-9]{1,}/[0-9]{1,}/[0-9]{2,}'
But neither of them works.
I will be appreciated for any help or comments. Thank you.
I tend to believe you are looking for:
#*(?:\d{1,2}/){2}\d{2,4} or even \S*(?:\d{1,2}/){2}\d{2,4}
This is simply saying:
\S* start with 0 or more non-space charaters.
(?:\d{1,2}/){2} find two groups of \d{1,2}/ but do not capture them. ie not capturing: (?:..).this will match the month and date part 8/13/. \d{1,2} means atleast one digit and atmost two digits
\d{2,4} match the year .Atleast 2 digits and atmost 4 digits
Using a part of your regex, I think you mean something like this
r'\S*([0-9]+/[0-9]+/[0-9]{2,})'
https://regex101.com/r/dxF4sT/1
To find the starting index, it would be where the match was found.
Note that \S will find all consecutive non-whitespace.
You can change this to other things like [#a-zA-Z] etc..., just add it to the class.

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.
you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.
.*?idUser=([0-9]+).*?
That regex should work for you :o)
Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match
Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))