Regular expression as optional with maximum limit of 1 - regex

I'm new to regex. Facing some issues while making one expression as optional and if it exists then it should not be repeated. In the below case I want %23 to be optional and if it occurs then it should not be repeated. But in below case it's working for optional but not for repeat case.
It's giving me true even if I put string as:
-113%23%2313113098A%2F--
Could someone suggest how to make it optional and not repetitive. This is my regex:
(%23)?([0-9]|[A-Z]|%2F|-).*$

You can use a negative lookahead to avoid matching repeating instances of %23:
^(?:[0-9]|[A-Z]|%2F|[-%])(?!(?:.*?%23){2}).*$
Breakup:
(?! # start negative lookahead
(?:.*?%23){2} # match 0 or more chars followed by %23, {2} matches 2 repeats
) # end lookahead
RegEx Demo
However if requirement is to avoid consecutive repeats then use:
^(?!.*?(?:%23){2})

Related

Changing existing regex so at least 1 numerical value is required

Said regex: \b(?=(?:[a-z\d]*[A-Z]){3})(?=.*\d)(?=(?:[A-Z\d]*[a-z]){2})[a-zA-Z\d]{5,30}\b
I am trying to add just 1 condition to this regex, that being that 1 number is required in order to match, I have tried inserting the lookahead (?=.*\d) but it did not work, as it matches the "HeLLoWoRlD" portion of <HeLLoWoRlD"123
You may add the (?=[A-Za-z]*\d) lookahead check:
\b(?=[A-Za-z]*\d)(?=(?:[a-z\d]*[A-Z]){3})(?=(?:[A-Z\d]*[a-z]){2})[a-zA-Z\d]{5,30}\b
^^^^^^^^^^^^^^^
See the regex demo.
The (?=[A-Za-z]*\d) lookahead matches a location that is immediately followed with 0 or more ASCII letters and then one digit.

Using regex to determine straight (unordered hand)

A straight in poker is five cards in a row, for example 23456 or 89TJQ. With a "sorted" hand, the regex could be written as:
^(A2345|23456|34567|45678|56789|6789T|789TJ|89TJQ|9TJQK|TJQKA)$
It's a bit verbose but straightforward enough. However, would it be possible to generate a (sensible) regex if the hand was unordered? For example, if the hand was 52634 or JQ89T??
One possible way would be to use a ?=.*<item> lookahead (which would essentially be "unsorted"), for example:
^(?:
(?=.*A)(?=.*2)(?=.*3)(?=.*4)(?=.*5)
|(?=.*2)(?=.*3)(?=.*4)(?=.*5)(?=.*6)
|(?=.*3)(?=.*4)(?=.*5)(?=.*6)(?=.*7)
|(?=.*4)(?=.*5)(?=.*6)(?=.*7)(?=.*8)
|(?=.*5)(?=.*6)(?=.*7)(?=.*8)(?=.*9)
|(?=.*6)(?=.*7)(?=.*8)(?=.*9)(?=.*T)
|(?=.*7)(?=.*8)(?=.*9)(?=.*T)(?=.*J)
|(?=.*8)(?=.*9)(?=.*T)(?=.*J)(?=.*Q)
|(?=.*9)(?=.*T)(?=.*J)(?=.*Q)(?=.*K)
|(?=.*T)(?=.*J)(?=.*Q)(?=.*K)(?=.*A)
)
.{5}$
Are there other / better approaches to finding if a straight exists using regex only?
You can use the following regex:
See regex in use here
(?!.*(.).*\1)(?:[A2345]{5}|[23456]{5}|[34567]{5}|[45678]{5}|[56789]{5}|[6789T]{5}|[789TJ]{5}|[89TJQ]{5}|[9TJQK]{5}|[TJQKA]{5})
This works by first using a negative lookahead to ensure that the string doesn't contain any duplicates (?!.*(.).*\1). Then it matches 5 characters from any of the straight possibilities.
(?!.*(.).*\1)
#^^^ ^ negative lookahead ensuring what follows doesn't match
# ^^ match any character any number of times
# ^^^ capture a character into capture group #1
# ^^ match any character any number of times
# ^^ match the same text as most recently matched by the 1st capture group
Against JQQ89, it works as follows:
- .* matches J
- (.) captures Q
- .* matches nothing
- \1 tries to match Q (and succeeds)
- Negative lookahead has a match, so fail the match.

R- regex extracting a string between a dash and a period

First of all I apologize if this question is too naive or has been repeated earlier. I tried to find it in the forum but I'm posting it as a question because I failed to find an answer.
I have a data frame with column names as follows;
head(rownames(u))
[1] "A17-R-Null-C-3.AT2G41240" "A18-R-Null-C-3.AT2G41240" "B19-R-Null-C-3.AT2G41240"
[4] "B20-R-Null-C-3.AT2G41240" "A21-R-Transgenic-C-3.AT2G41240" "A22-R-Transgenic-C-3.AT2G41240"
What I want is to use regex in R to extract the string in between the first dash and the last period.
Anticipated results are,
[1] "R-Null-C-3" "R-Null-C-3" "R-Null-C-3"
[4] "R-Null-C-3" "R-Transgenic-C-3" "R-Transgenic-C-3"
I tried following with no luck...
gsub("^[^-]*-|.+\\.","\\2", rownames(u))
gsub("^.+-","", rownames(u))
sub("^[^-]*.|\\..","", rownames(u))
Would someone be able to help me with this problem?
Thanks a lot in advance.
Shani.
Here is a solution to be used with gsub:
v <- c("A17-R-Null-C-3.AT2G41240", "A18-R-Null-C-3.AT2G41240", "B19-R-Null-C-3.AT2G41240", "B20-R-Null-C-3.AT2G41240", "A21-R-Transgenic-C-3.AT2G41240", "A22-R-Transgenic-C-3.AT2G41240")
gsub("^[^-]*-([^.]+).*", "\\1", v)
See IDEONE demo
The regex matches:
^[^-]* - zero or more characters other than -
- - a hyphen
([^.]+) - Group 1 matching and capturing one or more characters other than a dot
.* - any characters (even including a newline since perl=T is not used), any number of occurrences up to the end of the string.
This can easily be achieved with the following regex:
-([^.]+)
# look for a dash
# then match everything that is not a dot
# and save it to the first group
See a demo on regex101.com. Outputs are:
R-Null-C-3
R-Null-C-3
R-Null-C-3
R-Null-C-3
R-Transgenic-C-3
R-Transgenic-C-3
Regex
-([^.]+)\\.
Description
- matches the character - literally
1st Capturing group ([^\\.]+)
[^\.]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
. matches the character . literally
\\. matches the character . literally
Debuggex Demo
Output
MATCH 1
1. [4-14] `R-Null-C-3`
MATCH 2
1. [29-39] `R-Null-C-3`
MATCH 3
1. [54-64] `R-Null-C-3`
MATCH 4
1. [85-95] `R-Null-C-3`
MATCH 5
1. [110-126] `R-Transgenic-C-3`
MATCH 6
1. [141-157] `R-Transgenic-C-3`
This seems an appropriate case for lookarounds:
library(stringr)
str_extract(v, '(?<=-).*(?=\\.)')
where
(?<= ... ) is a positive lookbehind, i.e. it looks for a - immediately before the next captured group;
.* is any character . repeated 0 or more times *;
(?= ... ) is a positive lookahead, i.e. it looks for a period (escaped as \\.) following what is actually captured.
I used stringr::str_extract above because it's more direct in terms of what you're trying to do. It is possible to do the same thing with sub (or gsub), but the regex has to be uglier:
sub('.*?(?<=-)(.*)(?=\\.).*', '\\1', v, perl = TRUE)
.*? looks for any character . from 0 to as few as possible times *? (lazy evaluation);
the lookbehind (?<=-) is the same as above;
now the part we want .* is put in a captured group (...), which we'll need later;
the lookahead (?=\\.) is the same;
.* captures any character, repeated 0 to as many as possible times (here the end of the string).
The replacement is \\1, which refers to the first captured group from the pattern regex.

A special Regular Expression

I want to have a restriction a string which can accept alphanumeric values and hiphen.
I am providing 3 examples to have a clear idea.
1) AS15JKM-125TR-325AMOR
2) ITEW32-DE432OI
3) 09IURE765EDR
There is no specific pattern, There may b 0 to 3 hiphens in a string.
I just want to restrict it in such a way that it should accept only alphanumeric value and
only Hiphen, no other special character.
plz help me on this.
Option 1: No Lookahead
^(?:[A-Za-z0-9]*-){0,3}[A-Za-z0-9]+$
Note that if you only want uppercase letters, you need to remove a-z
Explanation
The ^ anchor asserts that we are at the beginning of the string
The non-capturing group (?:[A-Za-z0-9]*-) matches zero or more letters or digit, then a hyphen
This is repeated zero to three times, enforcing your limit on hyphens
[A-Za-z0-9]+ matches one or more letters or digit
The $ anchor asserts that we are at the end of the string
Option 2: With Lookahead
This does not present any benefit over the first version, I am just showing it for completion.
^(?=(?:[^-]*-){0,3}[^-]*$)[A-Za-z0-9]+$
Explanation
The lookahead (?=(?:[^-]*-){0,3}[^-]*$) asserts that what follows is
(?:[^-]*-) any number of non-hyphens, followed by a hyphen
{0,3} zero to three times
then [^-]*$ any number of non-hyphens and the end of the string
Option 3: With Negative Lookahead
Courtesy of #Jerry:
^(?!(?:[^-]*-){4})[A-Za-z0-9]+$
Explanation
The negative lookahead (?!(?:[^-]*-){4}) asserts that it is not possible to find a non-hyphen followed by a hyphen four times.
Assuming you do not want to count the hyphens, something like so should work: ^[A-Z0-9 -]+$.
An example of the regex is available here.

regex conditional statement (only parse numbers if string has a certain beginning)

first I used a string that returned my relaystates, so “1.0.0.0.1.1.0.0” would get parsed/grouped with \d+,
then my eight switches used ‘format response’, e.g. {1} to get the state for each switch.
now I need to get the numbers out of this string: “RELAYS.1.0.0.0.1.1.0.0”
\d+ will still get the numbers but I only want to get them IF the string starts with “RELAYS"
can anyone please explain how I could do that?
thnx a million in advance!
Edited icebear (today 00:24)
With a .NET engine, you could use the regex (?<=^RELAYS[\d.]*)\d+. But most regex engines don't support indefinite repetition in a negative lookbehind assertion.
See it live on regexhero.net.
Explanation:
(?<= # Assert that the following can be matched before the current position:
^RELAYS # Start of string, followed by "RELAYS"
[\d.]* # and any number of digits/dots.
) # End of lookbehind assertion
\d+ # Match one or more digits.
With a PCRE engine, you could use (?:^RELAYS\.|\G\.)(\d+) and access group 1 for each match.
See it live on regex101.com.
Explanation:
(?: # Start a non-capturing group that matches...
^RELAYS\. # either the start of the string and "RELAYS."
| # or
\G\. # the position after the previous match, followed by "."
) # End of non-capturing group
(\d+) # Match a number and capture it in group 1