R regular expression repetition ignores upper bound - regex

I try to make regular expression which helps me filter strings like
blah_blah_suffix
where suffix is any string that has length from 2 to 5 characters. So I want accept strings
blah_blah_aa
blah_blah_abcd
but discard
blah_blah_a
blah_aaa
blah_blah_aaaaaaa
I use grepl in the following way:
samples[grepl("blah_blah_.{2,5}", samples)]
but it ignores upper bound for repetition (5). So it discards strings blah_blah_a,
blah_aaa, but accepts string blah_blah_aaaaaaa.
I know there is a way to filter strings without usage of regular expression but I want to understand how to use grepl correctly.

You need to bound the expression to the start and end of the line:
^blah_blah_.{2,5}$
The ^ matches beginning of line and $ matches end of line. See a working example here: Regex101
If you want to bound the expression to the beginning and end of a string (not multi-line), use \A and \Z instead of ^ and $.
Anchors Tutorial

/^[\w]+_[\w]+_[\w]{2,5}$/
DEMO
Options: dot matches newline; case insensitive; ^ and $ match at line breaks
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]{2,5}»
Between 2 and 5 times, as many times as possible, giving back as needed (greedy) «{2,5}»
Assert position at the end of a line (at the end of the string or before a line break character) «$»

Related

Password REGEX with minimum eight characters, small and large letters or letters and at least one number or special character [duplicate]

This question already has answers here:
Regular Expression for password validation
(6 answers)
Closed 1 year ago.
I'm new to React Native, and I need to implement new password requirements.
The new requirements are small and large letters or letters and at least one number or special character.
The requirement for the password to be at least eight characters.
Here is my code:
.matches(
/^(?=.*[a-z])(?=.*\d)(?=.*[\W_])[\w\W].+$/,
I think that should work:
((^|, )((?=.[a-z])|(?=.\d)|(?=.*[\W_])[\w\W]))+$.
This works. It requires at least one uppercase letter, one lowercase letter, one number, and one special character such as # or # or $, with a length of at least eight characters.
(?m)^((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[\\W]).{8,})$
The (?m) at the beginning makes sure that the . in the regex does not match a newline.
From RegexBuddy:
^((?=.\d)(?=.[a-z])(?=.[A-Z])(?=.[\W]).{8,})$
Options: ^ and $ match at line breaks
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match the regular expression below and capture its match into backreference number 1 «((?=.\d)(?=.[a-z])(?=.[A-Z])(?=.[\W]).{8,})»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*\d)»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single digit 0..9 «\d»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*[a-z])»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character in the range between “a” and “z” «[a-z]»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*[A-Z])»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character in the range between “A” and “Z” «[A-Z]»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*[\W])»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “non-word character” «[\W]»
Match any single character that is not a line break character «.{8,}»
Between eight and unlimited times, as many times as possible, giving back as needed (greedy) «{8,}»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
It matches
abcDefg1$
1zBA^frmb
1#Basdfadsfadsf
It does not match
abcd123
123abc
abcdEFGH
abcdEFG2
abCDeF1E
1a2bc

regular expression match _ underscore

I have a string like this :
002_part1_part2_______________by_test
and I would like to stop the match at the second underscore character, like this :
002_part1_part2_
How can I do that with a Regular expression ?
Thanks
Create a pattern to match any character but not of an _ zero or more times followed by an underscore symbol. Put that pattern inside a capturing or non-capturing group and make it to repeat exactly 3 times by adding range quantifier {3} next to that group.
^(?:[^_]*_){3}
DEMO
You can use:
.*\d_
EXPLANATION:
Match any single character that is NOT a line break character (line feed) «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d»
Match the character “_” literally «_»
https://regex101.com/r/uX0qD5/1

How would I detect superscript for one word if there's no parentheses, but if there are parentheses, for all the contents of them?

I want to detect the two following circumstances, preferably with one regex:
This is a sentence ^that I wrote today.
And:
This is a sentence ^(that I wrote) today.
So basically, if there are parentheses after the caret, I want to match whatever is inside them. Otherwise, I just want to match just the next word.
I'm new to regex. Is this possible without making it too complicated?
\^(\w+|\([\w ]+\))
Options: case insensitive; ^ and $ match at line breaks
Match the character “^” literally «\^»
Match the regular expression below and capture its match into backreference number 1 «(\w+|\([\w ]+\))»
Match either the regular expression below (attempting the next alternative only if this one fails) «\w+»
Match a single character that is a “word character” (letters, digits, etc.) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «\([\w ]+\)»
Match the character “(” literally «\(»
Match a single character present in the list below «[\w ]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A word character (letters, digits, etc.) «\w»
The character “ ” « »
Match the character “)” literally «\)»
Created with RegexBuddy

Regex for not matching a string

I have a URL:
/ice-cream/stuff/sandwich/banana
I want to write a regular expression that ONLY matches the URL if these conditions are met:
"ice-cream" is in the URL
"sandwich" is in the URL and comes after "ice-cream"
"banana" is NOT in the URL
I tried this:
ice-cream.sandwich.^[(banana)] as well as many others but haven't found the solution.
Help is appreciate it.
Give a try to the below regex,
^(?!.*banana.*).*?ice-cream.*?sandwich.*$
OR
^(?!.*banana.*)(?:(?!sandwich).)*ice-cream.*?sandwich.*$
DEMO
Explanation:
^ Asserts that we are at the beginning of the line.
(?!.*banana.*) Negative lookahead which checks the line contain the string banana or not. If it's not then the regex engine set the marker on the starting. Or Otherwise it skips the lines which contains the string banana.
(?:(?!sandwich).)* Matches all the characters which are not of the string sandwich.
ice-cream.*?sandwich.* String sandwich must be after to the string ice-cream.
$ End of the line.
Hard to be precise without examples of matches and non-matches, but give this a try:
^(?!.*banana)(?:(?!.*sandwich(?=.*ice-cream))).*ice-cream.*sandwich.*$
Explanation of Regex:
^(?!.*banana)(?:(?!.*sandwich(?=.*ice-cream))).*ice-cream.*sandwich.*$
----------------------------------------------------------------------
^(?!.*banana)(?:(?!.*sandwich(?=.*ice-cream))).*ice-cream.*sandwich.*$
Options: Case insensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Default line breaks
Assert position at the beginning of a line «^»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*banana)»
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character string “banana” literally «banana»
Match the regular expression below «(?:(?!.*sandwich(?=.*ice-cream)))»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*sandwich(?=.*ice-cream))»
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character string “sandwich” literally «sandwich»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*ice-cream)»
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character string “ice-cream” literally «ice-cream»
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character string “ice-cream” literally «ice-cream»
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character string “sandwich” literally «sandwich»
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of a line «$»
Created with RegexBuddy
.*ice-cream.+sandwich.(?!banana).*
Try this one

Understanding regex criteria in pattern match

I am trying to determine what the following pattern match criteria allows me to enter:
\s*([\w\.-]+)\s*=\s*('[^']*'|"[^"]*"|[^\s]+)
From my attempt to decipher (by looking at the regex's I do understand) it seems to say I can start with any character sequence then I must have a brace followed by alphanumerics, then another sequence followed by braces, one intial single quote, no backslashes closed by a brace ???
Sorry if I have got this completely muddled. Any help is appreciated.
Regards,
Pablo
The square brackets are character classes, and the parens are for grouping. I'm not sure what you mean by "braces".
This basically matches a name=value pair where than name consists of one or more "word", dot or hyphen characters, and the value is either a single quoted character or a double-quoted string of characters, or a bunch of non-whitespace characters. Single-quoted characters cannot contain a single quote, and double quoted strings may not contain double-quotes (both arguably minor flaws whatever syntax this is from). There's also arguably some ambiguity since the last option ("a bunch on non-whitespace characters") could match something starting with a single or double quote.
Also, zero or more whitespaces may appear around the equal sign or at the beginning (that's the \s* bits).
It's looking for strings of text which are basically
<identifier> = <value>
identifier is made up of letters, digits, '-' and '.'
value can be a single-quoted strings, double-quoted strings, or any other sequence of characters (as long as it doesn't contain a space).
So it would match lines that look like this:
foo = 1234
bar-bar= "a double-quoted string"
bar.foo-bar ='a single quoted string'
.baz =stackoverflow.com this part is ignored
Some things to note:
There's no way to put a quote inside a quoted string (such as using \" inside "...").
Anything after the quoted string is ignored.
If a quoted string isn't used for value, then everything from the first space onwards is ignored.
Whitespace is optional
RegexBuddy says:
\s*([\w\.-]+)\s*=\s*('[^']*'|"[^"]*"|[^\s]+)
Options: case insensitive
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 1 «([\w\.-]+)»
Match a single character present in the list below «[\w\.-]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A word character (letters, digits, etc.) «\w»
A . character «\.»
The character “-” «-»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “=” literally «=»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 2 «('[^']*'|"[^"]*"|[^\s]+)»
Match either the regular expression below (attempting the next alternative only if this one fails) «'[^']*'»
Match the character “'” literally «'»
Match any character that is NOT a “'” «[^']*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “'” literally «'»
Or match regular expression number 2 below (attempting the next alternative only if this one fails) «"[^"]*"»
Match the character “"” literally «"»
Match any character that is NOT a “"” «[^"]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “"” literally «"»
Or match regular expression number 3 below (the entire group fails if this one fails to match) «[^\s]+»
Match a single character that is a “non-whitespace character” «[^\s]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Created with RegexBuddy
Let us break \s*([\w\.-]+)\s*=\s*('[^']*'|\"[^\"]*\"|[^\s]+) apart:
\s*([\w\.-]+)\s*:
\s* means 0 or more whitespace characters
`[\w.-]+ means 1 or more of the following characters: A-Za-z0-9_.-
('[^']*'|\"[^\"]*\"|[^\s]+):
One or more characters non-' characters enclosed in ' and '.
One or more characters non-" characters enclodes in " and ".
One or more characters not containing a space
So basically, you can mostly ignore the \s*'s in trying to understand the expression, they just handle removing spacing.
Yes, you have got it completely muddled. :P For one thing, there are no braces in that regex; that word usually refers to the curly brackets: {}. That regex only contains square brackets and parentheses (aka round brackets), and they're all regex metacharacters--they aren't meant to match those characters literally. The same goes for most of the other characters.
You might find this site useful. Very good tutorial and reference site for all things regex.