Regex Meaning (Regex Golf) [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
Regex newbie here, so I was trying this website for fun: https://regex.alf.nu
In particular, I'm concerned about the "Ranges" section here: https://regex.alf.nu/2
I was able to get as far as ^[a-f]+, and couldn't figure out the rest. By accident, I added a $ to get ^[a-f]+$ which was actually the answer.
Trying to wrap my mind around the meaning of this regex. Can someone give the plain English explanation of what's happening here?
It seems to say "a string that starts and ends with one or more of the letters a through f," but that doesn't quite make sense for me, for instance, with the word "cajac" which seems to satisfy those conditions.
For those who can't see the URL, it's asking me to match these words:
abac
accede
adead
babe
bead
bebed
bedad
bedded
bedead
bedeaf
caba
caffa
dace
dade
daff
dead
deed
deface
faded
faff
feed
But NOT match these:
beam
buoy
canjac
chymia
corah
cupula
griece
hafter
idic
lucy
martyr
matron
messrs
mucose
relose
sonly
tegua
threap
towned
widish
yite

In English it means: Match any words which contain only the letters a thru f.

Your pattern, when broken down:
^ assert position at start of the string
[a-f]+ match a single character present in the list below:
+ Between one and unlimited times, as many times as possible, giving back as needed
a-f a single character in the range between a and f (case sensitive)
$ assert position at end of the string
You can also see a quick explanation of your patterns on the Regex101 webpage.

Related

Regex - How to exclude matches without look-behind? [duplicate]

This question already has answers here:
How to negate specific word in regex? [duplicate]
(12 answers)
Closed 3 years ago.
I'm trying to scan all attributes from a database, searching for specific patterns and ignoring similar ones that I know should not match but I'm having some problems as in the below example:
Let's say I'm trying to find Customer Registration Numbers and one of my patterns is this:
.*CRN.*
Then I'm ignoring everything that are not CRNs (like currency and country name) like this:
(CRN)(?!CY|AME)
So far everything is working fine as look ahead is included in Javascript
The next step is to exclude things like SCRN (screen) for example but look behind (?<!S)(CRN)(?!CY|AME) doesn't work.
Is there any alternative?
Example inputs:
CREDIT_CARD
DISCARD
CARDINALITY
CARDNO
My Regex (?!.*DISCARD.*|.*CARDINALITY.*).*CARD.*
CARDINALITY was removed but DISCARD still being considered :(
The regex that you want is:
(?!\b(?:CARDINALITY|DISCARD)\b)(\b\w*CARD\w*\b)
It is important that you are testing the negative lookahead against the entire word and thus we are trying to match (\b\w*CARD\w*\b) rather than just CARD. The problem with the following regex:
(?!(?:CARDINALITY|DISCARD))CARD
is that with the case of DISCARD, when the scan is at the character position where CARD begins, we are past DIS and you would need a negative lookbehind condition to eliminate DISCARD from consideration. But when we are trying to match the complete word as we are in the regex I propose, we are still at the start of the word when we are applying the negative lookahead conditions.
Regex Demo (click on "RUN TESTS")

Regex get only sequence of word {word} word {word} word, but not two alike close [duplicate]

This question already has answers here:
Regex to capture: word {word} word
(3 answers)
Closed 3 years ago.
I need to get only sequence of word {word} word {word} word (ending with word and not {word}) and never two word close (word word) or two {word} next ({word} {word})
I already have this regex: https://regex101.com/r/yI64KQ/13
[RESOLVED]
Thanks to Norbert Incze, in this other question, final regex is: ([A-zÀ-ú]+([^\S\n]+\{[^}]*\}[^\S\n]+[A-zÀ-ú]+)+)
This worked perfectly! Thanks to everyone who helped me.
The group ([^\r\n\t])+ is picking up arbitrary text until the next {, including whitespace and more words and even }. You probably want to remove it.
Inside \{...\}, you accept [^}]*, which means arbitrary characters, while what you probably want is something restrictive like your first definition of the word.
Why the last [...] block? I don't see what it adds given your question.
Cool web site, by the way. To make this answer make sense on its own, this is the RE you had on it when I clicked your link:
([A-zÀ-ú]+(\s+\{[^}]*\}\s+[A-zÀ-ú]+)([^\r\n\t])+)[^\s.:;?!]

Regex not completely matching the desired patterns

I've been trying to write a regex to stictly match the following patterns -
3+, 3 months +, 3+ months, 3m+, 3 years +, 3y+, 3+ years.
I've written a pattern - [0-9]{1,2}\s?(m|months?|y|years?)?\s?\+ which works for most of the cases exept the 3+ months & 3+ years here it matches just the 3+ part and not the month part. I want to use the matched string somewhere and this causes an issue. To accomodate this I tried adding another group to make the regex look like [0-9]{1,2}\s?(m|months?|y|years?)?\s?\+ (m|months?|y|years?)\s|$ but this is also matching for 6+ math. Can someone help me with what the issue is here, also can this regex be improved?
I can use multiple regex to solve different different use cases, but I wanted to achieve this using only 1 regex.
Thanks
UPDATED:
I overlooked the strictness requirement. Something that helps me figuring out these problems is breaking the pattern down in a logic tree of sorts, like so:
Only XX months allowed
Only XX years allowed - \d{1,2}
If ## followed by a +
It must be followed by \s*(months|years)
If ## followed by a [my]
It must be followed by \s*\+
Close it off with $
That gets us on the right track to what we want. Again, it still permits undesired cases, but just revisit that thought exercise, tinker with the conditionals, and try to find common components that restrict the full regex to neat grouping of stricter patterns.
This should be closer to the strict solution you’re looking for:
\d{1,2}\s*([my]?\+|\+?\s*(months|years)|(months|years)\s*\+?)\s*$
===========================
ORIGINAL POST:
Here’s a first pass at a condensed version of what you want:
\d{1,2}[\+my]?\s*(months|years)?\s*\+?
Here’s a breakdown of the approach I took:
(\d{1,2})
^ Accommodate any two numbers (your approach is fine, \d means any number 0-9, saves a few characters)
([\+my]?\s*)
^ The characters following the given number may be m, y, or + followed by any number of spaces.
(months|years)?
^ We’ve accounted for all spaces with the previous piece of regex, so lets just say there might be months|years at this point.
(\s*\+?)
^ Last potential symbol is a +, but it might have several spaces in front of it.
Try the following regex
((?:\d{1,2}\+ (?:months|years))|(?:\d{1,2}\+)|(?:\d{1,2} (?:months|years) \+)|(?:\d{1,2}(?:m|y)\+))
You may use this regex with optional matches:
\d{1,2}\+?(?:\s?(?:months?|years?|[my])\b\s?\+?)?
RegEx Demo

Using just regex is there a way to skip words or chars when using a lookaround? [duplicate]

This question already has answers here:
RegExp exclusion, looking for a word not followed by another
(3 answers)
Closed 4 years ago.
I guess specifically this might be about a negative look ahead.
If i had a sentence in the form of:
this is the WORD i want and I want and this is the PHRASE I DONT WANT
is there a way to use just regex to match "WORD" but only not if "PHRASE" is present? My initial idea was a negative lookahead but that is only the immediate word following. I then tried using (?:\w+(?:\s*[\,\-\'\:\/]\s*|\s+)){0,3} and other similar tricks but this would match the words in-between and not the actual phrase. Not to mention the wonkiness of + in lookarounds. Then I thought about using a grouping like [^something] but i didnt know how to do that with full words without a lookaround. I then had the idea to nest lookarounds which i found out can happen, but that still gives me the root of the problem.
Can you skip words in the matching for a lookaround and if not how would i go about solving this issue?
Because if i nest using a lookbehind I still need to skip stuff to get to WORD in order to match it.
Assume the words are arbitrary in the sentence but the key word and the key phrase is something specific.
If I understand your question, you can try:
/(?!.*PHRASE I DONT WANT)WORD/
Demo

Regex - How to search for singular or plural version of word [duplicate]

This question already has answers here:
Regex search and replace with optional plural
(4 answers)
Closed 6 years ago.
I'm trying to do what should be a simple Regular Expression, where all I want to do is match the singular portion of a word whether or not it has an s on the end. So if I have the following words
test
tests
EDIT: Further examples, I need to this to be possible for many words not just those two
movie
movies
page
pages
time
times
For all of them I need to get the word without the s on the end but I can't find a regular expression that will always grab the first bit without the s on the end and work for both cases.
I've tried the following:
([a-zA-Z]+)([s\b]{0,}) - This returns the full word as the first match in both cases
([a-zA-Z]+?)([s\b]{0,}) - This returns 3 different matching groups for both words
([a-zA-Z]+)([s]?) - This returns the full word as the first match in both cases
([a-zA-Z]+)(s\b) - This works for tests but doesn't match test at all
([a-zA-Z]+)(s\b)? - This returns the full word as the first match in both cases
I've been using http://gskinner.com/RegExr/ for trying out the different regex's.
EDIT: This is for a sublime text snippet, which for those that don't know a snippet in sublime text is a shortcut so that I can type say the name of my database and hit "run snippet" and it will turn it into something like:
$movies= $this->ci->db->get_where("movies", "");
if ($movies->num_rows()) {
foreach ($movies->result() AS $movie) {
}
}
All I need is to turn "movies" into "movie" and auto inserts it into the foreach loop.
Which means I can't just do a find and replace on the text and I only need to take 60 - 70 words into account (it's only running against my own tables, not every word in the english language).
Thanks!
- Tim
Ok I've found a solution:
([a-zA-Z]+?)(s\b|\b)
Works as desired, then you can simply use the first match as the unpluralized version of the word.
Thanks #Jahroy for helping me find it. I added this as answer for future surfers who just want a solution but please check out Jahroy's comment for more in depth information.
For simple plurals, use this:
test(?=s| |$)
For more complex plurals, you're in trouble using regex. For example, this regex
part(y|i)(?=es | )
will return "party" or "parti", but what you do with that I'm not sure
Here's how you can do it with vi or sed:
s/\([A-Za-z]\)[sS]$/\1
That replaces a bunch of letters that end with S with everything but the last letter.
NOTE:
The escape chars (backslashes before the parens) might be different in different contexts.
ALSO:
The \1 (which means the first pattern) may also vary depending on context.
ALSO:
This will only work if your word is the only word on the line.
If your table name is one of many words on the line, you could probably replace the $ (which stands for the end of the line) with a wildcard that represents whitespace or a word boundary (these differ based on context).