What is the diffrence between these three regular expressions - regex

What is the main difference between the following 3 regular expressions.
1) /^[^0-9]+$/
2)/[^0-9]+/
3) m/[^0-9]+/
I am really trying to understand this, since researching online has not helped me much I was hoping I could find some help here.

All of them have [^0-9]+, which is one or more characters that are not the numbers 0, 1, ... to 9.
The first one /^[^0-9]+$/ is anchored at the start and end of the string, so it will match any string that only contains non-digits.
The second one /[^0-9]+/ is not anchored, so it matches any string that contains at least one (or more) non-digits.
The third one m/[^0-9]+/ is the same as the second, but uses the m// match operator explicitly.
For a good explanation, check out regex101.com for the first and second regex.

There's a difference between a regular expression and the match operator which takes a regular expression as its operand.
You only have two regular expressions there - ^[^0-9]+$ and [^0-9]+. Option 3 uses the same regex as option 2, but it uses a different version of the match operator.
The difference between 1 and 2 is that 1 is anchored at the start and the end of the string, whereas 2 isn't anchored at all.
So 1 says "match the start of the string, followed by one or more non-digits, followed by the end of the string". 2 says "match one or more non-digits anywhere in the string".
Does that help at all?

The pattern [^0-9] is common to these three regexes, and will match any single character that is not a decimal digit
/^[^0-9]+$/
This anchors the pattern to the beginning and end of the string, and insists that it contains one or more non-digit characters
The circumflex ^ is a zero-width anchor that matches the beginning of the string
The dollar sign $ is also a zero-width anchor that will match either at the end of the string, or before a newline character if that newline is the last in the string. So this will match "aaa" and "aaa\n" but not "aa7bb\n"
/[^0-9]+/
This has no anchors, and so will return true if the string contains at least one non-digit character anywhere
It will match "12x345" and fail to match "12345". Note that a trailing newline counts as a non-digit character, so this pattern will match "123\n"
m/[^0-9]+/
This is identical to #2, but with the m placed explicitly. This is unnecessary if you are using the default slashes for delimiters, but it can be convenient to use something different if you are matching a pattern for, say, a file path, which itself contains slashes
Using m lets you choose your own delimiter, for example m{/my/path} instead of /\/my\/path/
In essence, #1 is asking whether the string is wholly composed of non-digit characters, while #2 and #3 are identical, and test whether the string contains at least one non-digit character

Related

Positive and Negative Lookahead on matchings strings with two or more same consecutive characters [duplicate]

I can very easily write a regular expression to match a string that contains 2 consecutive repeated characters:
/(\w)\1/
How do I do the complement of that? I want to match strings that don't have 2 consecutive repeated characters. I've tried variations of the following without success:
/(\w)[^\1]/ ;doesn't work as hoped
/(?!(\w)\1)/ ;looks ahead, but some portion of the string will match
/(\w)(?!\1)/ ;again, some portion of the string will match
I don't want any language/platform specific way to take the negation of a regular expression. I want the straightforward way to do this.
The below regex would match the strings which don't have any repeated characters.
^(?!.*(\w)\1).*
(?!.*(\w)\1) negative lookahead which asserts that the string going to be matched won't contain any repeated characters. .*(\w)\1 will match the string which has repeated characters at the middle or at the start or at the end. ^(?!.*(\w)\1) matches all the starting boundaries except the one which has repeated characters. And the following .* matches all the characters exists on that particular line. Note this this matches empty strings also. If you don't want to match empty lines then change .* at the last to .+
Note that ^(?!(\w)\1) checks for the repeated characters only at the start of a string or line.
Lookahead and lookbehind, collectively called "lookaround", are zero-length assertions just like the start and end of line. They do not consume characters in the string, but only assert whether a match is possible or not. Lookaround allows you to create regular expressions that are impossible to create without them, or that would get very longwinded without them.

Perl code understanding

I am new to perl language - I have been trying to understand the below code
if ( $nextvalue !~ /^.+"[^ ]+ \/cs\/.+\sHTTP\/[1-9]\.[0-9]"|\/\/|\/Images\/fold\/1.jpg|\/busines|\/Type= OPTIONS|\/203.176.111.126/)
Can you please help us understand what is above meant for?
condition will be true when $nextvalue will NOT match following regular expression.
Regular expressiion will match if that string
either
starts with at least one character,
followed by double quote sign ("),
followed by at least one non-whitespace character,
followed by whitespace (),
followed by string "/cs/",
followed by at least one character,
followed by whitespace and string HTTP/,
followed by one of digits from 1 to 9 inclusive,
followed by dot
followed by one of digits from 0 to 9,
followed by double quote mark (")
or contains two forward slashes (//)
or contains sunstring "/Images/fold/1.jpg"
or contains substring "/busines"
or contains substring "/Type= OPTIONS"
or contains substring "/203.176.111.126"
Whenever i am unsure what some cryptic regular expression does, i turn to Debuggex:
^.+"[^ ]+ \/cs\/.+\sHTTP\/[1-9]\.[0-9]"|\/\/|\/Images\/fold\/1.jpg|\/busines|\/Type= OPTIONS|\/203.176.111.126
Debuggex Demo
This is a railroad diagram, every string that has a substring fitting the description along any of the grey tracks will match your regex. As your condition uses !~ meaning "does not match", those strings will then fail the check.
Debuggex certainly has issues (for example it displays ^, meaning you would have to know that this means the beginning of the string, same for dots and other, whitespaces show up as underscroes, etc.) but it certainly helps in understanding the structure of the expression and possibly gives you an idea what the author had in mind.

Regular expression let periods in (.)

My regular expression lets in periods for some reason, how can I keep that from happening.
Rules:
4-15 characters
Any alphanumeric characters
Underscore as long as it's not first or last
[A-Za-z][A-Za-z0-9_]{3,14}
I don't want "bad.example" for work.
Edit: changed to 4-15 characters
Your regex matches example as a substring of bad.example. Use anchors to prevent that:
^[A-Za-z][A-Za-z0-9_]{1,12}[A-Za-z]$
Note that (like your regex) this regex also prevents digits from matching in the first and last position - if they should be allowed (as per your specs), just add 0-9 at the end of the character classes.
^[A-Za-z][A-Za-z0-9_]{3,14}$
try this
This will match any alphanumeric at the beginning and end. In the middle it will accept from one up to twelve alphanumerics including an underscore:
^[a-zA-Z\d]\w{1,12}[a-zA-Z\d]$
It does not match bad.example but matches only example as your regex allows a character from 4 to 15.See here.
http://regex101.com/r/xV4eL5/5
To prevent it you need to match the whole input and not make partial matches.Put a ^ start anchor and $ end anchor.
Use
\A[A-Za-z0-9][\w]{1,12}[A-Za-z0-9]\Z

Regular Expression needed for a Specific ID

I need to create a regular expression that matches an ID that has a specific format. The ID always begins with "OR" followed by 4 digits, then a dash, then another number that can be of any length. Examples of valid matches are:
OR1581-2
OR0057-101
OR0000-5312
OR3450-17371
Thanks!
Try ^OR\d{4}-\d+$.
The ^ matches the beginning of the string or line.
OR is not a special sequence and will match only those two characters in order.
\d matches any digit, and {4} is shorthand for listing the preceding group (the digit) exactly four times.
- is not a special character and will match only the hyphen.
\d matches any digit again, and the + requires the preceding group (the digit) to occur one or more times.
$ matches the end of the string or line.
If you need to find match in string that contains such ID, but also other text, then use
\bOR\d{4}-\d+\b
However if you need to verify input if is in such format, so no other text around is allowed, then go with
^OR\d{4}-\d+$

What would be the regex pattern for a set of numbers separated with a comma

The possible values are...
1 (it will always start with a number)
1,2
4,6,10
You can try something like this:
^[0-9]+(,[0-9]+)*
This should do it:
(\d+,?)+
This will do:
-?[0-9]+(,-?[0-9]+)*
Or, if you want to be pedantic and disallow numbers starting with 0 (other than 0 itself):
(0|-?[1-9][0-9]*)(,(0|-?[1-9][0-9]*))+
Floating-point numbers are left as an exercise to the reader.
You'll want
(?<=(?:,|^))\d+(?=(?:$|,))
Regex Buddy explains it as...
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=(?:,|^))»
Match the regular expression below «(?:,|^)»
Match either the regular expression below (attempting the next alternative only if this one fails) «,»
Match the character "," literally «,»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «^»
Assert position at the start of the string «^»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=(?:$|,))»
Match the regular expression below «(?:$|,)»
Match either the regular expression below (attempting the next alternative only if this one fails) «$»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «,»
Match the character "," literally «,»
I would explain it as, "match any string of digits confirming that before it comes either the start of the string or a comma and that after it comes either the end of the string or a comma". nothing else.
The important thing is to use non-capturing groups (?:) instead of simply () to help overall performance.