I am using regular expression to validate a pattern followed by a fraction. I found these and they match what I need. Overall I want to match 1 to 2 numbers followed by the fraction. How are these expressions different?
/^[0-9]+(?:[\xbc\xbd\xbe])$/ugm
/^\d+(?:[\xbc\xbd\xbe])$/ugm
/^\w+(?:\w+)$/ugm
I need to match the following:
12½
1¼
11¾
but not match..
111½
11111¼
0¾
Well to begin with, [0-9] matches any character of: (0 to 9) and is not the same as \d
\d matches digits (0-9) and other digit characters such as Unicode.
\w matches any word character (letter, number, or underscore)
Although these given expressions may match the same pattern, you will eventually fail using your 3rd solution.
It will match a pattern like foobar where as you can see there are no (0-9) characters or Unicode fractions in this pattern.
And with running a quick benchmark, your 2nd solution is about 16% slower than your first, plus it matches Unicode and other digit characters.
I would stick with your first expression, and change it to match between 1-2 number characters.
/^[1-9][0-9]?(?:[\xbc\xbd\xbe])$/ugm
or even
/^[1-9][0-9]?(?:[\xbc-\xbe])$/ugm
Try the following:
^[1-9][0-9]?[\xbc\xbd\xbe]$
[0-9] and \d are equivalent. \w matches a "word" character. The expression [1-9] matches a digit which is not zero (since you specifically asked how to exclude that).
This unattractively hard-codes for some legacy 8-bit character set; for future compatibility, you should consider switching to Unicode.
You can try
/^[1-9][0-9]?(?:[\xbc\xbd\xbe])$/ugm
Related
I have the following regular expression for capturing positive & negative time offsets.
\b(?<sign>[\-\+]?)(?<hours>2[1-3]|[01][0-9]|[1-9]):(?<minutes>[0-5]\d)\b
It matches fine but the leading sign doesn't appear in the capture group. Am I formatting it wrong?
You can see the effect here https://regex101.com/r/CQxL8q/1/
That is because of the first \b. The \b word boundary does not match between a start of the string/newline and a - or + (i.e. a non-word char).
You need to move the word boundary after the optional sign group:
(?<sign>[-+]?)\b(?<hours>2[1-3]|[01][0-9]|[1-9]):(?<minutes>[0-5][0-9])\b
^^
See the regex demo.
Now, since the char following the word boundary is a digit (a word char) the word boundary will work correctly failing all matches where the digit is preceded with another word char.
The word boundary anchor (\b) matches the transition between a word character (letter, digit or underscore) to a non-word character or vice-versa. There is no such transition in -13:21.
The word boundary anchor could stay between the sign and the hours to avoid matching it in expressions that looks similar to a time (65401:23) but you cannot prevent it match 654:01:23 or 654-01:23.
As a side note [\-\+] is just a convoluted way to write [-+]. + does not have any special meaning inside a character class, there is no need to escape it. - is a special character inside a character class but not when it is the first or the last character (i.e. [- or -]).
Another remark: you use both [0-9] and \d in your regex. They denote the same thing1 but, for readability, it's recommended to stick to only one convention. Since other character classes that contain only digits are used, I would use [0-9] and not \d.
And some bugs in the regex fragment for hours: 2[1-3]|[01][0-9]|[1-9] do not match 0 (but it matches 00) and 20.
Given all the above corrections and improvements, the regex should be:
(?<sign>[-+]?)\b(?<hours>2[0-3]|[01][0-9]|[0-9]):(?<minutes>[0-5][0-9])\b
1 \d is the same as [0-9] when the Unicode flag is not set. When Unicode is enabled, \d also matches the digits in non-Latin based alphabets.
I'm not exactly a pro when it comes to regex and I have a PHP script that runs things through this regex:
^[\d\D]{1,}$
What does this supposed to do, it seems that it matches everything?
\d matches any digit
\D matches any non-digit.
[\d\D] matches all digits and non-digits.
{1,} asks for the match in [] to be repeated at least 1 time (with no upper limit).
So it matches everything with at least 1 character in it.
Reference: http://www.regular-expressions.info/reference.html
In short all that regex is doing is this:
^.+$
Which means match any character (digits OR non-digits) of 1 or greater length.
^[\d\D]{1,}$ will match a string which contains one or more {1,} of any digit \d or non-digit \D character including newline characters.
In contrast ^.+$ will match a string containing one or more of any character except newlines. If the singleline modifier was added to the regex, i.e. /^.+$/s then the . would also match any character including newlines.
[\d\D] is equivalent to using . in singleline mode, although more commonly [\s\S] is used with the same result.
+ is equivalent to {1,}.
The regex will match the whole of any string that contains at least one character of any kind.
You are right. In fact anything that is at least one character long. But in a kind of overcomplicated and pointless way. [\d\D] is equivalent to . and {1,} is equivalent to +
What is the meaning of this regex? [a-zA-Z]|\d
I know that [a-zA-Z] means all of a to Z chars but whats the mean of \d?
\d is a digit character. Your code means "any alphabetic or numeric character". It could more easily be expressed as [A-Za-z0-9].
\d just means a digit character, it is equivalent to [0-9].
Here's a good reference: http://www.regular-expressions.info/reference.html
In most regex flavors, \d means any numeric digit, and is the same as [0-9].
Your regex as a whole means "match either a single letter of the alphabet, or a single digit."
\d matches any digits.
\d matches any digit ( i.e. 0-9 ).
See for example regular expression list
\d means digit and is synonymous with [0-9]. As I type this I see this question is answered twice more, and I bet with the same information.
My favorite books on regex are
http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/1565922573
and
http://www.amazon.com/Beginning-Regular-Expressions-Programmer/dp/0764574892/ref=sr_1_1?s=books&ie=UTF8&qid=1305497415&sr=1-1
they are such a powerful thing to master.
Depending upon what your goal is, you might be able to replace that with \w which is a "word character" i.e. any letter, digit or the underscore character.
What does \d+ mean in a regular expression?
\d is a digit (a character in the range [0-9]), and + means one or more times. Thus, \d+ means match one or more digits.
For example, the string "42" is matched by the pattern \d+.
You can also find explanations for pieces of regular expressions like this using a tool like Regex101 (online, free) or Regex Coach (downloadable for Windows, free) that will let you enter a regular expression and sample text, then indicate what (if anything) matches the regex. They also try to explain, in words, what the regular expression does.
\d is called a character class and will match digits. It is equal to [0-9].
+ matches 1 or more occurrences of the character before.
So \d+ means match 1 or more digits.
\d means 'digit'. + means, '1 or more times'. So \d+ means one or more digit. It will match 12 and 1.
\d is a digit, + is 1 or more, so a sequence of 1 or more digits
^(?!-)[a-z\d\-]{1,100}$
Here's an explanation using regex comment mode, so this expanded form can itself be used as a regex:
(?x) # flag to enable comment mode
^ # start of line/string.
(?!-) # negative lookahead for literal hyphen (-) character, so fails if the next position contains one.
[a-z\d\-] # character class matches a single alpha (a-z), digit (\d) or hyphen (\-).
{1,100} # match the above [class] upto 100 times, at least once.
$ # end of line/string.
In short, it's matching upto 100 lowercase alphanumerics or hyphen, but the first character must not be hyphen.
Could be attempting to validate a serial number, or similar, but it's too general to say for sure.
Not all regex engines support negative lookaheads. If you're trying to figure out what it is doing in order to adapt for an engine without negative lookaheads, you can use:
^[a-z\d][a-z\d-]{0,99}$
(?!-) == negative lookahead
start of line not followed by a - that contains at least 1 to 100 characters that can be a-z or 0-9 or a - followed by the end of the line, though the \d in the character class is probably wrong and should be specified by 0-9 otherwise the a-z takes care of a 'd' character, depends on the regex flavor.
A string of letters, digits and dashes. Between 1 and 100 characters. The first character is not a dash.