How to exclude comma (,) in regex? - regex

I came to scenario where I only want [0-9 or .] For that I used this regex:
[0-9.]$
This regex accepts 0-9 and . (dot for decimal). But when I write something like this
1,1
It also accepts comma (,). How can I avoid this?

Once you are looking into a way to parse numbers (you said dot is for decimals), maybe you don't want your string to start with dot neither ending with it, and must accept only one dot. If this is your case, try using:
^(\d+\.?\d+|\d)$
where:
\d+ stands for any digit (one or more)
\.? stands for zero or one of literal dot
\d stands for any digit (just one)
You can see it working here
Or maybe you'd like to accept strings starting with a dot, which is normally accepted being 0 as integer part, in this case you can use ^\d*\.?\d+$.

This regex [0-9.]$ consists of a character class that matches a digit or a dot at the end of a line $.
If you only want to match a digit or a dot you could add ^ to assert the position at the start of a line:
^[0-9.]$
If you want to match one or more digits, a dot and one or more digits you could use:
^[0-9]+\.[0-9]+$

This regex may help you:
/[0-9.]+/g
Accepts 0 to 9 digits and dot(.).
Explanation:
Match a single character present in the list below [0-9.]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
. matches the character . literally (case sensitive)
You can test it here

Related

Regex string doesn't contain 2 dots in a row

I'd like to know if this regex expression is correct for checking that a string doesn't start with a dot, doesn't end with a dot and contains at least one dot anywhere but not the start or end:
My issue is that I can't figure on how to check if there's 2 dots in a row.
/^([^.])+([.])+.*([^.])$/
It seems you need to use
^[^.]+(?:\.[^.]+)+$
See the regex demo
Details:
^ - start of string
[^.]+ - 1+ chars other than a . (so, the first char cannot be .)
(?:\.[^.]+)+ - 1 or more (thus, the dot inside a string is obligatory to appear at least once) sequences of:
\. - a dot
[^.]+ - 1+ chars other than . (the + quantifier makes a char other than . appear at least once after a dot, thus, making it impossible to match the string with 2 dots on end)
$ - end of string.
You're close, have a try with:
^[^.]+(?:\.[^.]+){2,}$
It maches strings that have 2 or more dot, but not at the begining or at the end.
If you want one or more dot:
^[^.]+(?:\.[^.]+)+$
If you want one or two dots:
^[^.]+(?:\.[^.]+){1,2}$

Regex Matching Behaviour Of \w

I noticed some interesting behaviour with some regex work I am doing, and I'd like some insight.
From what I understand, the word character, \w should match the following [a-zA-Z_0-9]
Given this input,
0000000060399301+0000000042456971+0000000
What should this regex
(\d+)\w
Capture?
I would expect it to capture 0000000060399301 but it actually captures 000000006039930
Is there something I am missing? Why is the 1 dropped from the end?
I noticed if I changed the regex to
(\d+\w)
It captures correctly i.e. including the 1
Anyone care to explain? Thanks
You require the regex to match a trailing word character - that would be the 1.
It cannot be another character, because
+ is not a word class character
+ is not a digit
matching is greedy
\d+ - matches one or more digit characters.
\w+ - matches one or more word characters. [A-Za-z\d_]
So with this string 0000000060399301+, \d+ in this (\d+)\w regex matches all the digits (including the 1 before +) at very first, since the following pattern is \w , regex engine tries to find a match, so it backtracks one character to the left and forces \w to match the digit before + . Now the captured group contains 000000006039930 and the last 1 is matched by \w
The 1 is being dropped because \w isn't in the capture group.

regex - and . _

Writing a regex for a-z, A-Z and allowing - and . _ and integers.
For example:
Testing-Server1
Testing.Server
Testing_Server
Tried this, but was unsure how to allow - _ . and integers:
"^[a-z][A-Z]*$"
Simple enough:
^[-a-zA-Z0-9_.]*$
Explanation
^ = Match from the start of the input
[-a-zA-Z0-9_.] = A character class (a list of allowed characters):
- matches the literal '-' character (must be the first or last character in the class)
a-z matches lowercase alpha characters
A-Z matches uppercase alpha characters
0-9 matches the numeric characters
_ matches the literal '_' character
. matches the literal '.' character (unlike outside a character class, where it matches any character)
* = Match 0 to infinite characters (use + to match at least one character)
$ = Match to the end of the string
Alternative
As stranac mentions in his answer, you can replace a-zA-Z0-9_ with \w, but I prefer the more explicit version, as it's more understandable.
Limiting matched characters
As the OP asked in a comment, to limit the allowed number of characters to 15:
^[-a-zA-Z0-9_.]{0,15}$
Where {0,15} means match between 0 and 15 characters (of the character class) only. You can adjust the values as appropriate, for example, to match at least one character, use {1,15}.
The other answers are over-complicating things.
\w already matches letters, digits and underscores, so you only need to add dot and minus to those.
This regex does the trick: r'^[\w.-]+$'
A few examples:
>>> re.search(r'^[\w.-]+$', 'Testing-Server1').group()
'Testing-Server1'
>>> re.search(r'^[\w.-]+$', 'Testing.Server').group()
'Testing.Server'
>>> re.search(r'^[\w.-]+$', 'Testing_Server').group()
'Testing_Server'
Instead of parsing all the string to check if it contains only allowed characters, it is faster to search the first character that is not an allowed character, because the search stops once it is found:
if not re.search(r'[^\w.-]', yourstring):
...
If you need to check the max length of the string, you can simply write:
if (len(yourstring) < 16 and not re.search(r'[^\w.-]', yourstring)):
This following pattern is enough
^[a-zA-z1-9._-]+$
Explanation
a-z a single character in the range between a and z (case sensitive)
A-z a single character in the range between A and z (case sensitive)
1-9 a single character in the range between 1 and 9
. matches the character . literally
the literal character -
Demo
You can try the following pattern :
[a-zA-Z\d._-]+
DEMO
Try this out :
([a-zA-z0-9._-]+)
Explanation:
1) a-z a single character in the range between a and z (case sensitive)
2) A-z a single character in the range between A and z (case sensitive)
3) 0-9 a single character in the range between 0 and 9
4) ._- a single character in the list ._- literally
It should work , check this :https://regex101.com/r/gZ5xN5/2

Using ?=. in regular expression

I saw the phrase
^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[A-Za-z0-9_##%\*\-]{8,24}$
in regex, which was password checking mechanism. I read few courses about regular expressions, but I never saw combination ?=. explained.
I want know how it works. In the example it is searching for at least one capital letter, one small letter and one number. I guess it's something like "if".
(?=regex_here) is a positive lookahead. It is a zero-width assertion, meaning that it matches a location that is followed by the regex contained within (?= and ). To quote from the linked page:
lookaround actually matches characters, but then gives up the match,
returning only the result: match or no match. That is why they are
called "assertions". They do not consume characters in the string, but
only assert whether a match is possible or not. Lookaround allows you
to create regular expressions that are impossible to create without
them, or that would get very longwinded without them.
The . is not part of the lookahead, because it matches any single character that is not a line terminator.
Although i am a newbie to regex but what i understand about the above regex is
1- ?= is positive lookahead i.e. it matches the expression by looking ahead and sees if there is any pattern that matches your search paramater like [A-Z]
2- .* makes sure that they can be 0 or more number of characters before your matching expression i.e. it makes sure that u can lookahead till the end of the input string to find a match.
In short * is a quantifier which says 0 or more so if:
For instance u changed * with ? for [A-Z] part then your expression will only return true if ur 1st or 2nd letter is capital. OR if u changed it with + then ur expression will return true if any letter other than the first is a capital letter
^ asserts position at start of the string
Positive Lookahead (?=\D*\d)
Assert that the Regex below matches
\D matches any character that's not a digit (equivalent to [^0-9])
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d matches a digit (equivalent to [0-9])
Positive Lookahead (?=[^a-z]*[a-z])
Assert that the Regex below matches
Match a single character not present in the list below [^a-z]
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
Match a single character present in the list below [a-z]
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
Positive Lookahead (?=[^A-Z]*[A-Z])
Assert that the Regex below matches
Match a single character not present in the list below [^A-Z]
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Match a single character present in the list below [A-Z]
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
. matches any character (except for line terminators)
{8,30} matches the previous token between 8 and 30 times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

what does this regular expression mean?

^(?!-)[a-z\d\-]{1,100}$
Here's an explanation using regex comment mode, so this expanded form can itself be used as a regex:
(?x) # flag to enable comment mode
^ # start of line/string.
(?!-) # negative lookahead for literal hyphen (-) character, so fails if the next position contains one.
[a-z\d\-] # character class matches a single alpha (a-z), digit (\d) or hyphen (\-).
{1,100} # match the above [class] upto 100 times, at least once.
$ # end of line/string.
In short, it's matching upto 100 lowercase alphanumerics or hyphen, but the first character must not be hyphen.
Could be attempting to validate a serial number, or similar, but it's too general to say for sure.
Not all regex engines support negative lookaheads. If you're trying to figure out what it is doing in order to adapt for an engine without negative lookaheads, you can use:
^[a-z\d][a-z\d-]{0,99}$
(?!-) == negative lookahead
start of line not followed by a - that contains at least 1 to 100 characters that can be a-z or 0-9 or a - followed by the end of the line, though the \d in the character class is probably wrong and should be specified by 0-9 otherwise the a-z takes care of a 'd' character, depends on the regex flavor.
A string of letters, digits and dashes. Between 1 and 100 characters. The first character is not a dash.