I have
/\d+/
Using the string >"tom666tom"
It matches the 666. Shouldnt it fail when it hits the first t in tom?
How exactly is the regex engine working here. I know the plus sign means one or more.
it will fail if you tell the regex is should start and end with a number like so
/^\d+$/
the ^ defines the start of the string and $ the end.
Pattern search one or more digits (+) in the input string
You are not telling your expression to match the entire string. If any part of the string contains one or more digits, it will match. Use the ^ (zero-length start of line marker) and $ (zero-length end of line marker) to delimit your regex and indicate that the only thing on the line should be digits: /^\d+$/.
It shouldn't fall when it encounters first t in "tom" because a +
matches 1 or more of the preceeding token. This is a greedy match, and
will match as many characters as possible before satisfying the next
token.
In your regex /\d+/, the + is placed after \d which matches any digit.
As said in the definition, the regex engine is working perfectly, because it is matching the previous token (\d) as many times it could.
So it will match the digits till it encounters a mismatch.
So the preceeding token here is \d and hence, regex engine is working fine.
Related
I am looking for a regular expression to catch a whole word or expression within a sentence that contains dots:
this is an example test.abc.123 for what I am looking for
In this case i want to catch "test.abc.123"
I tried with this regex:
(.*)(\b.+\..++\b)(.*)
(.*) some signs or not
(\b.+\..++\b) a word containing some signs followed by at least on dot that is followed by some signs and this at least once
(.*) some more signs nor not#
but it gets me: "abc.123 for what I am looking for"
I see that I got something completely wrong, can anyone enlighten me?
If you need to match part of a string you don't need to match entire string (unless you are restricted by a functionality).
Your regex is so greedy. It also has dots every where (.+ is not a good choice most of the time). It doesn't have a precise point to start and finish either. You only need:
\w+(?:\.+\w+)+
It looks for strings that begin and end with word characters and contain at least a period. See live demo here
This regex pattern matches strings with two or more dots:
.*\..*\..*
"." matches any character except line-breaks
"*" repeats previous tokens 0 or more times
"." matches a single dot, slash is used for escape
.* Match any character and continue matching until next token
test.abc.123
(.) Match a single dot
test. abc.123
.* Again, any character and continue matching until next token
test.example.com
. Matches a single dot
test.example. com
.* Matches any character and continue matching until next token
test.example.com
Try this pattern: (?=\w+\.{1,})[^ ]+.
Details: (?=\w+\.{1,}) - positive lookahead to locate starting of a word with at least one dot (.). Then, start matching from that position, until space with this pattern [^ ]+.
Demo
I need to get a regex that will find a match of a single lower case a-z character followed by 5 numbers that is either:
at the start of a line
at the end of a line
surrounded by () or []
surrounded by whitespace
So the following results are expected:
a12345 MATCH
(a12345) MATCH
[a12345] MATCH
text a12345 MATCH
aa12345 NO MATCH
At the moment I have this (?<=[])]*)[a-z]{1}[0-9]{5}(?=[])]*) but it is not working for all scenarios, for example it sees aa12345 and a12345a as being matches when I don't want them to.
Can anyone help?
EDIT:
Apologies I should have mentioned this is for .NET c#
First of all your should mention programming language.
Following solution is for PCRE.
Regex: ((?<=[\[( ])|^)[a-z]\d{5}((?=[\]\) ])|$)
Explanation:
((?<=[\[( ])|^) checks for preceding brackets, whitespaces OR beginning.
[a-z]\d{5} checks for alphabet followed by 5 digits.
((?=[\]\) ])|$) checks for succeeding brackets, whitespaces OR end of line.
Regex101 Demo
Does this work:
(\[[a-z]\d{5}\])|(\([a-z]\d{5}\))|(\b[a-z]\d{5}\b)
My regular expression lets in periods for some reason, how can I keep that from happening.
Rules:
4-15 characters
Any alphanumeric characters
Underscore as long as it's not first or last
[A-Za-z][A-Za-z0-9_]{3,14}
I don't want "bad.example" for work.
Edit: changed to 4-15 characters
Your regex matches example as a substring of bad.example. Use anchors to prevent that:
^[A-Za-z][A-Za-z0-9_]{1,12}[A-Za-z]$
Note that (like your regex) this regex also prevents digits from matching in the first and last position - if they should be allowed (as per your specs), just add 0-9 at the end of the character classes.
^[A-Za-z][A-Za-z0-9_]{3,14}$
try this
This will match any alphanumeric at the beginning and end. In the middle it will accept from one up to twelve alphanumerics including an underscore:
^[a-zA-Z\d]\w{1,12}[a-zA-Z\d]$
It does not match bad.example but matches only example as your regex allows a character from 4 to 15.See here.
http://regex101.com/r/xV4eL5/5
To prevent it you need to match the whole input and not make partial matches.Put a ^ start anchor and $ end anchor.
Use
\A[A-Za-z0-9][\w]{1,12}[A-Za-z0-9]\Z
I want a regular expression to match a string that may or may not start with plus symbol and then contain any number of digits.
Those should be matched
+35423452354554
or
3423564564
This should work
\+?\d+
Matches an optional + at the beginning of the line and digits after it
EDIT:
As of OP's request of clarification: 3423kk55 is matched because so it is the first part (3423). To match a whole string only use this instead:
^\+?\d+$
It'll look something like this:
\+?\d+
The \+ means a literal plus sign, the ? means that the preceding group (the plus sign) can appear 0 or 1 times, \d indicates a digit character, and the final + requires that the preceding group (the digit) appears one or more times.
EDIT: When using regular expressions, bear in mind that there's a difference between find and matches (in Java at least, though most regex implementations have similar methods). find will find the substring somewhere in the owning string, and matches will try to match the entire string against the pattern, failing if there are extra characters before or after. Ensure you're using the right method, and remember that you can add a ^ to force the beginning of the line and a $ to force the end of the line (making the entire thing look like ^\+?\d+$.
Simple ^\+?\d+$
Start line, then 1 or 0 plus signs, followed by at least 1 digit, then end of lnie
A Perl regular expression for it could be: \+?\d+
I need to extract the last number that is inside a string. I'm trying to do this with regex and negative lookaheads, but it's not working. This is the regex that I have:
\d+(?!\d+)
And these are some strings, just to give you an idea, and what the regex should match:
ARRAY[123] matches 123
ARRAY[123].ITEM[4] matches 4
B:1000 matches 1000
B:1000.10 matches 10
And so on. The regex matches the numbers, but all of them. I don't get why the negative lookahead is not working. Any one care to explain?
Your regex \d+(?!\d+) says
match any number if it is not immediately followed by a number.
which is incorrect. A number is last if it is not followed (following it anywhere, not just immediately) by any other number.
When translated to regex we have:
(\d+)(?!.*\d)
Rubular Link
I took it this way: you need to make sure the match is close enough to the end of the string; close enough in the sense that only non-digits may intervene. What I suggest is the following:
/(\d+)\D*\z/
\z at the end means that that is the end of the string.
\D* before that means that an arbitrary number of non-digits can intervene between the match and the end of the string.
(\d+) is the matching part. It is in parenthesis so that you can pick it up, as was pointed out by Cameron.
You can use
.*(?:\D|^)(\d+)
to get the last number; this is because the matcher will gobble up all the characters with .*, then backtrack to the first non-digit character or the start of the string, then match the final group of digits.
Your negative lookahead isn't working because on the string "1 3", for example, the 1 is matched by the \d+, then the space matches the negative lookahead (since it's not a sequence of one or more digits). The 3 is never even looked at.
Note that your example regex doesn't have any groups in it, so I'm not sure how you were extracting the number.
I still had issues with managing the capture groups
(for example, if using Inline Modifiers (?imsxXU)).
This worked for my purposes -
.(?:\D|^)\d(\D)