Regex to match an optional '+' symbol followed by any number of digits - regex

I want a regular expression to match a string that may or may not start with plus symbol and then contain any number of digits.
Those should be matched
+35423452354554
or
3423564564

This should work
\+?\d+
Matches an optional + at the beginning of the line and digits after it
EDIT:
As of OP's request of clarification: 3423kk55 is matched because so it is the first part (3423). To match a whole string only use this instead:
^\+?\d+$

It'll look something like this:
\+?\d+
The \+ means a literal plus sign, the ? means that the preceding group (the plus sign) can appear 0 or 1 times, \d indicates a digit character, and the final + requires that the preceding group (the digit) appears one or more times.
EDIT: When using regular expressions, bear in mind that there's a difference between find and matches (in Java at least, though most regex implementations have similar methods). find will find the substring somewhere in the owning string, and matches will try to match the entire string against the pattern, failing if there are extra characters before or after. Ensure you're using the right method, and remember that you can add a ^ to force the beginning of the line and a $ to force the end of the line (making the entire thing look like ^\+?\d+$.

Simple ^\+?\d+$
Start line, then 1 or 0 plus signs, followed by at least 1 digit, then end of lnie

A Perl regular expression for it could be: \+?\d+

Related

What is the diffrence between these three regular expressions

What is the main difference between the following 3 regular expressions.
1) /^[^0-9]+$/
2)/[^0-9]+/
3) m/[^0-9]+/
I am really trying to understand this, since researching online has not helped me much I was hoping I could find some help here.
All of them have [^0-9]+, which is one or more characters that are not the numbers 0, 1, ... to 9.
The first one /^[^0-9]+$/ is anchored at the start and end of the string, so it will match any string that only contains non-digits.
The second one /[^0-9]+/ is not anchored, so it matches any string that contains at least one (or more) non-digits.
The third one m/[^0-9]+/ is the same as the second, but uses the m// match operator explicitly.
For a good explanation, check out regex101.com for the first and second regex.
There's a difference between a regular expression and the match operator which takes a regular expression as its operand.
You only have two regular expressions there - ^[^0-9]+$ and [^0-9]+. Option 3 uses the same regex as option 2, but it uses a different version of the match operator.
The difference between 1 and 2 is that 1 is anchored at the start and the end of the string, whereas 2 isn't anchored at all.
So 1 says "match the start of the string, followed by one or more non-digits, followed by the end of the string". 2 says "match one or more non-digits anywhere in the string".
Does that help at all?
The pattern [^0-9] is common to these three regexes, and will match any single character that is not a decimal digit
/^[^0-9]+$/
This anchors the pattern to the beginning and end of the string, and insists that it contains one or more non-digit characters
The circumflex ^ is a zero-width anchor that matches the beginning of the string
The dollar sign $ is also a zero-width anchor that will match either at the end of the string, or before a newline character if that newline is the last in the string. So this will match "aaa" and "aaa\n" but not "aa7bb\n"
/[^0-9]+/
This has no anchors, and so will return true if the string contains at least one non-digit character anywhere
It will match "12x345" and fail to match "12345". Note that a trailing newline counts as a non-digit character, so this pattern will match "123\n"
m/[^0-9]+/
This is identical to #2, but with the m placed explicitly. This is unnecessary if you are using the default slashes for delimiters, but it can be convenient to use something different if you are matching a pattern for, say, a file path, which itself contains slashes
Using m lets you choose your own delimiter, for example m{/my/path} instead of /\/my\/path/
In essence, #1 is asking whether the string is wholly composed of non-digit characters, while #2 and #3 are identical, and test whether the string contains at least one non-digit character

Regex keeps matching repeated numbers when I only want unique number

I would like to match an exact number in a string, but my regex keeps matching the exact number if it repeats together.
I have the following string:
SomePrefix1201-21,4,52
And I have the following regex to find a match for 21:
SomePrefix[\d]+-[,\d]*21[,$]*
It will match this string fine.
However, it also matches:
SomePrefix1201-2121,4,52
But I only want it to match if it is the exact number.
The number may exist at the end too, so it is not always following by a comma.
I've been racking my brain like anything
Update
Based on the corrected answer below, I managed to find the exact regex I need, with one addition of a lookahead too.
SomePrefix[\d]+-([\d]*,)*21(?!\d)[,$]*
The [,\d]* part matches any number of digits and commas in any order. What you probably wanted was ([\d]*,)* so that any preceding digits and commas must end in a comma (not a digit, which would become a part of the number).
SomePrefix[^-]+-(\d+,)*(21,|21$)
Match the prefix, followed by one or more non-dash characters, then a dash, then zero or more comma-terminated digit fields, followed either by 21, (and possibly more material) or just 21 anchored to the end.
If the comma-terminated fields can be empty, then of course \d* rather than \d+.
It's not clear that you can widely use the anchor operator $ inside a character class (perhaps some regex implementations have this feature), so I distributed it out into two matches for 21, which looks clear. The 21 can be factored out of this:
(21,|21$) -> 21(,|$)

regular expression not working as expected with the plus quantifier

I have
/\d+/
Using the string >"tom666tom"
It matches the 666. Shouldnt it fail when it hits the first t in tom?
How exactly is the regex engine working here. I know the plus sign means one or more.
it will fail if you tell the regex is should start and end with a number like so
/^\d+$/
the ^ defines the start of the string and $ the end.
Pattern search one or more digits (+) in the input string
You are not telling your expression to match the entire string. If any part of the string contains one or more digits, it will match. Use the ^ (zero-length start of line marker) and $ (zero-length end of line marker) to delimit your regex and indicate that the only thing on the line should be digits: /^\d+$/.
It shouldn't fall when it encounters first t in "tom" because a +
matches 1 or more of the preceeding token. This is a greedy match, and
will match as many characters as possible before satisfying the next
token.
In your regex /\d+/, the + is placed after \d which matches any digit.
As said in the definition, the regex engine is working perfectly, because it is matching the previous token (\d) as many times it could.
So it will match the digits till it encounters a mismatch.
So the preceeding token here is \d and hence, regex engine is working fine.

Match Regular Expressoin if string contains exactly N occrences of a character

I'd like a regular expression to match a string only if it contains a character that occurs a predefined number of times.
For example:
I want to match all strings that contain the character "_" 3 times;
So
"a_b_c_d" would pass
"a_b" would fail
"a_b_c_d_e" would fail
Does someone know a simple regular expression that would satisfy this?
Thank you
For your example, you could do:
\b[a-z]*(_[a-z]*){3}[a-z]*\b
(with an ignore case flag).
You can play with it here
It says "match 0 or more letters, followed by '_[a-z]*' exactly three times, followed by 0 or more letters". The \b means "word boundary", ie "match a whole word".
Since I've used '*' this will match if there are exactly three "_" in the word regardless of whether it appears at the start or end of the word - you can modify it otherwise.
Also, I've assumed you want to match all words in a string with exactly three "_" in it.
That means the string "a_b a_b_c_d" would say that "a_b_c_d" passed (but "a_b" fails).
If you mean that globally across the entire string you only want three "_" to appear, then use:
^[^_]*(_[^_]*){3}[^_]*$
This anchors the regex at the start of the string and goes to the end, making sure there are only three occurences of "_" in it.
Elaborating on Rado's answer, which is so far the most polyvalent but could be a pain to write if there are more occurrences to match :
^([^_]*_){3}[^_]*$
It will match entire strings (from the beginning ^ to the end $) in which there are exactly 3 ({3}) times the pattern consisting of 0 or more (*) times any character not being underscore ([^_]) and one underscore (_), the whole being followed by 0 ore more times any character other than underscore ([^_]*, again).
Of course one could alternatively group the other way round, as in our case the pattern is symmetric :
^[^_]*(_[^_]*){3}$
This should do it:
^[^_]*_[^_]*_[^_]*_[^_]*$
If you're examples are the only possibilities (like a_b_c_...), then the others are fine, but I wrote one that will handle some other possibilities. Such as:
a__b_adf
a_b_asfdasdfasfdasdfasf_asdfasfd
___
_a_b_b
Etc.
Here's my regex.
\b(_[^_]*|[^_]*_|_){3}\b

Regular expression to match last number in a string

I need to extract the last number that is inside a string. I'm trying to do this with regex and negative lookaheads, but it's not working. This is the regex that I have:
\d+(?!\d+)
And these are some strings, just to give you an idea, and what the regex should match:
ARRAY[123] matches 123
ARRAY[123].ITEM[4] matches 4
B:1000 matches 1000
B:1000.10 matches 10
And so on. The regex matches the numbers, but all of them. I don't get why the negative lookahead is not working. Any one care to explain?
Your regex \d+(?!\d+) says
match any number if it is not immediately followed by a number.
which is incorrect. A number is last if it is not followed (following it anywhere, not just immediately) by any other number.
When translated to regex we have:
(\d+)(?!.*\d)
Rubular Link
I took it this way: you need to make sure the match is close enough to the end of the string; close enough in the sense that only non-digits may intervene. What I suggest is the following:
/(\d+)\D*\z/
\z at the end means that that is the end of the string.
\D* before that means that an arbitrary number of non-digits can intervene between the match and the end of the string.
(\d+) is the matching part. It is in parenthesis so that you can pick it up, as was pointed out by Cameron.
You can use
.*(?:\D|^)(\d+)
to get the last number; this is because the matcher will gobble up all the characters with .*, then backtrack to the first non-digit character or the start of the string, then match the final group of digits.
Your negative lookahead isn't working because on the string "1 3", for example, the 1 is matched by the \d+, then the space matches the negative lookahead (since it's not a sequence of one or more digits). The 3 is never even looked at.
Note that your example regex doesn't have any groups in it, so I'm not sure how you were extracting the number.
I still had issues with managing the capture groups
(for example, if using Inline Modifiers (?imsxXU)).
This worked for my purposes -
.(?:\D|^)\d(\D)