Optional characters in a regex - regex

The task is pretty simple, but I've not been able to come up with a good solution yet: a string can contain numbers, dashes and pluses, or only numbers.
^[0-9+-]+$
does most of what I need, except when a user enters garbage like "+-+--+"
I've not had luck with regular lookahead, since the dashes and pluses could potentially be anywhere in the string.
Valid strings:
234654
24-3+-2
-234
25485+
Invalid:
++--+

How about this:
([+-]?\d[+-]?)+
which means "one or more digits, each of which can be preceded or followed by an optional plus or minus".
Here's a Python test script:
import re
TESTS = "234654 24-3+-2 -234 25485+ ++--+".split()
for test in TESTS:
print test, ":", re.match(r'([+-]?\d[+-]?)+', test) is not None
which prints this:
234654 : True
24-3+-2 : True
-234 : True
25485+ : True
++--+ : False

How about:
^[0-9+-]*[0-9][0-9+-]*$
This ensures that there is at least one digit somewhere in the string. (It looks like it might have a lot of backtracking, though. But on the other hand it doesn't have a + or * wrapped inside another + or *, which I don't like either.)

^([+-]*[0-9]+[+-]*)+$
Another solution using a positive look behind assertion ensuring there is at leat one number.
^[0-9+-]+$(?<=[0-9][+-]*)
Or using a positive look ahead assertion.
(?=[+-]*[0-9])^[0-9+-]+

I like the
^(?=.*\d)[\d+-]+$
solution, myself. It says exactly what you need without requiring any head-scratching.

I'd do it like this:
^[-+]*\d[\d+-]*$
Fast is good!

Related

Possible to limit to scope/range of a lookahead

We can check to see if a digit is in a password, for example, by doing something like:
(?=.*\d)
Or if there's a digit and lowercase with:
(?=.*\d)(?=.*[a-z])
This will basically go on "until the end" to check whether there's a letter in the string.
However, I was wondering if it's possible in some sort of generic way to limit the scope of a lookahead. Here's a basic example which I'm hoping will demonstrate the point:
start_of_string;
middle_of_string;
end_of_string;
I want to use a single regular expression to match against start_of_string + middle_of_string + end_of_string.
Is it possible to use a lookahead/lookbehind in the middle_of_string section WITHOUT KNOWING WHAT COMES BEFORE OR AFTER IT? That is, not knowing the size or contents of the preceding/succeeding string component. And limit the scope of the lookahead to only what is contained in that portion of the string?
Let's take one example:
start_of_string = 'start'
middle_of_string = '123'
end_of_string = 'ABC'
Would it be possible to check the contents of each part but limit it's scope like this?
string = 'start123ABC'
# Check to make sure the first part has a letter, the second part has a number and the third part has a capital
((?=.*[a-z]).*) # limit scope to the first part only!!
((?=.*[0-9]).*) # limit scope to only the second part.
((?=.*[A-Z]).*) # limit scope to only the last part.
In other words, can lookaheads/lookbehinds be "chained" with other components of a regex without it screwing up the entire regex?
UPDATE:
Here would be an example, hopefully this is more helpful to the question:
START_OF_STRING = 'abc'
Does 'x' exist in it? (?=.*x) ==> False
END_OF_STRING = 'cdxoy'
Does 'y' exist in it? (?=.*y) ==> True
FULL_STRING = START_OF_STRING + END_OF_STRING
'abcdxoy'
Is it possible to chain the two regexes together in any sort of way to only wok on its 'substring' component?
For example, now (?=.*x) in the first part of the string would return True, but it should not.
`((?=.*x)(?=.*y)).*`
I think the short answer to this is "No, it's not possible.", but am looking to hear from someone who understands this to tell why it is or isn't.
In .NET and javascript you could use a positive lookahead at the start of your string component and a negative lookbehind at the end of it to "constrain" the match. Example:
.*(?=.*arrow)(?<middle>.*)(?<=.*arrow).*
helloarrowxyz
{'middle': 'arrow'}
If in pcre, python, or other you would need to either have a fixed width lookahead to constraint it from going too far forward, such as what Wiktor Stribiżew says above:
.*(?=.{0,5}arrow)(?<middle>.{0,5}).*
Otherwise, it wouldn't be possible to do without either a fixed-width lookahead or a variable width look-behind.

Regexp: Keyword followed by value to extract

I had this question a couple of times before, and I still couldn't find a good answer..
In my current problem, I have a console program output (string) that looks like this:
Number of assemblies processed = 1200
Number of assemblies uninstalled = 1197
Number of failures = 3
Now I want to extract those numbers and to check if there were failures. (That's a gacutil.exe output, btw.) In other words, I want to match any number [0-9]+ in the string that is preceded by 'failures = '.
How would I do that? I want to get the number only. Of course I can match the whole thing like /failures = [0-9]+/ .. and then trim the first characters with length("failures = ") or something like that. The point is, I don't want to do that, it's a lame workaround.
Because it's odd; if my pattern-to-match-but-not-into-output ("failures = ") comes after the thing i want to extract ([0-9]+), there is a way to do it:
pattern(?=expression)
To show the absurdity of this, if the whole file was processed backwards, I could use:
[0-9]+(?= = seruliaf)
... so, is there no forward-way? :T
pattern(?=expression) is a regex positive lookahead and what you are looking for is a regex positive lookbehind that goes like this (?<=expression)pattern but this feature is not supported by all flavors of regex. It depends which language you are using.
more infos at regular-expressions.info for comparison of Lookaround feature scroll down 2/3 on this page.
If your console output does actually look like that throughout, try splitting the string on "=" when the word "failure" is found, then get the last element (or the 2nd element). You did not say what your language is, but any decent language with string splitting capability would do the job. For example
gacutil.exe.... | ruby -F"=" -ane "print $F[-1] if /failure/"

Regular expression any character with dynamic size

I want to use a regular expression that would do the following thing ( i extracted the part where i'm in trouble in order to simplify ):
any character for 1 to 5 first characters, then an "underscore", then some digits, then an "underscore", then some digits or dot.
With a restriction on "underscore" it should give something like that:
^([^_]{1,5})_([\\d]{2,3})_([\\d\\.]*)$
But i want to allow the "_" in the 1-5 first characters in case it still match the end of the regular expression, for example if i had somethink like:
to_to_123_12.56
I think this is linked to an eager problem in the regex engine, nevertheless, i tried to do some lazy stuff like explained here but without sucess.
Any idea ?
I used the following regex and it appeared to work fine for your task. I've simply replaced your initial [^_] with ..
^.{1,5}_\d{2,3}_[\d\.]*$
It's probably best to replace your final * with + too, unless you allow nothing after the final '_'. And note your final part allows multiple '.' (I don't know if that's what you want or not).
For the record, here's a quick Python script I used to verify the regex:
import re
strs = [ "a_12_1",
"abc_12_134",
"abcd_123_1.",
"abcde_12_1",
"a_123_123.456.7890.",
"a_12_1",
"ab_de_12_1",
]
myre = r"^.{1,5}_\d{2,3}_[\d\.]+$"
for str in strs:
m = re.match(myre, str)
if m:
print "Yes:",
if m.group(0) == str:
print "ALL",
else:
print "No:",
print str
Output is:
Yes: ALL a_12_1
Yes: ALL abc_12_134
Yes: ALL abcd_134_1.
Yes: ALL abcde_12_1
Yes: ALL a_123_123.456.7890.
Yes: ALL a_12_1
Yes: ALL ab_de_12_1
^(.{1,5})_(\d{2,3})_([\d.]*)$
works for your example. The result doesn't change whether you use a lazy quantifier or not.
While answering the comment ( writing the lazy expression ), i saw that i did a mistake... if i simply use the folowing classical regex, it works:
^(.{1,5})_([\\d]{2,3})_([\\d\\.]*)$
Thank you.

Regex: How to match a string that is not only numbers

Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed

Regular Expression for username

I need help on regular expression on the condition (4) below:
Begin with a-z
End with a-z0-9
allow 3 special characters like ._-
The characters in (3) must be followed by alphanumeric characters, and it cannot be followed by any characters in (3) themselves.
Not sure how to do this. Any help is appreciated, with the sample and some explanations.
You can try this:
^(?=.{5,10}$)(?!.*[._-]{2})[a-z][a-z0-9._-]*[a-z0-9]$
This uses lookaheads to enforce that username must have between 5 and 10 characters (?=.{5,10}$), and that none of the 3 special characters appear twice in a row (?!.*[._-]{2}), but overall they can appear any number of times (Konrad interprets it differently, in that the 3 special characters can appear up to 3 times).
Here's a test harness in Java:
String[] test = {
"abc",
"abcde",
"acd_e",
"_abcd",
"abcd_",
"a__bc",
"a_.bc",
"a_b.c-d",
"a_b_c_d_e",
"this-is-too-long",
};
for (String s : test) {
System.out.format("%s %B %n", s,
s.matches("^(?=.{5,10}$)(?!.*[._-]{2})[a-z][a-z0-9._-]*[a-z0-9]$")
);
}
This prints:
abc FALSE
abcde TRUE
acd_e TRUE
_abcd FALSE
abcd_ FALSE
a__bc FALSE
a_.bc FALSE
a_b.c-d TRUE
a_b_c_d_e TRUE
this-is-too-long FALSE
See also
regular-expressions.info/Lookarounds, limiting repetitions, and anchors
So basically:
Start with [a-z].
Allow a first serving of [a-z0-9], several times. 1)
Allow
at most one of [._-], followed by
at least one of [a-z0-9]
three times or less.
End with [a-z0-9] (implied in the above).
Which yields:
^[a-z][a-z0-9]*([._-][a-z0-9]+){0,3}$
But beware that this may result in user names with only one character.
1) (posted by #codeka)
try that:
^[a-zA-Z](([\._\-][a-zA-Z0-9])|[a-zA-Z0-9])*[a-z0-9]$
1) ^[a-zA-Z]: beginning
2) (([._-][a-zA-Z0-9])|[a-zA-Z0-9])*: any number of either alphanum, or special char followed by alphanum
3) [a-z0-9]$
Well, because I feel like being ... me. One Regex does not need to rule them all -- and for one of the Nine, see nqueens. (However, in this case there are some nice answers already; I'm just pointing out a slightly different approach.)
function valid(n) {
return (n.length > 3
&& n.match(/^[a-z]/i)
&& n.match(/[a-z0-9]$/i)
&& !n.match(/[._-]{2}/)
}
Now imagine that you only allow one ., _ or - total (perhaps I misread the initial requirements shrug); that's easy to add (and visualize):
&& n.replace(/[^._-]/g, "").length <= 1
And before anyone says "that's less efficient", go profile it in the intended usage. Also note, I didn't give up using regular expressions entirely, they are a wonderful thing.