I need regex to only take numbers in string - regex

I thought I had it with [0-9] but when I ran it that only took one number.
The string goes for example:
1 note
1,234 notes
68,000 notes
I want it so it takes the whole number and leaves out the notes part and the spaces and also the comma so just the full number.
The [0-9] would only take the first number of the string even when there wasnt a comma.
So how to only take the number please?

[0-9] means any one character between 0 and 9. What you are looking for is these characters repeated any number of times, but no other character should be there. The correct way to write this is [0-9]+.
M+, where M is some regex rule is equivalent to M M*, where * means 0 or more occurrences. So M+ can be inferred as at least one occurrence of portions specified by M.
EDIT: The question now also states that the entire number should be read, but the comma should be excluded from the output. AFAIK, this is impossible to be done using only regex, as the matched text can't be different from the stored text. A possible solution is to add , to the list of allowed characters and parse the result to remove them later on.

Related

Replace a specific word followed by sometimes single digit number or other times two digit numbers

Strings will always end with 'Row' followed by a number. For example,
desk_Row2.txt
desk_Row15.txt
If sorted, desk_Row15.txt will precede desk_Row2.txt.
If it's a single digit number, I want to put a leading 0 in front of it so that, when sorted, it will be:
desk_Row02.txt
desk_Row15.txt
I figured out a long way where I find 'Row' with findstr and '.' and what's between them is a number. Then I can figure out whether str2double(that) is greater than 9. Well, I think this can be done in a matter of one or two sentences.
More generally, I want to learn to create expressions so that I can do the above myself later on. For example, I have no idea what (^|\.)\s*. means.
I was thinking to use regexprep, but I have no idea what the expression should be.

Regular Expression (consecutive 1s and 0s)

Hey I'm supposed to develop a regular expression for a binary string that has no consecutive 0s and no consecutive 1s. However this question is proving quite tricky. I'm not quite sure how to approach it as is.
If anyone could help that'd be great! This is new to me.
You're basically looking for alternating digits, the string:
...01010101010101...
but one that doesn't go infinitely in either direction.
That would be an optional 0 followed by any number of 10 sets followed by an optional 1:
^0?(10)*1?$
The (10)* (group) gives you as many of the alternating digits as you need and the optional edge characters allow you to start/stop with a half-group.
Keep in mind that also allows an empty string which may not be what you want, though you could argue that's still a binary string with no consecutive identical digits. If you need it to have a length of at least one, you can do that with a more complicated "or" regex like:
^(0(10)*1?)|(1(01)*0?)$
which makes the first digit (either 1 or 0) non-optional and adjusts the following sequences accordingly for the two cases.
But a simpler solution may be better if it's allowed - just ensure it has a length greater than zero before doing the regex check.

Regex for minimum number of characters

I created this regular expression to validate names:
^[a-zA-Z0-9\s\-\,]+.\*?$
Is there a way add the minimum number of characters?
I know we can use {x,}, but I cannot make it work.
{x,} should be used instead of + here...
^[a-zA-Z0-9\s,-]{5,}
But this would mean, "at least 5 characters in the beginning match those from the character class, and then anything...
If you write it like this (almost your original - just with {5,} instead of +):
^[a-zA-Z0-9\s\-\,]{5,}.\*?$
This means "at least 5 characters in the beginning match those from the character class, and any one character, and then optionally an asterisk, and that should be the end of it".
Use a lookahead at the beginning of the regex to make sure the total number of characters is at least your minimum. For example, if your minimum is 8 characters:
^(?=.{8,})[a-zA-Z0-9\s\-,]+.\*?$
Also, you don't need to escape the comma.

Regex for a string up to 20 chars long with a comma

I need to define a regex for a string with the following requirements:
Maximum 20 characters
Must be in the form Name,Surname
No numbers and special characters allowed (again, it's a name&surname)
I already tried something like ^[^1-9\?\*\.\?\$\^\_]{1,20}[,][^1-9\?\*\.\?\$\^\_\-]{1,20}$ but as you can find, it also matches a 40 chars long string.
How can I check for the whole string's maximum length and at the same time impose 1 comma inside of it and obviously not at the borders?
Thank you
Try the regex:
^(?=[^,]+,[^,]+$)[a-zA-Z,]{1,20}$
Rubular Link
Explanation:
^ : Start anchor
(?=[^,]+,[^,]+$) : Positive lookahead to ensure string has exactly one comma
surrounded by at least one non-comma character on both sides.
[a-zA-Z,]{1,20} : Ensure entire string is of length max 20 and has only
letters and comma
$ : End anchor
You can do this using forward negative assertions:
^(?!.{21})[A-Za-z]+,[A-Za-z]+$
The regex contains two parts now, the actual definition, and a statement at the start, saying that from that point, there will not be 21 characters.
So for the definition as stated above, the regex becomes
^(?!.{21})[^1-9\?*\.\?\$\^_\,]+,[^1-9\?*\.\?\$\^_\,]+$
The obvious answer would be: Don't ask for name and surname in the same input field.
If you still want to do it: There's no easy way that I know of, but here is a possibility. To see the principle think your [^1-9\?\*\.\?\$\^\_\,] instead of X (I added he \, since it's kind of important :-)).
^(X{1},X{19})|(X{2},X{18})|...|(X{19},X{1})$
Quite ugly, but should work.
On a different note: You don't capture nearly all special characters with your exclusive range. But it's probably still better than an inclusive range.
As I say, I think stated the way you have it, it's not matchable by a regular expression -- it's a pushdown language.
However, you could always split on ',' and match each substring, then total.
I have you tried your example, but removing the
{1,20}
in the middle, leaving to try this:
^[[^1-9\?\*\.\?\$\^\_],[^1-9\?\*\.\?\$\^\_\-]]{1,20}$
Use:
[[a-zA-Z],[a-zA-Z]]{1,20}

Regex to find any character used more than 3 times in a string but not consecutively

I found all sorts of really close answers already, but not quite.
I need to look at a string, and find any character that is used more than 3 times. Basically to limit a password to disallow "mississippi" as it has more than 3 s's in it. I think it only needs to be characters, but should be unicode. So I guess the (:alpha:) for the character set to match on.
I found (\w)\1+{4,} which finds consecutive characters, like ssss or missssippi but not if they are not consecutive.
Working my way through the other regex questions to see if someone has answered it but there are lots, and no joy yet.
This should do it:
/(.)(.*\1){3}/
It doesn't make any sense to try to combine this with checking for allowable characters. You should first test that all characters are allowable characters and then run this test afterwards. This is why it's OK to use '.' here.
It will be slow though. It would be faster to iterate once over the string and count the characters. Although for your purpose I doubt it makes much difference since the strings are so short.
(\w)(.*\1){2,}
Match a "word character", then 2 copies of "anything, then the first thing again". Thus 3 copies of the first thing, with anything in between.
.*(\w).*\1.*\1.*\1.*
This will match on a string which has any number of characters, then a certain character, and the same character repeated three times after that (total of four), with any number of characters (0..n) in between. That's what you want, right?
Test it on e.g. http://www.regexplanet.com/simple/index.html
This regex matches e.g. "mississippi" (>3 s'es) and "twinkle twinkle little star" (> 3 t's)