Match against 1 hyphen per any number of digit groups

Match against 1 hyphen per any number of digit groups - regex

I'm trying to come up with some regex to match against 1 hyphen per any number of digit groups. No characters ([a-z][A-Z]).
123-356-129811231235123-1235612346123451235
/[^\d-]/g
The one above will match the string below, but it will let the following go through:
1223--1235---123123-------
I was looking at the following post How to match hyphens with Regular Expression? for an answer, but I didn't find anything close.
#Konrad Rudolph gave a good example.
Regular expression to match 7-12 digits; may contain space or hyphen
This tool is useful for me http://www.gskinner.com/RegExr/

Assuming it can't ever start with a hyphen:
^\d(-\d|\d)*$
broken down:
^ # match beginning of line
\d # match single digit
(-\d|\d)+ # match hyphen & digit or just a digit (0 or more times)
$ # match end of line
That makes every hyphen have to have a digit immediately following it. Keep in mind though, that the following are examples of legal patterns:
213-123-12314-234234
1-2-3-4-5-6-7
12234234234
gskinner example

Alternatively:
^(\d+-)+(\d+)$
So it's one or more group(s) of digits followed by hyphen + final group of digits.
Nothing very fancy, but in my tests it matched only when there were hyphen(s) with digits on both sides.

Related

Regex Help required for User-Agent Matching

Have used an online regex learning site (regexr) and created something that works but with my very limited experience with regex creation, I could do with some help/advice.
In IIS10 logs, there is a list for time, date... but I am only interested in the cs(User-Agent) field.
My Regex:
(scan\-\d+)(?:\w)+\.shadowserver\.org
which matches these:
scan-02.shadowserver.org
scan-15n.shadowserver.org
scan-42o.shadowserver.org
scan-42j.shadowserver.org
scan-42b.shadowserver.org
scan-47m.shadowserver.org
scan-47a.shadowserver.org
scan-47c.shadowserver.org
scan-42a.shadowserver.org
scan-42n.shadowserver.org
scan-42o.shadowserver.org
but what I would like it to do is:
Match a single number with the option of capturing more than one: scan-2 or scan-02 with an optional letter: scan-2j or scan-02f
Append the rest of the User Agent: .shadowserver.org to the regex.
I will then add it to an existing URL Rewrite rule (as a condition) to abort the request.
Any advice/help would be very much appreciated
Tried:
To write a regex for IIS10 to block requests from a certain user-agent
Expected:
It to work on single numbers as well as double/triple numbers with or without a letter.
(scan\-\d+)(?:\w)+\.shadowserver\.org
Input Text:
scan-2.shadowserver.org
scan-02.shadowserver.org
scan-2j.shadowserver.org
scan-02j.shadowserver.org
scan-17w.shadowserver.org
scan-101p.shadowserver.org
UPDATE:
I eventually came up with this:
scan\-[0-9]+[a-z]{0,1}\.shadowserver\.org

This is explanation of your regex pattern if you only want the solution, then go directly to the end.
(scan\-\d+)(?:\w)+
(scan\-\d+) Group1: match the word scan followed by a literal -, you escaped the hyphen with a \, but if you keep it without escaping it also means a literal - in this case, so you don't have to escape it here, the - followed by \d+ which means one more digit from 0-9 there must be at least one digit, then the value inside the group will be saved inside the first capturing group.
(?:\w)+ non-capturing group, \w one character which is equal to [A-Za-z0-9_], but the the plus + sign after the non-capturing group (?:\w)+, means match the whole group one or more times, the group contains only \w which means it will match one or more word character, note the non-capturing group here is redundant and we can use \w+ directly in this case.
Taking two examples:
The first example: scan-02.shadowserver.org
(scan\-\d+)(?:\w)+
scan will match the word scan in scan-02 and the \- will match the hyphen after scan scan-, the \d+ which means match one or more digit at first it will match the 02 after scan- and the value would be scan-02, then the (?:\w)+ part, the plus + means match one or more word character, at least match one, it will try to match the period . but it will fail, because the period . is not a word character, at this point, do you think it is over ? No , the regex engine will return back to the previous \d+, and this time it will only match the 0 in scan-02, and the value scan-0 will be saved inside the first capturing group, then the (?:\w)+ part will match the 2 in scan-02, but why the engine returns back to \d+ ? this is because you used the + sign after \d+, (?:\w)+ which means match at least one digit, and one word character respectively, so it will try to do what it is asked to do literally.
The second example: scan-2.shadowserver.org
(scan\-\d+)(?:\w)+
(scan\-\d+) will match scan-2, (?:\w)+ will try to match the period after scan-2 but it fails and this is the important point here, then it will go back to the beginning of the string scan-2.shadowserver.org and try to match (scan\-\d+) again but starting from the character c in the string , so s in (scan\-\d+) faild to match c, and it will continue trying, at the end it will fail.
Simple solution:
(scan-\d+[a-z]?)\.shadowserver\.org
Explanation
(scan-\d+[a-z]?), Group1: will capture the word scan, followed by a literal -, followed by \d+ one or more digits, followed by an optional small letter [a-z]? the ? make the [a-z] part optional, if not used, then the [a-z] means that there must be only one small letter.
See regex demo

Match only two combinations and ignore the rest in REGEX - tableau

I have a dozen input ID's and I need to match only two particular patterns while ignoring the rest. I have a column that would flag those valid/invalid if the regex match is true.
Test string:
1.) B-123456
2.) 985463728
My regex should strictly match the above two patterns and ignore the rest. The first test string would have an alphabet B followed by a hyphen and then few digits while the second test string is purely numbers. Below is what I tried:
[Bb\d][-\d][0-9]{1,9}
Please help me out with this as I have tried weird combinations and I am missing out on something tiny. My regex includes other combinations as well which should not happen.

You could match either bB a - and 6 digits, or match 9 digits surrounded by word boundaries:
\b(?:[Bb]-[0-9]{6}|[0-9]{9})\b
Regex demo
If the number of digits can vary, you could make the bB and the hyphen optional and either match 1+ digits using [0-9]+ or use a quantifier [0-9]{1,9}
\b(?:[bB]-)?[0-9]+\b
Or use anchors to assert the start ^ and the end $ of the string
^(?:[bB]-)?[0-9]+$

Trying to match zero outside the word bounderies

I have patterns like
FQC19515_TCELL001_20190319_165944.pdf
FQC19515_TBNK001_20190319_165944.pdf
I can match word TCELL and TBNK with this RegEX
^(\D+)-(\d+)-(\d+)([A-Z1-9]+)?.*
But if I have patterns like
FLW194640_T20NK022_20190323_131348.pdf
FLW194228_C1920_SOME_DEBRIS_REMOVED.pdf
the above regex returns
T2 and C192 instead of T20NK and C1920 respectively
Is there a general regex that matches Nzeros out side of these word boundaries?

Let's consider all 4 examples of your input:
FQC19515_TCELL001_20190319_165944.pdf
FQC19515_TBNK001_20190319_165944.pdf
FLW194640_T20NK022_20190323_131348.pdf
FLW194228_C1920_SOME_DEBRIS_REMOVED.pdf
The first group, between start of line and the first "_" (e.g. FQC19515 in row 1)
consists of:
a non-empty sequence of letters,
a non-empty sequence of digits.
So the regex matching it, including the start of line anchor and a capturing group is:
^([A-Z]+\d+)
You used \D instead of [A-Z] but I think that [A-Z] is
more specific, as it matches only letters an not e.g. "_".
The next source char is _, so the regex can also include _.
A now the more diificult part: The second group to be captured has
actually 2 variants:
a sequence of letters and a sequence of digits (after that there is
a "_"),
a sequence of letters, a sequence of digits and another sequence of
letters (after that there are digits that you want to omit).
So the most intuitive way is to define 2 alternatives, each with
a respective positive lookahead:
alternative 1: [A-Z]+\d+(?=_),
alternative 2: [A-Z]+\d+[A-Z]+(?=\d).
But there is a bit shorter way. Notice that both alternatives start
from [A-Z]+\d+.
So we can put this fragment at the first place and only the rest
include as a non-capturing group ((?:...)), with 2 alternatives.
All the above should be surrounded with a capturing group:
([A-Z]+\d+(?:(?=_)|[A-Z]+(?=\d)))
So the whole regex can be:
^([A-Z]+\d+)_([A-Z]+\d+(?:(?=_)|[A-Z]+(?=\d)))
with m option ("^" matches also the start of each line).
For a working example see https://regex101.com/r/GDdt10/1
Your regex: ^(\D+)-(\d+) is wrong as after a sequence of non-digits
(\D+) you specified a minus which doesn't occur in your source.
Also the second minus does not correspond to your input.
Edit
To match all your strings, I modified slightly the previous regex.
The changes are limited to the matching group No 2 (after _):
Alternative No 1: [A-Z]{2,}+(?=\d) - two or more letters, after them
there is a digit, to be omitted. It will match TCELL and TBNK.
Alternative No 2: [A-Z]+\d+(?:(?=_)|[A-Z]+(?=\d)) - the previous
content of this group. It will match two remaining cases.
So the whole regex is:
^([A-Z]+\d+)_([A-Z]{2,}+(?=\d)|[A-Z]+\d+(?:(?=_)|[A-Z]+(?=\d)))
For a working example see https://regex101.com/r/GDdt10/2

As far as I understand, you could use:
^[A-Z]+\d+_\K[A-Z0-9]{5}
Explanation:
^ # beginning of line
[A-Z]+ # 1 or more capitals
\d+_ # 1 or more digit and 1 underscore
\K # forget all we have seen until this position
[A-Z0-9]{5} # 5 capitals or digits
Demo

Match numbers after first character

I'd like to use Regex to determine whether the characters after the first are all numbers.
For example:
A123 would be valid as after A there are only numbers
A12B would be invalid as, after the first character, there is another letter
I essentially want to ignore the first character
I have so far this:
(?<=A)\w*(?=)
but this makes A12B or A1B2C valid, I only want numbers after A.

You could match not a digit \D, followed by matching 1+ times a digit. If that is the whole string, you could use anchors asserting the start ^ and the $ end of the string.
^\D\d+$
That will match:
^ Start of the string
\D Match not a digit
\d+ Match 1+ digits making sure there are digits
$ End of the string
Regex demo

The best solution I can think of is:
^.\d*$
^ - Start of the line
. - Any character (except line terminators)
\d*
\d- a number
* - repeated any number of times (including 0 times. If you want it to be at least 1, change it to +).
$ - End of the line
let regex = /^.\d*$/;
let testStrings = ['A123', 'A12B'];
testStrings.forEach(str => {
console.log(`${str} is ${regex.test(str) ? 'valid' : 'invalid'}`);
});

Your attempt is very complicated, especially given how simple is your goal.
Succeeding at regexes is all about simplicity.
The first character can be anything, so just go with ..
The next ones are all digits, so you want \d.
You'll star it to specify restriction-less repetition, or use + if you want at least one.
Finally, you need to anchor your regex at the beginning and at the end, else it would match stuff like A123XXXXX or XXXXA123.
Note that most implementations of match will already anchor the pattern at the end, so you can omit the caret at the beginning.
Final regex:
^.\d*$

Maybe
(?<=.{1,1})([0-9]+)(?=\s)
(?<=.{1,1}) - has exactly one character before
([0-9]+) - at least one digit
(?=\s) - has a whitespace after
Add ^ at the beginning - to specify beginning of line
Replace (?=\s) with $ for end of line

^[a-zA-Z][0-9]{3}$
^ - "starting with" (Here it is starting with any letter). Read it as ^[a-zA-Z]
[a-z] - any small letters and A-Z any capital letters (you may change if required.)
[0-9] - any numbers
{3} - describes how many numbers you want to check. You have to read it as [0-9]{3}
$ - End of the statement. (Means, in this case it will end up with 3 numbers)
Here you can play around - https://regex101.com/r/mqUHvP/5

A special Regular Expression

I want to have a restriction a string which can accept alphanumeric values and hiphen.
I am providing 3 examples to have a clear idea.
1) AS15JKM-125TR-325AMOR
2) ITEW32-DE432OI
3) 09IURE765EDR
There is no specific pattern, There may b 0 to 3 hiphens in a string.
I just want to restrict it in such a way that it should accept only alphanumeric value and
only Hiphen, no other special character.
plz help me on this.

Option 1: No Lookahead
^(?:[A-Za-z0-9]*-){0,3}[A-Za-z0-9]+$
Note that if you only want uppercase letters, you need to remove a-z
Explanation
The ^ anchor asserts that we are at the beginning of the string
The non-capturing group (?:[A-Za-z0-9]*-) matches zero or more letters or digit, then a hyphen
This is repeated zero to three times, enforcing your limit on hyphens
[A-Za-z0-9]+ matches one or more letters or digit
The $ anchor asserts that we are at the end of the string
Option 2: With Lookahead
This does not present any benefit over the first version, I am just showing it for completion.
^(?=(?:[^-]*-){0,3}[^-]*$)[A-Za-z0-9]+$
Explanation
The lookahead (?=(?:[^-]*-){0,3}[^-]*$) asserts that what follows is
(?:[^-]*-) any number of non-hyphens, followed by a hyphen
{0,3} zero to three times
then [^-]*$ any number of non-hyphens and the end of the string
Option 3: With Negative Lookahead
Courtesy of #Jerry:
^(?!(?:[^-]*-){4})[A-Za-z0-9]+$
Explanation
The negative lookahead (?!(?:[^-]*-){4}) asserts that it is not possible to find a non-hyphen followed by a hyphen four times.

Assuming you do not want to count the hyphens, something like so should work: ^[A-Z0-9 -]+$.
An example of the regex is available here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Match against 1 hyphen per any number of digit groups - regex

Alternatively: ^(\d+-)+(\d+)$ So it's one or more group(s) of digits followed by hyphen + final group of digits. Nothing very fancy, but in my tests it matched only when there were hyphen(s) with digits on both sides.

Related

Regex Help required for User-Agent Matching

Match only two combinations and ignore the rest in REGEX - tableau

Trying to match zero outside the word bounderies

Match numbers after first character

A special Regular Expression

Categories

Resources