Identifying number sequences with optional punctuation - regex

I am trying to identify account numbers in different formats using a single regex. The following are the different formats I need to detect:
12-34-56-78-9
12-3456-78-9
123-456-789
1-23-45678-9
We need to detect "-" inbetween a 9-digit number. But there is no clue where "-" could come. As of now, i am creating regex for individual conditions and detecting it. is there a simple regex to detect the above in a single shot?

Here you go, that's a pretty simple pattern:
^(?:\d-?){8}\d$
Demo
It simply means: find a digit (\d), optionally followed by a hyphen (-?), 8 times in a row ({8}), then the last digit (\d). This prevents a hyphen from being the first or last character, and it also prevents two hyphens in a row.

Related

How to write a Regex that identifies specific letters plus a minimum amount of numbers

I'm trying to write a regex that can locate IDs in a body of text. The ID starts with "DW" and has a minimum of 5 numbers after that. It will only have numbers and no other characters following that.
Correct Examples
DW40056
DW4000057
Wrong Examples
DW4005
DW405679fg
Use word boundaries around DW followed by 4 digits then one or more digits:
\bDW\d{4}\d+\b
See live demo.
The word boundaries prevent matches with input such as ABCDW12345XYZ etc.
Although you could code the digits part as\d{5,}, which is simpler than \d{4}\d+, not all engines support open-ended quantity ranges. Since you haven’t indicated the language/tool you’re using, this regex is going to work in more situations.
Try this pattern: DW\d{5,}$
See Demo
Explanation:
DW is two characters that id start with
\d is for 0-9 numbers
{5,} it means \d must appear five or more times
$ it means the end of string. this cause this pattern just take strings that end with numbers (no more characters after numbers)

match words containing at least one letter and at least one digit

I am new to regex but I have spent the last two days researching and I have also tried many of the similar queries in this and other sites.
I am trying to come up with an expression (POSIX ERE) that will find whole words that contain at least one letter and at least one digit. Specifically, I would like to capture all of these:
B/DIN/37/1
DU/32.Abb.31
P/NA.17
O/DIN/2017/8
22/N.Abb.2
I have tried many things and managed to crash my software a couple of times in the process, but still no go. One of my issues is I don't know how to phrase my string so it will find whole words that meet the criterion, not just a string within a word.
Thank you very much to anyone that can help me out!
I'm assuming words are separated by whitespace. In that case, a word that has a digit and a letter can be separated into two regex expressions:
0 or more non-whitespace characters, followed by a letter, followed by 0 or more non-whitespace characters, followed by a digit, followed by 0 or more non-whitespace characters
The above, but with digit and letter swapped.
The non-whitespace character matches make sure the entire word is captured.
Those translate into the following regexes:
\S*[A-Za-z]\S*[0-9]\S*
\S*[0-9]\S*[A-Za-z]\S*
Combining them yields this final expression:
(\S*[A-Za-z]\S*[0-9]\S*|\S*[0-9]\S*[A-Za-z]\S*)

Putting a group within a group [123[a-u]]

I'm having a lot more difficulty than I anticipated in creating a simple regex to match any specific characters, including a range of characters from the alphabet.
I've been playing with regex101 for a while now, but every combination seems to result in no matches.
Example expression:
[\n\r\t\s\(\)-]
Preferred expression:
[[a-z][a-Z]\n\r\t\s\(\)-]
Example input:
(123) 241()-127()()() abc ((((((((
Ideally the expression will capture every character except the digits
I know I could always manually input "abcdefgh".... but there has to be an easier way. I also know there are easier ways to capture numbers only, but there are some special characters and letters which I may eventually need to include as well.
With regex you can set the regex expression to trigger on a range of characters like in your above example [a-z] that will capture any letter in the alphabet that is between a and z. To trigger on more than one character you can add a "+" to it or, if you want to limit the number of characters captured you can use {n} where n is the number of characters you want to capture. So, [a-z]+ is one or more and [a-z]{4} would match on the first four characters between a and z.
You can use partial intervals. For example, [a-j] will match all characters from a to j. So, [a-j]{2} for string a6b7cd will match only cd. Also you can use these intervals several times within same group like this: [a-j4-6]{4}. This regex will match ab44 but not ab47
Overlooked a pretty small character. The term I was looking for was "Alternative" apparently.
[\r\t\n]|[a-z] with the missing element being the | character. This will allow it to match anything from the first group, and then continue on to match the second group.
At least that's my conclusion when testing this specific example.

Regex to detect filling character length with periods

I'm trying to build some regex that would detect when someone is trying to "fill out" their username with dots.
There are a few other requirements:
username must contain only letters, numbers and dots
username must start and end with a letter or number
but not more than one consecutive dot
minimum of 6 characters (letters and numbers)
e.g.:
a.b.c.d.e.6 is allowed (not caught) because it has 6 characters
a.b.c.d.5 is not (is caught) because it does not have the prerequisite 6 characters
The way that I'm building the regex is if there's a match, it will reject the username allowed.
What I have thus far is:
/[^a-z0-9.]|^\.|\.$|\.{2,}|\S{31,}|^\S{0,5}$/i
This catches:
any characters that aren't letters, numbers, dots
can't start with a dot
can't end with a dot
can't have 2 or more consecutive dots
can't have 31 or more characters
can't have 5 or less characters
I've tried dozens of different ways to get that last check in place, but they've all either broken the entire check, included the allowable (a.b.c.d.e.6) or just not worked.
the one that I've come closest with is:
(\.{1}[a-z0-9]{1,}){1,3}\S{1,}$
The problem with this is that it's also catching 123.456 (which should be allowed / not caught)
other examples of character strings that it should catch:
asdf.g
a.sdfg
a.sdf.g
as.df.g
I'm trying to do this using only regex, without having to pre-format it using JS.
Ok, after much experimentation I've actually found the answer. It turns out that finding the non-permitted strings was actually easier (for me anyway):
/^(\w\.?){4}\w$/
same expression expanded:
/^\w\.?\w\.?\w\.?\w\.?\w$/
This will catch anything that is populated with only 5 or fewer characters and interspersed with dots.
The full regex that I'm using also catches:
Strings of 31 or more characters (alphanumeric and periods).
Any characters that are not alphanumeric and periods.
Any string starting with a period
Any string ending with a period
Any string that has 2 or more consecutive periods
And a new-comer to the list: Any string that has 8 or more numeric digits without any alpha.
/^(\w\.?){4}\w$|^\w{0,5}$|\w{31,}|[^a-z0-9.]|^\.|\.$|\.{2,}|\d{8,}/i
I've tested this with all the possible combinations that I can think of on regex101 here: https://regex101.com/r/xI7wZ3/1
And it works! (yay)

How to optimise this regex to match string (1234-12345-1)

I've got this RegEx example: http://regexr.com?34hihsvn
I'm wondering if there's a more elegant way of writing it, or perhaps a more optimised way?
Here are the rules:
Digits and dashes only.
Must not contain more than 10 digits.
Must have two hyphens.
Must have at least one digit between each hyphen.
Last number must only be one digit.
I'm new to this so would appreciate any hints or tips.
In case the link expires, the text to search is
----------
22-22-1
22-22-22
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
666666-7777777-1
88888888-88888888-1
1-1-1
88888888-88888888-22
22-333-
333-22
----------
My regex is: \b((\d{1,4}-\d{1,5})|(\d{1,5}-\d{1,4}))-\d{1}\b
I'm using this site for testing: http://gskinner.com/RegExr/
Thanks for any help,
Nick
Here is a regex I came up with:
(?=\b[\d-]{3,10}-\d\b)\b\d+-\d+-\d\b
This uses a look-ahead to validate the information before attempting the match. So it looks for between 3-10 characters in the class of [\d-] followed by a dash and a digit. And then after that you have the actual match to confirm that the format of your string is actually digit(dash)digit(dash)digit.
From your sample strings this regex matches:
22-22-1
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
1-1-1
It also matches the following strings:
22-7777777-1
1-88888888-1
Your regexp only allows a first and second group of digits with a maximum length of 5. Therefore, valid strings like 1-12345678-1 or 123456-1-1 won't be matched.
This regexp works for the given requirements:
\b(?:\d\-\d{1,8}|\d{2}\-\d{1,7}|\d{3}\-\d{1,6}|\d{4}\-\d{1,5}|\d{5}\-\d{1,4}|\d{6}\-\d{1,3}|\d{7}\-\d{1,2}|\d{8}\-\d)\-\d\b
(RegExr)
You can use this with the m modifier (switch the multiline mode on):
^\d(?!.{12})\d*-\d+-\d$
or this one without the m modifier:
\b\d(?!.{12})\d*-\d+-\d\b
By design these two patterns match at least three digits separated by hyphens (so no need to put a {5,n} quantifier somewhere, it's useless).
Patterns are also build to fail faster:
I have chosen to start them with a digit \d, this way each beginning of a line or word-boundary not followed by a digit is immediately discarded. Other thing, using only one digit, I know the remaining string length.
Then I test the upper limit of the string length with a negative lookahead that test if there is one more character than the maximum length (if there are 12 characters at this position, there are 13 characters at least in the string). No need to use more descriptive that the dot meta-character here, the goal is to quickly test the length.
finally, I describe the end of string without doing something particular. That is probably the slower part of the pattern, but it doesn't matter since the overwhelming majority of unnecessary positions have already been discarded.