Anyone see anything wrong with my regex for port numbers? - regex

I made a regex for port numbers (before you say this is a bad idea, its going into a bigger regex for URL's which is much harder than it sounds).
My coworker said this is really bad and isn't going to catch everything. I disagree.
I believe this thing catches everything from 0 to 65535 and nothing else, and I'm looking for confirmation of this.
Single-line version (for computers):
/(^[0-9]$)|(^[0-9][0-9]$)|(^[0-9][0-9][0-9]$)|(^[0-9][0-9][0-9][0-9]$)|((^[0-5][0-9][0-9][0-9][0-9]$)|(^6[0-4][0-9][0-9][0-9]$)|(^65[0-4][0-9][0-9]$)|(^655[0-2][0-9]$)|(^6553[0-5]$))/
Human readable version:
/(^[0-9]$)| # single digit
(^[0-9][0-9]$)| # two digit
(^[0-9][0-9][0-9]$)| # three digit
(^[0-9][0-9][0-9][0-9]$)| # four digit
((^[0-5][0-9][0-9][0-9][0-9]$)| # five digit (up to 59999)
(^6[0-4][0-9][0-9][0-9]$)| # (up to 64999)
(^65[0-4][0-9][0-9]$)| # (up to 65499)
(^655[0-2][0-9]$)| # (up to 65529)
(^6553[0-5]$))/ # (up to 65535)
Can someone confirm that my understanding is correct (or otherwise)?

You could shorten it considerably:
^0*(?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9])$
no need to repeat the anchors every single time
no need for lots of capturing groups
no need to spell out repetitions.
Drop the leading 0* if you don't want to allow leading zeroes.
This regex is also better because it matches the special cases (65535, 65001 etc.) first and thus avoids some backtracking.
Oh, and since you said you want to use this as part of a larger regex for URLs, you should then replace both ^ and $ with \b (word boundary anchors).
Edit: #ceving asked if the repetition of 6553, 655, 65 and 6 is really necessary. The answer is no - you can also use a nested regex instead of having to repeat those leading digits. Let's just consider the section
6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}
This can be rewritten as
6(?:[0-4][0-9]{3}|5(?:[0-4][0-9]{2}|5(?:[0-2][0-9]|3[0-5])))
I would argue that this makes the regex even less readable than it already was. Verbose mode makes the differences a bit clearer. Compare
6553[0-5] |
655[0-2][0-9] |
65[0-4][0-9]{2} |
6[0-4][0-9]{3}
with
6
(?:
[0-4][0-9]{3}
|
5
(?:
[0-4][0-9]{2}
|
5
(?:
[0-2][0-9]
|
3[0-5]
)
)
)
Some performance measurements: Testing each regex against all numbers from 1 through 99999 shows a minimal, probably irrelevant performance benefit for the nested version:
import timeit
r1 = """import re
regex = re.compile(r"0*(?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9])$")"""
r2 = """import re
regex = re.compile(r"0*(?:6(?:[0-4][0-9]{3}|5(?:[0-4][0-9]{2}|5(?:[0-2][0-9]|3[0-5])))|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9])$")"""
stmt = """for i in range(1,100000):
regex.match(str(i))"""
print(timeit.timeit(setup=r1, stmt=stmt, number=100))
print(timeit.timeit(setup=r2, stmt=stmt, number=100))
Output:
7.7265428834649
7.556472630353351

Personally I would match just a number and then I would check with code that the number is in range.

Well, it's easy to prove that it will validate any correct port: just generate each valid string and test that it passes. Making sure it doesn't allow anything that it shouldn't is harder though - obviously you can't test absolutely every invalid string. You should definitely test simple cases and anything which you think might pass incorrectly (or which would pass incorrectly with a lesser regex - "65536" being an example).
It will allow some slightly odd port specifications though - such as "0000". Do you want to allow leading zeroes?
You might also want to consider whether you actually need to specify ^ and $ separately for each case, or whether you could use ^(case 1)|(case 2)|...$. Oh, and quantifiers could simplify the "1 to 4 digits" case too: ([0-9]{1,4}) will find between 1 and 4 digits.
(You might want to work on sounding a little less arrogant, by the way. If you're working with other people, communicating in a less aggressive way is likely to do more to improve everyone's day than just proving your regex is correct...)

What's wrong with parsing it into a number and work with integer comparisons? (regardless of whether or not this will be part of a "larger" regex).
If I were to use regex, I would just use:
\d{1,5}
Nope, it doesn't check for "valid" port numbers (neither does yours). But it's much more legible and for practical purposes I'd say it's "good enough."
PS: I'd work on being more humble.

A style note:
Repeating [0-9] over and over again is silly - something like [0-9][0-9][0-9] is much better written as \d{3}.

/^(6553[0-5])|(655[0-2]\d)|(65[0-4]\d{2})|(6[0-4]\d{3})|([1-5]\d{4})|([1-9]\d{1,3})|(\d)$/

regex has many implement ,what the paltform. try below , remove blanks
^[1-5]?\d{1,4}|6([0-4]\d{3}|5([0-4]\d{2}|5([0-2]\d|3[0-5]))$
readable
^
[1-5]?\d{1,4}|
6(
[0-4]\d{3}|
5(
[0-4]\d{2}|
5(
[0-2]\d|
3[0-5]
)
)
$

I would use this one:
6(?:[0-4]\d{3}|5(?:[0-4]\d{2}|5(?:[0-2]\d|3[0-5])))|(?:[1-5]\d{0,3}|[6-9]\d{0,2})?\d
The following Perl script tests some numbers:
#! /usr/bin/perl
use strict;
use warnings;
my $port = qr{
6(?:[0-4]\d{3}|5(?:[0-4]\d{2}|5(?:[0-2]\d|3[0-5])))|(?:[1-5]\d{0,3}|[6-9]\d{0,2})?\d
}x;
sub test {
my ($label, $regexp, $start, $stop) = #_;
my $matches = 0;
my $tests = 0;
foreach my $n ($start..$stop) {
$tests++;
$matches++ if "$n" =~ /^$regexp$/;
$tests++;
$matches++ if "0$n" =~ /^$regexp$/;
}
print "$label [$start $stop] => $matches matches in $tests tests\n";
}
test "Port", $port, 0, 2**16;
The output is:
Port [0 65536] => 65536 matches in 131074 tests

Related

Using RegEx how do I remove the trailing zeros from a decimal number

I'm needing to write some regex that takes a number and removes any trailing zeros after a decimal point. The language is Actionscript 3. So I would like to write:
var result:String = theStringOfTheNumber.replace( [ the regex ], "" );
So for example:
3.04000 would be 3.04
0.456000 would be 0.456 etc
I've spent some time looking at various regex websites and I'm finding this harder to resolve than I initially thought.
Regex:
^(\d+\.\d*?[1-9])0+$
OR
(\.\d*?[1-9])0+$
Replacement string:
$1
DEMO
Code:
var result:String = theStringOfTheNumber.replace(/(\.\d*?[1-9])0+$/g, "$1" );
What worked best for me was
^([\d,]+)$|^([\d,]+)\.0*$|^([\d,]+\.[0-9]*?)0*$
For example,
s.replace(/^([\d,]+)$|^([\d,]+)\.0*$|^([\d,]+\.[0-9]*?)0*$/, "$1$2$3");
This changes
1.10000 => 1.1
1.100100 => 1.1001
1.000 => 1
1 >= 1
What about stripping the trailing zeros before a \b boundary if there's at least one digit after the .
(\.\d+?)0+\b
And replace with what was captured in the first capture group.
$1
See test at regexr.com
(?=.*?\.)(.*?[1-9])(?!.*?\.)(?=0*$)|^.*$
Try this.Grab the capture.See demo.
http://regex101.com/r/xE6aD0/11
Other answers didn't consider numbers without fraction (like 1.000000 ) or used a lookbehind function (sadly, not supported by implementation I'm using). So I modified existing answers.
Match using ^-?\d+(\.\d*[1-9])? - Demo (see matches). This will not work with numbers in text (like sentences).
Replace(with \1 or $1) using (^-?\d+\.\d*[1-9])(0+$)|(\.0+$) - Demo (see substitution). This one will work with numbers in text (like sentences) if you remove the ^ and $.
Both demos with examples.
Side note: Replace the \. with decimal separator you use (, - no need for slash) if you have to, but I would advise against supporting multiple separator formats within such regex (like (\.|,)). Internal formats normally use one specific separator like . in 1.135644131 (no need to check for other potential separators), while external tend to use both (one for decimals and one for thousands, like 1.123,541,921), which would make your regex unreliable.
Update: I added -? to both regexes to add support for negative numbers, which is not in demo.
If your regular expressions engine doesn't support "lookaround" feature then you can use this simple approach:
fn:replace("12300400", "([^0])0*$", "$1")
Result will be: 123004
I know I am kind of late but I think this can be solved in a far more simple way.
Either I miss something or the other repliers overcomplicate it, but I think there is a far more straightforward yet resilient solution RE:
([0-9]*[.]?([0-9]*[1-9]|[0]?))[0]*
By backreferencing the first group (\1) you can get the number without trailing zeros.
It also works with .XXXXX... and ...XXXXX. type number strings. For example, it will convert .45600 to .456 and 123. to 123. as well.
More importantly, it leaves integer number strings intact (numbers without decimal point). For example, it will convert 12300 to 12300.
Note that if there is a decimal point and there are only zeroes after that it will leave only one trailing zeroes. For example for the 42.0000 you get 42.0.
If you want to eliminate the leading zeroes too then youse this RE (just put a [0]* at the start of the former):
[0]*([0-9]*[.]?([0-9]*[1-9]|[0]?))[0]*
I tested few answers from the top:
^(\d+\.\d*?[1-9])0+$
(\.\d*?[1-9])0+$
(\.\d+?)0+\b
All of them not work for case when there are all zeroes after "." like 45.000 or 450.000
modified version to match that case: (\.\d*?[1-9]|)\.?0+$
also need to replace to '$1' like:
preg_replace('/(\.\d*?[1-9]|)\.?0+$/', '$1', $value);
try this
^(?!0*(\.0+)?$)(\d+|\d*\.\d+)$
And read this
http://www.regular-expressions.info/numericranges.html it might be helpful.
I know it's not what the original question is looking for, but anyone who is looking to format money and would only like to remove two consecutive trailing zeros, like so:
£30.00 => £30
£30.10 => £30.10 (and not £30.1)
30.00€ => 30€
30.10€ => 30.10€
Then you should be able to use the following regular expression which will identify two trailing zeros not followed by any other digit or exist at the end of a string.
([^\d]00)(?=[^\d]|$)
I'm a bit late to the party, but here's my solution:
(((?<=(\.|,)\d*?[1-9])0+$)|(\.|,)0+$)
My regular expression will only match the trailing 0s, making it easy to do a .replaceAll(..) type function.
Breaking it down, part one: ((?<=(\.|,)\d*?[1-9])0+$)
(?<=(\.|,): A positive look behind. Decimal must contain a . or a , (commas are used as a decimal point in some countries). But as its a look behind, it is not included in the matched text, but still must be present.
\d*?: Matches any number of digits lazily
[1-9]: Matches a single non-zero character (this will be the last digit before trailing 0s)
0+$: Matches 1 or more 0s that occur between the last non-zero digit and the line end.
This works great for everything except the case where trailing 0s begin immediately, like in 1.0 or 5.000. The second part fixes this (\.|,)0+$:
(\.|,): Matches a . or a , that will be included in matched text.
0+$ matches 1 or more 0s between the decimal point and the line end.
Examples:
1.0 becomes 1
5.0000 becomes 5
5.02394900022000 becomes 5.02394900022
Is it really necessary to use regex? Why not just check the last digits in your numbers? I am not familiar with Actionscript 3, but in python I would do something like this:
decinums = ['1.100', '0.0','1.1','10']
for d in decinums:
if d.find('.'):
while d.endswith('0'):
d = d[:-1]
if d.endswith('.'):
d = d[:-1]
print(d)
The result will be:
1.1
0
1.1
10

regex for n characters or at least m characters

This should be a pretty simple regex question but I couldn't find any answers anywhere. How would one make a regex, which matches on either ONLY 2 characters, or at least 4 characters. Here is my current method of doing it (ignore the regex itself, that's besides the point):
[A-Za-z0_9_]{2}|[A-Za-z0_9_]{4,}
However, this method takes twice the time (and is approximately 0.3s slower for me on a 400 line file), so I was wondering if there was a better way to do it?
Optimize the beginning, and anchor it.
^[A-Za-z0-9_]{2}(?:|[A-Za-z0-9_]{2,})$
(Also, you did say to ignore the regex itself, but I guessed you probably wanted 0-9, not 0_9)
EDIT Hm, I was sure I read that you want to match lines. Remove the anchors (^$) if you want to match inside the line as well. If you do match full lines only, anchors will speed you up (well, the front anchor ^ will, at least).
Your solution looks pretty good. As an alternative you can try smth like that:
[A-Za-z0-9_]{2}(?:[A-Za-z0-9_]{2,})?
Btw, I think you want hyphen instead of underscore between 0 and 9, don't you?
The solution you present is correct.
If you're trying to optimize the routine, and the number of matches strings matching 2 or more characters is much smaller than those that do not, consider accepting all strings of length 2 or greater, then tossing those if they're of length 3. This may boost performance by only checking the regex once, and the second call need not even be a regular expression; checking a string length is usually an extremely fast operation.
As always, you really need to run tests on real-world data to verify if this would give you a speed increase.
so basically you want to match words of length either 2 or 2+2+N, N>=0
([A-Za-z0-9][A-Za-z0-9](?:[A-Za-z0-9][A0Za-z0-9])*)
working example:
#!/usr/bin/perl
while (<STDIN>)
{
chomp;
my #matches = ($_=~/([A-Za-z0-9][A-Za-z0-9](?:[A-Za-z0-9][A0Za-z0-9])*)/g);
for my $m (#matches) {
print "match: $m\n";
}
}
input file:
cat in.txt
ab abc bcad a as asdfa
aboioioi i i abc bcad a as asdfa
output:
perl t.pl <in.txt
match: ab
match: ab
match: bcad
match: as
match: asdf
match: aboioioi
match: ab
match: bcad
match: as
match: asdf

RegEx: How can I replace with $n instances of a string?

I'm trying to replace numbers of the form 4.2098234e-3 with 00042098234. I can capture the component parts ok with:
(-?)(\d+).(\d)+e-($d+)
but what I don't know how to do is to repeat the zeros at the start $4 times.
Any ideas?
Thanks in advance,
Ross
Ideally, I'd like to be able to do this with the find/replace feature of TextMate, if that's of any consequence. I appreciate that there are better tools than RegEx for this problem, but it's still an interesting question (to me).
You can't do it purely in regular expressions, because the replace string is just a string with backreferences -- you can't use repetition there.
In most programming lnaguages, you have regex replace with callback, which would be able to do it. However it's not something that a text editor can do (unless it has some scripting support).
This isn't something that should be done with regex. That said, you can do something like this, but it's not really worth the effort: the regex is complicated, and the capability is limited.
Here's an illustrative example of replacing a digit [0-9] with that many zeroes.
// generate the regex and the replacement strings
String seq = "123456789";
String regex = seq.replaceAll(".", "(?=[$0-9].*(0)\\$)?") + "\\d";
String repl = seq.replaceAll(".", "\\$$0");
// let's see what they look like!!!
System.out.println(repl); // prints "$1$2$3$4$5$6$7$8$9"
System.out.println(regex); // prints oh my god just look at the next section!
// let's see if they work...
String input = "3 2 0 4 x 11 9";
System.out.println(
(input + "0").replaceAll(regex, repl)
); // prints "000 00 0000 x 00 000000000"
// it works!!!
The regex is (as seen on ideone.com) (slightly formatted for readability):
(?=[1-9].*(0)$)?
(?=[2-9].*(0)$)?
(?=[3-9].*(0)$)?
(?=[4-9].*(0)$)?
(?=[5-9].*(0)$)?
(?=[6-9].*(0)$)?
(?=[7-9].*(0)$)?
(?=[8-9].*(0)$)?
(?=[9-9].*(0)$)?
\d
But how does it work??
The regex relies on positive lookaheads. It matches \d, but before doing that, it tries to see if it's [1-9]. If so, \1 goes all the way to the end of the input, where a 0 has been appended, to capture that 0. Then the second assertion checks if it's [2-9], and if so, \2 goes all the way to the end of the input to grab 0, and so on.
The technique works, but beyond a cute regex exercise, it probably has no real practicability.
Note also that 11 is replaced to 00. That is, each 1 is replaced with 1 zero. It's probably possible to recognize 11 as a number and put 11 zeroes instead, but it'd only make the regex more convoluted.

regex to match a maximum of 4 spaces

I have a regular expression to match a persons name.
So far I have ^([a-zA-Z\'\s]+)$ but id like to add a check to allow for a maximum of 4 spaces. How do I amend it to do this?
Edit: what i meant was 4 spaces anywhere in the string
Don't attempt to regex validate a name. People are allowed to call themselves what ever they like. This can include ANY character. Just because you live somewhere that only uses English doesn't mean that all the people who use your system will have English names. We have even had to make the name field in our system Unicode. It is the only Unicode type in the database.
If you care, we actually split the name at " " and store each name part as a separate record, but we have some very specific requirements that mean this is a good idea.
PS. My step mum has 5 spaces in her name.
^ # Start of string
(?!\S*(?:\s\S*){5}) # Negative look-ahead for five spaces.
([a-zA-Z\'\s]+)$ # Original regex
Or in one line:
^(?!(?:\S*\s){5})([a-zA-Z\'\s]+)$
If there are five or more spaces in the string, five will be matched by the negative lookahead, and the whole match will fail. If there are four or less, the original regex will be matched.
Screw the regex.
Using a regex here seems to be creating a problem for a solution instead of just solving a problem.
This task should be 'easy' for even a novice programmer, and the novel idea of regex has polluted our minds!.
1: Get Input
2: Trim White Space
3: If this makes sence, trim out any 'bad' characters.
4: Use the "split" utility provided by your language to break it into words
5: Return the first 5 Words.
ROCKET SCIENCE.
replies
what do you mean screw the regex? your obviously a VB programmer.
Regex is the most efficient way to work with strings. Learn them.
No. Php, toyed a bit with ruby, now going manically into perl.
There are some thing ( like this case ) where the regex based alternative is computationally and logically exponentially overly complex for the task.
I've parse entire php source files with regex, I'm not exactly a novice in their use.
But there are many cases, such as this, where you're employing a logging company to prune your rose bush.
I could do all steps 2 to 5 with regex of course, but they would be simple and atomic regex, with no weird backtracking syntax or potential for recursive searching.
The steps 1 to 5 I list above have a known scope, known range of input, and there's no ambiguity to how it functions. As to your regex, the fact you have to get contributions of others to write something so simple is proving the point.
I see somebody marked my post as offensive, I am somewhat unhappy I can't mark this fact as offensive to me. ;)
Proof Of Pudding:
sub getNames{
my #args = #_;
my $text = shift #args;
my $num = shift #args;
# Trim Whitespace from Head/End
$text =~ s/^\s*//;
$text =~ s/\s*$//;
# Trim Bad Characters (??)
$text =~ s/[^a-zA-Z\'\s]//g;
# Tokenise By Space
my #words = split( /\s+/, $text );
#return 0..n
return #words[ 0 .. $num - 1 ];
} ## end sub getNames
print join ",", getNames " Hello world this is a good test", 5;
>> Hello,world,this,is,a
If there is anything ambiguous to anybody how that works, I'll be glad to explain it to them. Noted that I'm still doing it with regexps. Other languages I would have used their native "trim" functions provided where possible.
Bollocks -->
I first tried this approach. This is your brain on regex. Kids, don't do regex.
This might be a good start
/([^\s]+
(\s[^\s]+
(\s[^\s]+
(\s[^\s]+
(\s[^\s]+|)
|)
|)
|)
)/
( Linebroken for clarity )
/([^\s]+(\s[^\s]+(\s[^\s]+(\s[^\s]+|)|)|))/
( Actual )
I've used [^\s]+ here instead of your A-Z combo for succintness, but the point is here the nested optional groups
ie:
(Hello( this( is( example))))
(Hello( this( is( example( two)))))
(Hello( this( is( better( example))))) three
(Hello( this( is()))))
(Hello( this()))
(Hello())
( Note: this, while being convoluted, has the benefit that it will match each name into its own group )
If you want readable code:
$word = '[^\s]+';
$regex = "/($word(\s$word(\s$word(\s$word(\s$word|)|)|)|)|)/";
( it anchors around the (capture|) mantra of "get this, or get nothing" )
#Sir Psycho : Be careful about your assumptions here. What about hyphenated names? Dotted names (e.g. Brian R. Bondy) and so on?
Here's the answer that you're most likely looking for:
^[a-zA-Z']+(\s[a-zA-Z']+){0,4}$
That says (in English): "From start to finish, match one or more letters, there can also be a space followed by another 'name' up to four times."
BTW: Why do you want them to have apostrophes anywhere in the name?
^([a-zA-Z']+\s){0,4}[a-zA-Z']+$
This assumes you want 4 spaces inside this string (i.e. you have trimmed it)
Edit: If you want 4 spaces anywhere I'd recommend not using regex - you'd be better off using a substr_count (or the equivalent in your language).
I also agree with pipTheGeek that there are so many different ways of writing names that you're probably best off trusting the user to get their name right (although I have found that a lot of people don't bother using capital letters on ecommerce checkouts).
Match multiple whitespace followed by two characters at the end of the line.
Related problem ----
From a string, remove trailing 2 characters preceded by multiple white spaces... For example, if the column contains this string -
" 'This is a long string with 2 chars at the end AB "
then, AB should be removed while retaining the sentence.
Solution ----
select 'This is a long string with 2 chars at the end AB' as "C1",
regexp_replace('This is a long string with 2 chars at the end AB',
'[[[:space:]][a-zA-Z][a-zA-Z]]*$') as "C2" from dual;
Output ----
C1
This is a long string with 2 chars at the end AB
C2
This is a long string with 2 chars at the end
Analysis ----
regular expression specifies - match and replace zero or more occurences (*) of a space ([:space:]) followed by combination of two characters ([a-zA-Z][a-zA-Z]) at the end of the line.
Hope this is useful.

How can I make this regex more compact?

Let's say I have a line of text like this
Small 0.0..20.0 0.00 1.49 25.71 41.05 12.31 0.00 80.56
I want to capture the last six numbers and ignore the Small and the first two groups of numbers.
For this exercise, let's ignore the fact that it might be easier to just do some sort of string-split instead of a regular expression.
I have this regex that works but is kind of horrible looking
^(Small).*?[0-9.]+.*?[0-9.]+.*?([0-9.]+).*?([0-9.]+).*?([0-9.]+).*?([0-9.]+).*?([0-9.]+).*?([0-9.]+)
Is there some way to compact that?
For example, is it possible to combine the check for the last 6 numbers into a single statement that still stores the results as 6 separate group matches?
If you want to keep each match in a separate backreference, you have no choice but to "spell it out" - if you use repetition, you can either catch all six groups "as one" or only the last one, depending on where you put the capturing parentheses. So no, it's not possible to compact the regex and still keep all six individual matches.
A somewhat more efficient (though not beautiful) regex would be:
^Small\s+[0-9.]+\s+[0-9.]+\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)
since it matches the spaces explicitly. Your regex will result in a lot of backtracking. My regex matches in 28 steps, yours in 106.
Just as an aside: In Python, you could simply do a
>>> pieces = "Small 0.0..20.0 0.00 1.49 25.71 41.05 12.31 0.00 80.56".split()[-6:]
>>> print pieces
['1.49', '25.71', '41.05', '12.31', '0.00', '80.56']
Here is the shortest I could get:
^Small\s+(?:[\d.]+\s+){2}([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s*$
It must be long because each capture must be specified explicitly. No need to capture "Small", though. But it is better to be specific (\s instead of .) when you can, and to anchor on both ends.
For usability, you should use string substitution to build regex from composite parts.
$d = "[0-9.]+";
$s = ".*?";
$re = "^(Small)$s$d$s$d$s($d)$s($d)$s($d)$s($d)$s($d)$s($d)";
At least then you can see the structure past the pattern, and changing one part changes them all.
If you wanted to get really ANSI you could make a short use metasyntax and make it even easier to read:
$re = "^(Small)_#D_#D_(#D)_(#D)_(#D)_(#D)_(#D)_(#D)";
$re = str_replace('#D','[0-9.]+',$re);
$re = str_replace('_', '.*?' , $re );
( This way it also makes it trivial to change the definition of what a space token is, or what a digit token is )