Regular expression for equations, variable number of inside parenthesis - regex

I'm trying to write Regex for the case where I have series of equations, for example:
a = 2 / (1 + exp(-2*n)) - 1
a = 2 / (1 + e) - 1
a = 2 / (3*(1 + exp(-2*n))) - 1
In any case I need to capture content of the outer parenthesis, so 1 + exp(-2*n), 1+e and 3*(1 + exp(-2*n)) respectively.
I can write expression that will catch one of them, like:
\(([\w\W]*?\))\) will perfectly catch 1 + exp(-2*n)
\(([\w\W]*?)\) will catch 1+e
\(([\w\W]*?\))\)\) will catch 3*(1 + exp(-2*n))
But it seems silly to pass three lines of code for something such simple. How can I bundle it? Please take a note that I will be processing text (in loop) line-by-line anyway, so you don't have to bother for securing operator to not greedy take next line.
Edit:
Un-nested brackets are also allowed: a = 2 / (1 + exp(-2*n)) - (2-5)

The commented code below does not use regular expressions, but does parse char arrays in MATLAB and output the terms which contain top-level brackets.
So in your 3 question examples with a single set of nested brackets, it returns the outermost bracketed term.
In the example from your comment where there are two or more (possibly nested) terms within brackets at the "top level", it returns both terms.
The logic is as follows, see the comments for more details
Find the left (opening) and right (closing) brackets
Generate the "nest level" according to how many un-closed brackets there are at each point in the equation char
Find the indicies where the nesting level changes. We're interested in opening brackets where the nest level increases to 1 and closing brackets where it decreases from 1.
Extract the terms from these indices
e = { 'a = 2 / (1 + exp(-2*n)) - 1'
'a = 2 / (1 + e) - 1'
'a = 2 / (3*(1 + exp(-2*n))) - 1'
'a = 2 / (1 + exp(-2*n)) - (2-5)' };
str = cell(size(e)); % preallocate output
for ii = 1:numel(e)
str{ii} = parseBrackets_(e{ii});
end
function str = parseBrackets_( equation )
bracketL = ( equation == '(' ); % indicies of opening brackets
bracketR = ( equation == ')' ); % indicies of closing brackets
str = {}; % intialise empty output
if numel(bracketL) ~= numel(bracketR)
% Validate the input
warning( 'Could not match bracket pairs, count mismatch!' )
return
end
nL = cumsum( bracketL ); % cumulative open bracket count
nR = cumsum( bracketR ); % cumulative close bracket count
nestLevel = nL - nR; % nest level is number of open brackets not closed
nestLevelChanged = diff(nestLevel); % Get the change points in nest level
% get the points where the nest level changed to/from 1
level1L = find( nestLevel == 1 & [true,nestLevelChanged==1] ) + 1;
level1R = find( nestLevel == 1 & [nestLevelChanged==-1,true] );
% Compile cell array of terms within nest level 1 brackets
str = arrayfun( #(x) equation(level1L(x):level1R(x)), 1:numel(level1L), 'uni', 0 );
end
Outputs:
str =
{'1 + exp(-2*n)'}
{'1 + e'}
{'3*(1 + exp(-2*n))'}
{'1 + exp(-2*n)'} {'2-5'}

Related

get integers from string

i have in the database data like this
61/10#61/12,0/12,10/16,0/21,0/12#61/33,0/28#0/34,0/23#0/28
where the part like 10/16(without #) is invalid should not use for the calculation,
but all other has next format min_hr + "/" + min_hrv + "#" + max_hr + "/" + max_hrv
and the issue is get AVG value by next psevdo formula [ summ(all(min_hrv)) + summ(all(max_hrv)) ] / count(all(min_hrv)) + all(max_hrv)), for the axample string result will be ((10 + 12 + 28 + 23) + (12 + 33 + 34 + 28))/8) == 22
What i try is:
SELECT regexp_replace(
'61/10#61/12,0/12,10/16,0/21,0/12#61/33,0/28#0/34,0/23#0/28',
',\d+/\d+,', ',',
'g'
);
to remove invalid data but 10/16 still in the strin, result is:
regexp_replace
--------------------------------------------------
61/10#61/12,10/16,0/12#61/33,0/28#0/34,0/23#0/28
if do good clean the string my plan is split to array some way like this, for max (not full solution, has empty string), has no solution for min:
SELECT
regexp_split_to_array(
regexp_replace(
'61/10#61/12,0/12,0/12#61/33,0/28#0/34,0/23#0/28',
',\d+/\d+,', ',',
'g'
)
,',?\d+/\d+#\d+/'
);
result is:
regexp_split_to_array
-----------------------
{"",12,33,34,28}
and then calculate the data, something like this:
SELECT ((
SELECT sum(tmin.unnest)
FROM
(SELECT unnest('{10,12,28,23}'::int[])) as tmin
)
+
(
SELECT sum(tmax.unnest)
FROM
(SELECT unnest('{12,33,34,28}'::int[])) as tmax
))
/
(SELECT array_length('{12,33,34,28}'::int[], 1) * 2)
may be some one know more simple and right way for such issue?
Use regexp_matches():
select (regexp_matches(
'61/10#61/12,0/12,0/12#61/33,0/28#0/34,0/23#0/28',
'\d+#\d+/(\d+)',
'g'))[1]
regexp_matches
----------------
12
33
34
28
(4 rows)
The whole calculation may look like this:
with my_data(str) as (
values
('61/10#61/12,0/12,10/16,0/21,0/12#61/33,0/28#0/34,0/23#0/28')
),
min_max as (
select
(regexp_matches(str, '(\d+)#\d+', 'g'))[1] as min_hrv,
(regexp_matches(str, '\d+#\d+/(\d+)', 'g'))[1] as max_hrv
from my_data
)
select avg(min_hrv::int+ max_hrv::int) / 2 as result
from min_max;
result
---------------------
22.5000000000000000
(1 row)
The pattern you are looking for should match the digits after #, a streak of digits and a / char. With regexp_matches, you may extract a part of the pattern only if you wrap that part within a pair of parentheses.
The solution is
regexp_matches(your_col, '#\d+/(\d+)', 'g')
Note that g stands for global, meaning that all occurrences found in the string will be returned.
Pattern details
\d+ - 1 or more (+) digits
/ - a /char
(\d+) - Capturing group 1: 1 or more digits
See the regex demo.
You may extract specific bits from your data if you use a single pair of parentheses in different parts of the '(\d+)/(\d+)#(\d+)/(\d+)' regex. To extract min_hr, you'd use '(\d+)/\d+#\d+/\d+'.

How to split expression containing brackets correctly

I am trying to write an expression handler that will correctly split brackets, until today it has worked very well, but I've now encountered a problem I hadn't thought of.
I try to split the expression by the content of brackets first, once these are evaluated I replace the original content with the results and process until there are no brackets remaining.
The expression may contain marcos/variables. Macros are denoted by text wrapped in $macro$.
A typical expression:
($exampleA$ * 3) + ($exampleB$ / 2)
Macros are replaced before the expression is evaluated, the above works fine because the process is as follows:
Split expression by brackets, this results in two expressions:
$exampleA$ * 3
$exampleB$ / 2
Each expression is then evaluated, if exampleA = 3 and exampleB = 6:
$exampleA$ * 3 = 3 * 3 = 9
$exampleB$ / 2 = 6 / 2 = 3
The expression is then rebuilt using the results:
9 + 3
The final expression without any brackets is then evaluated to:
12
This works fine until an expressions with nested brackets is used:
((($exampleA$ * 3) + ($exampleB$ / 2) * 2) - 1)
This breaks completely because the regular expression I'm using:
regex("(?<=\\()[^)]*(?=\\))");
Results in:
($exampleA$ * 3
$exampleB$ / 2
So how can I correctly decode this, I want the above to be broken down to:
$exampleA$ * 3
$exampleB$ / 2
I am not exactly sure what you are trying to do. If you want to match the innermost expressions, wouldn't this help?:
regex("(?<=\\()[^()]*(?=\\))");
By the way, are the parentheses in your example unbalanced on purpose?
Traditional regex cannot handle recursive structures like nested brackets.
Depending on which regex flavor you are using, you may be able to use regex recursion. Otherwise, you will probably need a new method for parsing the groups. I think the traditional way is to represent the expression as a stack: start with an empty stack, push when you find a '(', pop when you find a ')'.
You can't really do this with regex. You really need a recursive method, like this:
using System;
using System.Data;
using System.Xml;
public class Program
{
public static void Main() {
Console.WriteLine(EvaluateExpression("(1 + 2) * 7"));
}
public static int EvaluateExpression(string expression) {
// Recursively evaluate parentheses as sub expressions
var expr = expression.ToLower();
while (expr.Contains("(")) {
// Find first opening bracket
var count = 1;
var pStart = expr.IndexOf("(", StringComparison.InvariantCultureIgnoreCase);
var pos = pStart + 1;
// Find matching closing bracket
while (pos < expr.Length && count > 0) {
if (expr.Substring(pos, 1) == "(") count++;
if (expr.Substring(pos, 1) == ")") count--;
pos++;
}
// Error if no matching closing bracket
if (count > 0) throw new InvalidOperationException("Closing parentheses not found.");
// Divide expression into sub expression
var pre = expr.Substring(0, pStart);
var subexpr = expr.Substring(pStart + 1, pos - pStart - 2);
var post = expr.Substring(pos, expr.Length - pos);
// Recursively evaluate the sub expression
expr = string.Format("{0} {1} {2}", pre, EvaluateExpression(subexpr), post);
}
// Replace this line with you're own logic to evaluate 'expr', a sub expression with any brackets removed.
return (int)new DataTable().Compute(expr, null);
}
}
I'm assuming your using C# here... but you should get the idea and be able to translate it into whatever.
If you use the following regex, you can capture them as group(1). group(0) will have parenthesis included.
"\\(((?:\"\\(|\\)\"|[^()])+)\\)"
Hope it helps!

Regex for parenthesis (JavaScript)

This is the regexp I created so far:
\((.+?)\)
This is my test string: (2+2) + (2+3*(2+3))
The matches I get are:
(2+2)
And
(2+3*(2+3)
I want my matches to be:
(2+2)
And
(2+3*(2+3))
How should I modify my regular expression?
You cannot parse parentesized expressions with regular expression.
There is a mathematical proof that regular expressions can't do this.
Parenthesized expressions are a context-free grammar, and can thus be recognized by pushdown automata (stack-machines).
You can, anyway, define a regular expression that will work on any expression with less than N parentheses, with an arbitrary finite N (even though the expression will get complex).
You just need to acknowledge that your parentheses might contain another arbitrary number of parenteses.
\(([^()]+(\([^)]+\)[^)]*)*)\)
It works like this:
\(([^()]+ matches an open parenthesis, follwed by whatever is not a parenthesis;
(\([^)]+\)[^)]*)* optionally, there may be another group, formed by an open parenthesis, with something inside it, followed by a matching closing parenthesis. Some other non-parenthesis character may follow. This can be repeated an arbitrary amount of times. Anyway, at last, there must be
)\) another closed parenthesis, which matches with the first one.
This should work for nesting depth 2. If you want nesting depth 3, you have to further recurse, allowing each of the groups I described at point (2) to have a nested parenthesized group.
Things will get much easier if you use a stack. Such as:
foundMatches = [];
mStack = [];
start = RegExp("\\(");
mid = RegExp("[^()]*[()]?");
idx = 0;
while ((idx = input.search(start.substr(idx))) != -1) {
mStack.push(idx);
//Start a search
nidx = input.substr(idx + 1).search(mid);
while (nidx != -1 && idx + nidx < input.length) {
idx += nidx;
match = input.substr(idx).match(mid);
match = match[0].substr(-1);
if (match == "(") {
mStack.push(idx);
} else if (mStack.length == 1) {
break;
}
nidx = input.substr(idx + 1).search(mid);
}
//Check the result
if (nidx != -1 && idx + nidx < input.length) {
//idx+nidx is the index of the last ")"
idx += nidx;
//The stack contains the index of the first "("
startIdx = mStack.pop();
foundMatches.push(input.substr(startIdx, idx + 1 - startIdx));
}
idx += 1;
}
How about you parse it yourself using a loop without the help of regex?
Here is one simple way:
You would have to have a variable, say "level", which keeps track of how many open parentheses you have come across so far (initialize it with a 0).
You would also need a string buffer to contain each of your matches ( e.g. (2+2) or (2+3 * (2+3)) ) .
Finally, you would need somewhere you can dump the contents of your buffer into whenever you finish reading a match.
As you read the string character by character, you would increment level by 1 when you come across "(", and decrement by 1 when you come across ")". You would then put the character into the buffer.
When you come across ")" AND the level happens to hit 0, that is when you know you have a match. This is when you would dump the contents of the buffer and continue.
This method assumes that whenever you have a "(" there will always be a corresponding ")" in the input string. This method will handle arbitrary number of parentheses.

How do I visual select a calculation backwards?

I would like to visual select backwards a calculation p.e.
200 + 3 This is my text -300 +2 + (9*3)
|-------------|*
This is text 0,25 + 2.000 + sqrt(15/1.5)
|-------------------------|*
The reason is that I will use it in insert mode.
After writing a calculation I want to select the calculation (using a map) and put the results of the calculation in the text.
What the regex must do is:
- select from the cursor (see * in above example) backwards to the start of the calculation
(including \/-+*:.,^).
- the calculation can start only with log/sqrt/abs/round/ceil/floor/sin/cos/tan or with a positive or negative number
- the calculation can also start at the beginning of the line but it never goes back to
a previous line
I tried in all ways but could not find the correct regex.
I noted that backward searching is different then forward searching.
Can someone help me?
Edit
Forgot to mention that it must include also the '=' if there is one and if the '=' is before the cursor or if there is only space between the cursor and '='.
It must not include other '=' signs.
200 + 3 = 203 -300 +2 + (9*3) =
|-------------------|<SPACES>*
200 + 3 = 203 -300 +2 + (9*3)
|-----------------|<SPACES>*
* = where the cursor is
A regex that comes close in pure vim is
\v\c\s*\zs(\s{-}(((sqrt|log|sin|cos|tan|exp)?\(.{-}\))|(-?[0-9,.]+(e-?[0-9]+)?)|([-+*/%^]+)))+(\s*\=?)?\s*
There are limitations: subexpressions (including function arguments) aren't parsed. You'd need to use a proper grammar parser to do that, and I don't recommend doing that in pure vim1
Operator Mapping
To enable using this a bit like text-objects, use something like this in your $MYVIMRC:
func! DetectExpr(flag)
let regex = '\v\c\s*\zs(\s{-}(((sqrt|log|sin|cos|tan|exp)?\(.{-}\))|(-?[0-9,.]+(e-?[0-9]+)?)|([-+*/%^]+)))+(\s*\=?)?\s*'
return searchpos(regex, a:flag . 'ncW', line('.'))
endf
func! PositionLessThanEqual(a, b)
"echo 'a: ' . string(a:a)
"echo 'b: ' . string(a:b)
if (a:a[0] == a:b[0])
return (a:a[1] <= a:b[1]) ? 1 : 0
else
return (a:a[0] <= a:b[0]) ? 1 : 0
endif
endf
func! SelectExpr(mustthrow)
let cpos = getpos(".")
let cpos = [cpos[1], cpos[2]] " use only [lnum,col] elements
let begin = DetectExpr('b')
if ( ((begin[0] == 0) && (begin[1] == 0))
\ || !PositionLessThanEqual(begin, cpos) )
if (a:mustthrow)
throw "Cursor not inside a valid expression"
else
"echoerr "not satisfied: " . string(begin) . " < " . string(cpos)
endif
return 0
endif
"echo "satisfied: " . string(begin) . " < " . string(cpos)
call setpos('.', [0, begin[0], begin[1], 0])
let end = DetectExpr('e')
if ( ((end[0] == 0) || (end[1] == 0))
\ || !PositionLessThanEqual(cpos, end) )
call setpos('.', [0, cpos[0], cpos[1], 0])
if (a:mustthrow)
throw "Cursor not inside a valid expression"
else
"echoerr "not satisfied: " . string(begin) . " < " . string(cpos) . " < " . string(end)
endif
return 0
endif
"echo "satisfied: " . string(begin) . " < " . string(cpos) . " < " . string(end)
norm! v
call setpos('.', [0, end[0], end[1], 0])
return 1
endf
silent! unmap X
silent! unmap <M-.>
xnoremap <silent>X :<C-u>call SelectExpr(0)<CR>
onoremap <silent>X :<C-u>call SelectExpr(0)<CR>
Now you can operator on the nearest expression around (or after) the cursor position:
vX - [v]isually select e[X]pression
dX - [d]elete current e[X]pression
yX - [y]ank current e[X]pression
"ayX - id. to register a
As a trick, use the following to arrive at the exact ascii art from the OP (using virtualedit for the purpose of the demo):
Insert mode mapping
In response to the chat:
" if you want trailing spaces/equal sign to be eaten:
imap <M-.> <C-o>:let #e=""<CR><C-o>"edX<C-r>=substitute(#e, '^\v(.{-})(\s*\=?)?\s*$', '\=string(eval(submatch(1)))', '')<CR>
" but I'm assuming you wanted them preserved:
imap <M-.> <C-o>:let #e=""<CR><C-o>"edX<C-r>=substitute(#e, '^\v(.{-})(\s*\=?\s*)?$', '\=string(eval(submatch(1))) . submatch(2)', '')<CR>
allows you to hit Alt-. during insert mode and the current expression gets replaced with it's evaluation. The cursor ends up at the end of the result in insert mode.
200 + 3 This is my text -300 +2 + (9*3)
This is text 0.25 + 2.000 + sqrt(15/1.5)
Tested by pressing Alt-. in insert 3 times:
203 This is my text -271
This is text 5.412278
For Fun: ascii art
vXoyoEsc`<jPvXr-r|e.
To easily test it yourself:
:let #q="vXoyo\x1b`<jPvXr-r|e.a*\x1b"
:set virtualedit=all
Now you can #q anywhere and it will ascii-decorate the nearest expression :)
200 + 3 = 203 -300 +2 + (9*3) =
|-------|*
|-------------------|*
200 + 3 = 203 -300 +2 + (9*3)
|-----------------|*
|-------|*
This is text 0,25 + 2.000 + sqrt(15/1.5)
|-------------------------|*
1 consider using Vim's python integration to do such parsing
This seems quite a complicated task after all to achieve with regex, so if you can avoid it in any way, try to do so.
I've created a regex that works for a few examples - give it a try and see if it does the trick:
^(?:[A-Za-z]|\s)+((?:[^A-Za-z]+)?(?:log|sqrt|abs|round|ceil|floor|sin|cos|tan)[^A-Za-z]+)(?:[A-Za-z]|\s)*$
The part that you are interested in should be in the first matching group.
Let me know if you need an explanation.
EDIT:
^ - match the beginning of a line
(?:[A-Za-z]|\s)+ - match everything that's a letter or a space once or more
match and capture the following 3:
((?:[^A-Za-z]+)? - match everything that's NOT a letter (i.e. in your case numbers or operators)
(?:log|sqrt|abs|round|ceil|floor|sin|cos|tan) - match one of your keywords
[^A-Za-z]+) - match everything that's NOT a letter (i.e. in your case numbers or operators)
(?:[A-Za-z]|\s)* - match everything that's a letter or a space zero or more times
$ - match the end of the line

How to match the numeric value in a regular expression?

Okay, this is quite an interesting challenge I have got myself into.
My RegEx takes as input lines like the following:
147.63.23.156/159
94.182.23.55/56
134.56.33.11/12
I need it to output a regular expression that matches the range represented. Let me explain.
For example, if the RegEx receives 147.63.23.156/159, then it needs to output a RegEx that matches the following:
147.63.23.156
147.63.23.157
147.63.23.158
147.63.23.159
How can I do this?
Currently I have:
(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})
$1 contains the first xxx.xxx.xxx. part
$2 contains the lower range for the number
$3 contains the upper range for the number
Regexes are really not a great way to validate IP addresses, I want to make that clear right up front. It is far, far easier to parse the addresses and do some simple arithmetic to compare them. A couple of less thans and greater thans and you're there.
That said, it seemed like it would be a fun exercise to write a regex generator. I came up with a big mess of Python code to generate these regexes. Before I show the code, here's a sample of the regexes it produces for a couple of IP ranges:
1.2.3.4 to 1.2.3.4 1\.2\.3\.4
147.63.23.156 to 147.63.23.159 147\.63\.23\.15[6-9]
10.7.7.10 to 10.7.7.88 10\.7\.7\.([1-7]\d|8[0-8])
127.0.0.0 to 127.0.1.255 127\.0\.[0-1]\.(\d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5]))
I'll show the code in two parts. First, the part that generates regexes for simple integer ranges. Second, the part that handles full IP addresses.
Matching number ranges
The first step is to figure out how to generate a regex that matches an arbitrary integer range, say 12-28 or 0-255. Here's an example of the regexes my implementation comes up with:
156 to 159 15[6-9]
1 to 100 [1-9]|[1-9]\d|100
0 to 255 \d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5])
And now the code. There are numerous comments inline explaining the logic behind it. Overall it relies on a lot of recursion and special casing to try to keep the regexes lean and mean.
import sys, re
def range_regex(lower, upper):
lower, upper = str(lower), str(upper)
# Different lengths, for instance 1-100. Combine regex(1-9) and
# regex(10-100).
if len(lower) != len(upper):
return '%s|%s' % (
range_regex(lower, '9' * len(lower)),
range_regex(10 ** (len(lower)), upper)
)
ll, lr = lower[0], lower[1:]
ul, ur = upper[0], upper[1:]
# One digit numbers.
if lr == '':
if ll == '0' and ul == '9':
return '\\d'
else:
return '[%s-%s]' % (ll, ul)
# Same first digit, for instance 12-14. Concatenate "1" and regex(2-4).
elif ll == ul:
return ll + sub_range_regex(lr, ur)
# All zeros to all nines, for instance 100-399. Concatenate regex(1-3)
# and the appropriate number of \d's.
elif lr == '0' * len(lr) and ur == '9' * len(ur):
return range_regex(ll, ul) + '\\d' * len(lr)
# All zeros on left, for instance 200-649. Combine regex(200-599) and
# regex(600-649).
elif lr == '0' * len(lr):
return '%s|%s' % (
range_regex(lower, str(int(ul[0]) - 1) + '9' * len(ur)),
range_regex(ul + '0' * len(ur), upper)
)
# All nines on right, for instance 167-499. Combine regex(167-199) and
# regex(200-499).
elif ur == '9' * len(ur):
return '%s|%s' % (
range_regex(lower, ll + '9' * len(lr)),
range_regex(str(int(ll[0]) + 1) + '0' * len(lr), upper)
)
# First digits are one apart, for instance 12-24. Combine regex(12-19)
# and regex(20-24).
elif ord(ul[0]) - ord(ll[0]) == 1:
return '%s%s|%s%s' % (
ll, sub_range_regex(lr, '9' * len(lr)),
ul, sub_range_regex('0' * len(ur), ur)
)
# Far apart, uneven numbers, for instance 15-73. Combine regex(15-19),
# regex(20-69), and regex(70-73).
else:
return '%s|%s|%s' % (
range_regex(lower, ll + '9' * len(lr)),
range_regex(str(int(ll[0]) + 1) + '0' * len(lr),
str(int(ul[0]) - 1) + '9' * len(ur)),
range_regex(ul + '0' * len(ur), upper)
)
# Helper function which adds parentheses when needed to sub-regexes.
# Sub-regexes need parentheses if they have pipes that aren't already
# contained within parentheses. For example, "6|8" needs parentheses
# but "1(6|8)" doesn't.
def sub_range_regex(lower, upper):
orig_regex = range_regex(lower, upper)
old_regex = orig_regex
while True:
new_regex = re.sub(r'\([^()]*\)', '', old_regex)
if new_regex == old_regex:
break
else:
old_regex = new_regex
continue
if '|' in new_regex:
return '(' + orig_regex + ')'
else:
return orig_regex
Matching IP address ranges
With that capability in place, I then wrote a very similar-looking IP range function to work with full IP addresses. The code is very similar to the code above except that we're working in base 256 instead of base 10, and the code throws around lists instead of strings.
import sys, re, socket
def ip_range_regex(lower, upper):
lower = [ord(c) for c in socket.inet_aton(lower)]
upper = [ord(c) for c in socket.inet_aton(upper)]
return ip_array_regex(lower, upper)
def ip_array_regex(lower, upper):
# One octet left.
if len(lower) == 1:
return range_regex(lower[0], upper[0])
# Same first octet.
if lower[0] == upper[0]:
return '%s\.%s' % (lower[0], sub_regex(ip_array_regex(lower[1:], upper[1:])))
# Full subnet.
elif lower[1:] == [0] * len(lower[1:]) and upper[1:] == [255] * len(upper[1:]):
return '%s\.%s' % (
range_regex(lower[0], upper[0]),
sub_regex(ip_array_regex(lower[1:], upper[1:]))
)
# Partial lower subnet.
elif lower[1:] == [0] * len(lower[1:]):
return '%s|%s' % (
ip_array_regex(lower, [upper[0] - 1] + [255] * len(upper[1:])),
ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
)
# Partial upper subnet.
elif upper[1:] == [255] * len(upper[1:]):
return '%s|%s' % (
ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
ip_array_regex([lower[0] + 1] + [0] * len(lower[1:]), upper)
)
# First octets just 1 apart.
elif upper[0] - lower[0] == 1:
return '%s|%s' % (
ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
)
# First octets more than 1 apart.
else:
return '%s|%s|%s' % (
ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
ip_array_regex([lower[0] + 1] + [0] * len(lower[1:]),
[upper[0] - 1] + [255] * len(upper[1:])),
ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
)
If you just need to build them one at at time, this website will do the trick.
If you need code, and don't mind python, this code does it for any arbitrary numeric range.
If it's for Apache... I haven't tried it, but it might work:
RewriteCond %{REMOTE_ADDR} !<147.63.23.156
RewriteCond %{REMOTE_ADDR} !>147.63.23.159
(Two consecutive RewriteConds are joined by a default logical AND)
Just have to be careful with ranges with differing number of digits (e.g. 95-105 should be broken into 95-99 and 100-105, since it is lexicographic ordering).
I absolutely agree with the commenters, a pure-regex solution would be the wrong tool for the job here. Just use the regular expression you already have to extract the prefix, minimum, and maximum values,
$prefix, $minimum, $maximum = match('(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})', $line).groups()
then test your IP address against ${prefix}(\d+),
$lastgroup = match($prefix + '(\d+)', $addr).groups()[0]
and compare that last group to see if it falls within the proper range,
return int($minimum) <= int($lastgroup) <= int($maximum)
Code examples are pseudocode, of course - convert to your language of choice.
To my knowledge, this can't be done with straight up regex, but would also need some code behind it. For instance, in PHP you could use the following:
function make_range($ip){
$regex = '#(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})#';
if ( preg_match($regex, $ip, $matches) ){
while($matches[1] <= $matches[2]){
print "{$matches[0]}.{$matches[1]}";
$matches[1]++;
}
} else {
exit('not a supported IP range');
}
}
For this to work with a RewriteCond, I think some black magic would be in order...
How is this going to be used with RewriteCond, anyways? Do you have several servers and want to just quickly make a .htaccess file easily? If so, then just add that function to a bigger script that takes some arguments and burps out a .htaccess file.