Rearranging elements in Python - list

i am new to Python and i cant get this.I have a List and i want to take the input from there and write those in files .
p = ['Eth1/1', 'Eth1/5','Eth2/1', 'Eth2/4','Eth101/1/1', 'Eth101/1/2', 'Eth101/1/3','Eth102/1/1', 'Eth102/1/2', 'Eth102/1/3','Eth103/1/1', 'Eth103/1/2', 'Eth103/1/3','Eth103/1/4','Eth104/1/1', 'Eth104/1/2', 'Eth104/1/3','Eth104/1/4']
What i am trying :
with open("abc1.txt", "w+") as fw1, open("abc2.txt", "w+") as fw2:
for i in p:
if len(i.partition("/")[0]) == 4:
fw1.write('int ' + i + '\n mode\n')
else:
i = 0
while i < len(p):
start = p[i].split('/')
if (start[0] == 'Eth101'):
i += 3
key = start[0]
i += 1
while i < len(p) and p[i].split('/')[0] == key:
i += 1
end = p[i-1].split('/')
fw2.write('confi ' + start[0] + '/' + start[1] + '-' + end[1] + '\n mode\n')
What i am looking for :
abc1.txt should have
int Eth1/1
mode
int Eth1/5
mode
int Eth2/1
mode
int Eth 2/4
mode
abc2.txt should have :
int Eth101/1/1-3
mode
int Eth102/1/1-3
mode
int Eth103/1/1-4
mode
int Eth104/1/1-4
mode
So any Eth having 1 digit before " / " ( e:g Eth1/1 or Eth2/2
)should be in one file that is abc1.txt .
Any Eth having 3 digit before " / " ( e:g Eth101/1/1 or Eth 102/1/1
) should be in another file that is abc2.txt and .As these are in
ranges , need to write it like Eth101/1/1-3, Eth102/1/1-3 etc
Any Idea ?

I don't think you need a regex here, at all. All your items begin with 'Eth' followed by one or more digits. So you can check the length of the items before first / occurs and then write it to a file.
p = ['Eth1/1', 'Eth1/5','Eth2/1', 'Eth2/4','Eth101/1/1', 'Eth101/1/2', 'Eth101/1/3','Eth102/1/1', 'Eth102/1/2', 'Eth102/1/3','Eth103/1/1', 'Eth103/1/2', 'Eth103/1/3','Eth103/1/4','Eth104/1/1', 'Eth104/1/2', 'Eth104/1/3','Eth104/1/4']
with open("abc1.txt", "w+") as fw1, open("abc2.txt", "w+") as fw2:
for i in p:
if len(i.partition("/")[0]) == 4:
fw1.write('int ' + i + '\n mode\n')
else:
fw2.write('int ' + i + '\n mode\n')
I refactored your code a little to bring with-statement into play. This will handle correctly closing the file at the end. Also it is not necessary to iterate twice over the sequence, so it's all done in one iteration.
If the data is not as clean as provided, then you maybe want to use regexes. Independent of the regex itself, by writing if re.match(r'((Eth\d{1}\/\d{1,2})', "p" ) you proof if a match object can be created for given regex on the string "p", not the value of the variable p. This is because you used " around p.
So this should work for your example. If you really need a regex, this will turn your problem in finding a good regex to match your needs without any other issues.
As these are in ranges , need to write it like Eth101/1/1-3, Eth102/1/1-3 etc
This is something you can achieve by first computing the string and then write it in the file. But this is more like a separate question.
UPDATE
It's not that trivial to compute the right network ranges. Here I can present you one approach which doesn't change my code but adds some functionality. The trick here is to get groups of connected networks which aren't interrupted by their numbers. For that I've copied consecutive_groups. You can also do a pip install more-itertools of course to get that functionality. And also I transformed the list to a dict to prepare the magic and then retransformed dict to list again. There are definitely better ways of doing it, but this worked for your input data, at least.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from itertools import groupby
from operator import itemgetter
p = ['Eth1/1', 'Eth1/5', 'Eth2/1', 'Eth2/4', 'Eth101/1/1', 'Eth101/1/2',
'Eth101/1/3', 'Eth102/1/1', 'Eth102/1/2', 'Eth102/1/3', 'Eth103/1/1',
'Eth103/1/2', 'Eth103/1/3', 'Eth103/1/4', 'Eth104/1/1', 'Eth104/1/2',
'Eth104/1/3', 'Eth104/1/4']
def get_network_ranges(networks):
network_ranges = {}
result = []
for network in networks:
parts = network.rpartition("/")
network_ranges.setdefault(parts[0], []).append(int(parts[2]))
for network, ranges in network_ranges.items():
ranges.sort()
for group in consecutive_groups(ranges):
group = list(group)
if len(group) == 1:
result.append(network + "/" + str(group[0]))
else:
result.append(network + "/" + str(group[0]) + "-" +
str(group[-1]))
result.sort() # to get ordered results
return result
def consecutive_groups(iterable, ordering=lambda x: x):
"""taken from more-itertools (latest)"""
for k, g in groupby(
enumerate(iterable), key=lambda x: x[0] - ordering(x[1])
):
yield map(itemgetter(1), g)
# only one line added to do the magic
with open("abc1.txt", "w+") as fw1, open("abc2.txt", "w+") as fw2:
p = get_network_ranges(p)
for i in p:
if len(i.partition("/")[0]) == 4:
fw1.write('int ' + i + '\n mode\n')
else:
fw2.write('int ' + i + '\n mode\n')

Related

Find value of a string given a superstring regex

How can I match for a string that is a substring of a given input string, preferable with regex?
Given a value: A789Lfu891MatchMe2ENOTSTH, construct a regex that would match a string where the string is a substring of the given value.
Expected matches:
MatchMe
ENOTST
891
Expected Non Match
foo
A789L<fu891MatchMe2ENOTSTH_extra
extra_A789L<fu891MatchMe2ENOTSTH
extra_A789L<fu891MatchMe2ENOTSTH_extra
It seems easier for me to do the reverse: /\w*MatchMe\w*/, but I can't wrap my head around this problem.
Something like how SQL would do it:
SELECT * FROM my_table mt WHERE 'A789Lfu891MatchMe2ENOTSTH' LIKE '%' || mt.foo || '%';
Prefix suffixes
You could alternate prefix suffixes, like turn the superstring abcd into a pattern like ^(a|(a)?b|((a)?b)?c|(((a)?b)?c)?d)$. For your example, the pattern has 1253 characters (exactly 2000 fewer than #tobias_k's).
Python code to produce the regex, can then be tested with tobias_k's code (try it online):
from itertools import accumulate
t = "A789Lfu891MatchMe2ENOTSTH"
p = '^(' + '|'.join(accumulate(t, '({})?{}'.format)) + ')$'
Suffix prefixes
Suffix prefixes look nicer and match faster: ^(a(b(c(d)?)?)?|b(c(d)?)?|c(d)?|d)$. Sadly the generating code is less elegant.
Divide and conquer
For a shorter regex, we can use divide and conquer. For example for the superstring abcdefg, every substring falls into one of three cases:
Contains the middle character (the d). Pattern for that: ((a?b)?c)?d(e(fg?)?)?
Is left of that middle character. So recursively build a regex for the superstring abc: a|a?bc?|c.
Is right of that middle character. So recursively build a regex for the superstring efg: e|e?fg?|g.
And then make an alternation of those three cases:
a|a?bc?|c|((a?b)?c)?d(e(fg?)?)?|e|e?fg?|g
Regex length will be Θ(n log n) instead of our previous Θ(n2).
The result for your superstring example of 25 characters is this regex with 301 characters:
^(A|A?78?|8|((A?7)?8)?9(Lf?)?|Lf?|f|(((((A?7)?8)?9)?L)?f)?u(8(9(1(Ma?)?)?)?)?|89?|9|(8?9)?1(Ma?)?|Ma?|a|(((((((((((A?7)?8)?9)?L)?f)?u)?8)?9)?1)?M)?a)?t(c(h(M(e(2(E(N(O(T(S(TH?)?)?)?)?)?)?)?)?)?)?)?|c|c?hM?|M|((c?h)?M)?e(2E?)?|2E?|E|(((((c?h)?M)?e)?2)?E)?N(O(T(S(TH?)?)?)?)?|OT?|T|(O?T)?S(TH?)?|TH?|H)$
Benchmark
Speed benchmarks don't really make that much sense, as in reality we'd just do a regular substring check (in Python s in t), but let's do one anyway.
Results for matching all substrings of your superstring, using Python 3.9.6 on my PC:
1.09 ms just_all_substrings
25.18 ms prefix_suffixes
3.47 ms suffix_prefixes
3.46 ms divide_and_conquer
And on TIO and their "Python 3.8 (pre-release)" with quite different results:
0.30 ms just_all_substrings
46.90 ms prefix_suffixes
2.24 ms suffix_prefixes
2.95 ms divide_and_conquer
Regex lengths (also printed by the below benchmark code):
3253 characters - just_all_substrings
1253 characters - prefix_suffixes
1253 characters - suffix_prefixes
301 characters - divide_and_conquer
Benchmark code (Try it online!):
from timeit import repeat
import re
from itertools import accumulate
def just_all_substrings(t):
return "^(" + '|'.join(t[i:k] for i in range(0, len(t))
for k in range(i+1, len(t)+1)) + ")$"
def prefix_suffixes(t):
return '^(' + '|'.join(accumulate(t, '({})?{}'.format)) + ')$'
def suffix_prefixes(t):
return '^(' + '|'.join(list(accumulate(t[::-1], '{1}({0})?'.format))[::-1]) + ')$'
def divide_and_conquer(t):
def suffixes(t):
# Example: abc => ((a?b)?c)?
regex = f'{t[0]}?'
for c in t[1:]:
regex = f'({regex}{c})?'
return regex
def prefixes(t):
# Example: efg => (e(fg?)?)?
regex = f'{t[-1]}?'
for c in t[-2::-1]:
regex = f'({c}{regex})?'
return regex
def superegex(t):
n = len(t)
if n == 1:
return t
if n == 2:
return f'{t}?|{t[1]}'
mid = n // 2
contain = suffixes(t[:mid]) + t[mid] + prefixes(t[mid+1:])
left = superegex(t[:mid])
right = superegex(t[mid+1:])
return '|'.join([left, contain, right])
return '^(' + superegex(t) + ')$'
creators = just_all_substrings, prefix_suffixes, suffix_prefixes, divide_and_conquer,
t = "A789Lfu891MatchMe2ENOTSTH"
substrings = [t[start:stop]
for start in range(len(t))
for stop in range(start+1, len(t)+1)]
def test(p):
match = re.compile(p).match
return all(map(re.compile(p).match, substrings))
for creator in creators:
print(test(creator(t)), creator.__name__)
print()
print('Regex lengths:')
for creator in creators:
print('%5d characters -' % len(creator(t)), creator.__name__)
print()
for _ in range(3):
for creator in creators:
p = creator(t)
number = 10
time = min(repeat(lambda: test(p), number=number)) / number
print('%5.2f ms ' % (time * 1e3), creator.__name__)
print()
One way to "construct" such a regex would be to build a disjunction of all possible substrings of the original value. Example in Python:
import re
t = "A789Lfu891MatchMe2ENOTSTH"
p = "^(" + '|'.join(t[i:k] for i in range(0, len(t))
for k in range(i+1, len(t)+1)) + ")$"
good = ["MatchMe", "ENOTST", "891"]
bad = ["foo", "A789L<fu891MatchMe2ENOTSTH_extra",
"extra_A789L<fu891MatchMe2ENOTSTH",
"extra_A789L<fu891MatchMe2ENOTSTH_extra"]
assert all(re.match(p, s) is not None for s in good)
assert all(re.match(p, s) is None for s in bad)
For the value "abcd", this would e.g. be "^(a|ab|abc|abcd|b|bc|bcd|c|cd|d)$"; for the given example it's a bit longer, with 3253 characters...

Text processing to get if else type condition from a string

First of all, I am sorry about the weird question heading. Couldn't express it in one line.
So, the problem statement is,
If I am given the following string --
"('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
I have to parse it as
list1 = ["'James Gosling'", 'jamesgosling', 'jame gosling']
list2 = ["'SUN Microsystem'", 'sunmicrosystem']
list3 = [ list1, list2, keyword]
So that, if I enter James Gosling Sun Microsystem keyword it should tell me that what I have entered is 100% correct
And if I enter J Gosling Sun Microsystem keyword it should say i am only 66.66% correct.
This is what I have tried so far.
import re
def main():
print("starting")
sentence = "('James Gosling'/jamesgosling/jame gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
splited = sentence.split(",")
number_of_primary_keywords = len(splited)
#print(number_of_primary_keywords, "primary keywords length")
number_of_brackets = 0
inside_quotes = ''
inside_quotes_1 = ''
inside_brackets = ''
for n in range(len(splited)):
#print(len(re.findall('\w+', splited[n])), "length of splitted")
inside_brackets = splited[n][splited[n].find("(") + 1: splited[n].find(")")]
synonyms = inside_brackets.split("/")
for x in range(len(synonyms)):
try:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
print(inside_quotes_1)
except:
pass
try:
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
print(inside_quotes)
except:
pass
#print(synonyms[x])
number_of_brackets += 1
print(number_of_brackets)
if __name__ == '__main__':
main()
Output is as follows
'James Gosling
jamesgoslin
jame goslin
'SUN Microsystem
SUN Microsystem
sunmicrosyste
sunmicrosyste
3
As you can see, the last letters of some words are missing.
So, if you read this far, I hope you can help me in getting the expected output
Unfortunately, your code has a logic issue that I could not figure it out, however there might be in these lines:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
which by the way you can simply use:
inside_quotes_1 = synonyms[x][synonyms[x].find("\x22") + 1: synonyms[n].find("\x22")]
inside_quotes = synonyms[x][synonyms[x].find("\x27") + 1: synonyms[n].find("\x27")]
Other than that, you seem to want to extract the words with their indices, which you can extract them using a basic expression:
(\w+)
Then, you might want to find a simple way to locate the indices, where the words are. Then, associate each word to the desired indices.
Example Test
# -*- coding: UTF-8 -*-
import re
string = "('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
expression = r'(\w+)'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match 💚💚💚 ")
else:
print('🙀 Sorry! No matches! Something is not right! Call 911 👮')

There are a order in identifier?

My Question:
I have this exercise ;
If the verb ends in e, drop the e and add ing (if not exception: be, see, flee, knee, etc.)
If the verb ends in ie, change ie to y and add ing
For words consisting of consonant-vowel-consonant, double the final letter before adding ing
By default just add ing
Your task in this exercise is to define a function make_ing_form() which given a verb in infinitive form returns its present participle form. Test your function with words such as lie, see, move and hug. However, you must not expect such simple rules to work for all cases.
My code:
def make_ing_form():
a = raw_input("Please give a Verb: ")
if a.endswith("ie"):
newverb = a[:-2] + "y" + "ing"
elif a.endswith("e"):
newverb = a[:3] + "ing"
elif a[1] in "aeiou":
newverb = a + a[-1] + "ing"
else:
newverb = a + "ing"
print newverb
make_ing_form()
With this code all is gut , but when i change the placement ;
def make_ing_form():
a = raw_input("Please give a verb: ")
if a.endswith("e"):
newverb = a[:3] + "ing"
elif a.endswith("ie"):
newverb = a[:-2] + "y" + "ing"
elif a[1] in "aeiou":
newverb = a + a[-1] + "ing"
else:
newverb = a + "ing"
print newverb
make_ing_form()
the answer who i come are not on present participle , how i Understand here http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#python-has-names , when the identifier change to another statement ( from If to Elif), it "forget" the if statement .If that's the case , why i to receive this result ?
sorry about my English ....
In the second code it will never enter the first elif ( elif a.endswith("ie"): ) because if a verb ends in ie (ex. lie) it would enter the if, as lie ends in e. You should have the condition as in the first code. If you have more problems with your first code let me know.

regex for detecting subtitle errors

I'm having some issues with subtitles, I need a way to detect specific errors. I think regular expressions would help but need help figuring this one out. In this example of SRT formatted subtitle, line #13 ends at 00:01:10,130 and line #14 begins at 00:01:10:129.
13
00:01:05,549 --> 00:01:10,130
some text here.
14
00:01:10,129 --> 00:01:14,109
some other text here.
Problem is that next line can't begin before current one is over - embedding algorithm doesn't work when that happens. I need to check my SRT files and correct this manually, but looking for this manually in about 20 videos each an hour long just isn't an option. Specially since I need it 'yesterday' (:
Format for SRT subtitles is very specific:
XX
START --> END
TEXT
EMPTY LINE
[line number (digits)][new line character]
[start and end times in 00:00:00,000 format, separated by _space__minusSign__minusSign__greaterThenSign__space_][new line character]
[text - can be any character - letter, digit, punctuation sign.. pretty much anything][new line character]
[new line character]
I need to check if END time is greater then START time of the following subtitle. Help would be appreciated.
PS. I can work with Notepad++, Eclipse (Aptana), python or javascript...
Regular expressions can be used to achieve what you want, that being said, they can't do it on their own. Regular expressions are used for matching patterns and not numerical ranges.
If I where you, what I would do would be as following:
Parse the file and place the start-end time in one data structure (call it DS_A) and the text in another (call it DS_B).
Sort DS_A in ascending order. This should guarantee that you will not have overlapping ranges. (This previous SO post should point you in the right direction).
Iterate over and write the following in your file:j DS_A[i] --> DS_A[i + 1] <newline> DS_B[j] where i is a loop counter for DS_A and j is a loop counter for DS_B.
I ended up writing short script to fix this. here it is:
# -*- coding: utf-8 -*-
from datetime import datetime
import getopt, re, sys
count = 0
def fix_srt(inputfile):
global count
parsed_file, errors_file = '', ''
try:
with open( inputfile , 'r') as f:
srt_file = f.read()
parsed_file, errors_file = parse_srt(srt_file)
except:
pass
finally:
outputfile1 = ''.join( inputfile.split('.')[:-1] ) + '_fixed.srt'
outputfile2 = ''.join( inputfile.split('.')[:-1] ) + '_error.srt'
with open( outputfile1 , 'w') as f:
f.write(parsed_file)
with open( outputfile2 , 'w') as f:
f.write(errors_file)
print 'Detected %s errors in "%s". Fixed file saved as "%s"
(Errors only as "%s").' % ( count, inputfile, outputfile1, outputfile2 )
previous_end_time = datetime.strptime("00:00:00,000", "%H:%M:%S,%f")
def parse_times(times):
global previous_end_time
global count
_error = False
_times = []
for time_code in times:
t = datetime.strptime(time_code, "%H:%M:%S,%f")
_times.append(t)
if _times[0] < previous_end_time:
_times[0] = previous_end_time
count += 1
_error = True
previous_end_time = _times[1]
_times[0] = _times[0].strftime("%H:%M:%S,%f")[:12]
_times[1] = _times[1].strftime("%H:%M:%S,%f")[:12]
return _times, _error
def parse_srt(srt_file):
parsed_srt = []
parsed_err = []
for srt_group in re.sub('\r\n', '\n', srt_file).split('\n\n'):
lines = srt_group.split('\n')
if len(lines) >= 3:
times = lines[1].split(' --> ')
correct_times, error = parse_times(times)
if error:
clean_text = map( lambda x: x.strip(' '), lines[2:] )
srt_group = lines[0].strip(' ') + '\n' + ' --> '.join( correct_times ) + '\n' + '\n'.join( clean_text )
parsed_err.append( srt_group )
parsed_srt.append( srt_group )
return '\r\n'.join( parsed_srt ), '\r\n'.join( parsed_err )
def main(argv):
inputfile = None
try:
options, arguments = getopt.getopt(argv, "hi:", ["input="])
except:
print 'Usage: test.py -i <input file>'
for o, a in options:
if o == '-h':
print 'Usage: test.py -i <input file>'
sys.exit()
elif o in ['-i', '--input']:
inputfile = a
fix_srt(inputfile)
if __name__ == '__main__':
main( sys.argv[1:] )
If someone needs it save the code as srtfix.py, for example, and use it from command line:
python srtfix.py -i "my srt subtitle.srt"
I was lazy and used datetime module to process timecodes, so not sure script will work for subtitles longer then 24h (: I'm also not sure when miliseconds were added to Python's datetime module, I'm using version 2.7.5; it's possible script won't work on earlier versions because of this...

How to match the numeric value in a regular expression?

Okay, this is quite an interesting challenge I have got myself into.
My RegEx takes as input lines like the following:
147.63.23.156/159
94.182.23.55/56
134.56.33.11/12
I need it to output a regular expression that matches the range represented. Let me explain.
For example, if the RegEx receives 147.63.23.156/159, then it needs to output a RegEx that matches the following:
147.63.23.156
147.63.23.157
147.63.23.158
147.63.23.159
How can I do this?
Currently I have:
(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})
$1 contains the first xxx.xxx.xxx. part
$2 contains the lower range for the number
$3 contains the upper range for the number
Regexes are really not a great way to validate IP addresses, I want to make that clear right up front. It is far, far easier to parse the addresses and do some simple arithmetic to compare them. A couple of less thans and greater thans and you're there.
That said, it seemed like it would be a fun exercise to write a regex generator. I came up with a big mess of Python code to generate these regexes. Before I show the code, here's a sample of the regexes it produces for a couple of IP ranges:
1.2.3.4 to 1.2.3.4 1\.2\.3\.4
147.63.23.156 to 147.63.23.159 147\.63\.23\.15[6-9]
10.7.7.10 to 10.7.7.88 10\.7\.7\.([1-7]\d|8[0-8])
127.0.0.0 to 127.0.1.255 127\.0\.[0-1]\.(\d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5]))
I'll show the code in two parts. First, the part that generates regexes for simple integer ranges. Second, the part that handles full IP addresses.
Matching number ranges
The first step is to figure out how to generate a regex that matches an arbitrary integer range, say 12-28 or 0-255. Here's an example of the regexes my implementation comes up with:
156 to 159 15[6-9]
1 to 100 [1-9]|[1-9]\d|100
0 to 255 \d|[1-9]\d|1\d\d|2([0-4]\d|5[0-5])
And now the code. There are numerous comments inline explaining the logic behind it. Overall it relies on a lot of recursion and special casing to try to keep the regexes lean and mean.
import sys, re
def range_regex(lower, upper):
lower, upper = str(lower), str(upper)
# Different lengths, for instance 1-100. Combine regex(1-9) and
# regex(10-100).
if len(lower) != len(upper):
return '%s|%s' % (
range_regex(lower, '9' * len(lower)),
range_regex(10 ** (len(lower)), upper)
)
ll, lr = lower[0], lower[1:]
ul, ur = upper[0], upper[1:]
# One digit numbers.
if lr == '':
if ll == '0' and ul == '9':
return '\\d'
else:
return '[%s-%s]' % (ll, ul)
# Same first digit, for instance 12-14. Concatenate "1" and regex(2-4).
elif ll == ul:
return ll + sub_range_regex(lr, ur)
# All zeros to all nines, for instance 100-399. Concatenate regex(1-3)
# and the appropriate number of \d's.
elif lr == '0' * len(lr) and ur == '9' * len(ur):
return range_regex(ll, ul) + '\\d' * len(lr)
# All zeros on left, for instance 200-649. Combine regex(200-599) and
# regex(600-649).
elif lr == '0' * len(lr):
return '%s|%s' % (
range_regex(lower, str(int(ul[0]) - 1) + '9' * len(ur)),
range_regex(ul + '0' * len(ur), upper)
)
# All nines on right, for instance 167-499. Combine regex(167-199) and
# regex(200-499).
elif ur == '9' * len(ur):
return '%s|%s' % (
range_regex(lower, ll + '9' * len(lr)),
range_regex(str(int(ll[0]) + 1) + '0' * len(lr), upper)
)
# First digits are one apart, for instance 12-24. Combine regex(12-19)
# and regex(20-24).
elif ord(ul[0]) - ord(ll[0]) == 1:
return '%s%s|%s%s' % (
ll, sub_range_regex(lr, '9' * len(lr)),
ul, sub_range_regex('0' * len(ur), ur)
)
# Far apart, uneven numbers, for instance 15-73. Combine regex(15-19),
# regex(20-69), and regex(70-73).
else:
return '%s|%s|%s' % (
range_regex(lower, ll + '9' * len(lr)),
range_regex(str(int(ll[0]) + 1) + '0' * len(lr),
str(int(ul[0]) - 1) + '9' * len(ur)),
range_regex(ul + '0' * len(ur), upper)
)
# Helper function which adds parentheses when needed to sub-regexes.
# Sub-regexes need parentheses if they have pipes that aren't already
# contained within parentheses. For example, "6|8" needs parentheses
# but "1(6|8)" doesn't.
def sub_range_regex(lower, upper):
orig_regex = range_regex(lower, upper)
old_regex = orig_regex
while True:
new_regex = re.sub(r'\([^()]*\)', '', old_regex)
if new_regex == old_regex:
break
else:
old_regex = new_regex
continue
if '|' in new_regex:
return '(' + orig_regex + ')'
else:
return orig_regex
Matching IP address ranges
With that capability in place, I then wrote a very similar-looking IP range function to work with full IP addresses. The code is very similar to the code above except that we're working in base 256 instead of base 10, and the code throws around lists instead of strings.
import sys, re, socket
def ip_range_regex(lower, upper):
lower = [ord(c) for c in socket.inet_aton(lower)]
upper = [ord(c) for c in socket.inet_aton(upper)]
return ip_array_regex(lower, upper)
def ip_array_regex(lower, upper):
# One octet left.
if len(lower) == 1:
return range_regex(lower[0], upper[0])
# Same first octet.
if lower[0] == upper[0]:
return '%s\.%s' % (lower[0], sub_regex(ip_array_regex(lower[1:], upper[1:])))
# Full subnet.
elif lower[1:] == [0] * len(lower[1:]) and upper[1:] == [255] * len(upper[1:]):
return '%s\.%s' % (
range_regex(lower[0], upper[0]),
sub_regex(ip_array_regex(lower[1:], upper[1:]))
)
# Partial lower subnet.
elif lower[1:] == [0] * len(lower[1:]):
return '%s|%s' % (
ip_array_regex(lower, [upper[0] - 1] + [255] * len(upper[1:])),
ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
)
# Partial upper subnet.
elif upper[1:] == [255] * len(upper[1:]):
return '%s|%s' % (
ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
ip_array_regex([lower[0] + 1] + [0] * len(lower[1:]), upper)
)
# First octets just 1 apart.
elif upper[0] - lower[0] == 1:
return '%s|%s' % (
ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
)
# First octets more than 1 apart.
else:
return '%s|%s|%s' % (
ip_array_regex(lower, [lower[0]] + [255] * len(lower[1:])),
ip_array_regex([lower[0] + 1] + [0] * len(lower[1:]),
[upper[0] - 1] + [255] * len(upper[1:])),
ip_array_regex([upper[0]] + [0] * len(upper[1:]), upper)
)
If you just need to build them one at at time, this website will do the trick.
If you need code, and don't mind python, this code does it for any arbitrary numeric range.
If it's for Apache... I haven't tried it, but it might work:
RewriteCond %{REMOTE_ADDR} !<147.63.23.156
RewriteCond %{REMOTE_ADDR} !>147.63.23.159
(Two consecutive RewriteConds are joined by a default logical AND)
Just have to be careful with ranges with differing number of digits (e.g. 95-105 should be broken into 95-99 and 100-105, since it is lexicographic ordering).
I absolutely agree with the commenters, a pure-regex solution would be the wrong tool for the job here. Just use the regular expression you already have to extract the prefix, minimum, and maximum values,
$prefix, $minimum, $maximum = match('(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})', $line).groups()
then test your IP address against ${prefix}(\d+),
$lastgroup = match($prefix + '(\d+)', $addr).groups()[0]
and compare that last group to see if it falls within the proper range,
return int($minimum) <= int($lastgroup) <= int($maximum)
Code examples are pseudocode, of course - convert to your language of choice.
To my knowledge, this can't be done with straight up regex, but would also need some code behind it. For instance, in PHP you could use the following:
function make_range($ip){
$regex = '#(\d{1,3}\.\d{1,3}\.\d{1,3}\.)(\d{1,3})/(\d{1,3})#';
if ( preg_match($regex, $ip, $matches) ){
while($matches[1] <= $matches[2]){
print "{$matches[0]}.{$matches[1]}";
$matches[1]++;
}
} else {
exit('not a supported IP range');
}
}
For this to work with a RewriteCond, I think some black magic would be in order...
How is this going to be used with RewriteCond, anyways? Do you have several servers and want to just quickly make a .htaccess file easily? If so, then just add that function to a bigger script that takes some arguments and burps out a .htaccess file.