Regex conditional lookout - regex

My input text file is like
A={5,6},B={2},C={3}
B={2,4}
A={5},B={1},C={3}
A={5},B={2},C={3,4,QWERT},D={TXT}
I would like to match all the lines where A=5,B=2 and C=3. The catch is, if variable is not mentioned, then that variable can take any value and hence that line also needs to be matched.
Above should match line 1,2 & 4.
I tried
.*?(?:(?=A)A\{.*?5).*?(?:(?=B)B\{.*?2).*?(?:(?=C)C\{.*?3)
https://regex101.com/r/NN9qk5/1
But, it is not working
I shall be using this regex in a python 3.6 code.

If you want to solve it with a regex, you may use
^
(?!.*\bA={(?![^{}]*\b5\b))
(?!.*\bB={(?![^{}]*\b2\b))
(?!.*\bC={(?![^{}]*\b3\b))
.*
See the regex demo
The point is to fail a match if there is a key that contains no given number value inside braces.
E.g. (?!.*\bA={(?![^{}]*\b5\b)) is a negative lookahead that fails the match if, immediately to the right of the current location, there is no
- .* - any 0+ chars other than line break chars
- \bA - a whole word A
- ={ - ={ substring
- (?![^{}]*\b5\b) - that is not followed with any 0+ chars other than { and } and then followed with 5 as a whole word.
Sample usage in Python 3.6:
import re
s = """A={5,6},B={2},C={3}
B={2,4}
A={5},B={1},C={3}
A={5},B={2},C={3,4,QWERT},D={TXT}"""
given = { 'A': '5', 'B': '2', 'C': '3'}
reg_pattern = ''
for key,val in given.items():
reg_pattern += r"(?!.*\b{}={{(?![^{{}}]*\b{}\b))".format(key,val)
reg = re.compile(reg_pattern)
for line in s.splitlines():
if reg.match(line):
print(line)
Output:
A={5,6},B={2},C={3}
B={2,4}
A={5},B={2},C={3,4,QWERT},D={TXT}
Note the use of re.match, this method only searches for a match at the start of the string, so, no need adding ^ anchor (that matches string start).

Related

Nginx Lua regex match first word

I try to convert regex into Lua language, from
([a-zA-Z0-9._-/]+)
to
^%w+?([_-]%w+)
I want to make match first word with '-' and '_':
mar_paci (toto totot)
toi-re/3.9
pouri marc (sensor)
Phoenix; SAGEM
The result:
marc_paci
toi-re
pouri marc
Phoenix
The code used:
value = string.match(ngx.var.args, "^%w+?([_-]%w+)")
In the ^%w+?([_-]%w+) regex, I added the ? character for an optional string.
You can use
^[%w%s_-]*%w
It matches
^ - start of string
[%w%s_-]* - zero or more alphanumerics, whitespaces, _ or hyphens
%w - an alphanumeric char.
See the Lua demo:
local function extract(text)
return string.match(text, "^[%w%s_-]*%w")
end
print(extract("mar_paci (toto totot)"))
-- => mar_paci
print(extract("toi-re/3.9"))
-- => toi-re

Regex trouble capturing everything before last letter or number

I want to capture everything before the last letter or number. I do not want to match any white space, "-", or "#013" after the last letter or number.
This is the regex I currently have but it seems to be matching everything
(?<system_name>.*\w(?:[a-zA-Z]|[0-9]))
Current data:
469869-system
476657-SYSTEM
476657-system
681125-system#013
981765-system#013
687755-system#013
438105-system#013
281055-system#013
485548-SYSTEM
785455-system
489418-system
589568-system
489661-SYSTEM
486328-system - - #015
286728-system - - #015
SYSTEM-433455
system
What I want to match:
469869-system
476657-SYSTEM
476657-system
681125-system
981765-system
687755-system
438105-system
281055-system
485548-SYSTEM
785455-system
489418-system
589568-system
489661-SYSTEM
486328-system
286728-system
SYSTEM-433455
system
You can use:
^[\w-]+
where:
^ # beginning of line
[\w-]+ # character class, 1 or more word character or hyphen
Demo & explanation

Regex for multidimensional input string name to get the the last number between quare brackets

Is there someone to help me with my regex?
I want to match always last integer suquare bracket for every string.
product[attribute][1][0][value] - In this case [0]
product[attribute][9871][56][value] - In this case [56]
Click here for My work:
/\[[0-9,-]+\]/g
The goal is to increment input name on clone, product[attribute][{attribute_id}][{clone_index}][value].
You may use
var s = "product[attribute][1][0][value]";
console.log(s.replace(
/(.*\[)(\d+)(?=])/, function($0, $1, $2) {
return $1 + (Number($2)+1);
})
)
The regex matches
(.*\[) - Group 1: any 0+ chars other than line break chars as many as possible and then [
(\d+) - Group 2: one or more digits
(?=]) - a ] char must appear immediately to the right of the current location.
Incrementing is done inside the callback method.

Python - how to add a new line every time there is a pattern is found in a string?

How can I add a new line every time there is a pattern of a regex-list found in a string ?
I am using python 3.6.
I got the following input:
12.13.14 Here is supposed to start a new line.
12.13.15 Here is supposed to start a new line.
Here is some text. It is written in one lines. 12.13. Here is some more text. 2.12.14. Here is even more text.
I wish to have the following output:
12.13.14
Here is supposed to start a new line.
12.13.15
Here is supposed to start a new line.
Here is some text. It is written in one lines.
12.13.
Here is some more text.
2.12.14.
Here is even more text.
My first try returns as the output the same as the input:
in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'
start_rx = re.compile('|'.join(
['\d\d\.\d\d\.', '\d\.\d\d\.\d\d','\d\d\.\d\d\.\d\d']))
with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
text_list = fin2.read().split()
fin2.seek(0)
for string in fin2:
if re.match(start_rx, string):
string = str.replace(start_rx, '\n\n' + start_rx + '\n')
fout2.write(string)
My second try returns an error 'TypeError: unsupported operand type(s) for +: '_sre.SRE_Pattern' and 'str''
in_file2 = 'work1-T1.txt'
out_file2 = 'work2-T1.txt'
start_rx = re.compile('|'.join(
['\d\d\.\d\d\.', '\d\.\d\d\.\d\d','\d\d\.\d\d\.\d\d']))
with open(in_file2,"r") as fin2, open(out_file2, 'w') as fout3:
for line in fin2:
start = False
if re.match(start_rx, line):
start = True
if start == False:
print ('do something')
if start == True:
line = '\n' + line ## leerzeichen vor Pos Nr
line = line.replace(start_rx, start_rx + '\n')
fout3.write(line)
First of all, to search and replace with a regex, you need to use re.sub, not str.replace.
Second, if you use a re.sub, you can't use the regex pattern inside a replacement pattern, you need to group the parts of the regex you want to keep and use backreferences in the replacement (or, if you just want to refer to the whole match, use \g<0> backreference, no capturing groups are required).
Third, when you build an unanchored alternation pattern, make sure longer alternatives come first, i.e. start_rx = re.compile('|'.join(['\d\d\.\d\d\.\d\d', '\d\.\d\d\.\d\d', '\d\d\.\d\d\.'])). However, you may use a more precise pattern here manually.
Here is how your code can be fixed:
with open(in_file2,'r', encoding='utf-8') as fin2, open(out_file2, 'w', encoding='utf-8') as fout2:
text = fin2.read()
fout2.write(re.sub(r'\s*(\d+(?:\.\d+)+\.?)\s*', r'\n\n\1\n', text))
See the Python demo
The pattern is
\s*(\d+(?:\.\d+)+\.?)\s*
See the regex demo
Details
\s* - 0+ whitespaces
(\d+(?:\.\d+)+\.?) - Group 1 (\1 in the replacement pattern):
\d+ - 1+ digits
(?:\.\d+)+ - 1 or more repetitions of . and 1+ digits
\.? - an optional .
\s* - 0+ whitespaces
Try this
out_file2=re.sub(r'(\d+) ', r'\1\n', in_file2)
out_file2=re.sub(r'(\w+)\.', r'\1\.\n', in_file2)

RegEx not recognized although it should be

I'm trying to split texts like these:
§1Hello§fman, §0this §8is §2a §blittle §dtest :)
by delimiter "§[a-z|A-Z
My first approach was the following:
^[§]{1}[a-fA-F]|[0-9]$
But pythex.org won't find any occurrences in my example text by using this regex.
Do you know why?
The ^[§]{1}[a-fA-F]|[0-9]$ pattern matches a string starting with § and then having a letter from a-f and A-F ranges, or a digit at the end of the string.
Note the ^ matches the start of the string, and $ matches the end of the string positions.
To extract those words after § and a hex char after it you may use
re.findall(r'§[A-Fa-z0-9]([^\W\d_]+)', s)
# => ['Hello', 'man', 'this', 'is', 'a', 'little', 'test']
To remove them, you may use re.sub:
re.sub(r'\s*§[A-Fa-z0-9]', ' ', s).strip()
# => Hello man, this is a little test :)
To just get a string of those delimiters you may use
"".join(re.findall(r'§[A-Za-z0-9]', s))
# => §1§f§0§8§2§b§d
See this Python demo.
Details
§ - a § symbol
[A-Fa-z0-9] - 1 digit or ASCII letter from a-f and A-F ranges (hex char)
([^\W\d_]+) - Group 1 (this value will be extracted by re.findall): one or more letters (to include digits, remove \d)
Your regex uses anchors to assert the start and the end of the string ^$.
You could update your regex to §[a-fA-F0-9]
Example using split:
import re
s = "§1Hello§fman, §0this §8is §2a §blittle §dtest :)"
result = [r.strip() for r in re.split('[§]+[a-fA-F0-9]', s) if r.strip()]
print(result)
Demo