Regex for number of digits 0 and at least one digit 1 - regex

I have a block of text in which I try to find lines that contain any (*) number of digits 0 and at least one (+) digit 1. Explaination:
1234 xxx 00000000000111000000 00000010000100000000 Some text <-- matches
2345 yyy 00000000000000000000 00000000000000000000 Some text <-- does not match
2345 yyy 00000001000000000000 00000000000000000000 Some text <-- matches
3456 zzz 11111111111111111111 11111111111111111111 Some text <-- matches
How to accomplish this? Thanks!
Tried with negative lookahead but failed:
\s+\d+ +[a-zA-Z]+ +(?![0]{20}) +(?![0]{20}) +([0-9a-zA-Z ]+)

You are not matching any digits 0 or 1 after the assertions.
If both columns with the digits 0 or 1 can not be only zeroes, you can use both columns in the assertion:
+\d+ +[a-zA-Z]+ +(?!0{20} +0{20}\b)[01]{20} +[01]{20} +([0-9a-zA-Z ]+)
See a regex101 demo.

Here is my shorter version of the regex. But it only test line by line. So you will have to iterate through each line in your file like the code below:
import re
text = '''1234 xxx 00000000000111000000 00000010000100000000 Some text
2345 yyy 00000000000000000000 00000000000000000000 Some text
2345 yyy 00000001000000000000 00000000000000000000 Some text
3456 zzz 11111111111111111111 11111111111111111111 Some text'''
regex = r'^\d+\s+\w+\s+0*1+0*\s+\d+\s+\w+'
matches = re.findall(regex, text, re.MULTILINE)
for match in matches:
print(match)
For explanation and details, please check regex101 demo

Related

Specify end of regex group

I am trying to create a regular expression which matches multiple groups, so the values between the groups can be extracted. Each group looks identical.
Lets consider the following example, note that the linebreaks are intended:
dog 1
wuff
wuff
cat
123
XYZ
dog 1
wuff
wuff
cat
456
ABC
dog 1
wuff
wuff
cat
789
Thus, with the right regular expression I want to get the output:
123
XYZ
456
ABC
789
On regex101.com I tried:
(?s)(?:dog.*cat)
which matches all values between the first occurence of dog an the last occurence of cat.
In addition I tried:
(?s)(?:dog.*(cat){1})
which, with my limited knowledge, should match the first occurence of cat and then end the group, but it does not.
I appreciate any help.
You may use this regex in MULTILINE mode to capture value after dog.*cat matches:
^dog\b(?:.*\n)+?cat\n(.*(?:\n.*)*?)(?=\ndog|\Z)
Your values are present in capture group #1
RegEx Demo
RegEx Details:
^: Match start line
dog\b: Match word dog with a word boundary
(?:.*\n)+?: Match anything followed by a line break. Repeat this 1+ times (lazy)
cat\n: Match cat followed by a newline
(.*(?:\n.*)*?): These are the multiline values you're interested in the first capture group.
(?=\ndog|\Z): Lookahead to assert that we have a dog after line break or end of input ahead of the current position

Match multiple line text (from 1 to n lines) until certain new line regex

I created regex for matching such pattern:
<some text>
yyyy.MM.dd SOME TEXT decimal decimal
yyy.MM.dd
some sentence
some sentence
some sentence (it can have from 1 to n lines of comments) but the last line that does not start with yyyy.MM.dd SOME TEXT decimal decimal)
yyyy.MM.dd SOME TEXT decimal decimal
yyy.MM.dd
some sentence
some sentence
some sentence
...
<some text>
The regex:
((\d{4}\.\d{2}\.\d{2})\s([a-zA-Z\s]{0,})\s(\-{0,1}((\d{1}\,\d{2})|(\d{1,}\ \d{3}\,\d{2})))\s(\-{0,1}((\d{1}\,\d{2})|(\d{1,}\ \d{3}\,\d{2}))\s)(\d{4}\.\d{2}\.\d{2}))
Which matches only first 2 lines. I can't match multiline sentences until next yyyy.MM.dd SOME TEXT decimal decimal (exclusively)
This is the test data for matching:
2020.11.01 SOME TEXT -17,30 83 016,86
2020.10.30
Some text that should be
matched 20.01.2020 as
multiline text
until now
2020.11.01 SOME TEXT -27,30 81 016,86
2020.10.30
Some text that should be
matched 20.01.2020 as
multiline text
until now
...
it should match like this:
1.
2020.11.01 SOME TEXT -17,30 83 016,86
2020.10.30
Some text that should be
matched 20.01.2020 as
multiline text
until now
2020.11.01 SOME TEXT -27,30 81 016,86
2020.10.30
Some text that should be
matched 20.01.2020 as
multiline text
until now
For me it matches like this:
1.
2020.11.01 SOME TEXT -17,30 83 016,86
2020.10.30
2020.11.01 SOME TEXT -27,30 81 016,86
2020.10.30
How can I match from 1 to many multiline lines WITHOUT 'yyyy.MM.dd SOME TEXT decimal decimal' on the next line?
For the example data, you can match the first 2 lines with a date like pattern, followed by all the lines that do not start with a datelike pattern.
Note that \d{4}\.\d{2}\.\d{2} does not validate a date itself. To get a more precise match, this page has more detailed examples.
^\d{4}\.\d{2}\.\d{2} .*\r?\n\d{4}\.\d{2}\.\d{2}\b.*(?:\r?\n(?!\d{4}\.\d{2}\.\d{2}\b).*)*
Regex demo
Or if you first want to match all lines that start with a datelike pattern incase of 1 or more, followed with lines that do not:
^\d{4}\.\d{2}\.\d{2} \S.*(?:\r?\n\d{4}\.\d{2}\.\d{2}\b.*)+(?:\r?\n(?!\d{4}\.\d{2}\.\d{2}\b).*)*
Explanation
^ Start of the string
\d{4}\.\d{2}\.\d{2} \S.* match a datelike pattern followed by a space, at least a non whitespace char (For SOME TEXT in the example) and the rest of the line
(?:\r?\n\d{4}\.\d{2}\.\d{2}\b.*)+ Repeat 1+ times matches lines that start with a datelike pattern
(?: Non capture group (to repeat as a whole)
\r?\n Match a newline
(?!\d{4}\.\d{2}\.\d{2}\b) Assert not a datelike format directly to the right
.* If the previous assertion it true, match the whole line
)* Optionally repeat all lines that do not start with a datelike pattern (If there should be at least 1 line, change the quantifier to +)
Regex demo

Capturing two different lines using regex

I want to capture two lines in one variable, like this is my input:
Rose 0 82
ABC 0 0
ABC (Backup) 0 0
ABC XYZ 637 2021
ABC XYZ (Backup) 0 0
ABC EXYZ 0 0
I Want to capture the lines which are in bold.
I tried this code:
var = re.search("ABC\s+\d+\s+ .*\n(.*)\nABC XYZ .*",file_name)
but it is giving me output like this:
ABC 0 0
ABC (Backup) 0 0
ABC XYZ 637 2021
and my expected output is this:
ABC 0 0
ABC XYZ 637 2021
Can someone please suggest what modification is needed.
You may use
re.search("^(ABC[ \t]+\d+[ \t].*\n).*\n(ABC[ \t]+XYZ[ \t].*)",s, re.MULTILINE)
The regex will find the match you need and capture 2 lines into separate capturing groups. Then, check if there was a match and, if yes, join the two capturing group values.
See the Python demo
import re
s="""Rose 0 82
ABC 0 0
ABC (Backup) 0 0
ABC XYZ 637 2021
ABC XYZ (Backup) 0 0
ABC EXYZ 0 0"""
v = re.search("^(ABC[ \t]+\d+[ \t].*\n).*\n(ABC[ \t]+XYZ[ \t].*)",s, re.MULTILINE)
if v:
print("{}{}".format(v.group(1), v.group(2)))
Output:
ABC 0 0
ABC XYZ 637 2021
Pattern details
^ - start of a line (due to re.MULTILINE)
(ABC[ \t]+\d+[ \t].*\n) - Capturing group 1: ABC, 1+ spaces or tabs, 1+ digits, a space or tab and then the rest of the line with the newline
.*\n - whole next line
(ABC[ \t]+XYZ[ \t].*) - - Capturing group 2: ABC, 1+ spaces or tabs, XYZ, a space or tab and then the rest of the line.
you can make use of the "^" and the "$" to catch the start and end of a line.
^\*\*.*\*\*
This will give you 2 matches to iterate through. All the matches represent blod lines, qualified by the two * in the beginning end end of a line.
If the syntax includes a comment start as two stars than you can use this (but it will not cut two comments, if they are in one line).
^[\*]{2}(.*)[\*]{2}
If you want to find any comment with the form of **comment** use this
[\*]{2}[^\*]+[\*]{2}

I am looking for a solution to know if a specific word or digit is last in the line, followed by nothing , not even space [duplicate]

This question already has answers here:
regex to get the number from the end of a string
(2 answers)
Closed 3 years ago.
set vv "abc 123 456 "
regexp {abc[\s][\d]+[\s][\d]+} $vv
1
regexp {abc[\s][\d]+[\s][\d]+(?! )} $vv
1
Should return 0, as the line contains extra space at the end or extra characters.
From a list of lines, i am trying to know which lines have space at the end and which do not.
lines can be of any format, for instance, i need to extract line 1 and 3 but not 2 and 4.
"abc 123 456"
"abc 123 456 abc 999"
"xyz 123 999"
"xyz 123 999 zzz 222"
You could use a repeating pattern matching a space and digits to make sure that the line ends with digits only:
^abc(?: \d+)+$
Regex demo
Or a bit broader match using word characters \w if the lines can be of any format:
^\w+(?: \w+)+$
Regex demo
Not sure about TCL regex, but I think you have to add an anchor:
abc\s\d+\s\d+$
It can be summarized as ending of line ($) proceeding by words(\w).
puts [regexp {\w$} $vv]
If all you need is to find out if a line ends with a space or not, use this for a regex:
\s$
The regular express would be {^abc.*\d$} -- a digit followed by the end of the string.
% regex {^abc.*\d$} $vv
0
The glob pattern would be {abc*[0-9]}
% string match {abc*[0-9]} $vv
0
% string match {abc*[0-9] } $vv
1

Perl, Any match string after a pattern

I'm trying to match if any string exists after a certain pattern.
The pattern is "pattern" 'anything in between' "[after]". case insensitive.
e.g
pattern 1 [after] ABC
pattern 2 [after] 123 abc DEX
pattern 3 [after]
pattern 12345123 [after]
pattern #ASd#98 #_90sd [after] xyz dec
[after] 4 pattern
So the result I would like to obtain is,
pattern 1 [after] ABC
pattern 2 [after] 123 abc DEX
pattern #ASd#98 #_90sd [after] xyz dec
It begins with "pattern" and ends with "[after], anything sandwiched between is also accepted.
I'm having difficulty incorporating the delimits of [ ] & if string exists together.
I've tried, the closest I've gotten ends up matching
m/pattern/ ../ \[after]/
pattern 1 [after] ABC
pattern 2 [after] 123 ABC DEX
pattern 3 [after]
pattern 12345123 [after]
pattern #ASd#98 #_90sd [after] xyz dec
But I don't need the 3rd or 4th pattern as it doesn't hold any numerics or characters after "[after]".
Thanks
Here is the code I used to test against your input (which I just cat'ed and piped to the script)
#!/usr/bin/perl
while(<>)
{
print if (/^pattern.*\[after\]\s*\S+/);
}
So to break it down for you:
/^pattern : match any string that begins with "pattern"
.*\[after\] : match any characters followed by "[after]"
\s*\S+ : match 0 or more whitespace characters followed by one or more non-whitespace character
That should give you enough to work with to tweak it up as you see fit.
Code:
$str = 'pattern 2 [after] 123 abc DEX';
if ($str =~ m/^pattern\s+(\d+)\s+\[after\]\s+(.+)/) {
print "$1\t$2\n";
} else {
print "(no match)\n";
}
Output:
2 123 abc DEX
Test this code here.
Is this what you want:
/pattern [0-9] \[after\](?= .)/s
or
/pattern [0-9] \[after\] ./s