Regex pattern matching in python

Regex pattern matching in python - regex

I am trying to split the data
rest = [" hgod eruehf 10 SECTION 1. DATA: find my book 2.11.111 COLUMN: get me tea","111.2 CONTAIN i am good"]
match = re.compile(r'(((\d[.])(\d[.]))+\s(\w[A-Z]+:|\w+))')
out = match.search(rest)
print(out.group(0))
I found the pattern as "multiple decimal digit(eg:1. / 1.1. / 1.21.1 etc.,) followed by character till another multiple decimal digit(eg:1. / 1.1. / 1.21.1 etc.,) "
I want to split the data as
DATA: find my book
2.11.111 COLUMN: get me tea
111.2 CONTAIN i am good
Is there any way to split the text data based on the pattern.

You may get the expected matches using
import re
rest = [" hgod eruehf 10 SECTION 1. DATA: find my book 2.11.111 COLUMN: get me tea","111.2 CONTAIN i am good"]
res = []
for s in rest:
res.extend(re.findall(r'\d+(?=\.)(?:\.\d+)*.*?(?=\s*\d+(?=\.)(?:\.\d+)*|\Z)', s))
print(res)
# => ['1. DATA: find my book', '2.11.111 COLUMN: get me tea', '111.2 CONTAIN i am good']
See the Python demo
The regex is applied to each item in the rest list and all matches are saved into res list.
Pattern details
\d+ - 1+ digits
(?=\.) - there must be a . immediately to the right of the current position
(?:\.\d+)* - 0 or more repetitions of a . and then 1+ digits
.*? - 0+ chars other than newline, as few as possible
(?=\s*\d+(?=\.)(?:\.\d+)*|\Z) - up to the 0+ whitespaces, 1+ digits with a . immediately to the right of the current position, 0 or more repetitions of a . and then 1+ digits, or end of string

Related

RegEx to replace entire string with first two values

I'm trying to come up with a regex expression to replace an entire string with just the first two values. Examples:
Entire String: AO SMITH 100108283 4500W/240V SCREW-IN ELEMENT, 11"
First Two Values: AO SMITH
Entire String: BRA14X18HEBU / P11-042 / 310-470NL BRASS 1/4 x 1/8 HEX
BUSHING
First Two Values: BRA14X18HEBU / P11-042
Entire String: TWO-HOLE PIPE STRAP 4" 008004EG 72E 4
First Two Values: TWO-HOLE PIPE
The caveat is I'm wanting to preserve any kind of special characters and not count them, like "/"'s and "-"'s. The current code I've written does not, instead leaves the new values entirely blank. Only the first example above works.
Here's what I've got so far:
Matching Value:
^(\w+) +(\w+).+$
New Value:
$1 $2

One option could be using a single capture group and use that in the replacement.
^(\w+(?:-\w+)?(?: +\/)? +\w+(?:-\w+)?).+
The pattern matches:
^ Start of string
( Capture group 1
\w+(?:-\w+)?Match 1+ word charss with an optional part to match a - and 1+ word chars
(?: +\/)? Optionally match /
+\w+(?:-\w+)? Match 1+ word charss with an optional part to match a - and 1+ word chars
) Close group 1
.+ Match 1+ times any char (the rest of the line)
If there can be more than 1 hyphen, you can use * instead of ?
Regex demo
Output
AO SMITH
BRA14X18HEBU / P11-042
TWO-HOLE PIPE
A broader match could be matching non word chars in between the words
^(\w+(?:-\w+)*[\W\r\n]+\w+(?:-\w+)*).+
Regex demo

regex for matching latitude, longitudes without any character

I am looking for one regex which strictly allows 2 floating point numbers which are comma separated.
Test cases:
0,0
0.021312311323,0
0,0.012312312312
1.1,0.9836373
Regex that I have tried is
^[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)$\D+|\d*\.?\d+
These are latitudes and longitudes but I just want 2 values in these paremeters.
This regex fails in:
-10a, 10a
10a,10b
I would really appreciate any help and guidance.

Your regex ends with a couple of redundant patterns, you should remove \D+|\d*\.?\d+ after $. As $ means the end of string, there can be no more text after it, and the \D+|\d*\.?\d+ requires one or more non-digit chars, or just matches any float or integer number with \d*\.?\d+ - this matched your unwelcome strings.
You can use
^([-+]?(?:[1-8]?\d(?:\.\d+)?|90(?:\.0+)?)),\s*([-+]?(?:180(?:\.0+)?|(?:1[0-7]\d|[1-9]?\d)(?:\.\d+)?))$
See the regex demo. Note I converted some capturing groups into non-capturing, so that there remain just two "notional" capturing groups in the pattern.
Details
^ - start of string
([-+]?(?:[1-8]?\d(?:\.\d+)?|90(?:\.0+)?)) - Group 1:
[-+]? - an optional - or +
(?:[1-8]?\d(?:\.\d+)?|90(?:\.0+)?) - either a number from 0 to 89 ([1-8]?\d) and then an optional fractional part ((?:\.\d+)?) or 90 and then an optional . followed with one or more 0 chars
,\s* - a comma and 0+ whitespace chars
([-+]?(?:180(?:\.0+)?|(?:1[0-7]\d|[1-9]?\d)(?:\.\d+)?)) - Group 2:
[-+]? - an optional - or +
(?:180(?:\.0+)?|(?:1[0-7]\d|[1-9]?\d)(?:\.\d+)?) - either a 180 number followed with an optional . + one or more 0 chars, or a number from 0 to 179 and then an optional fractional part
$ - end of string.

Your regular expression is almost correct. You should have stopped at $ indicating the end of the string.
const testCases = [ "0,0",
"0.021312311323,0",
"0,0.012312312312",
"1.1,0.9836373",
"-10a, 10a",
"10a,10b"];
const re = /^[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)$/g;
testCases.forEach(tc => {
if(tc.match(re)) {
console.log(" VALID : " + tc );
} else {
console.log("NOT VALID : " + tc);
}
});

Regex for parse name with one or more words after double number and before 2 or more spaces

Problem:
How create regex to parse "DISNAY LAND 2.0 GCP" like name from Array of lines in Scala like this:
DE1ALAT0002 32.4756 -86.4393 106.1 ZQ DISNAY LAND 2.0 GCP 23456
//For using in code:
val regex = """(?:[\d\.\d]){2}\s*(?:[\d.\d])\s*(ZQ)\s*([A-Z])""".r . // my attempt
val getName = row match {
case regex(name) => name
case _ =>
}
I'm sure only in:
1) there is different number of spaces between values
2) useful value "DISNAY LAND 2.0 GCP" come after double number and "ZQ" letters
3) name separating with one space and may consist of one or many words
4) name ending with two or more spaces
sorry if I repeat the question, but after a long search I did not find the right solution
Many thank for answers

You may use an .unanchored pattern like
\d\.\d+\s+ZQ\s+(\S+(?:\s\S+)*)
See the regex demo. Details
\d\.\d+ - 1 digit, . and then 1+ digits
\s+ - 1+ whitespaces
ZQ - ZQ substring
\s+ - 1+ whitespaces (here, the left-hand side context definition ends, now, starting to capture the value we need to return)
(\S+(?:\s\S+)*) - Capturing group 1:
\S+ - 1 or more non-whitespace chars
(?:\s\S+)* - a non-capturing group that matches 0 or more sequences of a single whitespace (\s) and then 1+ non-whitespace chars (so, up to the double whitespace or end of string).
Scala demo:
val regex = """\d\.\d+\s+ZQ\s+(\S+(?:\s\S+)*)""".r.unanchored
val row = "DE1ALAT0002 32.4756 -86.4393 106.1 ZQ DISNAY LAND 2.0 GCP 23456"
val getName = row match {
case regex(name) => name
case _ =>
}
print(getName)
Output: DISNAY LAND 2.0 GCP

Parsing digits and decimals out of string with re

I have a string that looks like this:
'Home Cookie viewed item "yada_yada.mov" (22.4338.241384081)'
I need to parse the last set of numbers, the ones between the last period and the closing paren (in this case, 241384081) out of the string, keeping in mind that there may be one or more sets of parenthesis in the filename "yada_yada.mov."
So far I have this:
mo = re.match('.*([0-9])\)$', data1)
...where data1 is the string. But that is only returning the very last digit.
Any help, please?
Thanks!

You may use
(\d[\d.]*)\)$
See the regex demo.
Details
(\d[\d.]*) - Capturing group 1: a digit and then any amount of . and digits, 0 or more times
\) - a )
$ - end of string.
See the Python demo:
import re
s='Home Cookie viewed item "yada_yada.mov" (22.4338.241384081)'
m = re.search(r'(\d[\d.]*)\)$', s)
if m:
print(m.group(1)) # => 22.4338.241384081
# print(m.group(1).replace(".", "")) # => 224338241384081
Alternative patterns:
(\d+(?:\.\d+)*)\)$ # To match digits and then 0 or more repetitions of . + digits
(\d+(?:\.\d+)*)\)\s*$ # To allow any 0+ trailing whitespaces

Find out if a sting in Notepad++ contains a certain number using regex

I have several lines in Notepad++ that contains:
Modules = "3,40,40,40" Modules = "3,40,40,40,40,40,40" Modules = "3,15,15,15,15,15,15,15,15".
I want to use regex to only highlight or count the ones that contains the number 40. How? This is what i got so far:
(Modules\s=\s.+)

If there are no float/double values in between quotes, use
Modules\s=\s"[^"]*\b40\b[^"]*"
See the regex demo
Details
Modules - a substring
\s=\s - a = enclosed with 1 whitespace
" - a double quote
[^"]* - zero or more chars other than "
\b40\b - 40 as a whole word
[^"]* - zero or more chars other than "
" - a double quote
If you need to match 40 only in between commas and start/end of string, use
Modules\s=\s"(?:[^"]*,)?40(?:,[^"]*)?"
See another demo. The difference here is the (?:[^"]*,)?40(?:,[^"]*)?: (?:[^"]*,)? matches an optional sequence of any 0+ chars other than " followed with , and (?:,[^"]*)? matches an optional sequence of , followed with any 0+ chars other than ".

Your question is actually a bit tricky, because you want to match the number 40 in any position in the CSV list of numbers. That is, all of the following three lines should match:
Modules = "40,50,4"
Modules = "30,40"
Modules = "10,40,50"
To handle this, and assuming the CSV lists have no whitespace or decimals, we use \b40\b as the pattern. Consider this pattern:
Modules = ".*?\b40\b.*?"
Demo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex pattern matching in python - regex

Related

RegEx to replace entire string with first two values

regex for matching latitude, longitudes without any character

Regex for parse name with one or more words after double number and before 2 or more spaces

Parsing digits and decimals out of string with re

Find out if a sting in Notepad++ contains a certain number using regex

Categories

Resources