Added some regex into existing regular pattern - regex

I am not good regex and need to update following pattern without impacting other pattern. Any suggestion $ sign contain 1t0 4. $ sign always be begining of the line.( space may or may not be)
import re
data = " $$$AKL_M0_90_2K: Two line end vias (VIAG, VIAT and/or"
patt = '^ (?:ABC *)?([A-Za-z0-9/\._\:]+)\s*: ? '
match = re.findall( patt, data, re.M )
print match
Note : data is multi line string
match should contain : "$$$AKL_M0_90_2K" this result

I suggest the following solution (see IDEONE demo):
import re
data = r" $$$AKL_M0_90_2K: Two line end vias (VIAG, VIAT and/or"
patt = r'^\s*([$]{1,4}[^:]+)'
match = re.findall( patt, data, re.M )
print(match)
The re.findall will return the list with just one match. The ^\s*([$]{1,4}[^:]+) regex matches:
^ - start of a line (you use re.M)
\s* - zero or more whitespaces
([$]{1,4}[^:]+) - Group 1 capturing 1 to 4 $ symbols, and then one or more characters other than :.
See the regex demo
If you need to keep your own regex, just do one of the following:
Add $ to the character class (demo): ^ (?:ABC *)?([$A-Za-z0-9/._:]+)\s*: ?
Add an alternative to the first non-capturing group and place it at the start of the capturing one (demo): ^ ((?:ABC *|[$]{1,4})?[A-Za-z0-9/._:]+)\s*: ?

Related

Replace N spaces at the beginning of a line with N characters

I am looking for a regex substitution to transform N white spaces at the beginning of a line to N . So this text:
list:
- first
should become:
list:
- first
I have tried:
str = "list:\n - first"
str.gsub(/(?<=^) */, " ")
which returns:
list:
- first
which is missing one . How to improve the substitution to get the desired output?
You could make use of the \G anchor and \K to reset the starting point of the reported match.
To match all leading single spaces:
(?:\R\K|\G)
(?: Non capture group
\R\K Match a newline and clear the match buffer
| Or
\G Assert the position at the end of the previous match
) Close non capture group and match a space
See a regex demo and a Ruby demo.
To match only the single leading spaces in the example string:
(?:^.*:\R|\G)\K
In parts, the pattern matches:
(?: Non capture group
^.*:\R Match a line that ends with : and match a newline
| Or
\G Assert the position at the end of the previous match, or at the start of the string
) Close non capture group
\K Forget what is matched so far and match a space
See a regex demo and a Ruby demo.
Example
re = /(?:^.*:\R|\G)\K /
str = 'list:
- first'
result = str.gsub(re, ' ')
puts result
Output
list:
- first
I would write
"list:\n - first".gsub(/^ +/) { |s| ' ' * s.size }
#=> "list:\n - first"
See String#*
Use gsub with a callback function:
str = "list:\n - first"
output = str.gsub(/(?<=^|\n)[ ]+/) {|m| m.gsub(" ", " ") }
This prints:
list:
- first
The pattern (?<=^|\n)[ ]+ captures one or more spaces at the start of a line. This match then gets passed to the callback, which replaces each space, one at a time, with .
You can use a short /(?:\G|^) / regex with a plain text replacement pattern:
result = text.gsub(/(?:\G|^) /, ' ')
See the regex demo. Details:
(?:\G|^) - start of a line or string or the end of the previous match
- a space.
See a Ruby demo:
str = "list:\n - first"
result = str.gsub(/(?:\G|^) /, ' ')
puts result
# =>
# list:
# - first
If you need to match any whitespace, replace with a \s pattern. Or use \h if you need to only match horizontal whitespace.

Regex to remove trailing optional garbage

I want to clean strings that may contain garbage at the end, always separated by a forward slash / and if there is no garbage, there is no separator.
Example > expected output
Foo/Bar > Foo
Foobar > Foobar
I tried several versions like this one to extract the payload only, none of the worked:
(.*)\/.*
(.*)?\/.*
(.*)?\/*.*
And so on. Problem is: i always only get the first or second line to match.
What would be the correct expression to extract the wanted information?
Your first and second pattern capture till before the first / so that will not give a match for the third line as there is no / present.
The third pattern matches the whole line as the /* matches an optional forward slash, so the capture group will match the whole line, and the .* will not match any characters any more as the capture group is already at the end of the line.
You could write the pattern with a capture group for 1 or more word characters as the first part, and an optional second part starting the match from / till the end of the string.
In the replacement you can use the first capture group.
^(\w+)(?:\/.*)?$
^ Start of string
(\w+) Capture 1+ word characters in group 1
(?:\/.*)? Optionally match / and the rest of the line (to be removed after the replacement)
$ End of string
See a regex demo.
There is no language listed, but an example using JavaScript:
const regex = /^(\w+)(?:\/.*)?$/m;
const str = `Foo/Bar
Foobar`;
const result = str.replace(regex, "$1");
console.log(result);
Example using Python
import re
regex = r"^(\w+)(?:\/.*)?$"
test_str = ("Foo/Bar\n"
"Foobar")
result = re.sub(regex, r'\1', test_str, 0, re.MULTILINE)
if result:
print (result)
Output
Foo
Foobar
Python demo
You can use replace here as:
const cleanString = (str) => str.replace(/\/.*/, "");
console.log(cleanString("Foo/Bar"));
console.log(cleanString("Foobar"));
This task doesn't need the power of regex, you need to split on the first slash, e.g. in Python:
test_string.split('/', 1)[0]
I think the reason your regex doesn't work is that Foobar has no / to match on. So for regex you need to handle none, one, or many slashes. Again, in Python:
>>> test = ['foobar', 'foo/bar', 'foo/bar/baz']
>>> for s in t:
print(re.findall('^(.*?)(?=/|$)', s))
['foobar']
['foo']
['foo']
The regex says: from the start of the string, group all characters (non-greedy) until either a slash or the end of the string.
You can try doing a regex.split on / and select the first element from the list. For example in python:
import regex as re
new_string = re.split('/',string)[0]

Regex match the unknown characters with dash between

I'm struggling with the following combination of characters that I'm trying to parse:
I have two types of text:
1. AF-B-W23F4-USLAMC-X99-JLK
2. LS-V-A23DF-SDLL--X22-LSM
I want to get the last two combination of characters devided by - within dash.
From the 1. X99-JLK and from the 2. X22-LSM
I accomplished the 2. with the following regex '--(.*-.*)'
How can I parse the 1. sample and is there any option to parse it at one time with something like OR operator?
Thanks for any help!
The pattern --(.*-.*) that you tried matches the second example because it contains -- and it matches the first occurrence.
Then it matches until the end of the string and backtracks to find another hyphen.
As .* can match any character (also -) and there are no anchors or boundaries set, this is a very broad match.
If there have to be 2 dashes, you can match the first one, and use a capture group for the part with the second one using a negated character class [^-]
The character class can also match a newline. If you don't want to match a newline you can use [^-\r\n] or also not matching spaces [^-\s] (as there are none in the example data)
-([^-]+-[^-]+)$
Explanation
- Match -
( Capture group 1
[^-]+-[^-]+ Match the second dash between chars other than -
) Close group 1
$ End of string
See a regex demo
For example using Javascript:
const regex = /-([^-]+-[^-]+)$/;
[
"AF-B-W23F4-USLAMC-X99-JLK",
"LS-V-A23DF-SDLL--X22-LSM"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
})
You can try lookahead to match the last pair before the new line. JavaScript example:
const str = `
AF-B-W23F4-USLAMC-X99-JLK
LS-V-A23DF-SDLL--X22-LSM
`;
const re = /[^-]*-[^-]*(?=\n)/g;
console.log(str.match(re));

Python regex match across multiple lines

I am trying to match a regex pattern across multiple lines. The pattern begins and ends with a substring, both of which must be at the beginning of a line. I can match across lines, but I can't seem to specify that the end pattern must also be at the beginning of a line.
Example string:
Example=N ; Comment Line One error=
; Comment Line Two.
Desired=
I am trying to match from Example= up to Desired=. This will work if error= is not in the string. However, when it is present I match Example=N ; Comment Line One error=
config_value = 'Example'
pattern = '^{}=(.*?)([A-Za-z]=)'.format(config_value)
match = re.search(pattern, string, re.M | re.DOTALL)
I also tried:
config_value = 'Example'
pattern = '^{}=(.*?)(^[A-Za-z]=)'.format(config_value)
match = re.search(pattern, string, re.M | re.DOTALL)
You may use
config_value = 'Example'
pattern=r'(?sm)^{}=(.*?)(?=[\r\n]+\w+=|\Z)'.format(config_value)
match = re.search(pattern, s)
if match:
print(match.group(1))
See the Python demo.
Pattern details
(?sm) - re.DOTALL and re.M are on
^ - start of a line
Example= - a substring
(.*?) - Group 1: any 0+ chars, as few as possible
(?=[\r\n]+\w+=|\Z) - a positive lookahead that requires the presence of 1+ CR or LF symbols followed with 1 or more word chars followed with a = sign, or end of the string (\Z).
See the regex demo.

Select everything before & or everything if there is no &

I want to use regex to split some text.
my text:
Hello&World
Hello
0011&World
0011
using (.*)(\&.*) only matches 'Hello&World' and '0011&World' and (.*)(\&.*)? ignores the last part.
For the first 2 I want to get 'Hello' and the last 2 I want to get '0011'
Thank you
It seems you need to fetch 0+ chars other than & at the beginning of a string.
Use the following regex:
^[^&]*
See the regex demo.
Details:
^ - start of string
[^&]* - a negated character class matching zero or more (*) chars other than & (to match 1 or more replace * with +).
See the Python demo:
import re
ss = ['Hello&World','Hello','0011&World','0011']
for s in ss:
print(re.match('[^&]*', s).group())
# print(re.search('^[^&]*', s).group())
Note that re.match looks for a match only at the start of the string, thus making ^ redundant in the pattern.
Else, if you use re.search, the ^ anchor is necessary to anchor the search at the start of the string.