How to match the main subgroup in a regular expression?

How to match the main subgroup in a regular expression? - regex

Having this string:
"example( other(1), 123, [25]).othermethod(456)"
How i can capture only the arguments of the main functions:
"other(1), 123, [25]" and "456"
I am trying this:
http://regex101.com/r/cR0uS9/2
In html example. Having this:
<div>
<div>
<div>12</div>
<div>34</div>
</div>
</div>
<div>56</div>
I want to get:
<div>
<div>12</div>
<div>34</div>
</div>
and 56 as second match.

Here's a pattern that doesn't use recursion:
\w+\s*\((?P<parameters>(?:(?:(?:[^()]*\([^()]*\))+|[^()]*)(?:,(?!\s*\))|(?=\))))*)\)
Caveats:
Does not support more than 2 levels of nested braces. e.g.
a(b(c()))
Strings containing ( or ) will trip it up. e.g.
a(")")
You'll find the parameters in the group called "parameters".
Demo.
Explanation:
\w+ # function name
\s* # white space
\(
(?P<parameters> # parameters:
(?:
# two possibilities: 1: a simple parameter, like "12", "'hello'", or "3*1+2"
# 2: the parameter contains braces.
# we'll try to consume pairs of braces. If that fails, we'll simply match a parameter.
(?:
(?: # match a pair of braces ()
[^()]*
\(
[^()]*
\)
)+ # consume as many pairs of braces as possible. Make sure there's at least one, though, because we can't go matching nothing.
|
[^()]* # since there are no more (pairs of) braces, simply consume the function's parameters.
)
# next, either consume a "," or assert there's a ")"
(?:
,
(?! # make sure there is another parameter after the comma
\s*
\)
)
|
(?=
\)
)
)
)*
)
\)
P.S.: I haven't managed to come up with an acceptable pattern for the HTML example yet.

This does some recursion. Use it in a global find function.
# '~(?is)(?:([a-z]\w*)\s*\(((?&core)|)\))(?(DEFINE)(?<core>(?>(?&content)|(?:[a-z]\w*\s*\(|\()(?:(?=.)(?&core)|)\))+)(?<content>(?>(?![a-z]\w*\s*\(|[()]).)+))~'
(?xis-)
(?:
( [a-z] \w* ) # (1), Start-Delimiter, Function
\s* \(
( # (2), CORE
(?&core)
|
)
\) # End-Delimiter, close paren
)
# ///////////////////////
# // Subroutines
# // ---------------
(?(DEFINE)
# core
(?<core>
(?>
(?&content)
|
(?: # Start-Delimiter
[a-z] \w* \s* \( # Function
| \( # Or, a open paren
)
(?:
(?= . )
(?&core) # Recurse core
|
)
\) # End-Delimiter, close paren
)+
)
# content
(?<content>
(?>
(?!
[a-z] \w* \s* \(
| [()]
)
.
)+
)
)
Output:
** Grp 0 - ( pos 0 , len 29 )
example( other(1), 123, [25])
** Grp 1 - ( pos 0 , len 7 )
example
** Grp 2 - ( pos 8 , len 20 )
other(1), 123, [25]
** Grp 3 - NULL
** Grp 4 - NULL
-----------------------
** Grp 0 - ( pos 30 , len 16 )
othermethod(456)
** Grp 1 - ( pos 30 , len 11 )
othermethod
** Grp 2 - ( pos 42 , len 3 )
456
** Grp 3 - NULL
** Grp 4 - NULL
For the html div -
# '~(?s)(?:<div>((?&core)|)</div>)(?(DEFINE)(?<core>(?>(?&content)|<div>(?:(?=.)(?&core)|)</div>)+)(?<content>(?>(?!</?div>).)+))~'
(?xs-)
(?:
<div> # Start-Delimiter <div>
( # (1), CORE
(?&core)
|
)
</div> # End-Delimiter </div>
)
# ///////////////////////
# // Subroutines
# // ---------------
(?(DEFINE)
# core
(?<core>
(?>
(?&content)
|
<div> # Start-Delimiter <div>
(?:
(?= . )
(?&core) # Recurse core
|
)
</div> # End-Delimiter </div>
)+
)
# content
(?<content>
(?>
(?! </?div> )
.
)+
)
)
Output:
** Grp 0 - ( pos 0 , len 82 )
<div>
<div>
<div>12</div>
<div>34</div>
</div>
</div>
** Grp 1 - ( pos 5 , len 71 )
<div>
<div>12</div>
<div>34</div>
</div>
** Grp 2 - NULL
** Grp 3 - NULL
---------------------------
** Grp 0 - ( pos 84 , len 13 )
<div>56</div>
** Grp 1 - ( pos 89 , len 2 )
56
** Grp 2 - NULL
** Grp 3 - NULL

Related

need return value for captured group from last captured string in perl

I have XML files from which i want to capture init value( tag) for each parameter.I am copying some part of xml for reference.
I have port name and parameter name( tag(MNO) available with me.
eg . port name is XYZ & parameter name is MNO
port name is PQR & parameter name is ABC and GHI
There can be multiple tag under one container.
<R-PORT-PROTOTYPE UUID="Oac11eff016c6bb667f357a89xOac11f0ad174240e817fa858f00">
<SHORT-NAME>XYZ</SHORT-NAME>
<REQUIRED-COM-SPECS>
<PARAMETER-REQUIRE-COM-SPEC>
<INIT-VALUE>
<APPLICATION-VALUE-SPECIFICATION>
<SHORT-LABEL>Init_Val</SHORT-LABEL>
<CATEGORY>VALUE</CATEGORY>
<SW-VALUE-CONT>
<SW-VALUES-PHYS>
<V>0.071</V>
</SW-VALUES-PHYS>
</SW-VALUE-CONT>
</APPLICATION-VALUE-SPECIFICATION>
</INIT-VALUE>
<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">/SoftwareTypes/Interfaces/MNO</PARAMETER-REF>
</PARAMETER-REQUIRE-COM-SPEC>
</REQUIRED-COM-SPECS>
</R-PORT-PROTOTYPE>
<R-PORT-PROTOTYPE UUID="Oac11eff016c6bb667f357a89xOac11f0ad174240e817f8f55900">
<SHORT-NAME>PQR</SHORT-NAME>
<REQUIRED-COM-SPECS>
<PARAMETER-REQUIRE-COM-SPEC>
<INIT-VALUE>
<APPLICATION-VALUE-SPECIFICATION>
<SHORT-LABEL>Init_0</SHORT-LABEL>
<CATEGORY>VALUE</CATEGORY>
<SW-VALUE-CONT>
<SW-VALUES-PHYS>
<V>80</V>
</SW-VALUES-PHYS>
</SW-VALUE-CONT>
</APPLICATION-VALUE-SPECIFICATION>
</INIT-VALUE>
<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">/SoftwareTypes/Interfaces/ABC</PARAMETER-REF>
</PARAMETER-REQUIRE-COM-SPEC>
<PARAMETER-REQUIRE-COM-SPEC>
<INIT-VALUE>
<APPLICATION-VALUE-SPECIFICATION>
<SHORT-LABEL>Int_ghi</SHORT-LABEL>
<CATEGORY>VALUE</CATEGORY>
<SW-VALUE-CONT>
<SW-VALUES-PHYS>
<V>-80</V>
</SW-VALUES-PHYS>
</SW-VALUE-CONT>
</APPLICATION-VALUE-SPECIFICATION>
</INIT-VALUE>
<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">/SoftwareTypes/Interfaces/GHI</PARAMETER-REF>
</PARAMETER-REQUIRE-COM-SPEC>
</REQUIRED-COM-SPECS>
</R-PORT-PROTOTYPE>
regex :
if($test_string=~ /<R-PORT-PROTOTYPE.*?<short-name>Port_name<\/short-name>.*?<V>(.*?)<\/.*?<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">.*?Parameter_name<\/PARAMETER-REF>/gis) {
print $2;
}
I need output 80 if parameter is ABC and -80 if parameter is GHI

I suggest using XML::LibXML.
Here I've combined two Xpath queries to find V nodes:
SHORT-NAME is XYZ and PARAMETER-REF (with DEST == PARAMETER-DATA-PROTOTYPE) contains MNO.
SHORT-NAME is PQR and PARAMETER-REF (with DEST == PARAMETER-DATA-PROTOTYPE) contains ABC or GHI.
Example:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(location => 'doc.xml');
my $query = q{
//R-PORT-PROTOTYPE/SHORT-NAME[text()="XYZ"]/..
//PARAMETER-REF[#DEST="PARAMETER-DATA-PROTOTYPE"][
contains(text(),'MNO')
]/..//V
|
//R-PORT-PROTOTYPE/SHORT-NAME[text()="PQR"]/..
//PARAMETER-REF[#DEST="PARAMETER-DATA-PROTOTYPE"][
contains(text(),'ABC') or contains(text(),'GHI')
]/..//V
};
foreach my $vnode ($dom->findnodes($query)) {
print $vnode->to_literal() . "\n";
}
Output:
0.071
80
-80

The two ways to get either or both is
1 - Linear https://regex101.com/r/NYbvI8/1
# https://regex101.com/r/NYbvI8/1
# if($test_string=~ /<R-PORT-PROTOTYPE.*?<short-name>PQR<\/short-name>(?:.*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO1<\/PARAMETER-REF>)?(?:.*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO2<\/PARAMETER-REF>)?(?(1)|(?(2)|(?!)))/gis)
<R-PORT-PROTOTYPE .*? <short-name>PQR</short-name>
(?:
.*?
<V>
( .*? ) # (1)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO1</PARAMETER-REF>
)?
(?:
.*?
<V>
( .*? ) # (2)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO2</PARAMETER-REF>
)?
(?(1)
| (?(2)
| (?!)
)
)
2 - Out of order https://regex101.com/r/gQJ3cO/1
# https://regex101.com/r/t4M9UB/1
# if($test_string=~ /<R-PORT-PROTOTYPE.*?<short-name>PQR<\/short-name>(?:(?:(?(1)(?!)).*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO1<\/PARAMETER-REF>)|(?:(?(2)(?!)).*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO2<\/PARAMETER-REF>)){1,2}/gis)
<R-PORT-PROTOTYPE .*? <short-name>PQR</short-name>
(?:
(?:
(?(1) (?!) )
.*?
<V>
( .*? ) # (1)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO1</PARAMETER-REF>
)
|
(?:
(?(2) (?!) )
.*?
<V>
( .*? ) # (2)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO2</PARAMETER-REF>
)
){1,2}

How to get specific error message to Data Validation using Regex with python 3

How do i get a specific output like the below examples:
example 1 - If the user inputs Alberta, UN. I want to be able to see the print result as I'm sorry, UN is an invalid province abbreviation.
I would love if the program can display an exact error in relation to user's input. Instead of an error saying I'm sorry this is an error, without any specific message to let the user know where his/her fault is.
I would really appreciate it if i could get some results, because i have been brainstorming on how to make it work
# Import
import re
# Program interface
print("====== POSTAL CODE CHECKER PROGRAM ====== ")
print("""
Select from below city/Province and enter it
--------------------------------------------
Alberta, AB,
British Columbia, BC
Manitoba, MB
New Brunswick, NB
Newfoundland, NL
Northwest Territories, NT
Nova Scotia, NS
Nunavut, NU
Ontario, ON
Prince Edward Island, PE
Quebec, QC
Saskatchewan, SK
Yukon, YT
""")
# User input 1
province_input = input("Please enter the city/province: ")
pattern = re.compile(r'[ABMNOPQSYabcdefhiklmnorstuvw]| [CBTSE]| [Idasln], [ABMNOPQSY]+[BCLTSUNEK]')
if pattern.match(province_input):
print("You have successfully inputted {} the right province.".format(province_input))
elif not pattern.match(province_input):
print("I'm sorry, {} is an invalid province abbreviation".format(province_input))
else:
print("I'm sorry, your city or province is incorrectly formatted.")

I tried to generalize your question, so it will check if the first part of the input is a valid city and the second is a valid state abbreviation, when "valid" means each of them appears in its relevant valid inputs list.
The core of the code is the regex ([A-Z][A-Za-z ]+), ([A-Z]{2}), which matches two groups: the first group contains the city - and after a comma and a space - the second group contains the state abbreviation (which must consist two capital letters).
Please notice there are 5 possible outputs, according to the validity of each part.
import re
cities = ["Alberta", "British Columbia", "Manitoba"]
states = ["AB", "BC", "MB"]
province_input = input("Please enter the city/province: ")
regexp = r"([A-Z][A-Za-z ]+), ([A-Z]{2})"
if re.compile(regexp).match(province_input):
m = re.search(regexp, province_input)
city_input = m.group(1)
state_input = m.group(2)
if city_input not in cities and state_input not in states:
print("Both '%s' and '%s' are valid" % (city_input, state_input))
elif city_input in cities:
if state_input in states:
print("Your input '%s, %s' was valid" % (city_input, state_input))
else:
print("'%s' is an invalid province abbreviation" % state_input)
else:
print("The city '%s' is invalid" % city_input)
else:
print("Wrong input format")
I tried to make the code as clear as possible, but please do let me know if anything is unclear.

Not a Python programmer but I think if you import regex
the new regex lib replacement, it will give you access to the
Branch Reset construct.
Using that, it's trivial to divide City and Province into 3 groups.
So, just match with this regex
[ \t]*(?|(Alberta)(?:[ \t]*,[ \t]*(?:(AB)|(\w+)))?|(British[ \t]+Columbia)(?:[ \t]*,[ \t]*(?:(BC)|(\w+)))?|(Manitoba)(?:[ \t]*,[ \t]*(?:(MB)|(\w+)))?|(New[ \t]+Brunswick)(?:[ \t]*,[ \t]*(?:(NB)|(\w+)))?|(Newfoundland)(?:[ \t]*,[ \t]*(?:(NL)|(\w+)))?|(Northwest[ \t]+Territories)(?:[ \t]*,[ \t]*(?:(NT)|(\w+)))?|(Nova[ \t]+Scotia)(?:[ \t]*,[ \t]*(?:(NS)|(\w+)))?|(Nunavut)(?:[ \t]*,[ \t]*(?:(NU)|(\w+)))?|(Ontario)(?:[ \t]*,[ \t]*(?:(ON)|(\w+)))?|(Prince[ \t]+Edward[ \t]+Island)(?:[ \t]*,[ \t]*(?:(PE)|(\w+)))?|(Quebec)(?:[ \t]*,[ \t]*(?:(QC)|(\w+)))?|(Saskatchewan)(?:[ \t]*,[ \t]*(?:(SK)|(\w+)))?|(Yukon)(?:[ \t]*,[ \t]*(?:(YT)|(\w+)))?|()(\w+)())
Check the groups in this order :
if NO match : Please enter 'City, Province' from the list
else if length $1 equals 0 : '$2' is not a valid City
else if length $3 > 0 : '$3' is not a valid Province
else if length $2 equals 0 : Please enter a Province
else : Thank you, your entry is valid '$1, $2'
Demo: https://regex101.com/r/MrlqEN/1
Expanded
[ \t]*
(?|
( Alberta ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( AB ) # (2)
| ( \w+ ) # (3)
)
)?
| ( British [ \t]+ Columbia ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( BC ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Manitoba ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( MB ) # (2)
| ( \w+ ) # (3)
)
)?
| ( New [ \t]+ Brunswick ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( NB ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Newfoundland ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( NL ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Northwest [ \t]+ Territories ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( NT ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Nova [ \t]+ Scotia ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( NS ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Nunavut ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( NU ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Ontario ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( ON ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Prince [ \t]+ Edward [ \t]+ Island ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( PE ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Quebec ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( QC ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Saskatchewan ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( SK ) # (2)
| ( \w+ ) # (3)
)
)?
| ( Yukon ) # (1)
(?:
[ \t]* , [ \t]*
(?:
( YT ) # (2)
| ( \w+ ) # (3)
)
)?
| ( ) # (1)
( \w+ ) # (2)
( ) # (3)
)

How do named and unnamed PCRE capturing groups interact?

If the regular expression is e.g. ^(?<object>[\-\w]+)/([\-\w]+)$, will one invoke the second capturing group as $2 or as $1? In other words, are anonymous capturing groups absolutely or relatively numbered?

Use $2 to refer to the second numbered capturing group. Note I would not call it anonymous, maybe, "unnamed" would suit better here.
See a sample regex demo.
See PCRE docs:
PCRE supports the use of named as well as numbered capturing parentheses. The names are just an additional way of identifying the parentheses, which still acquire numbers.

In PCRE, Capture groups are numbered sequentially in the order found.
Here is an example where the groups are annotated, indented and numbered (mixed with some conditionals).
# ==============================
# Variations of the same thing
# ==============================
1 ( a )?
2 ( b )?
3 ( c )?
c (?(1)
|
c (?(2)
|
c (?(3) | (*FAIL) )
)
)
# ==============================
4 (
5 ( a )?
6 ( b )?
7 ( c )?
4 )
c (?(2)
|
c (?(3)
|
c (?(4) | (*FAIL) )
)
)
# ==============================
8 (?<A> a )?
9 (?<B> b )?
10 (?<C> c )?
c (?(<A>)
|
c (?(<B>)
|
c (?(<C>) | (*FAIL) )
)
)
# ==============================
11 (?<M>
12 (?<A> a )?
13 (?<B> b )?
14 (?<C> c )?
c (?(<A>)
|
c (?(<B>)
|
c (?(<C>) | (*FAIL) )
)
)
11 )
# ==============================
The Branch Reset treats conditionals a little differently.
At the next group number where the BR starts, it numbers sequentially
at the start of each branch.
Going past the BR, the numbering starts 1+ after the largest count assigned
from a single branch.
Example:
# Super Branch with Conditional's
1 ( a ) # (1)
(?|
x
br 2 ( y ) # (2)
z
(?|
br 3 ( u ) # (3)
4 ( u ) # (4)
c (?(1)
5 ( R ) # (5)
| (?|
br 6 ( x ) # (6)
|
br 6 ( x ) # (6)
c (?(2)
a
|
7 ( b ) # (7)
)
8 ( c ) # (8)
)
)
9 ( u ) # (9)
10 ( u ) # (10)
|
br 3 ( e ) # (3)
4 ( e ) # (4)
5 ( e ) # (5)
|
br 3 ( c ) # (3)
)
11 ( K ) # (11)
|
br 2 ( # (2 start)
p
3 ( # (3 start)
q
(?|
br 4 ( M ) # (4)
5 ( M ) # (5)
6 ( M ) # (6)
7 ( M ) # (7)
(?|
br 8 ( T ) # (8)
9 ( T ) # (9)
10 ( T ) # (10)
|
br 8 ( D ) # (8)
9 ( D ) # (9)
)
12 ( R ) # (12)
13 ( R ) # (13)
|
br 4 ( B ) # (4)
5 ( B ) # (5)
6 ( B ) # (6)
|
br 4 ( v ) # (4)
)
3 ) # (3 end)
r
2 ) # (2 end)
14 ( o ) # (14)
15 ( i ) # (15)
|
br 2 ( t ) # (2)
s
3 ( w ) # (3)
)
16 ( Z ) # (16)
Addendum for Dot-Net counting
There are 2 options for counting Dot-Net captures.
Count named capture groups
Named groups last
Obviously, without 1 you don't get 2.
Example: Don't count named groups
1 ( # (1 start)
(?'overall'
^
(?= [^&] )
(?:
(?<scheme> [^:/?#]+ )
:
)?
(?:
//
2 ( ) # (2)
(?<authority> [^/?#]* )
)?
(?<path> [^?#]* )
(?:
\?
(?<query> [^#]* )
)?
3 ( ) # (3)
(?:
\#
(?<fragment> .* )
)?
)
1 ) # (1 end)
Example: Count named groups
1 ( # (1 start)
2 (?'overall' # (2 start)
^
(?= [^&] )
(?:
3 (?<scheme> [^:/?#]+ ) # (3)
:
)?
(?:
//
4 ( ) # (4)
5 (?<authority> [^/?#]* ) # (5)
)?
6 (?<path> [^?#]* ) # (6)
(?:
\?
7 (?<query> [^#]* ) # (7)
)?
8 ( ) # (8)
(?:
\#
9 (?<fragment> .* ) # (9)
)?
2 ) # (2 end)
1 ) # (1 end)
Example: Count named groups, and Named groups last
1 ( # (1 start)
4 (?'overall' #_(4 start)
^
(?= [^&] )
(?:
5 (?<scheme> [^:/?#]+ ) #_(5)
:
)?
(?:
//
2 ( ) # (2)
6 (?<authority> [^/?#]* ) #_(6)
)?
7 (?<path> [^?#]* ) #_(7)
(?:
\?
8 (?<query> [^#]* ) #_(8)
)?
3 ( ) # (3)
(?:
\#
9 (?<fragment> .* ) #_(9)
)?
4 ) #_(4 end)
1 ) # (1 end)

Perl nested parentheses expression

How do I use perl regex to extract the contents within the outermost parentheses?
text = (-(A + (B - C)))
output = -(A + (B - C))
Thanks

It can be done with this (\(((?:[^()]++|(?1))*)\)) and there are several
ways to do it.
Formatted and tested:
( # (1 start), Recursion code group
\( # Opening (
( # (2 start), Capture, inner core
(?: # Cluster group
[^()]++ # Possesive, not parenth's
| # or,
(?1) # Recurse to group 1
)* # End cluster, do 0 to many times
) # (2 end)
\) # Closing )
) # (1 end)
Output
** Grp 0 - ( pos 4 , len 16 )
(-(A + (B - C)))
** Grp 1 - ( pos 4 , len 16 )
(-(A + (B - C)))
** Grp 2 - ( pos 5 , len 14 )
-(A + (B - C))

I don't see that anything more than this is required
use strict;
use warnings 'all';
my $text = "(-(A + (B - C)))";
my ($result) = $text =~ / \( (.*) \) /x;
print $result, "\n";
output
-(A + (B - C))
The pattern captures everything from after the first opening parenthesis to before the last closing parenthesis. From your question, I don't think there's a need to check that the string is balanced

Powershell parsing regex in text file

I have a text file:
Text.txt
2015-08-31 05:55:54,881 INFO (ClientThread.java:173) - Login successful for user = Test, client = 123.456.789.100:12345
2015-08-31 05:56:51,354 INFO (ClientThread.java:325) - Closing connection 123.456.789.100:12345
I would like output to be:
2015-08-31 05:55:54 Login Test 123.456.789.100
2015-08-31 05:56:51 Closing connection 123.456.789.100
Code:
$files = Get-Content "Text.txt"
$grep = $files | Select-String "serviceClient:" , "Unregistered" |
Where {$_ -match '^(\S+)+\s+([^,]+).*?-\s+(\w+).*?(\S+)$' } |
Foreach {"$($matches[1..4])"} | Write-Host
How can I do it with the current code?

Add another group and make it optional.
^(\S+)+\s+([^,]+).*?-\s+(\w+)(?:.*?=\s+(\w+))?.*?(\S+?)(?::\d+)?$
^
( \S+ )+ # (1)
\s+
( [^,]+ ) # (2)
.*? - \s+
( \w+ ) # (3)
(?:
.*? = \s+
( \w+ ) # (4)
)?
.*?
( \S+? ) # (5)
(?: : \d+ )?
$
Output:
** Grp 0 - ( pos 0 , len 121 )
2015-08-31 05:55:54,881 INFO (ClientThread.java:173) - Login successful for user = Test, client = 123.456.789.100:12345
** Grp 1 - ( pos 0 , len 10 )
2015-08-31
** Grp 2 - ( pos 11 , len 8 )
05:55:54
** Grp 3 - ( pos 57 , len 5 )
Login
** Grp 4 - ( pos 85 , len 4 )
Test
** Grp 5 - ( pos 100 , len 15 )
123.456.789.100
----------------
** Grp 0 - ( pos 123 , len 97 )
2015-08-31 05:56:51,354 INFO (ClientThread.java:325) - Closing connection 123.456.789.100:12345
** Grp 1 - ( pos 123 , len 10 )
2015-08-31
** Grp 2 - ( pos 134 , len 8 )
05:56:51
** Grp 3 - ( pos 180 , len 7 )
Closing
** Grp 4 - NULL
** Grp 5 - ( pos 199 , len 15 )
123.456.789.100

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to match the main subgroup in a regular expression? - regex

Related

need return value for captured group from last captured string in perl

How to get specific error message to Data Validation using Regex with python 3

How do named and unnamed PCRE capturing groups interact?

Perl nested parentheses expression

Powershell parsing regex in text file

Categories

Resources