Regex to extract switches /switch=value - regex

I have a batch file that I need to extract switches from.
The switches are in this format.
/Switch1=Value1 /Switch2="Value 2" /Switch3 /Switch4="C:\Program Files\DIR"
I need Switch=Value or Switch (only if it doesn't have any value for e.g. Switch3) extracted.
I am a beginner to regex. So far I have tried \/\w+=|\/\w+ this expression. But that doesn't give me a value.

Seems like you want this,
\/\w+(?:=(?:(["'])(?:(?!\1).)*\1|\S+))?
DEMO

Not much information, but here is something in perl to get you going:
perl -p -i -e 'print "$1=$3\n" if /\/(\w+)(=((\"[^"]*\")|\S+))?/;'

you use the lookback searching "switch." and look ahead for the first slash you will have to trim the values after but you got the values
(?<=Switch.=).+(?=/)

It can get hairy to parse a command line with switches.
Something like below.
# /([^ =]+)(?:=(?|"((?:[^"\\]*(?:\\.|[^"\\]*)*))"|([^ ]*)))?
/
( [^ =]+ ) # (1)
(?:
=
(?|
"
( # (2 start)
(?:
[^"\\]*
(?:
\\ .
|
[^"\\]*
)*
)
) # (2 end)
"
|
( [^ ]* ) # (2)
)
)?
Output
** Grp 0 - ( pos 0 , len 15 )
/Switch1=Value1
** Grp 1 - ( pos 1 , len 7 )
Switch1
** Grp 2 - ( pos 9 , len 6 )
Value1
-------------------
** Grp 0 - ( pos 16 , len 18 )
/Switch2="Value 2"
** Grp 1 - ( pos 17 , len 7 )
Switch2
** Grp 2 - ( pos 26 , len 7 )
Value 2
-------------------
** Grp 0 - ( pos 35 , len 8 )
/Switch3
** Grp 1 - ( pos 36 , len 7 )
Switch3
** Grp 2 - NULL
-------------------
** Grp 0 - ( pos 44 , len 31 )
/Switch4="C:\Program Files\DIR"
** Grp 1 - ( pos 45 , len 7 )
Switch4
** Grp 2 - ( pos 54 , len 20 )
C:\Program Files\DIR

Related

Iterate through captures with boost::regex

I have a regular expression to capture three fields in a HTML tag using boost::regex
"\\/\\/(.{1,3}?)\\.wikipedia\\.[a-z]+\\/wiki\\/(.*?)\\s*>(.*?)<"
So, from
Deutsch
I get
de
Porky%E2%80%99s" title="Porky’s – German" lang="de" hreflang="de"
Deutsch
But I´d like to have {de, Porky%E2%80%99s, Deutsch} instead.
How can I make my regex to stop matching the second field as soon as it finds the first white space?
I tried
"\\/\\/(.{1,3}?)\\.wikipedia\\.[a-z]+\\/wiki\\/(\\S*?)*>(.*?)<"
So the second field matches everything but whitespace but I get this crash report
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >'
what(): Ran out of stack space trying to match the regular expression.
This might work -
"//(.{1,3}?)\\.wikipedia\\.[a-z]+/wiki/([^\\s>\"]*).*?>(.*?)<"
I would use this instead -
"//(.{1,3}?)\\.wikipedia\\.[a-z]+/wiki/([^\\s>\"]*)[^>]*>(.*?)<"
Formatted:
//
( .{1,3}? ) # (1)
\.
wikipedia
\.
[a-z]+
/wiki/
( [^\s>"]* ) # (2)
[^>]*
>
( .*? ) # (3)
<
Output:
** Grp 0 - ( pos 9 , len 98 )
//de.wikipedia.org/wiki/Porky%E2%80%99s" title="Porky’s – German" lang="de" hreflang="de">Deutsch<
** Grp 1 - ( pos 11 , len 2 )
de
** Grp 2 - ( pos 33 , len 15 )
Porky%E2%80%99s
** Grp 3 - ( pos 99 , len 7 )
Deutsch

My regular expression won't match and I can't identify why

Here is an example of the text I am trying to match within a scalar:
1 N [51]Gone Girl [52]Fox $37,513,109 - 3,014 - $12,446 $37,513,109 $61 1
2 N [53]Annabelle [54]WB (NL) $37,134,255 - 3,185 - $11,659 $37,134,255 $6.5 1
3 1 [55]The Equalizer [56]Sony $18,750,375 -45.1% 3,236 - $5,794 $64,236,992 $55 2
4 3 [57]The Boxtrolls [58]Focus $11,979,588 -30.7% 3,464 - $3,458 $32,093,796 $60 2
5 2 [59]The Maze Runner [60]Fox $11,634,764 -33.3% 3,605 -33 $3,227 $73,556,159 $34 3
6 N [61]Left Behind (2014) [62]Free $6,300,147 - 1,825 - $3,452 $6,300,147 $16 1
7 4 [63]This is Where I Leave You [64]WB $4,009,345 -41.8% 2,735 -133 $1,466 $29,012,573 $19.8 3
8 5 [65]Dolphin Tale 2 [66]WB $3,422,377 -28.5% 2,790 -586 $1,227 $37,866,130 $36 4
Here is the regular expression I was using that won't seem to match up. Can anyone identify why?
if ($allData =~ /(\d+)\s+(\d+|[N])\s+(\[\d+\])(.+)\s+(\[\d+\])(.+)\s+(\$\.+)\s+(\-|\+\d+\.\d+%|\-\d+\.\d+%)\s+(\d+)\s+(\-\d+|\-|\+\d+)\s+(\$\.+)\s+(\$\.+)\s+(\.+)\s+(\d+)/g)
{
$current[$i] = $1;
$last[$i] = $2;
$title[$i] = $4;
$week[$i] = $7;
$cume[$i] = $12;
printf("%-4s%-4s%-35s%-10s%-10s", $current[$i], $last[$i], $title[$i], $week[$i], $cume[$i]);
if ($last[$i] ne '-'){
$gain = $last[$i] - $current[$i];
}
if ($gain < $bigloss){
$bigloss = $gain;
$losstitle = $title[$i];
}
if ($gain > $biggain){
$biggain = $gain;
$gaintitle = $title[$i];
}
if ($last[$i] eq '-'){
if ($current[$i] < $bigdebut){
$bigdebut = $current[$i];
$bigdebuttitle = $title[$i];
}
if ($current[$i] > $weakdebut){
$weakdebut = $current[$i];
$weakdebuttitle = $title[$i];
}
}
$i++;
}
Could be the fix -
# /(\d+)\s+(\d+|[N])\s+(\[\d+\])(.+?)\s+(\[\d+\])(.+?)\s+(\$.+?)\s+(\-|\+\d+\.\d+%|\-\d+\.\d+%)\s+([\d,]+)\s+(\-\d+|\-|\+\d+)\s+(\$.+?)\s+(\$.+?)\s+(.+?)\s+(\d+)/g
( \d+ ) # (1)
\s+
( \d+ | [N] ) # (2)
\s+
( \[ \d+ \] ) # (3)
( .+? ) # (4)
\s+
( \[ \d+ \] ) # (5)
( .+? ) # (6)
\s+
( \$ .+? ) # (7)
\s+
( # (8 start)
\-
| \+ \d+ \. \d+ %
| \- \d+ \. \d+ %
) # (8 end)
\s+
( [\d,]+ ) # (9)
\s+
( \- \d+ | \- | \+ \d+ ) # (10)
\s+
( \$ .+? ) # (11)
\s+
( \$ .+? ) # (12)
\s+
( .+? ) # (13)
\s+
( \d+ ) # (14)
Output sample:
** Grp 0 - ( pos 506 , len 98 )
7 4 [63]This is Where I Leave You [64]WB $4,009,345 -41.8% 2,735 -133 $1,466 $29,012,573 $19.8 3
** Grp 1 - ( pos 506 , len 1 )
7
** Grp 2 - ( pos 508 , len 1 )
4
** Grp 3 - ( pos 510 , len 4 )
[63]
** Grp 4 - ( pos 514 , len 25 )
This is Where I Leave You
** Grp 5 - ( pos 540 , len 4 )
[64]
** Grp 6 - ( pos 544 , len 2 )
WB
** Grp 7 - ( pos 547 , len 10 )
$4,009,345
** Grp 8 - ( pos 558 , len 6 )
-41.8%
** Grp 9 - ( pos 565 , len 5 )
2,735
** Grp 10 - ( pos 571 , len 4 )
-133
** Grp 11 - ( pos 578 , len 6 )
$1,466
** Grp 12 - ( pos 585 , len 11 )
$29,012,573
** Grp 13 - ( pos 597 , len 5 )
$19.8
** Grp 14 - ( pos 603 , len 1 )
3
Try this regex:
\d\s[A-Z0-9]\s\[\d\d\][A-Z][a-z]+(\s\b\w+\b){0,}\s(\(\d+\)\s)?\[\d\d\][A-Z]+[a-z]*\s(\(\w+\)\s)?\$(\d{1,3},){2}\d{3}\s-\s?\d+[,.]\d+((%\s\d,\d{1,3}\s-\s?\$?\d{1,3}(,\d{1,3}\s)?)|\s-\s\$\d{1,3},\d{1,3}\s)\s?\$\d{1,3},\d{1,3}(,\d{1,3})*\s\$\d{1,3}(,\d{1,3})*(\.\d+)?(\s\$\d+(\.)?\d+)?\s\d
here: http://regexr.com/39m54

Parse Maven Filename

How can I parse a maven filename into the artifact and and version?
The filenames look like this:
test-file-12.2.2-SNAPSHOT.jar
test-lookup-1.0.16.jar
I need to get
test-file
12.2.2-SNAPSHOT
test-lookup
1.0.16
So the artifactId is the text before the first instance of a dash and a number and the version is the text after the first instance of a number up to .jar.
I could probably do it with split and several loops and checks but it feels like there should be a simpler way.
EDIT:
Actually, the regex wasn't as complicated as I thought!
new File("test").eachFile() { file ->
String fileName = file.name[0..file.name.lastIndexOf('.') - 1]
//Split at the first instance of a dash and a number
def split = fileName.split("-[\\d]")
String artifactId = split[0]
String version = fileName.substring(artifactId.length() + 1, fileName.length())
println(artifactId)
println(version)
}
EDIT2: Hmm. It fails on examples such as this:
http://mvnrepository.com/artifact/org.xhtmlrenderer/core-renderer/R8
core-renderer-R8.jar
Basically its just this ^(.+?)-(\d.*?)\.jar$
used in multi-line mode if there is more than one line.
^
( .+? )
-
( \d .*? )
\. jar
$
Output:
** Grp 0 - ( pos 0 , len 29 )
test-file-12.2.2-SNAPSHOT.jar
** Grp 1 - ( pos 0 , len 9 )
test-file
** Grp 2 - ( pos 10 , len 15 )
12.2.2-SNAPSHOT
--------------------------
** Grp 0 - ( pos 31 , len 22 )
test-lookup-1.0.16.jar
** Grp 1 - ( pos 31 , len 11 )
test-lookup
** Grp 2 - ( pos 43 , len 6 )
1.0.16

Regex to extract pattern from text

I have a string that contains a bunch of function calls within it. I need to extract every occurrence of the VariableSet function call. Functions can appear in any order. Here is an example:
parsedExpression = "VariableSet(b, 999)If(a = 0,"Black",SetColor(a,b,c))VariableSet("a" ,1.573) VariableSet( c,-2387)"
I need to find every match that starts with "VariableSet(" and ends with the first close parenthesis that follows it. So, for the example above, I need a list like this:
VariableSet(b, 999)
VariableSet("a" ,1.573)
VariableSet( c,-2387)
I planned to use the code below but I have not been able to determine the correct regex pattern. The best I could come up with is "VariableSet(.*(?i:)\b)" but it does not produce the list above.
Dim matches As MatchCollection = Regex.Matches(parsedExpression, "VariableSet\(.*(?i:\)\b)")
' Loop over matches.
For Each m As Match In matches
' Loop over captures.
For Each c As Capture In m.Captures
Dim varName As String = ""
Dim varValue As String = ""
Dim firstCommaPosition As Integer
'For every VariableSet that was found do the following:
'Parse the captured string to get the variable name and value
varName = c.Value.Replace("VariableSet(", "").Replace(")", "")
firstCommaPosition = varName.IndexOf(",")
varValue = varName.Substring(firstCommaPosition + 1)
varName = varName.Substring(0, firstCommaPosition).Replace("""", "")
'Set the variable
ce.Variables(varName) = ce.Evaluate(varValue)
'Remove this instance of VariableSet() function from parsedExpression
parsedExpression = parsedExpression.Replace(c.Value, "")
Next
Next
I would greatly appreciate it if someone could provide the correct regex pattern.
Maybe this will help you:
Dim strMatch As String = ""
Dim strVar1 As String = ""
Dim strVar2 As String = ""
Dim strExpression As String = "VariableSet(b, 999)If(a = 0,""Black"",SetColor(a,b,c))VariableSet(""a"" ,1.573) VariableSet( c,-2387)"
Dim rx As New RegularExpressions.Regex("VariableSet\((?<V1>.*?),(?<V2>.*?)\)", RegularExpressions.RegexOptions.IgnoreCase)
Dim rxMatch As RegularExpressions.MatchCollection = rx.Matches(strExpression)
For intI As Integer = 0 To rxMatch.Count - 1
strMatch = rxMatch(intI).Value 'VariableSet(b, 999)
strVar1 = rxMatch(intI).Groups("V1").ToString 'b
strVar2 = rxMatch(intI).Groups("V2").ToString ' 999
Next
VariableSet\([^)]*\) should be a direct replacement.
If you wanted to get fancy, all your code could be done using a single regex.
# VariableSet\((\s*"?\s*([^,")]*?)\s*"?\s*(?:,\s*"?\s*([^,")]*?)\s*"?\s*)?)\)
VariableSet
\( # Open paren
( # (1 start), Inside paren's
\s*
"? \s*
( [^,")]*? ) # (2), Var
\s*
"? \s*
(?:
, # Comma
\s*
"? \s*
( [^,")]*? ) # (3), Value
\s*
"? \s*
)?
) # (1 end)
\) # Close paren
Example input string:
VariableSet(b, 999)
VariableSet("a" ,1.573)
VariableSet( c,-2387)
VariableSet( , 999)
VariableSet( "aadsfasdf")
VariableSet( )
Output matches ( Var / Value ):
** Grp 2 - ( pos 12 , len 1 )
b
** Grp 3 - ( pos 16 , len 3 )
999
----------------
** Grp 2 - ( pos 35 , len 1 )
a
** Grp 3 - ( pos 40 , len 5 )
1.573
----------------
** Grp 2 - ( pos 63 , len 1 )
c
** Grp 3 - ( pos 65 , len 5 )
-2387
----------------
** Grp 2 - ( pos 86 , len 0 ) EMPTY
** Grp 3 - ( pos 88 , len 3 )
999
----------------
** Grp 2 - ( pos 108 , len 9 )
aadsfasdf
** Grp 3 - NULL
----------------
** Grp 2 - ( pos 136 , len 0 ) EMPTY
** Grp 3 - NULL

regular expression behaving weird in django urls

Here is regular expression in urls.py
url(r'^company_data/(?:[A-Za-z]+)/((?:0?[1-9]|[12][0-9]|3[01])(?:0?[1-9]|1[012])(?:20)?[0-9]{2})*/((?:0?[1-9]|[12][0-9]|3[01])(?:0?[1-9]|1[012])(?:20)?[0-9]{2})*$', 'stats.views.second', name='home'),
my views.py
def second(request,comp_name,offset_min,offset_max=None):
I am calling in this way from browser /company_data/hello/24092014/25092014
Expecting in the below way
comp_name= "hello", offset_min="24092014",offset_max="25092014"
In reality it is
comp_name="24092014",offset_max="25092014"
What wrong did I do here??
Thanks in advance!!
enter code here
You're missing a capture group 1.
Edit: Also note that groups 2 and 3 should be done like below, unless I'm reading you
wrong and you intend to retrieve the last part of particular number groups.
# '^/?company_data/([A-Za-z]+)/((?:(?:0?[1-9]|[12][0-9]|3[01])(?:0?[1-9]|1[012])(?:20)?[0-9]{2})*)/((?:(?:0?[1-9]|[12][0-9]|3[01])(?:0?[1-9]|1[012])(?:20)?[0-9]{2})*)$'
^
/? company_data /
( [A-Za-z]+ ) # (1)
/
( # (2 start)
(?:
(?: 0? [1-9] | [12] [0-9] | 3 [01] )
(?: 0? [1-9] | 1 [012] )
(?: 20 )?
[0-9]{2}
)*
) # (2 end)
/
( # (3 start)
(?:
(?: 0? [1-9] | [12] [0-9] | 3 [01] )
(?: 0? [1-9] | 1 [012] )
(?: 20 )?
[0-9]{2}
)*
) # (3 end)
$
Output:
** Grp 0 - ( pos 0 , len 37 )
/company_data/hello/24092014/25092014
** Grp 1 - ( pos 14 , len 5 )
hello
** Grp 2 - ( pos 20 , len 8 )
24092014
** Grp 3 - ( pos 29 , len 8 )
25092014