I have a regular expression to capture three fields in a HTML tag using boost::regex
"\\/\\/(.{1,3}?)\\.wikipedia\\.[a-z]+\\/wiki\\/(.*?)\\s*>(.*?)<"
So, from
Deutsch
I get
de
Porky%E2%80%99s" title="Porky’s – German" lang="de" hreflang="de"
Deutsch
But I´d like to have {de, Porky%E2%80%99s, Deutsch} instead.
How can I make my regex to stop matching the second field as soon as it finds the first white space?
I tried
"\\/\\/(.{1,3}?)\\.wikipedia\\.[a-z]+\\/wiki\\/(\\S*?)*>(.*?)<"
So the second field matches everything but whitespace but I get this crash report
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >'
what(): Ran out of stack space trying to match the regular expression.
This might work -
"//(.{1,3}?)\\.wikipedia\\.[a-z]+/wiki/([^\\s>\"]*).*?>(.*?)<"
I would use this instead -
"//(.{1,3}?)\\.wikipedia\\.[a-z]+/wiki/([^\\s>\"]*)[^>]*>(.*?)<"
Formatted:
//
( .{1,3}? ) # (1)
\.
wikipedia
\.
[a-z]+
/wiki/
( [^\s>"]* ) # (2)
[^>]*
>
( .*? ) # (3)
<
Output:
** Grp 0 - ( pos 9 , len 98 )
//de.wikipedia.org/wiki/Porky%E2%80%99s" title="Porky’s – German" lang="de" hreflang="de">Deutsch<
** Grp 1 - ( pos 11 , len 2 )
de
** Grp 2 - ( pos 33 , len 15 )
Porky%E2%80%99s
** Grp 3 - ( pos 99 , len 7 )
Deutsch
I have a batch file that I need to extract switches from.
The switches are in this format.
/Switch1=Value1 /Switch2="Value 2" /Switch3 /Switch4="C:\Program Files\DIR"
I need Switch=Value or Switch (only if it doesn't have any value for e.g. Switch3) extracted.
I am a beginner to regex. So far I have tried \/\w+=|\/\w+ this expression. But that doesn't give me a value.
Seems like you want this,
\/\w+(?:=(?:(["'])(?:(?!\1).)*\1|\S+))?
DEMO
Not much information, but here is something in perl to get you going:
perl -p -i -e 'print "$1=$3\n" if /\/(\w+)(=((\"[^"]*\")|\S+))?/;'
you use the lookback searching "switch." and look ahead for the first slash you will have to trim the values after but you got the values
(?<=Switch.=).+(?=/)
It can get hairy to parse a command line with switches.
Something like below.
# /([^ =]+)(?:=(?|"((?:[^"\\]*(?:\\.|[^"\\]*)*))"|([^ ]*)))?
/
( [^ =]+ ) # (1)
(?:
=
(?|
"
( # (2 start)
(?:
[^"\\]*
(?:
\\ .
|
[^"\\]*
)*
)
) # (2 end)
"
|
( [^ ]* ) # (2)
)
)?
Output
** Grp 0 - ( pos 0 , len 15 )
/Switch1=Value1
** Grp 1 - ( pos 1 , len 7 )
Switch1
** Grp 2 - ( pos 9 , len 6 )
Value1
-------------------
** Grp 0 - ( pos 16 , len 18 )
/Switch2="Value 2"
** Grp 1 - ( pos 17 , len 7 )
Switch2
** Grp 2 - ( pos 26 , len 7 )
Value 2
-------------------
** Grp 0 - ( pos 35 , len 8 )
/Switch3
** Grp 1 - ( pos 36 , len 7 )
Switch3
** Grp 2 - NULL
-------------------
** Grp 0 - ( pos 44 , len 31 )
/Switch4="C:\Program Files\DIR"
** Grp 1 - ( pos 45 , len 7 )
Switch4
** Grp 2 - ( pos 54 , len 20 )
C:\Program Files\DIR
Here is an example of the text I am trying to match within a scalar:
1 N [51]Gone Girl [52]Fox $37,513,109 - 3,014 - $12,446 $37,513,109 $61 1
2 N [53]Annabelle [54]WB (NL) $37,134,255 - 3,185 - $11,659 $37,134,255 $6.5 1
3 1 [55]The Equalizer [56]Sony $18,750,375 -45.1% 3,236 - $5,794 $64,236,992 $55 2
4 3 [57]The Boxtrolls [58]Focus $11,979,588 -30.7% 3,464 - $3,458 $32,093,796 $60 2
5 2 [59]The Maze Runner [60]Fox $11,634,764 -33.3% 3,605 -33 $3,227 $73,556,159 $34 3
6 N [61]Left Behind (2014) [62]Free $6,300,147 - 1,825 - $3,452 $6,300,147 $16 1
7 4 [63]This is Where I Leave You [64]WB $4,009,345 -41.8% 2,735 -133 $1,466 $29,012,573 $19.8 3
8 5 [65]Dolphin Tale 2 [66]WB $3,422,377 -28.5% 2,790 -586 $1,227 $37,866,130 $36 4
Here is the regular expression I was using that won't seem to match up. Can anyone identify why?
if ($allData =~ /(\d+)\s+(\d+|[N])\s+(\[\d+\])(.+)\s+(\[\d+\])(.+)\s+(\$\.+)\s+(\-|\+\d+\.\d+%|\-\d+\.\d+%)\s+(\d+)\s+(\-\d+|\-|\+\d+)\s+(\$\.+)\s+(\$\.+)\s+(\.+)\s+(\d+)/g)
{
$current[$i] = $1;
$last[$i] = $2;
$title[$i] = $4;
$week[$i] = $7;
$cume[$i] = $12;
printf("%-4s%-4s%-35s%-10s%-10s", $current[$i], $last[$i], $title[$i], $week[$i], $cume[$i]);
if ($last[$i] ne '-'){
$gain = $last[$i] - $current[$i];
}
if ($gain < $bigloss){
$bigloss = $gain;
$losstitle = $title[$i];
}
if ($gain > $biggain){
$biggain = $gain;
$gaintitle = $title[$i];
}
if ($last[$i] eq '-'){
if ($current[$i] < $bigdebut){
$bigdebut = $current[$i];
$bigdebuttitle = $title[$i];
}
if ($current[$i] > $weakdebut){
$weakdebut = $current[$i];
$weakdebuttitle = $title[$i];
}
}
$i++;
}
Could be the fix -
# /(\d+)\s+(\d+|[N])\s+(\[\d+\])(.+?)\s+(\[\d+\])(.+?)\s+(\$.+?)\s+(\-|\+\d+\.\d+%|\-\d+\.\d+%)\s+([\d,]+)\s+(\-\d+|\-|\+\d+)\s+(\$.+?)\s+(\$.+?)\s+(.+?)\s+(\d+)/g
( \d+ ) # (1)
\s+
( \d+ | [N] ) # (2)
\s+
( \[ \d+ \] ) # (3)
( .+? ) # (4)
\s+
( \[ \d+ \] ) # (5)
( .+? ) # (6)
\s+
( \$ .+? ) # (7)
\s+
( # (8 start)
\-
| \+ \d+ \. \d+ %
| \- \d+ \. \d+ %
) # (8 end)
\s+
( [\d,]+ ) # (9)
\s+
( \- \d+ | \- | \+ \d+ ) # (10)
\s+
( \$ .+? ) # (11)
\s+
( \$ .+? ) # (12)
\s+
( .+? ) # (13)
\s+
( \d+ ) # (14)
Output sample:
** Grp 0 - ( pos 506 , len 98 )
7 4 [63]This is Where I Leave You [64]WB $4,009,345 -41.8% 2,735 -133 $1,466 $29,012,573 $19.8 3
** Grp 1 - ( pos 506 , len 1 )
7
** Grp 2 - ( pos 508 , len 1 )
4
** Grp 3 - ( pos 510 , len 4 )
[63]
** Grp 4 - ( pos 514 , len 25 )
This is Where I Leave You
** Grp 5 - ( pos 540 , len 4 )
[64]
** Grp 6 - ( pos 544 , len 2 )
WB
** Grp 7 - ( pos 547 , len 10 )
$4,009,345
** Grp 8 - ( pos 558 , len 6 )
-41.8%
** Grp 9 - ( pos 565 , len 5 )
2,735
** Grp 10 - ( pos 571 , len 4 )
-133
** Grp 11 - ( pos 578 , len 6 )
$1,466
** Grp 12 - ( pos 585 , len 11 )
$29,012,573
** Grp 13 - ( pos 597 , len 5 )
$19.8
** Grp 14 - ( pos 603 , len 1 )
3
Try this regex:
\d\s[A-Z0-9]\s\[\d\d\][A-Z][a-z]+(\s\b\w+\b){0,}\s(\(\d+\)\s)?\[\d\d\][A-Z]+[a-z]*\s(\(\w+\)\s)?\$(\d{1,3},){2}\d{3}\s-\s?\d+[,.]\d+((%\s\d,\d{1,3}\s-\s?\$?\d{1,3}(,\d{1,3}\s)?)|\s-\s\$\d{1,3},\d{1,3}\s)\s?\$\d{1,3},\d{1,3}(,\d{1,3})*\s\$\d{1,3}(,\d{1,3})*(\.\d+)?(\s\$\d+(\.)?\d+)?\s\d
here: http://regexr.com/39m54
I need a regular expression to validate passwords with the following requirements
Length : minimal 4 chars, maximum 39 chars
Allowed chars : a-z, A-Z, 0-9, minus, underscore, at-sign and dot
Additional : not-repeating and not-incremental like 'aaaa' or '1234' or 'abcd'
^[a-zA-Z0-9#.-_]{4,39}$
Sure this can be done in a verbose manner.
After all, its up to you to define what is sequential.
Just flesh out the rest of this regex:
# ^(?:(a(?!b)|b(?!c)|c(?!d)|d(?!e)|1(?!2)|2(?!3)|3(?!4)|4(?!5)|[#._-])(?!\1)){4,39}$
^
(?:
( # (1 start)
a
(?! b )
| b
(?! c )
| c
(?! d )
| d
(?! e )
# Add the rest of the alphabet here
| 1
(?! 2 )
| 2
(?! 3 )
| 3
(?! 4 )
| 4
(?! 5 )
# Add the rest of the numbers here
| [#._-]
# Add any other sequential symbols here
) # (1 end)
(?! \1 ) # Non-repeating
){4,39}
$
Here is the regex:
ws(s)?://([0-9\.a-zA-Z\-_]+):([\d]+)([/([0-9\.a-zA-Z\-_]+)?
Here is a test pattern:
wss://beta5.max.com:18989/abcde.html
softlion.com likes it:
Test results
Match count: 1
Global matches:
wss://beta5.max.com:18989/abcde.html
Value of each capturing group:
0 1 2 3 4
wss://beta5.max.com:18989/abcde.html s beta5.max.com 18989 /abcde.html
scala does not:
val regex = """ws(s)?://([0-9\.a-zA-Z\-_]+):([\d]+)([/([0-9\.a-zA-Z\-_]+)?""".r
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 58
ws(s)?://([0-9\.a-zA-Z\-_]+):([\d]+)([/([0-9\.a-zA-Z\-_]+)?
My bad, I had an extra [ at the front of the last capturing group.
([/([0-9.a-zA-Z-_]+)?
Java allows intersections and all that, so error ..
ws
( s )?
://
( [0-9\.a-zA-Z\-_]+ )
:
( [\d]+ )
= ( <-- Unbalanced '('
= [ <-- Unbalanced '['
/
( [0-9\.a-zA-Z\-_]+ )?
With everybody else its no problem:
ws
( s )? # (1)
://
( [0-9\.a-zA-Z\-_]+ ) # (2)
:
( [\d]+ ) # (3)
( [/([0-9\.a-zA-Z\-_]+ )? # (4)
So, its good to see (know) the original regex is not what you thought it was.