How to get values inside nested braces using perl - regex

I have a string having list of expressions inside braces. I want to get the details by splitting it in an array.
I have tried like this.
#!/usr/bin/perl
sub main() {
my $string = <STDIN>;
while ($string =~ /(\((?:(?1)|[^()]*+)++\))|[^()\s]++/g)
{
print "$&\n"
}
main();
InPut : (+ (+ 4 3) ( - 3 2) 5)
Output should be : (+ (+ 3 4) ( - 2 3) 5)
(+ 3 4)
( - 2 3)
which i'm trying to store it in an array and then evaluate seprately.. But not sure thats the right approach.
Basically i'm trying to evaluate an expression as below.
4+3 =7 , 3-2 =1 , and then 7+1+5 = 13
Final output should be 13
Can any one kindly help me on this?

Use the following expression /(?=(\((?>[^()]+|(?1))*\)))/g
See it in action here: http://regex101.com/r/eI7iP5

Related

Substitute the markdown italic to html using regex in Perl

To convert the markdown italic text $script into html, I've written this:
my $script = "*so what*";
my $res =~ s/\*(.)\*/$1/g;
print "<em>$1</em>\n";
The expected result is:
<em>so what</em>
but it gives:
<em></em>
How to make it give the expected result?
Problems:
You print the wrong variable.
You switch variable names halfway through.
. won't match more than one character.
You always add one EM element, even if no stars are found.
You always add one EM element, even if multiple pairs of stars are found.
You add the EM element around the entire output, not just the portion in stars.
Fix:
$script =~ s{\*([^*]+)\*}{<em>$1</em>}g;
print "$script\n";
or
my $res = $script =~ s{\*([^*]+)\*}{<em>$1</em>}gr;
print "$res\n";
But that's not it. Even with all the aforementioned problems fixed, your parser still has numerous other bugs. For example, it misapplies italics for all of the following:
**Important**Correct: ImportantYour code: *Important*
4 * 5 * 6 = 120Correct: 4 * 5 * 6 = 120Your code: 4 5 6 = 120
4 * 6 = 20 is *wrong*Correct: 4 * 6 = 20 is wrongYour code: 4 6 = 20 is wrong*
`foo *bar* baz`Correct: foo *bar* bazYour code: `foo bar baz`
\*I like stars\*Correct: *I like stars*Your code: \I like stars\

Using the text replaced in the replace text

I'm using the version 7.3.3 of Notepad++.
I have this list of numbers to 1.000.000.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
I want to add XML tags to these numbers like this:
<SerialNumber>
<SN>1</SN>
</SerialNumber>
<SerialNumber>
<SN>2</SN>
</SerialNumber>
<SerialNumber>
<SN>3</SN>
</SerialNumber>
<SerialNumber>
<SN>4</SN>
</SerialNumber>
So I need a regular expression to find a number ended with a \n\r, and use the number I've found with the regular expression in the text that I'm going to add.
Do you know to do that in Notepad++?
I have tried with \d{*} but it is not a valid regular expression.
Open the "Replace" menu (Search > Replace...).
And set like that :
- Find what: (\d+)[(\r)?\n]
- Replace with : <SerialNumber>\r\n <SN>$1</SN>\r\n</SerialNumber>
- Check : Search mode -> Regular expression
Then press Replace All. To even match the last number, add an empty line at the end of your file.
I hope it helps.
if we take your [1.... n] numbers in string called strData then :
var nums = strData.split("\r\n").map(function(item) {
return parseInt(item, 10);
});
var strXmlOut = nums.map(function(n) {
return "<SerialNumber><SN>" + n + "</SN></SerialNumber>";
}).join("\r\n");
and the one call version would be :
var xmlOut = strData.split("\r\n")
.map(function(item) {return parseInt(item, 10);})
.map(function(n) {return "<SerialNumber><SN>" + n + "</SN></SerialNumber>";})
.join("\r\n");

Regular expression for contents of parenthesis in Racket

How can I get contents of parenthesis in Racket? Contents may have more parenthesis. I tried:
(regexp-match #rx"((.*))" "(check)")
But the output has "(check)" three times rather than one:
'("(check)" "(check)" "(check)")
And I want only "check" and not "(check)".
Edit: for nested parenthesis, the inner block should be returned. Hence (a (1 2) c) should return "a (1 2) c".
Parentheses are capturing and not matching.. so #rx"((.*))" makes two captures of everything. Thus:
(regexp-match #rx"((.*))" "any text")
; ==> ("any text" "any text" "any text")
The resulting list has the first as the whole match, then the first set of acpturnig paren and then the ones inside those as second.. If you want to match parentheses you need to escape them:
(regexp-match #rx"\\((.*)\\)" "any text")
; ==> #f
(regexp-match #rx"\\((.*)\\)" "(a (1 2) c)")
; ==> ("(a (1 2) c)" "a (1 2) c")
Now you see that the first element is the whole match, since the match might start at any location in the search string and end where the match is largest. The second element is the only one capture.
This will fail if the string has additional sets of parentheses. eg.
(regexp-match #rx"\\((.*)\\)" "(1 2 3) (a (1 2) c)")
; ==> ("(1 2 3) (a (1 2) c)" "1 2 3) (a (1 2) c")
It's because the expression isn't nesting aware. To be aware of it you need recursive reguler expression like those in Perl with (?R) syntax and friends, but racket doesn't have this (yet???)

How to use foreach inside subst in Tcl (template iteration)?

Does anyone know how if there is a way to include a foreach loop in the subst command, to get a pseudo-template effect?
For example, the following works:
set lim 3
set table sldkfjsl
set sqlpat {
select * from $table limit $lim
}
set sqltext [subst $sqlpat]
But I would like to do something like
set sqlpat {
foreach i {1 2 3} {
select * from ${table}_$i limit $lim;
}
}
set sqltext [subst $sqlpat]
And have it give three separate lines of sql:
select * from sldkfjsl_1 limit 3
select * from sldkfjsl_2 limit 3
select * from sldkfjsl_3 limit 3
Any ideas? Thanks!
(EDIT, my solution which sort of shows how build a strfor command that can be used in a subst template, in my case for passing both SQL and gnuplot code to their respective programs):
proc strfor { nms vals str } {
set outstr ""
foreach $nms $vals {
append outstr [subst $str]
}
return $outstr
}
set foostr1 {select $a from table_$b;\n}
set x [strfor {a b} {A 1 B 2 C 3 D 4} $foostr1]
set foostr2 {
blahsd line 1
blahg line 2
[strfor {a b} {A 1 B 2 C 3 D 4} {
forline1 $a $b
forline2 $b $a
}]
blah later
}
puts [subst $foostr2]
The looping commands in Tcl do not return values, so they are useless in a string which is processed with subst. It is of course possible to write an accumulating looping command as you have done. Another possibility is to use lmap. However, the problem can be solved in an easier way.
set lim 3
set table sldkfjsl
We're going to make a list where every item is an instance of a literal template with variable substitutions. First we create an empty list:
set sqlpats {}
Then we loop for each value in the sequence 1..3. For every iteration we append an instance of the template to the list:
foreach i {1 2 3} {
lappend sqlpats "select * from ${table}_$i limit $lim"
}
(subst isn't necessary here, ordinary variable substitution is sufficient.)
Create a resulting string from the list, with newlines between each item (yep, I was wrong, one more command was needed):
join $sqlpats \n
ETA:
subst is one of those commands which is nice to have, but that I for one almost never use. For most purposes, simpler measures will do. Once in a while though, a convoluted bit of code leaves a string unsubstituted. I pick up subst out of the drawer and zap! That said, the ability to selectively allow or disallow different kinds of substitutions alone makes subst very worthwhile.
Documentation: foreach, join, lappend, lmap, set, subst

Scala Regular Expression Oddity

I have this regular expression:
^(10)(1|0)(.)(.)(.)(.{18})((AB[^|]*)\||(AQ[^|]*)\||(AJ[^|]*)\||(AF[^|]*)\||(CS[^|]*)\||(CR[^|]*)\||(CT[^|]*)\||(CK[^|]*)\||(CV[^|]*)\||(CY[^|]*)\||(DA[^|]*)\||(AO[^|]*)\|)+AY([0-9]*)AZ(.*)$
To give it a bit of organization, there's really 3 parts:
// Part 1
^(10)(1|0)(.)(.)(.)(.{18})
// Part 2
// Optional Elements that begin with two characters and is terminated by a |
// May appear at most once
((AB[^|]*)\||(AQ[^|]*)\||(AJ[^|]*)\||(AF[^|]*)\||(CS[^|]*)\||(CR[^|]*)\||(CT[^|]*)\||(CK[^|]*)\||(CV[^|]*)\||(CY[^|]*)\||(DA[^|]*)\||(AO[^|]*)\|)+
// Part 3
AY([0-9]*)AZ(.*)$
Part 2 is the part that I'm having trouble with but I believe the current regular expression says any of these given elements will appear one or more times. I could have done something like: (AB.*?|) but I don't need the pipe in my group and wasn't quite sure how to express it.
This is my sample input - it's SIP2 if you've seen it before (please disregard checksum, I know it's not valid):
101YNY201406120000091911AOa|ABb|AQc|AJd|CKe|AFf|CSg|CRh|CTi|CVj|CYk|DAl|AY1AZAA71
This is my snippet of Scala code:
val regex = """^(10)(1|0)(.)(.)(.)(.{18})((AB[^|]*)\||(AQ[^|]*)\||(AJ[^|]*)\||(AF[^|]*)\||(CS[^|]*)\||(CR[^|]*)\||(CT[^|]*)\||(CK[^|]*)\||(CV[^|]*)\||(CY[^|]*)\||(DA[^|]*)\||(AO[^|]*)\|)+AY([0-9]*)AZ(.*)$""".r
val msg = "101YNY201406120000091911AOa|ABb|AQc|AJd|CKe|AFf|CSg|CRh|CTi|CVj|CYk|DAl|AY1AZAA71"
val m = regex.findFirstMatchIn(msg)) match {
case None => println("No match")
case Some(x) =>
for (i <- 0 to x.groupCount) {
println(i + " " + x.group(i))
}
}
This is my output:
0 101YNY201406120000091911AOa|ABb|AQc|AJd|CKe|AFf|CSg|CRh|CTi|CVj|CYk|DAl|AY1AZAA71
1 10
2 1
3 Y
4 N
5 Y
6 201406120000091911
7 DAl|
8 ABb
9 AQc
10 AJd
11 AFf
12 CSg
13 CRh
14 CTi
15 CKe
16 CVj
17 CYk
18 DAl
19 AOa
20 1
21 AA71
Note the entry that starts with 7. Can anyone explain why that's there?
I'm using Scala 2.10.4 but I believe regular expressions in Scala simply uses Java's regular expression. I'm certainly open to other suggestions for parsing strings.
EDIT: Based on wingedsubmariner's response, I was able to fix my regular expression:
^(10)(1|0)(.)(.)(.)(.{18})(?:AB([^|]*)\||AQ([^|]*)\||AJ([^|]*)\||AF([^|]*)\||CS([^|]*)\||CR([^|]*)\||CT([^|]*)\||CK([^|]*)\||CV([^|]*)\||CY([^|]*)\||DA([^|]*)\||AO([^|]*)\|)+AY([0-9]*)AZ(.*)$
Basically adding ?: to indicate I was not interested in the group!
You get a matched group for each set of parentheses, the order being the order of the opening parenthesis in the regex. Matched group 7 corresponds to the opening parenthesis that begins your "Group 2":
((AB[^|]*)\||(AQ[^|]*)\||(AJ[^|]*)\||(AF[^|]*)\||(CS[^|]*)\||(CR[^|]*)\||(CT[^|]*)\||(CK[^|]*)\||(CV[^|]*)\||(CY[^|]*)\||(DA[^|]*)\||(AO[^|]*)\|)+
^
|
This parenthesis
Each matched group takes on the value of the last part of the text that matched, which in this case is DAl| because it was the last piece of text to match the "Group 2" expression.
Here is a simpler example that demonstrates the behavior:
val regex = """((A)\||(B)\|)+""".r
val msg = "A|B|A|B|"
regex.findFirstMatchIn(msg) match {
case None => println("No match")
case Some(x) =>
for (i <- 0 to x.groupCount) {
println(i + " " + x.group(i))
}
}
Which produces:
0 A|B|A|B|
1 B|
2 A
3 B