How to use regex to extract nested patterns - regex

Hi I'm struggling with some regex
I've got a string like this:
a:b||c:{d:e||f:g}||h:i
basically name value pairings. I want to be able to parse out the pairings so I get:
a:b
c:{d:e||f:g}
h:i
then I can further parse the pairings contained in { } if required
It is the nesting that is making me scratch my head. Any regex experts out there that can give me a hand?
thanks,
Rob

Arbitrarily nested patterns is irregular. So, no, you can't just use regex to parse this.

Is there any limit on the depth of nesting in your strings ? If not your language is not regular and regular expressions are the wrong tool -- as you are discovering already.

Related

Regex force group order

I'm new in regex and I have a question.
Like in this example, https://regex101.com/r/Iak7cF/1/ how do I force
src="wow"
to be in group 1, and
title="toto"
to be in group 2?
I want to capture this kind of text in any order only if it contains:
class="formula"
Am I doing it right?
You'd better use an HTML parser
But if you really want to use regex, you have to use named groups to achieve what you want.
<img(?=[^>]*class="formula")(?=.*(?<src>src=".*"))(?=.*(?<title>title=".*")).*>
DEMO
Regular expressions are very flexible and powerful, but in general, they are not the right tool for parsing XML, HTML, or XHTML. From WinBatch:
Regular Expressions are only good for parsing text that is tightly defined. Since Regular Expressions don't really understand the context of matches, they can be fooled in a big way if the structure of the text changes. In particular, Regular Expressions have difficulty with hierarchy.
PerlMonks has a detailed explanation of why regex is not a good solution for all but the most simple of casess. They summarize it like this:
So I hope it is clear: Please, don't try to parse arbitrary XML/HTML with regexes!

Using a regular expression to insert text in a match

Regular Expressions are incredible. I'm in my regex infancy so help solving the following would be greatly appreciated.
I have to search through a string to match for a P character that's not surrounded by operators, power or negative signs. I then have to insert a multiplication sign. Case examples are:
33+16*55P would become 33+16*55*P
2P would become 2*P
P( 33*sin(45) ) would become P*(33*sin(45))
I have written some regex that I think handles this although I don't know how using regex I can insert a character:
The reg is I've written is:
[^\^\+\-\/\*]?P+[^\^\+\-\/\*]
The language where the RegEx will be used is ActionScript 3.
A live example of the regex can be seen at:
http://www.regexr.com/39pkv
I would be massively grateful if someone could show me how I insert a multiplication sign in middle of the match ie P2, becomes P*2, 22.5P becomes 22.5P
ActionScript 3 has search, match and replace functions that all utilise regular expressions. I'm unsure how I'd use string.replace( expression, replaceText ) in this context.
Many thanks in advance
Welcome to the wonder (and inevitable frustration that will lead to tearing your hair out) that is regular expressions. You should probably read over the documentation on using regular expressions in ActionScript, as well as this similar question.
You'll need to combine RegExp.test() with the String.replace() function. I don't know ActionScript, so I don't know if it will work as is, but based on the documentation linked above, the below should be a good start for testing and getting an idea of what the form of your solution might look like. I think #Vall3y is right. To get the replace right, you'd want to first check for anything leading up to a P, then for anything after a P. So two functions is probably easier to get right without getting too fancy with the Regex:
private function multiplyBeforeP(str:String):String {
var pattern:RegExp = new RegExp("([^\^\+\-\/\*]?)P", "i");
return str.replace(pattern, "$1*P");
}
private function multiplyAfterP(str:String):String {
var pattern:RegExp = new RegExp("P([^\^\+\-\/\*])", "i");
return str.replace(pattern, "P*$1");
}
Regex is used to find patterns in strings. It cannot be used to manipulate them. You will need to use action script for that.
Many programming languages have a string.replace method that accepts a regex pattern. Since you have two cases (inserting after and before the P), a simple solution would be to split your regex into two ([^\^\+\-\/\*]?P+ and P+[^\^\+\-\/\*] for example, this might need adjustment), and switch each pattern with the matching string ("*P" and "P*")

matching table tag by regular expression in php

I need to match a substring in php substring is like
<table class="tdicerik" id="dgVeriler"
I wrote a regular expression to it like <table\s*\sid=\"dgVeriler\" but it didnot work where is my problem ?
You forgot a dot:
<table\s.*\sid="dgVeriler"
would have worked.
<table\s+.*?\s+id="dgVeriler"
would have been better (making the repetition lazy, matching as little as possible).
<table\s+[^>]*?\s+id="dgVeriler"
would have been better still (making sure that we don't accidentally match outside of the <table>tag).
And not trying to parse HTML with regular expressions, using a parser instead, would probably have been best.
I dont know what you want get but try this:
<table\s*.*id=\"dgVeriler\"

What is a better way to write this regular expression?

I am converting XML children into the element parameters and have a dirty regex script I used in Textmate. I know that dot (.) doesn't search for newlines, so this is how I got it to resolve.
Search
language="(.*)"
(.*)<education>(.*)(\n)?(.*)?(\n)?(.*)?(\n)?(.*)?</education>
(.*)<years>(.*)</years>
(.*)<grade>(.*)</grade>
Replace
grade="$13" language="$1" years="$11">
<education>$3$4$5$6$7$8$9</education>
I know there's a better way to do this. Please help me build my regex skills further.
Use an xml parser, don't use regex to parse xml.
If there are no other tags inside the <education> element, I would change that part to:
<education>([^<>]*)</education>
If possible, I would use the same technique everywhere else you're using .*. In the case of the language attribute, it would take this form:
language="([^"]*)"

Algorithm to get a Regex

Something like this is on my mind: I put one or a few strings in, and the algorithm shows me a matching regex.
Is there an "easy" way to do this, or does something like this already exist?
Edit 1: Yes, I'm trying to find a way to generate regex.
Edit 2: Regulazy is not what I am looking for. The common use for the code I want is to find a correct RegEx; for example, article numbers:
I put in 123456, the regex should be \d{6}
I put in nb-123456, the regex should be \w{2}-\d{6}
If you have Emacs you can use regexp-opt. For example, evaluating:
(regexp-opt (list "my" "list" "of" "some" "strings" "to" "search"))
returns
"list\\|my\\|of\\|s\\(?:earch\\|ome\\|trings\\)\\|to"
Perl can do it: http://www.hakank.org/makeregex/
So does ruby: http://www.toolbox-mag.de/data/makeregex.html
Note: not so perfect solution.
And there is a CLI tool: txt2regex.
There was txt2re, once upon a time...
It sounds like you want an algorithm to generate a regular grammar based on some samples. In a lot of cases, there are many possible grammars for a given set of examples--there can even be infinite possible grammars. Of course, the possibilities can be limited by a second set of required non-matches, which can limit it to zero possibilities if the non-matching strings are too inclusive.
txt2re does something like this.
How about the following (matches every string)?
.*
I think that Regulazy by Roy Osherove does this to a certain extent, or it may be Regulator. BOth are on this page:
http://weblogs.asp.net/rosherove/pages/tools-and-frameworks-by-roy-osherove.aspx
if your input strings are not random strings and they are based on some rules, by using a parser (i.e. jflex), you can create a regex generator which will generate a regex w.r.t. the given strings.
Look at txt2re.
This site holds a form that takes a sample string and generates a regex pattern that can match the given string.
Then it generates the corresponding script for the following languages: Perl, PHP, Python, Java, Javascript, ColdFusion, C, C++ Ruby, VB, VBScript, J#.net, C#.net, C++.net, VB.net