Regular Expressions: Lookback to only the first occurrence (non-greedy lookback?)

Regular Expressions: Lookback to only the first occurrence (non-greedy lookback?) - regex

Here's the problem:
XML:
<userPermissions>
<enabled>true</enabled>
<name>ViewPublicReports</name>
</userPermissions>
<userPermissions>
<enabled>true</enabled>
<name>ViewRoles</name>
</userPermissions>
<userPermissions>
<enabled>true</enabled>
<name>ViewSetup</name>
</userPermissions>
What I'm trying to match is:
<userPermissions>
<enabled>true</enabled>
<name>ViewRoles</name>
</userPermissions>
All the patterns that I've managed to put together matches up to the first string:
(?<=<userPermissions>)[\s\S]+?ViewRoles[\s\S]*?<\/userPermissions>
Not quite sure how to make the backwards match from "ViewRoles" non-greedy.
Thanks in advance for your help.
*Edit: I'm using a tool that deploys metadata between Salesforce instances, which are captured as XML. The tool provides a "find/replace" functionality that uses regex for the "find." I don't have the option of using an XML parser.

This <userPermissions>(?:(?!</userPermissions>)[\S\s])*?ViewRoles[\S\s]*?</userPermissions>
matches that tag.
Formatted
<userPermissions>
(?:
(?! </userPermissions> )
[\S\s]
)*?
ViewRoles
[\S\s]*?
</userPermissions>

It has been told, but the correct way to extract this would be to use an XML parser. However, you can also use the following regex:
(.+\n){2}.+ViewRoles.+\n.+
Which actually matches the following structure:
2 rows without restrictions
a row that includes "ViewRoles"
another row without restrictions

Related

regex to match link inside xml with last mod

<?xml version='1.0' encoding='UTF-8'?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://google.com/2020/08/this1.html</loc><lastmod>2020-08-06T11:30:55Z</lastmod></url>
<url><loc>https://google.com/2020/08/this2.html</loc><lastmod>2020-08-05T11:30:06Z</lastmod></url>
<url><loc>https://google.com/2020/08/this3.html</loc><lastmod>2020-08-06T11:29:25Z</lastmod></url>
</lastmod></url></urlset>
I'm trying to get links from above xml to get links which has lastmod of 2020-08-06
my regex code is https:.+2020-08-05.+<\/url
but it ended up getting it all from 1st and last link
I want to match only
<url><loc>https://google.com/2020/08/this1.html</loc><lastmod>2020-08-06T11:30:55Z</lastmod></url>
<url><loc>https://google.com/2020/08/this3.html</loc><lastmod>2020-08-06T11:29:25Z</lastmod></url>

/<loc>(.+)<\/loc>.*2020-08-06/g
capturing the group between loc tags
Demo and explanation here:
https://regex101.com/r/HBvG3K/8

A very easy and stupid regex - see regexr:
.*<lastmod>2020-08-06.*

Regex to find subelement in XML

I am using the Regular Expression search feature in Notepad++ to find matches in a few hundred files.
My goal is to find a parent/child combo in in each. I don't care a lot about what specifically is selected (parent and child or just child). I just want to know if the parent contains a specific child.
I want to find a parent element that also has a child element.
Example of what it should find (since one of the sub-elements is a ):
<description>
<otherstuff>
</otherstuff>
<something>
</something>
<description>
</description>
<otherstuff>
</otherstuff>
</description>
Example of what it should NOT find:
<description>
<otherstuff>
</otherstuff>
<something>
</something>
<notadescription>
</notadescription>
<otherstuff>
</otherstuff>
<description>
Each may have other children and sub children as well. They both also may be in the same document.
If I search for this:
<description>(.*)<description>(.*)</description>
It selects too much, because it will select another top level when I only want it to select the child for that 2nd piece.

You said you're working with Notepad++, here here a way to go:
Ctrl+F
Find what: <description>(?:(?!</description).)*<description>(?:(?!<description>).)*</description>
check Match case
check Wrap around
check Regular expression
CHECK . matches newline
Explanation:
<description> # opening tag
(?:(?!</description).)* # tempered greedy token, make sure we have not closing tag before:
<description> # opening tag
(?:(?!<description>).)* # tempered greedy token, make sure we have not opening tag before:
</description> # closing tag
Screen capture:

You should not use (.*) it is greedy
here is an example why you shouldn't use it in you case
<description>
<otherstuff>
</otherstuff>
<description>
<description>hello<\description>
</description>
<\description>
Supposing that here we use <description>(.*)<description>(.*)</description>
It will parse:
<description>
<description>hello<\description>
</description>
<\description>
So if you want to parse only what is inside the 2nd description you should use (.*?) it is called non greedy
Using <description>(.*)<description>(.*?)</description> will parse:
<description>
<description>hello<\description> # end of parse
# here <\description> is missing cause (.*?) will look only for the first match
So you must use (.*?) it will stop parsing right when it found the first end match, but (.*) is greedy so it will look for largest match possible
So if you use <description>(.*)<description>(.*?)</description> it will be fine, cause it will parse only what is inside the sub description in your case

I'm guessing that we'd be designing an expression to exclude <notadescription>, such as:
<description>(?!<notadescription>)[\s\S]*<\/description>
which if we would be capturing the description element, we might want a capturing group:
(<description>(?!<notadescription>)[\s\S]*<\/description>)
Demo

How to get the index by regular expression in ANT

I have a string with a version as .v_september (every month it will vary). In this i wanted to take the value after underscore, which means "sep" (First 3 letters alone).
By using the regex .v_(.*) i am able to take the complete month and not able to get the first 3 letters alone.
Can someone help me out how can I achieve this in Apache ANT.
Thanks !

Regex functions on properties are a bit awkward in native Ant (as opposed to working with text within files). Ant-contrib has the replaceregexp task, but I try to avoid ant-contrib whenever possible.
Instead, it can be accomplished with the loadfile task and a nested filter:
<property name="version" value=".v_september" />
<loadfile property="version.month.short">
<propertyresource name="version" />
<filterchain>
<tokenfilter>
<replaceregex pattern="\.v_(.{3}).*" replace="\1" />
</tokenfilter>
</filterchain>
</loadfile>
<echo message="${version.month.short}" />
Regarding the regex pattern, note how it needs to end with .*. This is because Ant doesn't have a "match" function that simply returns the content of a capture group. It's just running a replacement, so we need to replace everything in the string that isn't part of the group.

.* will capture everything and for limiting to capturing only three characters you need to write {3} instead of *. Also you should escape the . in the beginning of your regex to only match a literal dot. You can use this regex and capture from group1,
\.v_(.{3})
Demo

Regex - Multiline extraction

Using the enclosed regex I'm able to match extract the 'model_name' value when nfc_support" value="true in a few instances. However, I'm unable to get it to match is other instances as displayed below. Any help in getting it to match in both instances would be greatly appreciated.
EX:
<capability name=\"model_name\"[A-Za-z1-9"=();,._/<>\s]*<capability name=\"nfc_support\" value=\"true\"/>
Will work with:
<capability name="model_name" value="T11"/>
<capability name="brand_name" value="Turkcell"/>
<capability name="marketing_name" value="Campaign"/>
</group>
<group id="chips">
<capability name="nfc_support" value="true"/>
</group>
But cannot match this:
<capability name="model_name" value="U8650"/>
<capability name="brand_name" value="Huawei"/>
<capability name="marketing_name" value="Sonic"/>
</group>
<group id="chips">
<capability name="nfc_support" value="true"/>

Your regex will match everything between the first model_name and the last nfc_support = true, because you use the greedy * quantifier. This is a problem if you have multiple occurences of nfc_support in the same string you are applying the regex to, as it will keep searching until it finds <capability name = "nfc_support" value = "true"/>. A better practice to selectively match text that may appear multiple times is to use the reluctant greedy quantifier: *?, to avoid matching too much.
Assuming all lines will follow a format of model_name, brand_name, marketing_name, /group, group id, then nfc_support, a regex that enforces this format is:
(?s)<capability name=\"model name\" value=\"(.*?)\"/>\n<capability name=\"brand_name\" value=\"(.*?)\"/>\n<capability name=\"marketing_name\" value=\"(.*?)\"/>\n</group>\n<group_id=\"chips\">\n<capability name=\"nfc_support\" value=\"true\"/>
Apologies in advance if there are typos in this regex, but you get the gist of it...
This regex will store the values of model_name, brand_name, and marketing_name into groups $1, $2, and $3, respectively, only if nfc_support is "true." The (?s) enables multiline searching.

Forgive me if I'm wrong, but it looks like your expression of:
[A-Za-z1-9"=();,._/<>\s]
does not account for a 0 in your character class (showing as 1-9) and should thus be:
[A-Za-z0-9"=();,._/<>\s]
EDIT: This is in regards to your example of a non-match for "model_name" value="U8650"

How to find a word within text using XSLT 2.0 and REGEX (which doesn't have \b word boundary)?

I am attempting to scan a string of words and look for the presence of a particular word(case insensitive) in an XSLT 2.0 stylesheet using REGEX.
I have a list of words that I wish to iterate over and determine whether or not they exist within a given string.
I want to match on a word anywhere within the given text, but I do not want to match within a word (i.e. A search for foo should not match on "food" and a search for bar should not match on "rebar").
XSLT 2.0 REGEX does not have a word boundary(\b), so I need to replicate it as best I can.

You can use alternation to avoid repetition:
<xsl:if test="matches($prose, concat('(^|\W)', $word, '($|\W)'),'i')">

If your XSLT 2.0 processor is Saxon 9 then you can use Java regular expression syntax (including \b) with the functions matches, tokenize and replace by starting the flag attribute with an exclamation mark:
<xsl:value-of select="matches('all foo is bar', '\bfoo\b', '!i')"/>
Michael Kay mentioned that option recently on the XSL mailing list.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expressions: Lookback to only the first occurrence (non-greedy lookback?) - regex

This <userPermissions>(?:(?!</userPermissions>)[\S\s])?ViewRoles[\S\s]?</userPermissions> matches that tag. Formatted <userPermissions> (?: (?! </userPermissions> ) [\S\s] )? ViewRoles [\S\s]? </userPermissions>

It has been told, but the correct way to extract this would be to use an XML parser. However, you can also use the following regex: (.+\n){2}.+ViewRoles.+\n.+ Which actually matches the following structure: 2 rows without restrictions a row that includes "ViewRoles" another row without restrictions

Related

regex to match link inside xml with last mod

Regex to find subelement in XML

How to get the index by regular expression in ANT

Regex - Multiline extraction

How to find a word within text using XSLT 2.0 and REGEX (which doesn't have \b word boundary)?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expressions: Lookback to only the first occurrence (non-greedy lookback?) - regex

This <userPermissions>(?:(?!</userPermissions>)[\S\s])*?ViewRoles[\S\s]*?</userPermissions> matches that tag. Formatted <userPermissions> (?: (?! </userPermissions> ) [\S\s] )*? ViewRoles [\S\s]*? </userPermissions>

It has been told, but the correct way to extract this would be to use an XML parser. However, you can also use the following regex: (.+\n){2}.+ViewRoles.+\n.+ Which actually matches the following structure: 2 rows without restrictions a row that includes "ViewRoles" another row without restrictions

Related

regex to match link inside xml with last mod

Regex to find subelement in XML

How to get the index by regular expression in ANT

Regex - Multiline extraction

How to find a word within text using XSLT 2.0 and REGEX (which doesn't have \b word boundary)?

Categories

Resources

This <userPermissions>(?:(?!</userPermissions>)[\S\s])?ViewRoles[\S\s]?</userPermissions> matches that tag. Formatted <userPermissions> (?: (?! </userPermissions> ) [\S\s] )? ViewRoles [\S\s]? </userPermissions>