xml Regex matching the whole xml file - regex

I need a regular expression that given the following XML, will give me all the products (productos) that have 'Bebidas' as a category (categoria), and I have to do this in Sublime Text, so only have the option to use a regular expression (no dedicated XML parser allowed):
XML File www.ethgf.com/electricos.xml
I have a problem when I use (?s)<producto>(.+?Bebidas.+?)<\/producto> because it highlights almost all the XML (the first 'producto' tag through the last tag closure).

Since the question is about selecting the whole <product> nodes, you can use the following regex:
(?s)<product>(?:\s*<(\w+)>[^<]*?<\/\1>\s*)*?\s*<category>Drinks<\/category>(?:\s*<(\w+)>[^<]*?<\/\2>\s*)*?\s*<\/product>
It will highlight all <product> nodes that contain Drinks category, even if the nodes are not following some strict order:

Related

Regex to get comma within xml tags

I am new to regex. I have an xml like
<Root xmlns="rooter"><add>This is an example, test</add></Root>, 123, test, 8765
I want to find only comma which is within the xml tags
I have tried
<Root.*\,.*</Root>
and
<Root.*>(\,)
It return the xml tag but I want only comma and replace with other character.
I want to replace this comma with other character in atom text editor. If I replace it, it should be like
<Root xmlns="rooter"><add>This is an example# test</add></Root>, 123, test, 8765
The following regex will work if the text is the same format as you have defined above.
,(?=[^\/<]*<\/)
I have used look ahead here. You can check the link for more details.
https://www.regular-expressions.info/lookaround.html

REGEXP_LIKE to match xml tag content that is not like a specific string

I'm trying to do a regular expression matching with REGEXP_LIKE and I'm looking for a regexp to find if the value of a specific tag is not a specific string.
For example:
<person>
<name>John</name>
<age>40</age>
</person>
My goal is to validate that the name tag's value is not John, so the REGEXP_LIKE would return true for input xmls where name is not John.
Thank you in advance for the help!
A quick and easy way to do this is to simply negate the regex search:
... WHERE NOT REGEXP_LIKE('column_name', '<name>John</name>')
However, as should be mentioned every time a question like this is posted, it's generally a bad idea to parse XML with regex. If you find yourself constructing more complex regex patterns to search this XML data, then you should:
Use an XML parser instead of regular expressions, or
Change how you are storing the data! Make person.age a separate table column; don't bung the entire XML structure into a single place.

KML Inserting a Specific Tag between Two Other Tags Based on a Condition

TLDR - Insert the Style tag and its contents (see code blocks) between the name and description tag only if the description mentions the phrase "the office" in order to change the current Google Earth placemark from the default yellow one to a custom one...
Hi guys, I’m having a bit of trouble figuring this one out…
Using Notepadd++ I am editing a Google Earth kml file where I have many placemarks that follow this XML pattern:
<Placemark>
<name>Jim</name>
<description>Jim goes to the office every day</description>
<TimeSpan>
<begin>2016-06-20T12:00:00Z</begin>
<end>2016-06-25T12:00:00Z</end>
</TimeSpan>
<Point>
<coordinates>123412341234,123412341234,1</coordinates>
</Point>
</Placemark>
I would like to find every instance of the phrase “the office”. If that text is found, the code below is inserted between name and description in a fashion that would be readable by Google earth.
<Style id="customstyle">
<IconStyle>
<color>a1ff00ff</color>
<scale>1.5</scale>
<Icon>
<href>http://maps.google.com/mapfiles/kml/shapes/shaded_dot.png</href>
</Icon>
</IconStyle>
</Style>
Doing this would change the placemark from the default one to a custom one.
All of the tutorials I have found thus far, have been instructions on how to add words or phrases to the beginning or end of a line using Notepad++ regex, or the instructions show how to insert text on the next line using \n.
However, I think my situation is unique in that based on a certain criteria I want to insert my text above the line. (more specifically insert my text between the name and description tags)
The end result would look something like this (notice how the text I wanted to add is now in between the name tag and the description tag)
<Placemark>
<name>Jim</name>
<Style id="customstyle">
<IconStyle>
<color>a1ff00ff</color>
<scale>1.5</scale>
<Icon>
<href>http://maps.google.com/mapfiles/kml/shapes/shaded_dot.png</href>
</Icon>
</IconStyle>
</Style>
<description>Jim goes to the office every day</description>
<TimeSpan>
<begin>2016-06-15T12:00:00Z</begin>
<end>2016-06-20T12:00:00Z</end>
</TimeSpan>
<Point>
<coordinates>2135125,1234523451,12341234</coordinates>
</Point>
</Placemark>
Now all placemarks that followed that pattern would have a different type of placemark than the default one (and i would no longer have a headache).
Thanks in advance all.
Well, the regex doesn't really need to match something before the line.
It just needs to put something with lines before your match.
So it's still a fairly simple thing to do.
So using Notepad++
Find What : (\s+)(<description>)(?=.*?the office.*?<\/description>)
Replace with : $1<Style id="customstyle">$1\t<IconStyle>$1\t\t<color>a1ff00ff</color>$1\t\t<scale>1.5</scale>$1\t\t<Icon>$1\t\t\t<href>http://maps.google.com/mapfiles/kml/shapes/shaded_dot.png</href>$1\t\t</Icon>$1\t</IconStyle>$1</Style>$1$2
Search mode : Regular Expression
Note that the whitespaces before the description tag are put in capture group 1.
That's a trick make an insert with the same indentation as the tag.
But you could also just put in tags without whitespaces.
Find What : (<description>)(?=.*?the office.*?<\/description>)
Replace with : <Style id="customstyle"><IconStyle><color>a1ff00ff</color><scale>1.5</scale><Icon><href>http://maps.google.com/mapfiles/kml/shapes/shaded_dot.png</href></Icon></IconStyle></Style>$1
And then use a plugin like "XML Tools" to "Pretty Print" your XML.

RegEx for mining XML tag content

Fellow Forum Members,
I am using the latest NotePad++. I have 430 separate XML files and my goal is to make a "dmcode" list of all 430 XML files. The dmcode identifies each XML file and looks like the example code shown below. I need help in developing a Regular Expression that will grab the dmcode tag content located between the <dmCode opening tag and the closing /> terminator. Also I only need this extraction to only apply to dmcode tags that follows the <dmIdent> tag. In other words, any dmcode tag that is not preceded by a <dmIdent> tag does not end up on my NotePad++ search result list. Is such a Regular Expression that can pull targeted data from a lot of XML files possible?
<dmIdent>
<dmCode assyCode="00" disassyCode="00" disassyCodeVariant="00" infoCode="042" infoCodeVariant="A" itemLocationCode="O" modelIdentCode="SASA" subSubSystemCode="6" subSystemCode="0" systemCode="A03" systemDiffCode="XY"/>
As an alternative I have been researching using an XPath expression to accomplish the same task. However, I can't seem to find a NotePad++ XPath plugin that will enable me to specify the data I want to extract from 430 XML files by using an XPath expression instead of a Regular Expression. I will also appreciate it if anyone can provide an example of an XPath expression that will perform the same task I'm trying to accomplish by using a Regular Expression.
Any help will be greatly appreciated.
I know there are plugins for XPath, but I don't know one that allows you to search several files. The following XPath would match all attributes in <dmCode> as a child of the root element <dmIdent>:
/dmIdent/dmCode[#*]
I need help in developing a Regular Expression that will grab the dmcode tag content located between the <dmCode opening tag and the closing /> terminator. Also I only need this extraction to only apply to dmCode tags that follow the <dmIdent> tag.
This will work for the most simple cases, where:
<dmCode> is the first child of <dmIdent>
There are no comments, CDATA tags, or similar constructs that could make it fail.
(?i)<dmIdent>\s*<dmCode \K[^"/>]*(?>(?:"[^\\"]*(?:\\.[^\\"]*)*"|/(?!>))[^"/>]*)*(?=/>)
regex101 demo
Matches:
(?i)<dmIdent>\s*<dmCode both tags spearated by whitespace (case-insensitively)
\K resets the matched text
[^"/>]* Any characters except ", / or >
And loops:
"[^\\"]*(?:\\.[^\\"]*)*" text in quotes, or
/(?!>) a / not followed by >
both followed by the previous [^"/>]*
(?=/>) All followed by />

Regular Expressions - match content in XML page

I am new to regular expressions and need to write one that will pull certain data out of an XML page. For instance,
<name>Number of test runs</name>
<value>2</value>
The only number I need to pull is the 2. I want it to look at the XML tag Name so I don't pull from any other numbers on the page. Below is what I have but I am matching all the content instead of just the 2. Any help would be appreciative.
Current Regular Expression:
/<name>Number of Failed BGPs</name>\n<value>(.+?)/
You said the problem is that it's matching all the content, not just the value (2). But you do need to match all the content to ensure it's the correct <name> tag.
The distinction you want is the matched group, designated by parens.
/<name>Number of Failed BGPs<\/name>\n<value>(.+?)<\/value>/
You want to get the first matched group, which should be just the value itself. Notice I also added the </value> tag to the regex. If you don't, your lazy quantifier would pick up only the first digit.