This question already has answers here:
Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
(10 answers)
Closed 5 years ago.
I have a xml file with this data format
<row Id="9" Body="aaaaaaaaa" Target="123456" />
I want to find & replace all Body="" things with a space from my xml file. What is the regex for that?
There are many possibilities, here is one way to remove the content from the Body attribute
(<row.*Body=").*?("[^>]+>)
This creates two capturing groups for the content before and after the Body attribute. Then, you just use those capturing groups for the replacement:
$1$2
It will transform:
<row Id="9" Body="aaaaaaaaa" Target="123456" />
Into:
<row Id="9" Body="" Target="123456" />
You can see it working here.
Related
This question already has answers here:
Replacing nested quotes in a faulty string inside XML Attribute using Regular Expression
(4 answers)
How to parse invalid (bad / not well-formed) XML?
(4 answers)
Closed 3 years ago.
I have this xml which could have nested double quotes (not escaped) inside attributes:
<test>
<tag1 att1="This has "nested double quotes"">
<tag2 att2="This also has a nested " double quotes"></tag2>
</tag1>
</test>
I need to find a regex which will select all the nested double quotes, in this case
"nested double quotes"
nested " double
and replace them with the " character. The final xml should be like the following:
<test>
<tag1 att1="This has "nested double quotes"">
<tag2 att2="This also has a nested " double quotes"></tag2>
</tag1>
</test>
Is it possible to achieve this using regex?
This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 5 years ago.
I have an xml response in the following format
...
<field>
<fieldType>DOCUMENT</fieldType>
...
<fieldId>12345</fieldId>
<id>21345</id>
<isActive>F</isActive>
....
</field>
<field>
<fieldType>FOLDER</fieldType>
...
<fieldId>15345</fieldId>
<id>11345</id>
<isActive>T</isActive>
....
</field>
<field>
<fieldType>DOCUMENT</fieldType>
...
<fieldId>98765</fieldId>
<id>57689</id>
<isActive>T</isActive>
....
</field>
...
There are multiple such values in the xml. I need to extract only the fieldId
1. which is active i.e., T
2. which is of fieldType DOCUMENT.
I tried the following regex,
<fieldType>DOCUMENT</fieldType>.+?<fieldId>(.+?)</fieldId>.+?<isActive>T</isActive>
But this is extracting the 1st occurrence of fieldId, 12345 (even though it is not active) instead of 98765
P.s: I am trying to use this regex
An xml parser is probably a better solution. However based on your question this seems to achieve the desired output.
<fieldType>DOCUMENT<\/fieldType>\n.*\n.*<fieldId>(.*)<\/fieldId>.*\n.*\n.*<isActive>T<\/isActive>/g
Try it here! https://regexr.com/3krn4
This question already has an answer here:
Replace or remove mutliple lines of text in oracle stored procedure
(1 answer)
Closed 7 years ago.
Could any one help me to repalce or remove set of lines using a replace or regex replace function in oracle,and what would be the string pattern to find and replace..I need something in oracle Stored procedure
lines to be replaced from below text
</properties>
<?xml version="1.0"?>
<properties>
XML string
<COLLECT_PARALLELGRAMMAR1>global.grxml</COLLECT_PARALLELGRAMMAR1>
<COLLECT_INPUTMODES>voice dtmf</COLLECT_INPUTMODES>
<CONF_INPUTMODES>dtmf</CONF_INPUTMODES>
</ROW>
</properties>
<?xml version="1.0"?>
<properties>
<ROW>
<MODULE_NAME>main_menu_phone</MODULE_NAME>
<MODULE_DESCRIPTION>Main Menu for Customers with silver membership</MODULE_DESCRIPTION>
<MODULE_TYPE>phone</MODULE_TYPE>
Use REPLACE to do it.
select replace('<COLLECT_PARALLELGRAMMAR1>global.grxml</COLLECT_PARALLELGRAMMAR1>
<COLLECT_INPUTMODES>voice dtmf</COLLECT_INPUTMODES>
<CONF_INPUTMODES>dtmf</CONF_INPUTMODES>
</ROW>
</properties>
<?xml version="1.0"?>
<properties>
<ROW>
<MODULE_NAME>main_menu_phone</MODULE_NAME>
<MODULE_DESCRIPTION>Main Menu for Customers with silver membership</MODULE_DESCRIPTION>
<MODULE_TYPE>phone</MODULE_TYPE>'
,
'</properties>
<?xml version="1.0"?>
<properties>
'
,'')
from dual
Output
<COLLECT_PARALLELGRAMMAR1>global.grxml</COLLECT_PARALLELGRAMMAR1>
<COLLECT_INPUTMODES>voice dtmf</COLLECT_INPUTMODES>
<CONF_INPUTMODES>dtmf</CONF_INPUTMODES>
</ROW>
<ROW>
<MODULE_NAME>main_menu_phone</MODULE_NAME>
<MODULE_DESCRIPTION>Main Menu for Customers with silver membership</MODULE_DESCRIPTION>
<MODULE_TYPE>phone</MODULE_TYPE>
The INSTR function returns the character position of a sub string within a larger string. Having found the location of some text in a string, a natural next step is to extract it, use the SUBSTR function
Or you can use replace function directly.
This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 9 years ago.
In my document I have
<Country>US</Country>
<Country>PR</Country>
Between the
<country>
and
</country>
I want to find ANYTHING except for US and PR.
For example
<country>US</country> = ignore
<country>PR</country> = ignore
<country>UP</county> = match found
What I have is
Pattern = "<Country>(.*?[^USPR].*?)</Country>"
but this ignores strings like
<Country>UP</Country>
Not sure how to write allowing only 2 options between the tags.. US and PR only.
This should work.
<country>(?!(US|PR))(.*?)</country>
Matches the opening <country> tag not followed by US or PR. Then goes on to match anything before the closing </country> tag.
Try this one:
(?<=<Country>(?!US|PR)).*?(?=</Country>)
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
How to remove single attribute with quotes via RegEx
I am trying to remove the "sfref" attribute from the html code below:
<a sfref="[Libraries]719c25f9-89b3-4a7c-b6d5-e734b0c06ac1" href="../../HPLC.sflb.ashx">Determination</a> <br />
<img sfref="[Libraries]3e60aebb-acac-4806-bd22-f7986f66e7b3" src="../../Note52011.sflb.ashx">Test</a><br />
So far I have come up with this regex, but it is not matching:
(sfref=")([a-zA-Z0-9:;.\s()-\,]*)(")
This is where I am testing if it help:
http://regexr.com?2v4h6
Can someone please help me remove the "sfref" attribute?
You really really really shouldn't use regex (see the link in #Jack Maney's comment), but if you have to, this should work:
sfref="[^"]*"
This will work for single or double quotes.
sfref=('|").*?\1