I have multi nested quotes in an HTML that look like this:
<div class="quote-container">
<div class="quote-block">
<div class="quote-container">
<div class="quote-block">
</div>
</div>
<div class="quote-container">
<div class="quote-block">
</div>
</div>
<div class="quote-container">
<div class="quote-block">
</div>
</div>
</div>
</div>
I need to search and remove quotes. I use expression:
<div class="quote-container">.*<div class="quote-block">.*</div>.*</div>
This works for single quotes. However there is a problem with multi nested quotes (example above).
My task is to search for:
<div class="quote-container">.*<div class="quote-block">
plus any string NOT containing
<div
and ending with
.*</div>.*</div>
I tried lookbehind and lookahead assertions like this:
<div class="quote-container">.*<div class="quote-block">.*(?!<div).*</div>.*</div>
but they don't work.
Is there a way to do my task? I need a perl expression I can use in TextPipe (I use it for forum parsing and later I do text-to-speech conversion).
Thanks in advance.
I think your problem is you are using greedy expressions .*.
Try replacing all .* with the non-greedy .*?
I would personally solve this problem by replacing the quotes out until there were no longer any quotes to replace out. There's really no way to handle this in one regex replace, what you'll need to do is something like:
psuedo-code:
html="... from your post ...";
do{
newhtml=html
newhtml=replace(
'/<div class="quote-container">.*<div class="quote-block">.*</div>.*</div>/s',
'',
newhtml
)
} while(newhtml!=html)
html=newhtml
this will handle all manner of nested quotes.
Regexes are a poor choice to manipulate nested structures. I would write a specific parser for this problem (a simple stack based parser should suffice).
Related
When I woke up this morning, I didn’t know a stroke of regex. By the time I went to Mass, I’d been able to cobble together this regex to find occurrences of ‘Mph’ in an html document.
(?i)(?<=[\s|\d])mph+
If I run it against the following test data:
<div class="vsMph">
<p>95 Mph</p>
</div>
<div class="vsMph">
<p>95Mph</p>
</div>
It correctly matches:
‘ Mph’ and
‘Mph’
And equally correctly leaves the ‘vsMph’ alone, which is exactly what I want. Eventually, I'm going to use the same technique to match knots, ft, in, km and so on.
I’m executing this expression in in Sublime Text 3 using RegReplace and ultimately, what I hope to do is to use this regular expression to find all occurrences of ‘Mph’ preceded by a space or a digit and:
Enclose ‘Mph’ in <abbr> tags.
Add a space between the digit and the
opening <abbr> tag if there was no space between the last digit and
'Mph' originally.
In other words, I want to convert the above test data to:
<div class="vsMph">
<p>95 <abbr title="Miles per hour">Mph</abbr></p>
</div>
<div class="vsMph">
<p>95 <abbr title="Miles per hour">Mph</abbr></p>
</div>
I can get RegReplace to add the <abbr> tags as described in 1. above, but I’ve searched around on Google and I can’t find anything that tells me how to conditionally insert a space in a regex replace.
So I’m wondering. Is it possible in the first place to conditionally add a space in a regex replacement and if so how do I do it, or do I have to search for ‘\sMph’ and ‘\dMph’ and replace them separately?
Regards.
I would suggest using groups to match Mph. You could search for simply the following regex:
(\d)(\s)?(Mph)
Then replace using groups
$1 <abbr title="Miles per hour">$3</abbr>
output:
<div class="vsMph">
<p>95 <abbr title="Miles per hour">Mph</abbr></p>
</div>
<div class="vsMph">
<p>95 <abbr title="Miles per hour">Mph</abbr></p>
</div>
I have this type of divs which similar class , now I wanted to replace all the class with "-type" at the end with regular expression. I have tried this so far http://regexr.com/3b7q3.
<div class="form-group form-row #styleProperty username-v-type" data-type="#templateName" data-prop="#styleProperty" data-is-validate="#isValidate">
</div>
Its better to do it by parsing HTML. But you can use this regex instead:
/\b[\w\S]+-type/g
Demo
I cant find anywhere a working regex expression to find and replace the text between the div tags
So there is this html where i want to select everything between the <div class="info"> and </div> tag and replace it with some other texts
<div class="extraUserInfo">
<p>Hello World! This is a sample text</p>
<javascript>.......blah blah blah etc etc
</div>
and replace it with
My custom text with some codes
<tags> asdasd asdasdasdasdasd</tags>
so it would look like
<div class="extraUserInfo">
My custom text with some codes
<tags> asdasd asdasdasdasdasd</tags>
</div>
here is a refiddle that all my code is there and as you can see I want to replace the whole bunch of codes between the and tag
http://refiddle.com/1h6j
Hope you get what I mean :)
If there's no nesting, would just do a plain match non-greedy (lazy)
(?s)<div class="extraUserInfo">.*?</div>
.*? matches any amount of any character (as few as possible) to meet </div>
Used s modifier for making the dot match newlines too.
Edit: Here a Javascript-version without s modifier
/<div class="extraUserInfo">[\s\S]*?<\/div>/g
And replace with new content:
<div class="extraUserInfo">My custom...</div>
See example at regex101; Regex FAQ
In PHPStorm, I need to find/replace some mixed case strings which are used for CSS class names and for the DOM id's. I can't change attributes like onClick and image names need to remain. Here is what I have:
<div class="ThumbContainer" id="Source-Data4-Thumb">
<div class="ThumbTitleArea">
<div class="DataTitleDiv"> GYR Performance <img src="images/someImage.png" onClick="someFunc()" /></div>
</div>
<div class="dataDetailArea">
<div class="DataThumbArea"> Data Source:Client<br>
Last refreshed:12/05/2013 <br>
Records:206<br>
<br>
Used for the following reports<br>
- GYR Performance<br>
</div>
</div>
</div>
Here is what I need:
<div class="thumb_container" id="source_data4_thumb">
<div class="thumb_title_area">
<div class="data_title_div"> GYR Performance <img src="images/someImage.png" onClick="someFunc()" /></div>
</div>
<div class="data_detail_area">
<div class="data_thumb_area"> Data Source:Client<br>
Last refreshed:12/05/2013 <br>
Records:206<br>
<br>
Used for the following reports<br>
- GYR Performance<br>
</div>
</div>
</div>
Notice the dataDetailArea starts with a lowercase.. bleh. This will be a one-time find/replace so it doesn't need to be in PHPStorm. It can be in any online tool even, like http://gskinner.com/RegExr/
The actual backbone template I need to find/replace on is about 3100 lines of code, otherwise I'd provide it all here for you.
Here's what I have so far. It seems to not match match the Camel-Case3-Foo:
(class|id|data-[?!=])="\b([A-Za-z][a-z-]*){2,}\b"
This regex should find the locations where underscores should be placed:
((?<=\w)(?=[A-Z])|-)
It would seem to make sense to do a replacement with this to insert the underscores, then convert the string to lower case.
I would search for something like this:
"[a-z0-9_]*\([A-Z]\)
A quote mark with anything following that has lowercase, numeric, or underscore characters.
Anything following that has an uppercase letter.
Make the uppercase letter sub-expression 1.
Replace subexpression 1 with an underscore + the result from a tolower() function.
You will need to apply this multiple times to each line since it will only find one
uppercase letter per pass.
I'm trying to use regex to parse a template I have. I want to find the _stop tag for the _start tag. I need to find the specific ones since there can be nested _stop and _start tags.
The regex I'm using is
/{(.*?)_start}.*{(\1_stop)}/s
and throwing that into a preg_match
And the template
<div data-role="collapsible-set" class="mfe_collapsibles" data-theme="c" data-inset="false">
{MakeAppointment_start}
<div id="appointmentHeading" data-action-id="appointmentNext" data-action-text="Next" data-a data-role="collapsible" data-collapsed="true" data-collapsed-icon="arrow-r" data-expanded-icon="arrow-d" data-iconpos="right">
<h3 class="collapsibleMainHeading">New {AppointmentTerm}</h3>
<p>
{AppointmentForm}
</p>
</div>
{MakeAppointment_stop}
{RegisterSection_start}
<div id="registerHeading" class="preRegistration" data-action-id="register" data-action-text="Register" data-role="collapsible" data-collapsed="true" data-collapsed-icon="arrow-r" data-expanded-icon="arrow-d" data-iconpos="right">
<h3 class="collapsibleMainHeading">Register</h3>
<p>
{RegisterForm}
</p>
</div>
{RegisterSection_stop}
<div data-role="collapsible" class="preRegistration" data-collapsed="true" data-collapsed-icon="arrow-r" data-expanded-icon="arrow-d" data-iconpos="right">
<h3 class="collapsibleMainHeading">Login</h3>
<p>
{LoginForm}
</p>
</div>
</div>
</div>
The results are
Array
(
[0] => {MakeAppointment_start}
<div id="appointmentHeading" data-action-id="appointmentNext" data-action-text="Next" data-a data-role="collapsible" data-collapsed="true" data-collapsed-icon="arrow-r" data-expanded-icon="arrow-d" data-iconpos="right">
<h3 class="collapsibleMainHeading">New {AppointmentTerm}</h3>
<p>
{AppointmentForm}
</p>
</div>
{MakeAppointment_stop}
[1] => MakeAppointment
[2] => MakeAppointment_stop
)
Index 0 is correct however 1 and 2 are not. 1 should have the register tags and content and 2 should not exist.
What am I doing wrong here?
Firstly, preg_match only returns one match. use preg_match_all instead. Secondly, the indices 1 and 2 you get, are your capturing groups. You can simply ignore them, although your second capturing group is quite redundant; you could just remove the second pair or parentheses in your regex. Using preg_match_all will yield the full match and all capturing groups for all matches.
I also think you should escape your { and } since they are regex meta-characters. I wonder why the engine does not choke on them this way, but I think it is better practice to just escape them anyway.