Regular Expression, searching code in Dreamweaver - regex

I just need to search some code and find the following, using the find and replace with regular expressions in Dreamweaver:
<div id ="specific_div"> anything can be in here </div>
I have not really tried to tackle regular expressions yet, but I am having trouble figuring out how to tell it that "I don't care what is in between the div tags, but please find me this specific div tag, everything in it and the closing div tag!"

There's a 'specific tag' option under the 'search' list in the 'Find and Replace' window. Why don't you use it? Anyway I'd probably use something like the following. But it won't select all the node if there's an idenatical closing tag <div> inside that div.
<div\b(.*?)(\bid)\s*=\s*("|')specific_div\3([\w\W]*?)</div>
I didn't close the opening tag because I thought you might have other attributes in that node. Oh and make sure the 'Match Case' option is not selected in the search window.

Related

Find content of the tag <Caption>

I want to record a macro for Notepad++ to find several Texts which are inside a xml-document with some -tags and a lot of other XML-Tags. So I want to use regex and need a little of help. I think I'm quite close.
example: <Caption>ThetextIwanttofind</Caption>
my regex: <Caption\b[^>]*>(.*?)</Caption>
The problem is the closing Caption-tag. How to rewrite my regex to get the inner text with the closing Caption?
Thx for your help!
<Caption\b[^>]*>(.*?)<Caption> --> works for Caption without a closing tag
One solution would be to use :
<Caption\b[^>]*>(.*?)<\/?Caption>
^
But it's kind of ugly

Notepad++ Regex to remove styling

I need to remove some tags from a whole lot of html pages.
Lately I discovered the option of regex in Notepad++
But.. Even after hours of Googling I don't seem to get it right.
What do I need?
Example:
<p class=MsoNormal style='margin-left:19.85pt;text-indent:-19.85pt'><spanlang=NL style='font-size:11.0pt;font-family:Symbol'>ยท<span style='font:7.0pt "Times New Roman"'> </span></span><span lang=NL style='font-size:9.0pt;font-family:"Arial","sans-serif"'>zware uitvoering met doorzichtige vulruimte;</span></p>
I need to remove everything about styling, classes and id's. So I need to only have the clean tags without anything else.
Anyone able to help me on this one?
Kind regards
EDIT
Check an entire file via pastebin: http://pastebin.com/0tNwGUWP
I think this pattern will erase all styles in "p" and "span" tags :
((?<=<p)|(?<=<span))[^>]*(?=>)
=> how it works:
( (?<=<p) | (?<=<span) ): This is a LookBehind Block to make sure
that the string we are looking for comes after <p OR <span
[^>]* : Search for any character that is not a > character
(?=>) : This is a LookAfter block to make sure that the
string we are looking for comes before > character
PS: Tested on Notepad ++
If sample you provided is representative of what you need to process, then, the following quick and dirty solution will work:
Find what: [a-z]+='[^']*'
Replace with:
Find what: [a-z]+=[a-zA-Z]*
Replace with:
You must run the first one first to pick up the style='...' attributes and you'll need to run the second next to pickup both the class='...' and lang='...'.
There's good reason why others posters are saying don't attempt to parse HTML this way. You'll end up in all sorts of trouble since regex, in general cannot handle all the wonderful weirdness of HTML.
My advise as follows.
As I see in your sample text you have only "p" and "span" tags that need to be handled. And you apparently want to remove all the styles inside them. In this case, you could consider removing everything inside those tags, leave them simple <p> or <span>.
I don't know about Notepad++ but a simple C# program can do this job quickly.
Assuming <spanlang=NL a typo (should be <span lang=NL), I'd do:
Find what: (<\w+)[^>]*>
Replace with: $1>
If you don't mind doing a little bit of programming: HTMLAgilityPack can easily remove scripts/styles/wathever from you xml/html.
Example:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
doc.DocumentNode.Descendants()
.Where(n => n.Name == "script" || n.Name == "style")
.ToList()
.ForEach(n => n.Remove());

REGEX Pattern - How do I match upto a certain tag in html

I have some html which I want to grab between 2 tags. However nested tags exist in the html so looking for wouldn't work as it would return on the first nested div.
Basically I want my regex to..
Match some text literally, followed by ANY character upto another literal text string. So my question is how do I get [^<]* to continue matching until it see's the next div.
such as
<div id="test"[^<]*<div id="test2"
Example html
<div id="test" class="whatever">
<div class="wrapper">
<fieldset>Test</fieldset><div class="testclass">some info</div>
</div>
<!-- end test div--></div>
</div>
<div id="test2" class="endFind">
In general, I suspect you want to look at "greedy" vs "lazy" in your regex, assuming that's supported by your platform/language.
For example, <div[^>]*>(.*?)</div> would make $1 match all the text inside a div, but would try to keep it as small as possible. Some people call *? a "lazy star".
But it seems you're looking to find the text within a div that is before the start of the first nested div. That would be something like <div[^>]*>(.*?)<div
Read about greedy vs lazy here and check to make sure that whatever language you're using supports it.
$ php -r '$text="<div>Test<div>foo</div></div>\n"; print preg_replace("/<div[^>]*>(.*?)<div.*/", "\$1", $text);'
Test
$
Regex is not capable of parsing HTML. If this is part of an application, you're doing something wrong. If you absolutely have to parse a document, use a html/xml parser.
If you're trying to screen scrape something and don't want to bother with a parser, look for identifying marks in the page you're scraping. For example, maybe the embedded div ends just before the one you want to match, so you could match </div></div> instead.
Alternatively, here's a regex that meets your requirements. However, it is very fragile: it will break if, for example, #test's children have children, or the html isn't valid, or I missed something, etc, etc ...
/<div id="test"[^<]*(<([^ >]+).+<\/$2>[^<]*)*<\/div>/

regex replace in dreamweaver

I'm trying to replace tags in my code, but keeping the text inside "as is".
example string:
<p class="negrita">text1</p>
or
<p class="negrita">text2</p>
i need to get those replaced as so:
<h3>text1</h3>
and
<h3>text2</h3>
i'm searching (and matching fine) with this,
<p class="negrita">([^>]*)</p>
but I have no idea on how to keep the text inside, as
<h3>$1</h3>
is not working.
Instead of using regular expressions to do this, use Dreamweaver's Specific Tag search option. While regular expressions are possible in Dreamweaver, it's a bit crippled and sometimes a little buggy.
To do this with Specific Tag, invoke the search using Control-F
Change the search option to "Specific Tag".
Next to that, enter p to search for all <p> tags
Hit the + icon below the box to add an option and select "With Attribute" "Class" "=" from the first three pulldowns and type "negrita" in the fourth.
For action, select "Change Tag" and then select h3.
Run this and you will get <h3 class="negrita">text1</h3>
Then repeat the steps above, searching for all h3 tags with the class negrita and choose Remove Attribute "class" for the action. Two steps to one, I admit but it will work every time.

Regular Expression matching nested TAGS

Hello I'm trying to match multi-nested quote's blockquotes and transform them back into BBCode
This is what I got so far as far as regex is involved
Converted it back to html entities to be seen on stackedoverflow
<div class="quoteheader"><div class="topslice_quote">([\s\S]*?)</div></div><blockquote>([\s\S]*?)(?:</blockquote><div class="quotefooter"><div class="botslice_quote"></div></div>){2,})
I'm trying to match this
<div class="quoteheader"><div class="topslice_quote">Quote</div></div><blockquote>Outside quote is this
<div class="quoteheader"><div class="topslice_quote">Quote</div></div><blockquote>Inner quote is this</blockquote><div class="quotefooter"><div class="botslice_quote"></div></div>
</blockquote><div class="quotefooter"><div class="botslice_quote"></div></div>
to generate this
[quote]Outside quote is
this[quote]Inner quote is
this[/quote][/quote]
I'm using VBScript 5.5 Regeular Expressions for this. (but this isn't that important)
I really need help on the expression. I've tired using a HTML Parser for this but it turns out to be more difficult then using regex
I'm just repeating what's said here.
Regular Expressions can't match Context Free languages, like groups of tags. You can't match opening to closing tags, so parsing a block (Especially a nested one) becomes impossible to do reliably.
You can certainly build a cludge to help, but there will be situations where it won't work.
Well, this is all you need to do with the parser.
Here's the pseudocode. I don't know your parser so this is the best I can offer.
First find the div tag with the quoteheader class. Get the next sibling.
That is the blockquote tag. Let's call this tag theQuote.
Get the first child of theQuote. It will be a html text item. That is the outer quote.
Get the third child of theQuote. It will be another blockquote tag. Let's call this tag theInner.
Get the first child of theInner. It will be a html text item. That is the inner quote.