how to catch regex <right></> - regex

I'm trying to find a way to make a list of everything between
<right>and </> tags
This is my regex:
<right>([^<\/>]*)<\/>?
I'm currently missing this case:
when there is only 1 tag in front: <right>hi
when interlocked <right><right>hello</></> (for the case of nesting, the content inside will be viewed as text, the next card is processed as usual)
it will look like this:
case1:
<div style="text-align: right">
hi
</>
case2:
<div style="text-align: right">
<div style="text-align: right">
hello
</>
</>
This is where i try regex: https://regex101.com/r/ybO9cV/1
thanks for your help !

Mostly bad news. To get the matches that are missing the trailing </> use /<right>([^<\/>]*)(<\/>)?/g
The nesting requirement cannot be satisfied with regex - can-regular-expressions-be-used-to-match-nested-patterns

Related

Sublime Text Regex Search for alphanumeric string, not working..

I'm trying to replace a common theme used in hundreds of pages in my project:
<div id="PageTitle"> (Page title as a string) </div>
And the title varies each page. I want to replace it with
<div class="row">
<div class="col-md-12 col-sm-12">
<h3><?= $pageTitle?></h3>
</div>
</div>
I've tried searching with <div id="PageTitle">/^\w+$/</div>, and <div id="PageTitle">"^[a-zA-Z0-9_]*$"</div> with no luck. Any ideas?
You are almost there. Looks like you got the pattern from somewhere else. ^ and $ are starting and ending anchors so they match with the start and end of an input so you should probably get rid of them.
Next if your page title is only going to contain alphanumeric characters (no spaces too) then \w is fine, else you might want to use . instead.
<div id="PageTitle">\w+<\/div>
For a title containing any character:
<div id="PageTitle">.+?<\/div>
Here's a demo
Hope this helps!
Try this one as well, I think its pretty strict:
<div id="PageTitle">(?:(?!<\/div>).)+<\/div>
Or even:
<div id="PageTitle">[\s\S]*?<\/div>

Parse specific div from raw text using regex?

So I'm in a situation that requires parsing raw HTML data as a string, this is unavoidable unfortunately otherwise I wouldn't post this. I only need regex to match the class of a div that has an img tag as a child.
So this is the code example that I'm dealing with:
<div class="summary">
<h3>Example</h3>
<div class="explanation">
<span>This serves as an example for the site.</span>
</div>
<div class="user-details">
mheathershaw<br>
<img src="res/badge522.png"/> <span class="score">522</span>
</div>
<div class="help">
Help
</div>
</div>
And the div that I'd like to retrieve the class from is the div that contains the image. The exact capture from this example that I'd like (optimally) is user-details. The criteria for capturing it is simply if it has <img ... /> as a child.
Anyone able to help? Thanks!
You may try this,
/<div\b[^>]*\bclass="([^"]*)"[^>]*>(?:(?!<\/div>)[\s\S])*?<img\b[^>]*>(?:(?!<\/div>)[\s\S])*?<\/div>/
DEMO

How can I extract URLs from html content with ruby regexp?

Lets go directly with an example since it is not easy to explain:
<li id="l_f6a1ok3n4d4p" class="online"> <div class="link"> random strings - 4 <a style="float:left; display:block; padding-top:3px;" href="http://www.webtrackerplus.com/?page=flowplayerregister&a_aid=&a_bid=&chan=flow"><img border="0" src="/resources/img/fdf.gif"></a> <!-- a class="none" href="#">random strings - 4 site2.com - # - </a --> </div> <div class="params"> <span>Submited: </span>7 June 2015 | <span>Host: </span>site2.com </div> <div class="report"> <a title="" href="javascript:report(3191274,%203,%202164691,%201)" class="alert"></a> <a title="" href="javascript:report(3191274,%203,%202164691,%200)" class="work"></a> <b>100% said work</b> </div> <div class="clear"></div> </li> <li id="l_zsgn82c4b96d" class="online"> <div class="link"> <a href="javascript:show('zsgn82c4b96d','random%20strings%204',%20'site1.com');%20" onclick="visited('zsgn82c4b96d');" style
In the above content i want to extract from
javascript:show('f6a1ok3n4d4p','random%20strings%204',%20'site2.com')
the string "f6a1ok3n4d4p" and "site2.com" then make it as
http://site2.com/f6a1ok3n4d4p
and same for
javascript:show('zsgn82c4b96d','random%20strings%204',%20'site1.com')
to become
http://site1.com/zsgn82c4b96d
I need it to be done with ruby regex
This should give you some insight of how to do it.
https://regex101.com/r/wD4oT8/2
javascript:show\(\'(.*?)'.*?\'([^\']*)\'\) will capture the first argument as $1, last part within ' as $2, so you get what you want by substituting as $2/$1.
That's the regex part of it, and, of course, you can adjust the regex as you see fit, for example, to include the usage of " (javascript:show\((?:\'|\")(.*?)(?:\'|\").*?\'([^\'\"]*)(?:\'|\")\) or allow only with 3 arguments.
/yourregex/.match(yourstring) will extract the information you need.

How to make a non-greedy regex for following?

I have something like this:
...
<div class="viewport viewport_h" style = "overflow: hidden;" >
<div id="THIS" class="overview overview_h">
<ul>
<li>some txt to be captured</li>
<li>some txt to be captured</li>
<li>some txt to be captured</li>
</ul>
<div>
" some text to be captured"
</div>
</div>
</div>
"some text not to be captured"
</div>
<div class="scrollbar_h">
<div class="track_h"></div>
...
I want to capture everything inside div with id=THIS. I'm using somthing like:
#<div class="viewport viewport_h" style = "overflow: hidden;" >\s*<div class="overview overview_h">\s*(?:<ul>)?([\s\d\w<>\/()="-:;‘’!,:]+)(?:</div>)+?#
The last (?:</div>)+? is to make it non-greedy for further "</div>" but that doesn't work and captuers all other following </div>. :(
As said in comments regex is not a proper way for parsing (?:X|H)TML documents.
Let consider your example one straight way for that is following regex :
<div[^>]*id="THIS"[^>]*>(.*?)</div>
DEMO
That will match following text :
<ul>
<li>some txt to be captured</li>
<li>some txt to be captured</li>
<li>some txt to be captured</li>
</ul>
<div>
" some text to be captured"
</div>
As you can see its not the proper result as you need another </div> so you need to count the open divs to be able to detect the closing divs
that its all based on the language you are using.
Now in this case if you want to create a none-greedy ending dive you need to put a dot before + like following :
<div[^>]*id="THIS"[^>]*>(.*?)(</div>).+?
DEMO
Now it will match another </div> but still its hard for regex to detect the true result (its more complicated for another situation).and it's the reason that the proper way for parsing (?:X|H)TML is using a (?:X|H)TML Parser

Regular expression for exactly one match

I am using the following regular expression in my code editor (sublime text) in order to search for the ASP.NET comments.
<%--.*(\n.*)*--%>
I want this regular expression to stop looking any forward as soon as the first --%> is found. But it keeps looking until the last comment's --%> is found. I have got this idea that i've to use some kind of flag to make it stop as soon as the first --%> but I am unable to figure it out.
Can anyone please tell me how may I modify this regex?
UPDATE
I forgot to post some sample markup. Here it is:
<div class="modal-footer">
<%--<button class="btn" data-dismiss="modal">
Close</button>
<button id="btnAddCountry" class="btn btn-primary" data-dismiss="modal">
Save changes</button>--%>
</div>
</div>
<div class="row-fluid">
<div class="span12">
<div class="box paint_hover">
<div class="title">
<h3>Sale Voucher</span>
</h3>
</div>
<div class="content">
<ul id="tabExample1" class="nav nav-tabs">
<li class="active"><a id="lnkAddEditVoucher" href="#AddEditVoucher" data-toggle="tab">Add/Update Sale Voucher</a></li>
<li><a id="lnkViewVouchers" href="#ViewVouchers" data-toggle="tab">Search Sale Voucher</a></li>
<%-- <li><a id="lnkViewParties" href="#ViewParties" data-toggle="tab">Search Parties</a></li>--%>
</ul>
I just want to match the first comment and not the second one.
You need to make the * quantifiers non-greedy. Usually this is done by adding a ? after them, e.g. .*? instead of just .*.
I've also simplified the regex a bit. Sublime Text supports the (?s) modifier at the beginning of the pattern to make the dot match even newlines:
(?s)<%--.*?--%>
If you prefer matching the newline explicitly:
<%--(.|\n)*?--%>
The problem you seem to have is that you use the greedy version of .*, which matches anything (including --%>). Try using <%--.*?(\n.*?)*?--%> instead to make it non-greedy.