Regex to find a class in a list of files - regex

I have a list of files (HTML) that contains classes.
1 <div class="homepage">...</div>
2 <div class="abc homepage">...</div>
3 <div class="abc homepage2">...</div>
4 <div class="abc homepage four five">...</div>
5 <div class="abc homepagenot five">...</div>
6 <div class="abc homepage-not five">...</div>
I'm trying to use regex to find it in Visual Studio using Regex Expressions.
I've been trying to use
class=".*homepage.*"
as the search criteria, but that is also returning me point 5,6.
Essentially I just want point 1, 2, 4.
What am I missing in regex?

You could check for the existence of a word boundary with \b which looks for a non-word character (and can be zero length). I also include a negative lookahead for the hyphen, because a hyphen is a non-word character that will match on \b but you don't want item 6 in your list.
\bhomepage\b(?!-)
Here's the Regex101 page.

Try class=".*homepage[ "]. This looks for either a space or quote right after homepage.

I assume that you want to find single word homepage within class="...".
Please see this regex.
class=\"(.+ )*homepage( .+)*\"

Assuming you want only first 4 divs as matches, this expression will work for you.
homepage(?!not|-)(.*?)
var divs ="<div class='homepage'></div>"
+"<div class='abc homepage'></div>"
+"<div class='abc homepage2'></div>"
+"<div class='abc homepage four five'></div>"
+"<div class='abc homepagenot five'></div>"
+"<div class='abc homepage-not five'></div>"
+"<div class='abc homepage2'></div>";
var matches = divs.match(/(homepage)(?!not|-)(.+?)/g);
console.log(matches, matches.length);

Related

How to Match Redundant Lines From Contenteditable Div in Regex

I'm trying to process the html inside a contenteditable div. It might look like:
<div>Hi I'm Jack...</div>
<div><br></div>
<div><br></div>
<div>More text.</div> *<div><br></div>*
*<div><br></div>**<div><br></div>*
*<div><br></div>*
*<div>
<br>
</div>*
What regex expression would match all trailing <div><br></div> but not the ones sandwiched between useful divs containing text, i.e., <div> text (not html) </div>?
I have enclosed all expressions I want to match in asterisks. The asterisk are for reference only and are not part of my string.
Thanks,
Jack
You can use the pattern:
(?:<div>[\n\s]*<br>[\n\s]*<\/div>)(?!.*?<div>[^<]+<\/div>)
You can try it here.
Let me know if this works for all your cases and I will write a detailed explanation of the pattern.

Regex - match every possible char and space

I want to extract data from html. The thing is, that i cant extract 2 of strings which are on the top, and on the bottom of my pattern.
I want to extract 23423423423 and 1234523453245 but only, if there is string Allan between:
<h4>###### </h4> said12:49:32
</div>
<a href="javascript:void(0)" onclick="replyAnswer(##########,'GET','');" class="reportLink">
report </a>
</div>
<div class="details">
<p class="content">
Hi there, Allan.
</p>
<div id="AddAnswer1234523453245"></div>
Of course, i can do something like this: Profile\/(\d+).*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*Allan.*\s*.*\s*.*AddAnswer(\d+). But the code is horrible. Is there any solution to make it shorter?
I was thinking about:
Profile\/(\d+)(.\sAllan)*AddAnswer(\d+)
or
Profile\/(\d+)(.*Allan\s*)*AddAnswer(\d+)
but none of wchich works properly. Do you have any ideas?
You can construct a character group to match any character including newlines by using [\S\s]. All space and non-space characters is all characters.
Then, your attempts were reasonably close
/Profile\/(\d+)[\S\s]*Allan[\S\s]*AddAnswer(\d+)/
This looks for the profile, the number that comes after it, any characters before Allan, any characters before AddAnswer, and the number that comes after it. If you have single-line mode available (/s) then you can use dots instead.
/Profile\/(\d+).*Allan.*AddAnswer(\d+)/s
demo
You can use m to specify . to match newlines.
/Profile\/(\d+).+AddAnswer(\d+)/m
Better use a parser instead. If you must use regular expressions for whatever reason, you might get along with a tempered greedy solution:
Profile/(\d+) # Profile followed by digits
(?:(?!Allan)[\S\s])+ # any character except when there's Allan ahead
Allan # Allan literally
(?:(?!AddAnswer)[\S\s])+ # same construct as above
AddAnswer(\d+) # AddAnswer, followed by digits
See a demo on regex101.com

get specific string after first occurance of string regex sublime text 2 find & replace

include_once($pathToRoot.'header.php');
echo('</div>');
assume you have variations on the above code across hundreds of files, how do you match against the first occurrence of
</div>
after
header.php'
?
In the find field:
(?s)(header\.php'.+?)</div>
In the replace (if you what to replace </div> with </test>):
$1</test>
I don't know that sublimetext2 but the regular expression would look like this:
/include_once\($pathToRoot.'header.php'\);(.*?)(<\/div>)/s
The first group would be the string between the include and the closing div and the second group would be the closing div itself.

What Yahoo Pipes regex use in this case?

have you any ideas how to change in item. description in Yahoo.pipes this link
<img src="http://mysite.com/img/pc/image.gif" class="big" style="background-image:url(http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg);" alt="" title="">
to this
<img src="http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg"/>
using regex.
I don't know what variant of RegEx Pipes uses, so I'll go with the .NET variant and you can adjust for whatever syntax is needed. It should be pretty close.
Search for:
<img[^>]+url\(
([^\)]+)
\)[^>]+>
Replace with:
<img src="$1" />
Join the lines. Line 1 finds an image tag up to the url argument in the CSS style attribute. Line 2 matches the background image URL and captures it. Line 3 matches the rest of the image tag.
Here is an extremely simple regex to accomplish what you're looking for using PERL style Regexs:
<img.*background-image:url\((.*)\);.*>
Basically, here is the breakdown on how it matches:
It will start by matching the characters "
It then matches any characters, between 0 and unlimited times.
Then it matches the string "background-image:url(
Then it matches any characters, between 0 and unlimited times, which is captured into backreference #1
Then it matches the characters ");"
Then it matches any characters, between 0 and unlimited times.
Then it matches the ">" character.
Note: You should replace the items that match any characters to something more specific, depending on the application that you're using the regex. This is why I've referred to this as "extremely simple".
Then, that gets replaced with:
<img src="$1">
Edit: Didn't see richardtallent's answer, pretty similar application just a different implementation.

Regex to exclude multiple strings

Could use some help with Regex searching with NetBeans 7.01's find function.
I'm trying to exclude multiple strings. Specifically, the target lines:
<div class="table_left">
<div class="table_right">
<div class="table_clear">
I need to match only the third and other Div classes that are not either table_left or table_right.
I've tried:
class="table_(((?!left).*)|((?!right).*))
and
class="table_(left|right){0}
I realized while pasting my first Regex line that I'm matching not right OR not left, which is returning both. What is the proper way to specify two conditions? The and operator?
The joys of searching for words that are also Boolean operators...
Try this pattern:
<div\s+class="(?!table_(left|right))[^"]+"
which wouldn't match:
<div class="table_left">
<div class="table_right">
but would match:
<div class="table_clear">
<div class="foo">
EDIT
The HT wrote:
I need to match only classes that begin with table, but are not right or left
Ah, okay, that would look like:
<div\s+class="table_(?!left|right)[^"]+"
or
<div\s+class="table(?!_left|_right)[^"]+"
as you already found yourself (but I included it in my answer for completeness sake).
A quick explanation of the pattern <div\s+class="table_(?!left|right)[^"]+":
<div # match '<div'
\s+ # match one ore more space chars
class="table_(?!left|right) # match 'class="table_' only if it is not followed by 'left' or 'right'
[^"]+ # match one or more characters other than '"'
" # match a '"'