RegExp replace all but selected - regex

So I'm trying to erase everything except the matched case in this 1900 line document with Notepad++ RegExp Find/Replace, so that I only have the file names, which shorten it to under about 1000 lines at minimum. I know the code that selects the text ((?<=/images/item/)(.*)(?=" a) but the problem is I don't know how to make it erase anything that doesn't match that case. Here's a portion of the document.
using notepad++, it would find and select abyssal-scepter.gif, aegis-of-the-legion.gif, etc
<img src="/images/item/abyssal-scepter.gif" alt="LoL Item: Abyssal Scepter"><br> <div id="id_77" class="tier-wrapper drag-items health magic-resist health-regen champ-box float-left ajax-tooltip {t:'Item',i:'77'} classic-and-dominion filter-is-dominion filter-is-classic filter-tier-advanced filter-bonus-aura filter-category-health filter-category-magic-resist filter-category-health-regen ui-draggable ui-draggable-handle">
<img src="/images/item/aegis-of-the-legion.gif" alt="LoL Item: Aegis of the Legion"><br> <div id="id_235" class="tier-wrapper drag-items ability-power movement champ-box float-left ajax-tooltip {t:'Item',i:'235'} filter-tier-advanced filter-bonus-unique-passive filter-category-ability-power filter-category-movement ui-draggable ui-draggable-handle">
<img src="/images/item/aether-wisp.gif" alt="LoL Item: Aether Wisp"><br>
<div class="info">
<div class="champ-name">Aether Wisp</div>
<div class="champ-sub">
<img src="/images/gold.png" alt="Item Cost" style="width:16px; vertical-align:middle;"> 850 / 415
</div>
</div>
</div>
<div id="id_21" class="tier-wrapper drag-items ability-power champ-box float-left ajax-tooltip {t:'Item',i:'21'} classic-and-dominion filter-is-dominion filter-is-classic filter-tier-basic filter-category-ability-power ui-draggable ui-draggable-handle">
<img src="/images/item/amplifying-tome.gif" alt="LoL Item: Amplifying Tome"><br>
<div class="info">
<div class="champ-name">Amplifying Tome</div>
<div class="champ-sub">
I'm not familiar with RegExp, so to summarize, I need it to look like this at the end of it.
abyssal-scepter.gif
aegis-of-thelegion.gif
aether-wisp.gif
amplifying-tome.gif
Thank you for your time

A Notepad++ solution:
Find what : .*?/images/item/(.*?)"|.*
Replace with : $1\n
Search mode : Regular expression (with ". matches newline" checked)
The result will have an extra linefeed at the end.
But that shouldn't pose a problem I suppose.

Maybe this can help. or not since you dropped the Javascript tag out of your original post
<script type="text/javascript">
var thestring = "<img src=\"/images/item/aegis-of-the-legion.gif\" alt=\"LoL Item: Aegis of the Legion\"><br>";
var thestring2 = "<img src=\"/images/otherstuff/aegis-of-the-legion.gif\" alt=\"LoL Item: Aegis of the Legion\"><br>";
function ParseIt(incomingstring) {
var pattern = /"\/images\/item\/(.*)" /;
if (pattern.test(incomingstring)) {
return pattern.exec(incomingstring)[1];
}
else {
return "";
}
//return pattern.test(incomingstring) ? pattern.exec(incomingstring)[1] : "";
}
</script>
Calling ParseIt(thestring) returns "aegis-of-the-legion.gif"
Calling ParseIt(thestring2) return ""

Since you are doing this in NP++, this works for me. In cases like this where speed and results are more important than specific technique, I'll usually run several regexes. First, I'll get each tag on its own line by doing a search for > and replacing it with >\n. This gets each tag on its own line for simpler processing. Then a replace of ^>*<.*?".*?/?([\w\d\-_]+\.\w{2,4})?".*>.*$ with $1 will will extract all the filenames from the tags, removing the unneeded text. Then, finally, to clear all the tags that didn't have a filename in them, just replace <.*> with an empty string. Finally, use Edit>Line Operations>Remove empty lines, and you'll have the result you're looking for. It's not a 100% regex solution, but this is a one time action that you just need a simple result from.

Related

How to find and replace in a regex code

I am trying to find and replace in a regex code
<div class="gallery-image-container">
<div jstcache="1116"
class="gallery-image-high-res loaded"
style="width: 396px;
height: 264px;
background-image: url("https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no");
background-size: 396px 264px;"
jsan="7.gallery-image-high-res,7.loaded,5.width,5.height,5.background-image,5.background-size">
</div>
</div>
In the code above I used This
(https:\/\/[^&]*)
To extract this URL
https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no
I used This regex s\d{3} to get s396
Now I want to replace s396 to s1000 in the URL
Now am Stock and don't know how to go about it.
Please is there anyway all these can be done in just one regex code not multiple codes?
I would suggest using an HTML parser, but I understand sometimes that is not possible. Here is a little example in python.
import re
data = '''
<div class="gallery-image-container">
<div jstcache="1116"
class="gallery-image-high-res loaded"
style="width: 396px;
height: 264px;
background-image: url("https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no");
background-size: 396px 264px;"
jsan="7.gallery-image-high-res,7.loaded,5.width,5.height,5.background-image,5.background-size">
</div>
</div>
'''
match = re.search("(https?://[^&]+)", data)
url = match.group(1)
url = re.sub("s\d{3}", "s1000", url)
print(url)
They key part is the regex of
(https?://[^&]+)
It is using a negative character class. It's saying, look for http with an optional s followed by :// and then all the non & You can use this site to play around with regexs:
https://regex101.com/r/b0APFA/1
I'm sure you could do a clever 1 liner nested regex to find and replace all at once, but it's going to be easier to troubleshoot if you have a few lines.

Match only spefic url via regex

I want to match only this specific url
https://www.facebook.com/princessaustine.alcantara.3/about?lst=100002159119314%3A100022260619396%3A1507039852
Here's the source code
<div class="hidden_elem"><code id="u_0_17"><!-- <div class="fbTimelineTopSectionBase _6-d _529n"><div class="_5h60" id="pagelet_above_header_timeline" data-referrer="pagelet_above_header_timeline"></div><div id="above_header_timeline_placeholder"></div><div class="fbTimelineSection fbTimelineTopSection"><div id="fbProfileCover"><div class="cover" id="u_0_13"><a class="coverWrap coverImage" data-referrerid="100022260619396" href="https://www.facebook.com/photo.php?fbid=118243868927633&set=a.117907638961256.1073741827.100022260619396&type=3" rel="theater" ajaxify="https://www.facebook.com/photo.php?fbid=118243868927633&set=a.117907638961256.1073741827.100022260619396&type=3&size=1440%2C1080&source=10&player_origin=profile&referrer_profile_id=100022260619396" data-ploi="https://scontent.fmnl4-1.fna.fbcdn.net/v/t31.0-8/22136852_118243868927633_2950847275004458372_o.jpg?oh=fbcb3c8abc2023b35a5a36fb2989d850&oe=5A821DA8" title="Cover Photo" id="u_0_12" data-cropped="1"><img class="coverPhotoImg photo img" src="https://scontent.fmnl4-1.fna.fbcdn.net/v/t31.0-8/c0.81.851.315/p851x315/22136852_118243868927633_2950847275004458372_o.jpg?oh=7d0222f3c38b31acb33a7b1ffba2ac9e&oe=5A797385" style="top:0px;width:100%" data-fbid="118243868927633" alt="Cover Photo, Image may contain: 1 person, sitting" /><div class="coverBorder"></div><img class="coverChangeThrobber img" src="https://static.xx.fbcdn.net/rsrc.php/v3/yk/r/LOOn0JtHNzb.gif" alt="" width="16" height="16" /></a><div class="_2nlj _2xc6"><h1 class="_2nlv"><a class="_2nlw" href="https://www.facebook.com/princessaustine.alcantara.3"><span id="fb-timeline-cover-name" data-testid="profile_name_in_profile_page">Princess Austine Alcantara</span></a><span class="_2nly"></span></h1></div></div><div id="fbTimelineHeadline" class="clearfix"><div class="_50zj"><div class="actions _70j"><div class="_5h60 actionsDropdown" id="pagelet_timeline_profile_actions" data-referrer="pagelet_timeline_profile_actions"></div></div></div><div class="_70k"><ul class="_6_7 clearfix" data-referrer="timeline_light_nav_top" id="u_0_14"><li><a class="_6-6 _6-7" href="https://www.facebook.com/princessaustine.alcantara.3?lst=100002159119314%3A100022260619396%3A1507039852" data-tab-key="timeline">Timeline<span class="_513x"></span></a></li><li><a class="_6-6" href="https://www.facebook.com/princessaustine.alcantara.3/about?lst=100002159119314%3A100022260619396%3A1507039852" data-tab-key="about">About<span class="_513x"></span></a></li><li><a class="_6-6" href="https://www.facebook.com/princessaustine.alcantara.3/friends?lst=100002159119314%3A100022260619396%3A1507039852&source_ref=pb_friends_tl" data-tab-key="friends">Friends<span class="_gs6"><span id="u_0_10">7 Mutual</span></span><span class="_513x"></span></a></li><li><a class="_6-6" href="https://www.facebook.com/princessaustine.alcantara.3/photos?lst=100002159119314%3A100022260619396%3A1507039852&source_ref=pb_friends_tl" data-tab-key="photos">Photos<span class="_513x"></span></a></li><li><div class="_6a uiPopover _6-6 _9rx" id="u_0_15"><a class="_9ry _p" href="#" aria-haspopup="true" aria-expanded="false" rel="toggle" role="button" id="u_0_16">More<i class="_bxy img sp_AWfL8SqGWNa sx_41c408"></i></a></div></li></ul></div><div class="name"><div class="photoContainer"><div><a class="profilePicThumb" href="https://www.facebook.com/photo.php?fbid=116140922471261&set=a.116141002471253.1073741826.100022260619396&type=3&source=11&referrer_profile_id=100022260619396" rel="theater" id="u_0_11"><img class="profilePic img" alt="Princess Austine Alcantara's Profile Photo, Image may contain: 1 person, smiling, closeup" src="https://scontent.fmnl4-1.fna.fbcdn.net/v/t1.0-1/c0.0.160.160/p160x160/22050231_116140922471261_8103110572544919612_n.jpg?oh=d942ae339c7c9dc7c8add2e3dd34f6c4&oe=5A413CB6" /></a></div><meta content="https://scontent.fmnl4-1.fna.fbcdn.net/v/t1.0-1/p50x50/22050231_116140922471261_8103110572544919612_n.jpg?oh=e43d8f6e5cfb1387f1a5d864b7947225&oe=5A3CC115" itemprop="image" /></div></div></div></div></div>
I tried to use this regex code below but it also match other items inside. How can i match only that specific url? Thanks
The class is dynamic.
(?i)(?<=a class=".+" href=").*?(?=" data-tab-key="about)
If you want to match the href, you can use [^"]+ inside of href, this way you regex will not capture more than what you need as it will be stopped by ".
You can then create something like href="([^"]*?)" data-tab-key="about".
I'd suggest avoiding using regex to match html though.
Try..
(?i)a class=".+" href="\K.*?(?=" data-tab-key="about)
I believe you are struggling to get a variable length look behind to work, which is
(?<=a class=".+" href=")
.+ in the above is not a valid syntax as it introduces variable length in a look behind. This is not supported in any of the regex engines I know(I would be happy to know if I'm wrong here).
That said in-order to emulate a variable length look-behind one could use the \K flag which resets the starting point of the match to the current position(there by dropping all the the previously grabbed items out of the final match).
Demo regex is here.

Selenium+Python, about find_element_by_xpath h3

Is there any way identify the button "Connect" by the string "Test Engine 0728" then click it with the method find_element_by_xpath or any other method in python+selenium environment. Thanks a lot!
<html
<head
<body
<div class="page" id="main-page"
<div class="controls" id="Engines"
<div class="devices" id="Devices-List"
<h3 class="device-name">Test Engine 0728 </h3>
</div>
<button>Connect</button>
...
This xpath should work for you:
driver.find_element_by_xpath("//h3[contains(text(),'Test Engine 0728')]/../../button[contains(text(),'Connect')]").click()
There are certainly multiple ways to find the button.
One option would be to start your xpath expression with the div with id Engines, check that it contains the h3 tag with Test Engine 0728 text in the div with Devices-List id. Then, get the button by Connect text:
button = driver.find_element_by_xpath('//div[#id="Engines" and div[#id="Devices-List"]/h3[contains(., "Test Engine 0728")]]/button[. = "Connect"]')
button.click()
Or, another option would be to find the div with Devices-List id, check for the h3 tag's text inside and get the following button sibling:
//div[#id="Devices-List" and h3[contains(., "Test Engine 0728")]]/following-sibling::button
This one also should work:
connectButtonClick = driver.find_element_by_xpath("//div[#class='controls'][#id='Engines'][contains(., 'Test Engine 0728')]//button[text()='Connect']").click()

Regex replace text between quotes and a little more

I'm using a new framework, OctoberCMS which has a system for linking pages with twig. This is an example of how it works
About Us
I have a huge table with loads of links inside the cells and need to change the link format to one that complies with october's.
Here's an example link in the table cells:
Lecture Notes
This is the format that it has to follow
<a href=
"/assets/<...long_path_to_file...>"> </a>
TO
<a href=
"{{ 'assets/<...long_path_to_file...>' |theme }}"> </a>
This regex should work for you:
match:
/href=("|')(.*)("|')/ig
and replace with
'href="{{ \'$2\'|theme }}"'
In javascript this would look like:
var a = // get anchor tag
a.replace(/href=("|')(.*)("|')/ig, 'href="{{ \'$2\'|theme }}"');
in php:
$htmlString = // get html string
$newHtml = preg_replace('/href=("|')(.*)("|')/ig', 'href="{{ \'$2\'|theme }}"', $htmlString);

Matching text that is not html tags with regular expression

So I am trying to create a regular expression that matches text inside different kinds of html tags. It should match the bold text in both of these cases:
<div class="username_container">
<div class="popupmenu memberaction">
<a rel="nofollow" class="username offline " href="http://URL/surfergal.html" title="Surfergal is offline"><strong><!-- google_ad_section_start(weight=ignore) -->**Surfergal**<!-- google_ad_section_end --></strong></a>
</div>
<div class="username_container">
<span class="username guest"><b><a>**Advertisement**</a></b></span>
</div>
I have tried with the following regular expression without any result:
/<div class="username_container">.*?((?<=^|>)[^><]+?(?=<|$)).*?<\/div>/is
This is my first time posting here on stackoverflow so if I am doing something incredibly stupid I can only apologize.
Using regex to parse html is.. hard. See the links in the comments to your question.
What do you plan to do with these matches? Here's a quick jquery script that logs the results in the console:
var a = [];
$('strong, b').each(function(){
a.push($(this).html());
});
console.log(a);
results:
["<!-- google_ad_section_start(weight=ignore) -->**Surfergal**<!-- google_ad_section_end -->", "<a>**Advertisement**</a>"] ​
http://jsfiddle.net/Mk7xf/