Regex/Notepad++ to strip everything but links from an HTML page?

Regex/Notepad++ to strip everything but links from an HTML page? - regex

I have a long page of HTML code. Sprinkled in the code is a variety of links in the form of
Whatever
What regex do I need for Search|Replace in Notepad++ to strip out the entire page of HTML and just leave an isolated listing of the links like this:
whatever.com
whichever.com
whoever.com

Use the following:
Find:
^[^"]+.([^"]+).*
Replace:
$1
To filter the title,
Find:
</?[^>]*.
Replace:
empty string

Related

How to find a specific between table HTML tags RegEx

I want to find a RegEx that allows me to find a specific text between HTML table tags.
I have: This is a test text <tr><td>text inside table</td></tr> and I want the RegEx to return me just the second 'text' because it is inside the table.
I have tried <tr>(text)<\/tr> but returns nothing.
It needs to be done with RegEx it cannot be done with a HTML parser

Your <tr>(text)<\/tr> matches only <tr>text</tr>, but you have other text around.
So you need <tr>.*?(text).*?<\/tr> for that

Regex to convert BBCode link to HTML link

I am using TinyMCE 4.4 in which, the content source is either HTML and BBCode. A user can insert link in BBCode view and convert to the HTML.
For example BBCode link: [url href=http://test.com]test[/url], I need regex which will convert to the according HTML link.
Below lines there in the TinyMCE BBCode plugin but it seems to be not working.
rep(/\[url=([^\]]+)\](.*?)\[\/url\]/gi, "$2");
rep(/\[url\](.*?)\[\/url\]/gi, "$1");
Ideally, Regex should convert above BBCode link to test but it should handle complex URL(with query string params) as well.
Any thoughts how I can make it?

The problem is that your BBCode is wrong. It should be without the href= simply url=
[url=http://test.com]test[/url]
Then the regexes should do their job correctly.
The regex: /[url=([^]]+)](.*?)[\/url]/gi, "$2" applies for the case [url=http://test.com]some plain text[/url]
The regex /[url](.*?)[\/url]/gi, "$1" is used for the case [url]http://test.com[/url]
See it in action here

Regex to convert URLs to HTML <a href> hyperlinks in Notepad++?

I have a list of URLs in a text file I am trying to change to HTML, but I'm failing miserably.
My URLs are in this format:
http://mydomain.com/here-are-my-links.html
Does anybody know of a regex search/replace command I can run in Notepad++ to change my URL list to this format:
here are my links

Use the regex
(http://mydomain.com/(.*?)\.html)
and replace it with
\2
If you want to change - into space you can do this
-(?=[^<>]*?</a>)
and replace it with

Get link text and href and wrap it in other html tags

How can stuff like this:
<b><a class='visit' href='LINK'>LINK'S NAME</a></b>
Can be turned into this:
<tr><td>LINK'S NAME<а href="LINK">constant text</a></td></tr>

You should never use regex for HTML manipulation, unless you have a good reason to do so.
regex for match:
/<b>\s*<a\s+class='visit'\s+href='([^']*)'\s*>([^<]+)<\/a>\s*<\/b>/
replacemenet:
"<tr><td>$2<а href=\"$1\">constant text</a></td></tr>"

to remove href attr from all the anchor tags using Reg Ex

i am trying to remove href attr from all the anchor tags using Reg Ex,for print page.
Although i need the text value in anchors.

Couldn't you just style your links using CSS? You could create two separate styles: one for "normal" web visits, and one for print. In the print-style, you don't visually enhance the links.

Never parse markup with regular expressions. BAD!
http://htmlparsing.icenine.ca

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex/Notepad++ to strip everything but links from an HTML page? - regex

I have a long page of HTML code. Sprinkled in the code is a variety of links in the form of Whatever What regex do I need for Search|Replace in Notepad++ to strip out the entire page of HTML and just leave an isolated listing of the links like this: whatever.com whichever.com whoever.com

Use the following: Find: ^[^"]+.([^"]+).* Replace: $1 To filter the title, Find: </?[^>]*. Replace: empty string

Related

How to find a specific between table HTML tags RegEx

Regex to convert BBCode link to HTML link

Regex to convert URLs to HTML <a href> hyperlinks in Notepad++?

Get link text and href and wrap it in other html tags

to remove href attr from all the anchor tags using Reg Ex

Categories

Resources