Adding regex strings in vb.net - regex

I have a string and I can find the following
Kbps
Duration
Mb
Song Title
Website
http://abmp3.com/
I can't seem to find the URL i used Expresso to create the regex and used the source from the webpage to get matches but for some reason when i add this href="(.*.mp3)" to the end of the string it won't find anything. The kbps,duration,and mb are on all on the same line. The Song Title is on a different line and so is the URL
My question is how would you add the href="(.*.mp3)" to the end of the regex string?
Regex Code
":6px;"">(.* Kbps)<br>(.*)<br> (.* Mb)</div></td>\D+\S+<strong>(.*) mp3"
Need to add this to the end
href="(.*.mp3)"
Thanks in advance!

Looking at the website, it appears this would work for you:
href=\".*\.mp3\"

Related

How to extract file name from URL?

I have file names in a URL and want to strip out the preceding URL and filepath as well as the version that appears after the ?
Sample URL
Trying to use RegEx to pull, CaptialForecasting_Datasheet.pdf
The REGEXP_EXTRACT in Google Data Studio seems unique. Tried the suggestion but kept getting "could not parse" error. I was able to strip out the first part of the url with the following. Event Label is where I store URL of downloaded PDF.
The URL:
https://www.dudesolutions.com/Portals/0/Documents/HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
REGEXP_EXTRACT( Event Label , 'Documents/([^&]+)' )
The result:
HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
Now trying to determine how do I pull out everything after the? where the version data is, so as to extract just the Filename.pdf.
You could try:
[^\/]+(?=\?[^\/]*$)
This will match CaptialForecasting_Datasheet.pdf even if there is a question mark in the path. For example, the regex will succeed in both of these cases:
https://www.dudesolutions.com/somepath/CaptialForecasting_Datasheet.pdf?ver
https://www.dudesolutions.com/somepath?/CaptialForecasting_Datasheet.pdf?ver
Assuming that the name appears right after the last / and ends with the ?, the regular expression below will leave the name in group 1 where you can get it with \1 or whatever the tool that you are using supports.
.*\/(.*)\?
It basically says: get everything in between the last / and the first ? after, and put it in group 1.
Another regular expression that only matches the file name that you want but is more complex is:
(?<=\/)[^\/]*(?=\?)
It matches all non-/ characters, [^\/], immediately preceded by /, (?<=\/) and immediately followed by ?, (?=\?). The first parentheses is a positive lookbehind, and the second expression in parentheses is a positive lookahead.
This REGEXP_EXTRACT formula captures the characters a-zA-Z0-9_. between / and ?
REGEXP_EXTRACT(Event Label, "/([\\w\\.]+)\\?")
Google Data Studio Report to demonstrate.
Please try the following regex
[A-Za-z\_]*.pdf
I have tried it online at https://regexr.com/. Attaching the screenshot for reference
Please note that this only works for .pdf files
Following regex will extract file name with .pdf extension
(?:[^\/][\d\w\.]+)(?<=(?:.pdf))
You can add more extensions like this,
(?:[^\/][\d\w\.]+)(?<=(?:.pdf)|(?:.jpg))
Demo

Regex: Get subtext from a string

I have a list of text lines. Each line contains a title and a URL as follows:
product-title-7134 http://domain.com/page-1
another-product-title-822 http://domain.com/page-218
etc.
Using only .NET regex, please help me extract the url from each line.
I understand it can be done by looking at the string from the end until the http is met and output that part but I don't know the exact regex formula for that. Any help is much appreciated.
I would do that with this regex:
http://(\S+)
And find first group in every match.
This regex will math all https:// and http:// links:
(http|https)(://\S+)
You can test this in the .NET regex tester: http://regexstorm.net/tester

Google Analytics Regex Advanced filter to include and exclude keyword and filetype

Okay so I need to create an advance filter in Google Analytics that includes "breast", but DOES NOT include "before" "after" or "blog" in the url. I also want to filter out .jpg file extensions.
Here are example URLs that I want the filter to return:
http://www.doctortaylor.com/breast-lift-surgery/
http://www.doctortaylor.com/breast-augmentation-pasadena-and-los-angeles-area/
I want to filter out any urls that are before and after photo pages, and any actual .jpg file urls.
I'm a regex beginner, but this is pretty advanced. Any help would be greatly appreciated!!
This regular expression does fairly well:
^(?!before|after|blog)*((?!before|after|blog).)*breast(?!before|after|blog|\.jpg)*((?!before|after|blog|\.jpg).)*$
UPDATED: I have updated the expression to capture all scenarios, even characters that begin or end the string. This regular expression excludes all words that you list in your description and correctly identifies the word breast.
MATCHES
http://www.doctortaylor.com/breast-lift-surgery/
http://www.doctortaylor.com/breast-augmentation-pasadena-and-los-angeles-area/
DOES NOT MATCH
http://www.doctortaylor.com/breast-lift-surgeryblog/
http://www.doctortaylor.com/breast-lift-surgery.jpg/
http://blog.doctortaylor.com/breast-lift-surgery/
http://www.doctortaylor.com/after-breast-lift-surgery/
This regular expression uses an equivalent of inverse matching.

Notepad++: Multiple line search & multiple line replace

I'm having a battle with a regex. (MOBI creation)
I have two files: one with XML, the other an HTML table of contents.
The important parts of the XML:
<navPoint id="_NeedsHTMLid" playOrder="40">
<navLabel><text>Needs anchor text from link.)</text></navLabel>
...
The HTML TOC, of course, looks like:
schema.org Article Mark-up
======
Hours and hours... worked with Textpad forever. Saw remarks here, now I'm using NotePad++... some of the regex results are different (NOT that I had it working anyway.) #_[\b(\w\b] was returning the ID: now? Not so much!
Does anyone know how to yank both the ID and the anchor text out of these? I'd be so grateful.
You can use this to get the id and the anchor text at the same time:
_(\w+)\b|([a-Z\s.]+[)]+)
#_[\b(\w\b] is not a valid regex. Try _([^"]+)\b.
Edited: try [^"] in place of \w.
If you want to match the ids and the text, go to Search > Find menu (shortcut CTRL+F) and do the following:
Find what:
id="([a-zA-Z0-9\-\:\_\.]+)"|<text>(.+?)<\/text>
Select radio button "Regular Expression"
Then press Find All in Current Document
You can test it with your example at regex101.
Here's a StackOverflow post about valid id names.
I didn't provided you with a Search and Replace solution, since you didn't mentioned anything about a replacement.

Yahoo Pipes - Remove all word or tag before match

for example, I have some feed, with an item title like this
Some text is better than one text http://t.co/blablabla #hashtag
then I want to get only the URL using regex like this
http://t.co/blablabla
how do i do that ?
(sorry I use google translate to make this question)
thanks for answer
Here's a random one I found with a simple google search:
(http|ftp|https)://([\w-_]+(?:(?:.[\w-_]+)+))([\w-.,#?^=%&:/~+#]*[\w-\#?^=%&/~+#])?