Regex specific Param from Uri - regex

Simply put, I pull the href prop of a link and need to replace it with new link when clicked. The new link needs 1 parameter from the original link (a claim link opening a new window and claiming a task for a user).
Thus far I have a working solution. What I'm wanting is for someone to maybe help me refine my RegEx a little.
For links like:
/crm/v2/claimTask?email=example#gmail.com&id=1372365392-1UsIvb-0002qr-Sz
I use:
$(this).prop("href").match(/(email|order|phone|num)=\s*?(.+)&/)[0].replace(/&/, '')
And get:
email=example#gmail.com
What i'd like to do is be able to remove .replace(/&/, '') and have the regex stop at the & symbol to begin with, but i'm unsure how to do this. Any ideas?
Further examples:
/crm/v2/claimTask?order=123456&id=137236456452-1UweRRwvb-00456jr-Sz
/crm/v2/claimTask?phone=6665554444&id=175655392-4WERTe4-097qt-Da
/crm/v2/claimTask?num=6665554444&id=1372234392-9sfaWa-12374ip-eW
/crm/v2/claimTask?email=email#test.net&id=133453465392-k0wS24S-36735qr-rt
Using:
$(this).prop("href").match(/(email|order|phone|num)=\s*?(.+)&/)
Would yield:
order=123456&
phone=6665554444&
num=6665554444&
email=email#test.net&

Try this:
$(this).prop("href").match(/((email|order|phone|num)=\s*?(.+))&/)[1] //"email=email#test.net"
$(this).prop("href").match(/((email|order|phone|num)=\s*?(.+))&/)[3] //"email#test.net"
The above just puts the part without the & into a capture group. You could also use a positive lookahead:
$(this).prop("href").match(/(email|order|phone|num)=\s*?(.+)(?=&)/) //["email=email#test.net", "email", "email#test.net"]

Just use a lookahead:
(email|order|phone|num)=\s*?(.+)(?=&)
It will not "eat" the ampersand.

Related

Regex Optional Conditional Exact Match?

I have a Regex that looks like this:
(?<Number>\d{3})-?(?<Hand>R?L?)[-\s]?(?<Description>.*?)?(?<ShnOpp>SHN|OPP)?$
With some sample data:
104-RL-BLAH BLA SHN
104-RL FOO OPP
102-RL-BAR WL74
102-BAR WL74
102-R-BAR WL74 SHN
102-R-BAR WL74 OPP
So, the named group Hand can either contain RL|R|L|{Blank}.
But, if and only if, Hand="RL" do I want to match ShnOpp with SHN|OPP, otherwise just leave it as part of the description. So, can I do a literal IF condition within my regex?
Either my Googling skills failed me or maybe you just can't do it, but I'd love to be proved wrong.
Here's a link to a working sample: https://regex101.com/r/wGghbV/2
You can't use a conditional to check that a certain group captured one exact text, however it is possible to use a conditional here by adding a new group that only matches RL like:
(?<Number>\d{3})-?(?<Hand>(?<RL>RL)|[RL]?)[ \-]?(?<Description>.*?)[ \-]?(?(RL)(?<ShnOpp>SHN|OPP)?)$
Your updated sample: https://regex101.com/r/wGghbV/3

How to extract file name from URL?

I have file names in a URL and want to strip out the preceding URL and filepath as well as the version that appears after the ?
Sample URL
Trying to use RegEx to pull, CaptialForecasting_Datasheet.pdf
The REGEXP_EXTRACT in Google Data Studio seems unique. Tried the suggestion but kept getting "could not parse" error. I was able to strip out the first part of the url with the following. Event Label is where I store URL of downloaded PDF.
The URL:
https://www.dudesolutions.com/Portals/0/Documents/HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
REGEXP_EXTRACT( Event Label , 'Documents/([^&]+)' )
The result:
HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
Now trying to determine how do I pull out everything after the? where the version data is, so as to extract just the Filename.pdf.
You could try:
[^\/]+(?=\?[^\/]*$)
This will match CaptialForecasting_Datasheet.pdf even if there is a question mark in the path. For example, the regex will succeed in both of these cases:
https://www.dudesolutions.com/somepath/CaptialForecasting_Datasheet.pdf?ver
https://www.dudesolutions.com/somepath?/CaptialForecasting_Datasheet.pdf?ver
Assuming that the name appears right after the last / and ends with the ?, the regular expression below will leave the name in group 1 where you can get it with \1 or whatever the tool that you are using supports.
.*\/(.*)\?
It basically says: get everything in between the last / and the first ? after, and put it in group 1.
Another regular expression that only matches the file name that you want but is more complex is:
(?<=\/)[^\/]*(?=\?)
It matches all non-/ characters, [^\/], immediately preceded by /, (?<=\/) and immediately followed by ?, (?=\?). The first parentheses is a positive lookbehind, and the second expression in parentheses is a positive lookahead.
This REGEXP_EXTRACT formula captures the characters a-zA-Z0-9_. between / and ?
REGEXP_EXTRACT(Event Label, "/([\\w\\.]+)\\?")
Google Data Studio Report to demonstrate.
Please try the following regex
[A-Za-z\_]*.pdf
I have tried it online at https://regexr.com/. Attaching the screenshot for reference
Please note that this only works for .pdf files
Following regex will extract file name with .pdf extension
(?:[^\/][\d\w\.]+)(?<=(?:.pdf))
You can add more extensions like this,
(?:[^\/][\d\w\.]+)(?<=(?:.pdf)|(?:.jpg))
Demo

URL regex group catching

Hello I'm trying to find a regex that would catch the terms in a url.
For example, given:
https://stackoverflow.com, it would catch "stackoverflow"
and given https://stackoverflow.com/questions/ask, it would catch "stackoverflow", "questions", "ask" and any potential terms in between the slash character after the domain name.
Up until now I managed to find the following regex but it cannot repeat catching groups
https?:\/\/(?:www\.)?([\da-z-]*)(?:[\.a-z]*)(?:\/([\da-z]*)\/?)+
Do you guys have any ways to resolve that issue?? that would be great.
I testet the answer of Michal M it appears not to get "www." so I updated it
/(?:\/(?:w{3}\.)?)\K([\w]+)/i
Edit: As soon as it's not important to match the "www." I placed it inside a non capturing group so it won't be captured. Btw I also placed the case insensitive modifier so "WWW." would be okay too.
Try this one:
(?:(\/))\K(\w+)
tested in notepad++
You may try using two separate regexes -- one for the hostname part and another for the terms in the path part. Then combine them with alternation construction and do global search:
https?:\/\/(?:\w+\.)*(\w+)\.\w+ # this would capture hostname "term"
|
\/(\w+) # this would capture path "terms"
(Note: requires /x modifier.)
Demo: https://regex101.com/r/nA8jT9/2
Thanks I managed to rearrange it for it to work with the "www"
(?:\/(?:www\.)?)\K([\w\d]+)

extract email address from Notepad++ using regex

I am trying to extract email addresses from notepad++ using RegEx.
I tried like this
Find and Replace
Find: (\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)
Replace : .\1
I am loosing email address instead of text. I need remove all text and keep only email addresses in the file. How to do that?
Abilash Perumandla
hi Gunpreet, kindly share your thoughts to Abi#TEKperfekt.com
Pratap Aneel
15d
Pratap Aneel
please share your thoughts to Pratap.kumar#rsrit.com
naveen kumar
15d
naveen kumar
You need to match and capture the email with a (...) subpattern (so, you do that right), but you need to just match everything else (and that part is missing).
Use
Find what: (\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)|.
Replace with: $1
Then, you might want to use Edit -> Blank Operations -> Remove Unnecessary Blank and EOL menu option.

Regex in Notepad++ to move contents of an element to an attribute value

I'm trying to solve a regex riddle. Let's say I have rows of hrefs looking like this:
anchor1.in
an3.php
setup.exe
What I want the regex (or any other solution) to do is to take the href title and copy it over to the actual url with a foward slash in front of it.
A successful result would become:
anchor1.in
an3.php
setup.exe
If you can solve this please explain how you did it.
You can use the following to match:
(<a\s+href=")(.*?)(">)(.*?)(<\/a>)
And replace with:
\1\2/\4\3\4\5
See DEMO and Explanation