Specific word followed by full stop followed by another group RegEx - regex

Hi I'm having a little trouble with a regular expression.
I tested:
([Version]+)\.([.0-9A-Za-z]+)
With:
/Downloads/Documents/Access MDB - DEV Version.2.1.4.zip
This worked in RegexHero, my groups seemed fine (sort of).
However when I'm searching through HTML source code it returns things like:
e.axd
How would I get 2 groups:
Version.
2.1.4.zip
Or even one group?
Version.2.1.4.zip
I'm puzzling over this, regular expressions aren't my strong suit.

Version.2.1.4.zip in one group,
^.* (.*)$
DEMO

After using Avinash's answer as a baseline I found the solution as:
(version.*\.zip)
This matches what I needed
version.2.1.4.zip

Related

301 redierction, matching urls through regex. Matching dashes

I'm trying to match urls for a migration, however I can't seem to have a regex which matches it.
I've tried different expressions and using regex checkers to determine where exactly it's broken, but it's not clear to me
This is my regex
https:\/\/blog\.xyz\.ca\/EN\/post\/201[0-9]\/[0-9][0-9]\/[0-9][0-9]\/*\).aspx
I'm trying to match these kinds of urls (hundreds)
https://blog.xyz.ca/EN/post/2019/05/14/how-test-higher-education-test-can-test-more-test-students-and-test-sdf-the-test.aspx
https://blog.xyz.ca/EN/post/2019/05/14/how-test.aspx
https://blog.xyz.ca/EN/post/2019/05/14/how-test-higher-the-test.aspx
And remap them to something like this
https://blog.xyz.ca/2017/12/21/test-how-the-testaspx
I thought that I could match the dash section using the wildcard, but it seems to not be working and none of the generators are giving me a clear warning. I've tried https://regexr.com/ and https://www.regextester.com/
If I understand the problem right, here we might just want to have a simple expression and capture our desired URL components, according to which we would find our redirect rules, and we can likely start with:
(.+\.ca)\/EN\/post(\/[0-9]{4}\/[0-9]{2}\/[0-9]{2})(\/.+)\.aspx
and if necessary, we would be adding/reducing our constraints, and I'm guessing that no validation might be required.
Demo 1
or:
(.+\.ca)\/EN\/post(\/[0-9]{4}\/[0-9]{2}\/[0-9]{2})(\/.+)(\.aspx)
Demo 2

Reg Ex Help to find instances of ".xl" between brackets

I'm awful with RegEx and am in a time crunch. I'm trying to come up with a rule that will pull out instances of text captured between brackets that also include the phrase ".xl" in them.
Example String:
C:\Users[chris.xlm]\Desktop[Test1.xlsx]Sheet1'![$C$4]
What would get captured from the expression would be:
1. chris.xlm
2. Test1.xlsx
The pattern:
\[([^]]+?\.xl.*?)\]
should accomplish what you need.
The pattern grabs everything before and after any presence of .xl if it is in the text, including the full extension.
Revised thanks to C Perkin's comment.

regex to find domain without those instances being part of subdomain.domain

I'm new to regex. I need to find instances of example.com in an .SQL file in Notepad++ without those instances being part of subdomain.example.com(edited)
From this answer, I've tried using ^((?!subdomain))\.example\.com$, but this does not work.
I tested this in Notepad++ and # https://regex101.com/r/kS1nQ4/1 but it doesn't work.
Help appreciated.
Simple
^example\.com$
with g,m,i switches will work for you.
https://regex101.com/r/sJ5fE9/1
If the matching should be done somewhere in the middle of the string you can use negative look behind to check that there is no dot before:
(?<!\.)example\.com
https://regex101.com/r/sJ5fE9/2
Without access to example text, it's a bit hard to guess what you really need, but the regular expression
(^|\s)example\.com\>
will find example.com where it is preceded by nothing or by whitespace, and followed by a word boundary. (You could still get a false match on example.com.pk because the period is a word boundary. Provide better examples in your question if you want better answers.)
If you specifically want to use a lookaround, the neative lookahead you used (as the name implies) specifies what the regex should not match at this point. So (?!subdomain\.)example trivially matches always, because example is not subdomain. -- the negative lookahead can't not be true.
You might be better served by a lookbehind:
(?<!subdomain\.)example\.com
Demo: https://regex101.com/r/kS1nQ4/3
Here's a solution that takes into account the protocols/prefixes,
/^(www\.)?(http:\/\/www\.)?(https:\/\/www\.)?example\.com$/

RegEx to find all possible relative links to a specific file - also capture link text

Yes, there's hundreds of [regex] [html] topics on SO, but the first 30 I've checked don't help me with my problem.
I've got 745 total links (all relative, and they have to stay relative) to a file in my site. I need to find all these links and append data before and after them. I also need to capture and use the link text.
I've tried several expressions and the regex below is the closest I can get, but it's not good enough - it keeps finding a few instances of some other href to a different file and captures the content all the way to the </a> of the file I actually care about.
<a href="((.)*?)?myFile.html((.)*?)?>((.)*?)?</a>
In the above, I need to capture the relative path to the file and any anchors that might be present, as well as the actual link text.
What regex should I be using?
It shouldn't matter, but I'm using Adobe Dreamweaver to perform the search.
The following regex should work for what you need:
<a href="([^"]*?a\.fparameters\.html)(#[^"]+?)?".*?>(.*?)<
It will work even if you have URLs like:
JOBMAXNODECOUNT
that do not have #xxxx.
A few examples:
For JOBMAXNODECOUNTyou will get:
Group 1: a.fparameters.html
Group 2: #jobmaxnodecount
Group 3: JOBMAXNODECOUNT
For mjobctl -m to modify the job after it has been submitted. See the RSVSEARCHALGO you will get only one match:
Group 1: a.fparameters.html
Group 2: #rsvsearchalgo
Group 3: RSVSEARCHALGO
Try this regex: (updated)
href="([^"]*?)myFile\.html#?([^"]*).*?>(.*?)<\/a>
Explained demo here: http://regex101.com/r/lA6vB7
First, never do this: (.)* ...or this: (?:.)*
The first one consumes one character at a time and captures it in a group, each time overwriting previous captured character. The second one avoids most of that overhead by using a non-capturing group, but it's still only matching one character at a time inside that group; why bother? All it's doing is cluttering up the regex.
Adding the ? to make it non-greedy -- e.g. (.)*?-- doesn't make it worse, but it doesn't help, either. And sticking that inside another group and making the group optional -- i.e. ((.)*?)? -- is a recipe for catastrophic backtracking.. But performance considerations aside, when I see a capturing group with a quantifier attached, it almost always turns out mistake on the author's part. (ref)
As for your question, my solution turns out to be almost identical to Oscar's:
([^<>]*)

Regular expression with negative look aheads

I am trying to contruct a regular expression to remove links from content unless it contains 1 of 2 conditions.
<a.*?href=[""'](http[s]?:\/\/(.*?)\.link\.com)?\/(?!m\/).*?<\/a>
This will match any link to link.com that does not have m/ at the end of the domain section. I want to change this slightly so it does't match URLs that are links to pdf files regardless of having the m/ in the url, I came up with:
<a.*?href=["'](http[s]?:\/\/(.*?)\.brodies\.com)?\/(?!m\/).*?\.(?!pdf)["'].*?<\/a>
Which is ooh so very close except now it will only match if the URL has a "." at the end - I can see why it's doing it. I can't seem to make the "." optional as this causes the non greedy pattern prior to the "." to keep going until it hits the ["']
Any help would be good to help solve this.
Thanks
Paul
You probably want to use (?<!\.pdf)["'] instead of \.(?!pdf)["'].
But note that this expression has several issues, best way to solve them is to use a proper HTML parser.
First, RegEx match open tags except XHTML self-contained tags.
That said, (since it probably will not deter,) here is a slightly-better-constrained version of what you're trying to, with the caveat that this is still not good enough!
<a[^>]+?href\s*=\s*["'](https?:\/\/[^"']*?\.link\.com)?\/(?!m\/)[^"']*?\.(?!pdf)[^"']*?["'][^>]*?>.*?<\/a>
You can see a running example of this regex at: http://rubular.com/r/obkKrKpB8B.
Your problem was actually just that you were looking for a quote character immediately after the dot, here: .(?!pdf)["'].