Specific word followed by full stop followed by another group RegEx

Specific word followed by full stop followed by another group RegEx - regex

Hi I'm having a little trouble with a regular expression.
I tested:
([Version]+)\.([.0-9A-Za-z]+)
With:
/Downloads/Documents/Access MDB - DEV Version.2.1.4.zip
This worked in RegexHero, my groups seemed fine (sort of).
However when I'm searching through HTML source code it returns things like:
e.axd
How would I get 2 groups:
Version.
2.1.4.zip
Or even one group?
Version.2.1.4.zip
I'm puzzling over this, regular expressions aren't my strong suit.

Version.2.1.4.zip in one group,
^.* (.*)$
DEMO

After using Avinash's answer as a baseline I found the solution as:
(version.*\.zip)
This matches what I needed
version.2.1.4.zip

Related

301 redierction, matching urls through regex. Matching dashes

I'm trying to match urls for a migration, however I can't seem to have a regex which matches it.
I've tried different expressions and using regex checkers to determine where exactly it's broken, but it's not clear to me
This is my regex
https:\/\/blog\.xyz\.ca\/EN\/post\/201[0-9]\/[0-9][0-9]\/[0-9][0-9]\/*\).aspx
I'm trying to match these kinds of urls (hundreds)
https://blog.xyz.ca/EN/post/2019/05/14/how-test-higher-education-test-can-test-more-test-students-and-test-sdf-the-test.aspx
https://blog.xyz.ca/EN/post/2019/05/14/how-test.aspx
https://blog.xyz.ca/EN/post/2019/05/14/how-test-higher-the-test.aspx
And remap them to something like this
https://blog.xyz.ca/2017/12/21/test-how-the-testaspx
I thought that I could match the dash section using the wildcard, but it seems to not be working and none of the generators are giving me a clear warning. I've tried https://regexr.com/ and https://www.regextester.com/

If I understand the problem right, here we might just want to have a simple expression and capture our desired URL components, according to which we would find our redirect rules, and we can likely start with:
(.+\.ca)\/EN\/post(\/[0-9]{4}\/[0-9]{2}\/[0-9]{2})(\/.+)\.aspx
and if necessary, we would be adding/reducing our constraints, and I'm guessing that no validation might be required.
Demo 1
or:
(.+\.ca)\/EN\/post(\/[0-9]{4}\/[0-9]{2}\/[0-9]{2})(\/.+)(\.aspx)
Demo 2

Reg Ex Help to find instances of ".xl" between brackets

I'm awful with RegEx and am in a time crunch. I'm trying to come up with a rule that will pull out instances of text captured between brackets that also include the phrase ".xl" in them.
Example String:
C:\Users[chris.xlm]\Desktop[Test1.xlsx]Sheet1'![$C$4]
What would get captured from the expression would be:
1. chris.xlm
2. Test1.xlsx

The pattern:
\[([^]]+?\.xl.*?)\]
should accomplish what you need.
The pattern grabs everything before and after any presence of .xl if it is in the text, including the full extension.
Revised thanks to C Perkin's comment.

regex to find domain without those instances being part of subdomain.domain

I'm new to regex. I need to find instances of example.com in an .SQL file in Notepad++ without those instances being part of subdomain.example.com(edited)
From this answer, I've tried using ^((?!subdomain))\.example\.com$, but this does not work.
I tested this in Notepad++ and # https://regex101.com/r/kS1nQ4/1 but it doesn't work.
Help appreciated.

Simple
^example\.com$
with g,m,i switches will work for you.
https://regex101.com/r/sJ5fE9/1
If the matching should be done somewhere in the middle of the string you can use negative look behind to check that there is no dot before:
(?<!\.)example\.com
https://regex101.com/r/sJ5fE9/2

Without access to example text, it's a bit hard to guess what you really need, but the regular expression
(^|\s)example\.com\>
will find example.com where it is preceded by nothing or by whitespace, and followed by a word boundary. (You could still get a false match on example.com.pk because the period is a word boundary. Provide better examples in your question if you want better answers.)
If you specifically want to use a lookaround, the neative lookahead you used (as the name implies) specifies what the regex should not match at this point. So (?!subdomain\.)example trivially matches always, because example is not subdomain. -- the negative lookahead can't not be true.
You might be better served by a lookbehind:
(?<!subdomain\.)example\.com
Demo: https://regex101.com/r/kS1nQ4/3

Here's a solution that takes into account the protocols/prefixes,
/^(www\.)?(http:\/\/www\.)?(https:\/\/www\.)?example\.com$/

RegEx to find all possible relative links to a specific file - also capture link text

Yes, there's hundreds of [regex] [html] topics on SO, but the first 30 I've checked don't help me with my problem.
I've got 745 total links (all relative, and they have to stay relative) to a file in my site. I need to find all these links and append data before and after them. I also need to capture and use the link text.
I've tried several expressions and the regex below is the closest I can get, but it's not good enough - it keeps finding a few instances of some other href to a different file and captures the content all the way to the </a> of the file I actually care about.
<a href="((.)*?)?myFile.html((.)*?)?>((.)*?)?</a>
In the above, I need to capture the relative path to the file and any anchors that might be present, as well as the actual link text.
What regex should I be using?
It shouldn't matter, but I'm using Adobe Dreamweaver to perform the search.

The following regex should work for what you need:
<a href="([^"]*?a\.fparameters\.html)(#[^"]+?)?".*?>(.*?)<
It will work even if you have URLs like:
JOBMAXNODECOUNT
that do not have #xxxx.
A few examples:
For JOBMAXNODECOUNTyou will get:
Group 1: a.fparameters.html
Group 2: #jobmaxnodecount
Group 3: JOBMAXNODECOUNT
For mjobctl -m to modify the job after it has been submitted. See the RSVSEARCHALGO you will get only one match:
Group 1: a.fparameters.html
Group 2: #rsvsearchalgo
Group 3: RSVSEARCHALGO

Try this regex: (updated)
href="([^"]*?)myFile\.html#?([^"]*).*?>(.*?)<\/a>
Explained demo here: http://regex101.com/r/lA6vB7

First, never do this: (.)* ...or this: (?:.)*
The first one consumes one character at a time and captures it in a group, each time overwriting previous captured character. The second one avoids most of that overhead by using a non-capturing group, but it's still only matching one character at a time inside that group; why bother? All it's doing is cluttering up the regex.
Adding the ? to make it non-greedy -- e.g. (.)*?-- doesn't make it worse, but it doesn't help, either. And sticking that inside another group and making the group optional -- i.e. ((.)*?)? -- is a recipe for catastrophic backtracking.. But performance considerations aside, when I see a capturing group with a quantifier attached, it almost always turns out mistake on the author's part. (ref)
As for your question, my solution turns out to be almost identical to Oscar's:
([^<>]*)

Regular expression with negative look aheads

I am trying to contruct a regular expression to remove links from content unless it contains 1 of 2 conditions.
<a.*?href=[""'](http[s]?:\/\/(.*?)\.link\.com)?\/(?!m\/).*?<\/a>
This will match any link to link.com that does not have m/ at the end of the domain section. I want to change this slightly so it does't match URLs that are links to pdf files regardless of having the m/ in the url, I came up with:
<a.*?href=["'](http[s]?:\/\/(.*?)\.brodies\.com)?\/(?!m\/).*?\.(?!pdf)["'].*?<\/a>
Which is ooh so very close except now it will only match if the URL has a "." at the end - I can see why it's doing it. I can't seem to make the "." optional as this causes the non greedy pattern prior to the "." to keep going until it hits the ["']
Any help would be good to help solve this.
Thanks
Paul

You probably want to use (?<!\.pdf)["'] instead of \.(?!pdf)["'].
But note that this expression has several issues, best way to solve them is to use a proper HTML parser.

First, RegEx match open tags except XHTML self-contained tags.
That said, (since it probably will not deter,) here is a slightly-better-constrained version of what you're trying to, with the caveat that this is still not good enough!
<a[^>]+?href\s*=\s*["'](https?:\/\/[^"']*?\.link\.com)?\/(?!m\/)[^"']*?\.(?!pdf)[^"']*?["'][^>]*?>.*?<\/a>
You can see a running example of this regex at: http://rubular.com/r/obkKrKpB8B.
Your problem was actually just that you were looking for a quote character immediately after the dot, here: .(?!pdf)["'].

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Specific word followed by full stop followed by another group RegEx - regex

Version.2.1.4.zip in one group, ^.* (.*)$ DEMO

After using Avinash's answer as a baseline I found the solution as: (version.*\.zip) This matches what I needed version.2.1.4.zip

Related

301 redierction, matching urls through regex. Matching dashes

Reg Ex Help to find instances of ".xl" between brackets

regex to find domain without those instances being part of subdomain.domain

RegEx to find all possible relative links to a specific file - also capture link text

Regular expression with negative look aheads

Categories

Resources