Regex to pull value from middle of file path - regex

I am trying to figure out how to pull the following string out of a folder path... I want to pull COMPANY_NAME from the below folder path. Is there a way to use REGEX to pull string between 2nd and 3rd backslash?
Example:
\10.20.3.23\S$\COMPANY_NAME\Main_5e08a942f39a430db0b081736a3f1881\C_VOL-b002.spf

Try this (?(DEFINE)(?<urlPart>[^\\\s]+))\\\\(?&urlPart)\\(?&urlPart)\\\K(?&urlPart) demo
It will match the desired part of the URL you are after. Things to note:
The url does not need to start at the beginning of the string (if you require this add ^ after the define group)
It will match many urls in the same string
It will match even if there is no file name
White space will invalidate the match
See the demo for details
If you were wondering it uses subroutine definitions to reuse parts of the regex.

Related

Optional grouping with Regex in Eclipse

I have a file with many different file path locations. Some of them have multiple directory depth and some don't. What I need to do is prepend a directory /WEB_ROOT/ to all file path locations in the file.
For example
index.jsp -> /WEB_ROOT/index.jsp
/instructor/assigned_appts.jsp -> /WEB_ROOT/instructor/assigned_appts.jsp
I have tried this one ([\/_]?[A-Za-z]*).jsp to try and capture the optional _ and / values but this doesn't match properly.
/instructor/assigned_appts.jsp only matches _appts.jsp
I have tried this as well ([\/_]?[A-Za-z])*.jsp which properly matches all expected file paths but when I replace I only get the last letter instead of the full group
So a replace with /WEB_ROOT/$1.jsp gives the following
index.jsp -> /WEB_ROOT/x.jsp
/instructor/assigned_appts.jsp -> /WEB_ROOT/s.jsp
Help please!
You can match the whole line, and as [\/_]? is optional, make sure that you match at least a single char A-Za-z before the .jsp
If you want to replace with group 1 like /WEB_ROOT/$1 you can also capture the .jsp
(.*[A-Za-z]\.jsp)
Note sure if supported in eclipse, but you might also just get the whole match and use $0 instead of group 1
.*[A-Za-z]\.jsp
If .jsp is at the end of the string, you can append an anchor .*[A-Za-z]\.jsp$

Visual studio code find replace another part of the string used in the find

I am using visual studio code to find and replace another part of the string.
The string will always contain the string "sitemap" without the quotes but i want to remove index.html
Some examples of what i need replaced:
front/index.htmltemplate.xsl
to
front/template.xsl
com/index.htmlwp-sitemap
to
com/wp-sitemap
Some my attempts on the vs studio code regex search box incude
sitemap[^"']*index.html
and
sitemap.*?(?=index.html)
but neither is identifying all of the strings that need replacing
This might do the trick for you: (index\.html)(?=.+(sitemap))
Explanation:
() = Group multiple tokens together to create a capture group
(index.html) = Create a capture group around your target string to replace
(?=.+(sitemap)) = Create a capture group for sitemap and allow for any type and number characters between sitemap and index.html until reaching "sitemap".
?= means this is a "positive lookahead" meaning it will match a group after the main expression without including it in the result. In this case it means it will match sitemap and any chars before it without including it in your result -- so you just get index.html.
https://regexr.com/6iipm

Regex Adding a URL path except the current one I'm at

I'm trying to add something along the lines of this regex logic.
For Input:
reading/
reading/123
reading/456
reading/789
I want the regex to match only
reading/123
reading/456
reading/789
Excluding reading/.
I've tried reading\/* but that doesn't work because it includes reading/
You must escape your backslashes in Hugo, \\/\\d+.

Matching redirect on url end, ignoring the substring

Im currently trying to redirect from and old website to the new one.
The domain has changed and the subpath has changed, but the end is always the same, so I am trying to create a regex that will ignore the subpath, and only match with the ending, no matter what the combination might be.
Example:
http://shop.kmsport.dk/team-sport/bolde/fodbolde
https://kmsport.dk/collections/fodbolde
http://shop.kmsport.dk/fodbolde/fodbold-udstryr/anforerbind-325
These 3 urls all contain the word "fodbolde" but I only wanna match up the first two, since they both end on "/fodbolde", and ignoring the subpath in the process.
So far I've been able to match up the ends with this:
\/([a-zA-Z]*)*+$
How do I create something to account for the different subpaths?
P.s Its a massive sporting good store, so would be nice not having to creating a unique redirect for every possible combination -.-
If you are only interested in the last part just go with
url.rsplit('/', 1)[-1]
You current regex is not taking /fodbolde into account. If that has to be at the end you could use $ to assert the end of the string like /fodbolde$
One possibility could be to match the start of the string ^https?:// and optionally match shop. (?:shop\.)? followed by kmsport\.sk/
Then use a repeating pattern matching not a forward slash followed by a forward slash zero or more times (?:[^/]+/)* and at the end of the string match fodbolde fodbolde$
^https?://(?:shop\.)?kmsport\.dk/(?:[^/]+/)*fodbolde$

Match url with uppercase letters except if it contains a filename like .jpg,.css,.js etc

I need a Regular Expression that can match url with uppercase letters but do not match if it contains a filename like .jpg,.css,.js etc
I want to redirect all uppercase url to lowercase but only when it is not pointing to a file resource.
Try using a regex visualizer like regexpal.com.
Here's an example of a regular expression that approximates what you're trying to do:
\w+\.(?:com|net)(?:/[A-Z]+){1,}[/]?(?:\.jpg|\.png|\.JPG|\.PNG){0}$
\w+\.(?:com|net) captures a domain of the form word.com or word.net. (You'll need to add other domains or improve this if you want to capture subdomains as well.)
(?:/[A-Z]+){1,}[/]?captures all-caps directories like /FOO/BAR/ with an optional trailing slash.
(?:\.jpg|\.png|\.JPG|\.PNG){0}$ captures exactly zero of the extensions listed; you'll obviously need to add to this list of extensions.
But perhaps rethink your routing; it's better form to keep all assets in devoted directories on your server, so that you can simply pass any request to mysite.com/assets/ along unchanged while handling other URLs.