Optional grouping with Regex in Eclipse

Optional grouping with Regex in Eclipse - regex

I have a file with many different file path locations. Some of them have multiple directory depth and some don't. What I need to do is prepend a directory /WEB_ROOT/ to all file path locations in the file.
For example
index.jsp -> /WEB_ROOT/index.jsp
/instructor/assigned_appts.jsp -> /WEB_ROOT/instructor/assigned_appts.jsp
I have tried this one ([\/_]?[A-Za-z]*).jsp to try and capture the optional _ and / values but this doesn't match properly.
/instructor/assigned_appts.jsp only matches _appts.jsp
I have tried this as well ([\/_]?[A-Za-z])*.jsp which properly matches all expected file paths but when I replace I only get the last letter instead of the full group
So a replace with /WEB_ROOT/$1.jsp gives the following
index.jsp -> /WEB_ROOT/x.jsp
/instructor/assigned_appts.jsp -> /WEB_ROOT/s.jsp
Help please!

You can match the whole line, and as [\/_]? is optional, make sure that you match at least a single char A-Za-z before the .jsp
If you want to replace with group 1 like /WEB_ROOT/$1 you can also capture the .jsp
(.*[A-Za-z]\.jsp)
Note sure if supported in eclipse, but you might also just get the whole match and use $0 instead of group 1
.*[A-Za-z]\.jsp
If .jsp is at the end of the string, you can append an anchor .*[A-Za-z]\.jsp$

Related

Visual studio code find replace another part of the string used in the find

I am using visual studio code to find and replace another part of the string.
The string will always contain the string "sitemap" without the quotes but i want to remove index.html
Some examples of what i need replaced:
front/index.htmltemplate.xsl
to
front/template.xsl
com/index.htmlwp-sitemap
to
com/wp-sitemap
Some my attempts on the vs studio code regex search box incude
sitemap[^"']*index.html
and
sitemap.*?(?=index.html)
but neither is identifying all of the strings that need replacing

This might do the trick for you: (index\.html)(?=.+(sitemap))
Explanation:
() = Group multiple tokens together to create a capture group
(index.html) = Create a capture group around your target string to replace
(?=.+(sitemap)) = Create a capture group for sitemap and allow for any type and number characters between sitemap and index.html until reaching "sitemap".
?= means this is a "positive lookahead" meaning it will match a group after the main expression without including it in the result. In this case it means it will match sitemap and any chars before it without including it in your result -- so you just get index.html.
https://regexr.com/6iipm

Regex To Exclude First Part of String

New at this so thanks in advance for the help.
I'm looking to write a Regex that will match the end of the string but not the beginning and there are some cases where the string is only one character.
Here are the sample strings and I'm trying to match only the items shown, otherwise there is no match.
/en-ca/brand/atf-type-f/ # should match /brand/atf-type-f/
/ # no match
/en-ca # no match
/en-ca/ # no match
/es-xl # no match
/en-gb # no match
/ru-kz/ # no match
/knowledge-centre/sds # should match /knowledge-centre/sds
/en-us/brand/purity-fg # should match /brand/purity-fg
The Regex engine I'm using to Google Analytics and I'm looking to output the Page Path without the country ID and the language ID.

Figured this out.
Using the Advanced Filter within GA I:
1) Used regex with ^(/..-..)?(/)?(.*)
2) used the Output To -> Constructor to put up the groups I wanted. Each () within GA Output Constructor is numbered. Therefore $A1 pickups first part and so on. Therefore just returning $A3 gave me the path. Had to added / back in at the beginning so the output statement became /$A3
Hope this help someone else.

Regex to pull value from middle of file path

I am trying to figure out how to pull the following string out of a folder path... I want to pull COMPANY_NAME from the below folder path. Is there a way to use REGEX to pull string between 2nd and 3rd backslash?
Example:
\10.20.3.23\S$\COMPANY_NAME\Main_5e08a942f39a430db0b081736a3f1881\C_VOL-b002.spf

Try this (?(DEFINE)(?<urlPart>[^\\\s]+))\\\\(?&urlPart)\\(?&urlPart)\\\K(?&urlPart) demo
It will match the desired part of the URL you are after. Things to note:
The url does not need to start at the beginning of the string (if you require this add ^ after the define group)
It will match many urls in the same string
It will match even if there is no file name
White space will invalidate the match
See the demo for details
If you were wondering it uses subroutine definitions to reuse parts of the regex.

Regex Adding a URL path except the current one I'm at

I'm trying to add something along the lines of this regex logic.
For Input:
reading/
reading/123
reading/456
reading/789
I want the regex to match only
reading/123
reading/456
reading/789
Excluding reading/.
I've tried reading\/* but that doesn't work because it includes reading/

You must escape your backslashes in Hugo, \\/\\d+.

REGEX that leaves one if more than one is present

I have to filter paths they can look like:
some_path//rest
some_path/rest
some_path\\\\rest
some_path\rest
I need to replace some_path//rest with FILTER
some_path/rest// I want FILTER/
some_path/rest\\ I want FILTER\
some_path/rest I want FILTER
some_path/rest/ I want FILTER/
some_path/rest\ I want FILTER\
I am using some_path[\\\\\\\/]+rest to match the middle, if I use it at the end it consumes all the path separators.
I do not know in advance whether the separators will be / or \\ it can mix in a single path.
some_path/rest\some_more//and/more\\\\more

Consider using back references. Keep in mind that with Python, you will be seeing the \ escaped with a second \ in the output. This example seems to do what you are looking for:
>>> for test in ('some_path/rest//','some_path/rest\\','some_path/rest','some_path/rest/','some_path/rest\\'):
... re.sub(r"some_path[\/]+rest([\/]?)\1*", r"FILTER\1", test)
...
'FILTER/'
'FILTER\\'
'FILTER'
'FILTER/'
'FILTER\\'
>>>
The \1 is a back reference to the previous () group. In the search, it is searching for any number of matches of that item. In the replace, it is just adding in the one item.

You can do it with a simple (without back reference) replace term by using a look ahead.
Use this regex to search:
some_path[\\\\/]+rest(?:([\\\\/])(?=\1))?
and replace the match with just 'FILTER':
re.sub(r"some_path[\\\\/]+rest(?:([\\\\/])(?=\1))?", 'FILTER', path)
This works by matching (ie consuming) the trailing slash only when it is doubled.
To allow for when there's no trailing slashes, the match for trailing slashes is made optional by wrapping in (?:...)? (which is non-capturing, so the back reference is \1, not \2 which is harder to read).
Note that you don't need quite so many backslashes in your regex.
Here's some test code:
for path in ('some_path/rest//','some_path/rest\\','some_path/rest','some_path/rest/','some_path/rest\\'):
print path + ' -> ' + re.sub(r"some_path[\\\\/]+rest(?:([\\\\/])(?=\1))?", 'FILTER', path)
Output:
some_path/rest// -> FILTER/
some_path/rest\ -> FILTER\
some_path/rest -> FILTER
some_path/rest/ -> FILTER/
some_path/rest\ -> FILTER\

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Optional grouping with Regex in Eclipse - regex

Related

Visual studio code find replace another part of the string used in the find

Regex To Exclude First Part of String

Regex to pull value from middle of file path

Regex Adding a URL path except the current one I'm at

REGEX that leaves one if more than one is present

Categories

Resources