REGEX: To extract particular string from path - regex

I am looking to extract particular string from path.
For example, I have to extract 4th value separated by (.) from filename. which is "lm" in below examples.
Examples:
/apps/java/logs/abc.defgh.ijk.lm.nopqrst.uvw.xyz.log
/apps2/java/logs/abc.defgh.ijk.lm.log
This will extract full file name:
.*\/(?<name>.*).log

You can use
.*\/(?:[^.\/]*\.){3}(?<value>[^.\/]*)[^\/]*$
Or, if .log must be the extension:
.*\/(?:[^.\/]*\.){3}(?<value>[^.\/]*)[^\/]*\.log$
See the regex demo. Details:
.* - any zero or more chars other than line break chars, as many as possible
\/ - a / char
(?:[^.\/]*\.){3} - three occurrences of zero or more chars other than . and / as many as possible and a dot
(?<value>[^.\/]*) - Group "value": zero or more chars other than . and / as many as possible
[^\/]* - zero or more chars other than /
\.log - a .log substring
$ - end of string.

You can also try
\/(?:\w+\.){3}(\w+)
Or
\/(?:\w+\.){3}(\w+).*\.log
Where:
\/ - Match string starting from "/"
(?:\w+\.){3} - Matches 3 occurrences of "xyz." e.g. abc.defgh.ijk.
(\w+) - Capture the alpanumeric string. This will contain the target value e.g. "lm"
.*\.log - Optional. Match any set of characters that ends with .log e.g. .nopqrst.uvw.xyz.log

Related

Regex Pattern that has to include something after /

Using Regex, I want to match any URL that includes the /it-jobs/ but must have something after the final /.
To be a match the URL must have /it-jobs/ + characters after the trailing / otherwise it should not match. Please refer to below example.
Example: www.website.com/it-jobs/ - is not a match
www.website.com/it-jobs/java-developer - is a match
www.website.com/it-jobs/php - is a match
www.website.com/it-jobs/angular-developer - is a match
You can use
/it-jobs/[^/\s]+$
To match the whole string, add .* at the pattern start:
.*/it-jobs/[^/\s]+$
See the regex demo.
Details:
.* - zero or more chars other than line break chars as many as possible
/it-jobs/ - a literal string
[^/\s]+ - any one or more chars other than / and whitespaces
$ - end of string.

Regex to pick a value from url

I am having difficulty to build a regex which can extract a value from the URL. The condition is get the value between after last "/" and ".html" Please help
Sample URL1 - https://www.example.com/fgf/sdf/sdf/as/dwe/we/bingo.html - The value I want to extract is bingo
Sample URL2 - www.example.com/we/b345g.html - The value I want to extract is b345g
I tried to build a regex and I was able to get "bingo.html" and "b345g.html using [^\/]+$ but was not able to remove or skip ".html"
Here you are:
\/([^\/]+?)(?>\..+)?$
Explaination:
\/ - literal character '/'
([^\/]+?) - first group: at least one character that is not a '/' with greedyness (match only the first expansion)
[^\/] - any character that is not a '/'
+ - at least one occurence
? - greediness operator (match only first expansion)
(?>\..+)? - second optional group: '.' + any character (like '.html' or '.exe' or '.png')
?> - non-capturing lookahead group (exclude the content from the result)
\. - literal character '.'
. - any character (except line terminators)
+ - at least one occurence
? - optionality (note that this one is outside the parenthesis)
$ - end of the string
If you want also to exclude query strings you can expand it like this:
\/([^\/]+?)(?>\..+)?(?>\?.*)?$
If you also need to remove the protocol part of the url you can use this:
(?<!\/)\/([^\/]+?)(?>\..+)?(?>\?.*)?$
Where this (?<!\/) just look if there are no '/' before the start of the match
You are only matching using [^\/]+$ but not differentiating between the part before and after the dot.
To make that different, you could use for example a capture group to get the part after the last slash and before the first dot.
\S*\/([^\/\s.]+)\.[^\/\s]+$
\S*\/ Match optional non whitespace chars till the last occurrence of /
([^\/\s.]+) Capture group 1 Match 1+ times any char except a / whitespace char or .
\. Match a dot
[^\/\s]+ Match 1+ times any char except a / whitespace char or .
$ End of string
See a regex demo.

Partial path match regex

I'm trying to develop a regex that partially matches a certain branch of a path. Not before and not deeper than that level. For example:
/path - no match
/path/subpath - no match
/path/subpath/XYZ-123/subpath - no match
/path/subpath/XYZ-123 - match
So far, I have the following
^\/path\/subpath\/.*$
However, this obviously also matches for /path/subpath/XYZ-123/subpath which I would like to exclude as well. Note that the path contains characters, numbers and special characters.
You may use
^\/path\/subpath\/[^\/]*$
See the regex demo.
The regex will match
^ - start of a string
\/path\/subpath\/ - /path/subpath/ string
[^\/]* - 0 or more chars other than /
$ - end of string.

regex match URL path only with specific chars?

I search a regex in PHP to match a simple URL path with specific characters and not more.
My regex don't work exactly (flag 'gm' only for test. in working process please without 'g' for more exactly.):
/^\/[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?$/gm
URL path Examples with comment:
#match: YES
/
/trip-001
/trip-001/
/trip-001/summer-2019
/trip-001/summer-2019/
/trip-001/summer-2019/ibiza-001/
/trip-001/summer-2019/ibiza-001/PICT-001
#match: NO
//
trip-001
trip-001/
trip-001/summer-2019
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001
//trip-001
trip-001//
//trip-001/summer-2019
//trip-001//summer-2019
trip-001//summer-2019
//trip-001/summer-2019/
//trip-001//summer-2019//
trip-001//summer-2019/
trip-001/summer-2019//
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
//trip-001/summer-2019/ibiza-001/
//trip-001//summer-2019/ibiza-001/
//trip-001/summer-2019//ibiza-001/
//trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001
//trip-001/summer-2019/ibiza-001/PICT-001
# and similar
/trip-001/summer-2019/ibiza-001/PICT-001/
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001/
trip-001/summer-2019/ibiza-001/PICT-001/
trip-001/summer-2019/ibiza-001/whatever-987/PICT001
trip-001/summer-2019/ibiza-001/whatever-987/PICT001/
I have no idea it works with {n}.
Only this charset: A-Z a-z 0-9 - / and exactly no more. Please no \d for digits.
It's for a !preg_match() in PHP.
EDIT: Leading slash is a must have. Double slash and more is not allowed. Trailing slash yes or no.
It appears the URL should only be valid if there are not more than 5 slashes.
You may adjust your pattern as
^(?!(?:[^\/]*\/){5})(?:(?:\/[A-Za-z0-9-]+){1,4}\/?|\/)$
See regex demo
Details
^ - start of string
(?!(?:[^\/]*\/){5}) - a negative lookahead that fails the match if there are 5 occurrences of / chars in the string
(?: - start of the non-capturing group:
(?:\/[A-Za-z0-9-]+){1,4}\/? - 1 to 4 occurrences of a / and 1+ ASCII alphanumeric or - chars and then an optional / char
| - or
\/ - a single / char in the string
) - end of the non-capturing group
$ - end of string.

How to extract letter and number sequences from a string in Velocity Template?

I've a string like 'LL101-D10'. I want to extract String before hyphen starting from first numeric digit in Velocity.
Eg - "LL101-D10" , LLL101DL-D10
output - 101 , 101DL
To extract String before hyphen i did as below-
#set ($index = $String.indexOf('-'))
#set ($val1= $String.substring(0, $index))
But How i can extract other part in Velocity? Any help would be appreciated.
You may use a replace operation using the following regex:
^[^0-9-]*([0-9][^-]*).*
and replace with the $1 placeholder referring to the contents captured in Group 1.
See the regex demo
Details
^ - start of string
[^0-9-]* - 0+ chars other than digits and -
([0-9][^-]*) - Group 1: a digit and then 0 or more chars other than -
.* - the rest of the string (without linebreaks, if there are line breaks, add (?s) before ^)
Use it like
#set ($val1= $String.replaceFirst("^[^0-9-]*([0-9][^-]*).*", "$1"))