RegEx - Match a String Between Last '\' and Second '_' - regex

I am trying to extract part of a filename out of a file path so that I can use it in the filename of a modified file. I'm having a little trouble trying to get RegEx to give me the part of the filename that I need, though. Here is the file path that I'm working with:
X:\\folder1\\folder2\\folder3\\folder4\\folder5\\Wherever-Place_2555025_Monthly-Report_202209150000.csv
Within this path, the drive name, the number of folders, the number of dashes in "Wherever-Place", and the information after the second underscore in the filename may vary. The important part is that I need to extract the following information:
Wherever-Place_2555025
from the path. Basically, I need to match everything between the last backslash and the second underscore. I can come up with the following RegEx to match everything after the last backslash:
[^\\]+$
And, if I run the output of that first RegEx through this next RegEx, I can get a match that includes the beginning of the string through the last character before the second underscore:
[^_]+_[^_]+
But, that also gives me another match that starts after the second underscore and goes through the end of the filename. This is not desirable - I need a single match, but I can't figure out how to get it to stop after it finds one match. I'd also really like to do all of this in one single RegEx, if that is possible. My RegEx has never been that good, and on top of that what I had is rusty...
Any help would be much appreciated.

If Lookarounds are supported, you may use:
(?<=\\)[^\\_]*_[^\\_]*(?=_[^\\]*$)
Demo.

For this match:
Basically, I need to match everything between the last backslash and
the second underscore.
You can use a capture group:
.*\\\\([^\s_]+_[^\s_]+)
The pattern matches:
.*\\\\ Match the last occurrence of \\
( Capture group 1
[^\s_]+_[^\s_]+ Match 1+ chars other than _ and \, then match the first _ and again match 1+ chars other than _ and \
) Close group 1
Regex demo
Or if supported with lookarounds and a match only:
(?<=\\)[^\s_\\]+_[^\s_]+(?![^\\]*\\)
The pattern matches:
(?<=\\) Positive lookbehind, assert \ to the left
[^\s_\\]+_[^\s_]+ Match 1+ chars other than _ and \, then match the first _ and again match 1+ chars other than _ and \
(?![^\\]*\\) Negative lookahead, assert not \ to the right
Regex demo

Related

Create regex that matches at least one non letter character or end of string

I am looking for regex to match following set:
/VIDEO_PRE_MINE
/VIDEO_PRE
/VIDEO_PRE/
/VIDEO_PRE/SOMETHING
And I want exclude expresions like this:
/VIDEO_PRESOMETHING
/VIDEO_PREsomething/something
In other words after expression '_PRE' cannot be any literal character, but it can be end of the string.
Here are regexes that i tried:
1. ^\/[^\/]*_PRE[^a-z|A-Z]
2. ^\/[^\/]*_PRE[^a-z|A-Z]?$
However I didn't manage to cover all use cases from sets with those regex.
I would really appreciate any help with this.
Thanks
For your example data, you could add an optional group (?:[_/].*)? to match either a _ or / followed by matching any char except a newline 0+ times until the end of the string $
^/[^/]*_PRE(?:[_/].*)?$
^ Start of string
/[^/]* Match /, then 0+ times any char except /
_PRE Match literally
(?: Non capturing group
[_/].* Match either _ or / followed by 0+ times any char except a newline
)? Close non capturing group and make it optional
$ End of string
Regex demo
Note that the forward slashes are not escaped. Depending on the language or delimiters you might have to escape them.
My guess is that we might want to have some right boundaries, such as
^\/VIDEO_PRE(?:\b\/?|\/[^\/\s]+\/?|_[^\/\s]+\/?)$
in specified form, and in general form:
^\/[^_]+_PRE(?:\b\/?|\/[^\/\s]+\/?|_[^\/\s]+\/?)$
which might work. You would likely want to test and modify the expression, which is explained on the top right panel of regex101.com, if you wish to explore/simplify it, and in this link, you can watch how it would match against some sample inputs, if you like.
DEMO

Match all characters after the last instance of a string in regex

I am looking to capture all characters after the last instance of a string in regex.
The string (that which we're searching after the last instance of) is as follows, sans quotes: " - ", or \b\s\-\s\b: boundary(whitespace character, preceded by -, preceded by whitespace character).
Test string as follows:
One Thing - Two Things - Three Things - Four Things
Desired match:
Four Things
This regex only matches everything after the first instance of the string:
(?<=\b\s\-\s\b)(.*)$
(Returns, sans quotes: "Two Things - Three Things - Four Things")
Whereas this matches everything after the last single character -:
[^\-]+$
(Returns, sans quotes: " Four Things")
Thoughts?
Try using a positive lookbehind then negating on the - delimiter and taking the last result
(?<=- )[^-]+$
https://regex101.com/r/sMX9FC/1
I think you could get your match without using lookarounds.
You could match any char except a newline from the start of the string followed by matching your pattern. That will match the last instance.
Then capture in a group matching 0+ times any char except a newline until the end of the string.
^.*\b\s\-\s\b(.*)$
^ Start of string
.* Match any char except a newline
\b\s\-\s\b\ Match your pattern
(.*) Capture in group 1 matching 0+ times any char except a newline
$ End of string
Regex demo
The is no tool or programming language listed, but if \K is supported to forget what was matched, you might also use:
^.*\b\s\-\s\b\K.*$
Regex demo
This matches the end of a string, everything that is not a - after a -.
-\s*([^-]+)$
It's the simplest regex I could think of.
.*(?<=\b\s\-\s\b)(.*)$, or putting a .* before your current regex should achieve what you're after, since that's a greedy match by default.

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

Regex to match characters to the right of a colon

I'm stuck on a regex. I'm trying to match words in any language to the right of a colon without matching the colon itself.
The basic rule:
For a line to be valid, it must not begin with or contain any characters outside of [a-z0-9_] until after :.
Any characters to the right of : should match as long as the line begins with the set of characters defined above.
For instance, given a string such as these:
this string should not match
bob_1:Hi. I'm Bob. I speak русский and this string should match
alice:Hi Bob. I speak 한국어 and this string should also match
http://example.com - would prefer to not match URLs
This string:should not match because no spaces or capital letters are allowed left of the colon
Only 2 of the 5 strings above need to match. And only to the right of the colon.
Hi. I'm Bob. I speak русский and this string should match
Hi Bob. I speak 한국어 and this string should also match
I'm currently using (^[a-z0-9_]+(?=:)) to match characters to the left of :. I just can't seem to reverse the logic.
The closest I have at the moment is (?!(?!:)).+. This seems to match everything to right of the colon as well as the colon itself. I just can't figure out how to not include : in the match.
Can one of you regex wizards help me out? If anything is unclear please let me know.
Short regex pattern (case insensitive):
^\w+:(\w.*)
\w - matches any word character (equal to [a-zA-Z0-9_])
https://regex101.com/r/MZhqSL/6
As you marked pcre, here's the pattern you need (only to the right of the colon):
^\w+:\K\w.*
\K - resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
https://regex101.com/r/E1yHVY/1
You can use this regex:
^[a-z0-9_]+:\K(?!//).*
RegEx Demo
RegEx Breakup:
^: Start
[a-z0-9_]+: Match 1+ of [a-z0-9_] characters
:: Match a colon
\K: Reset matched info so far
(?!//): Negative lookahead to disallow // right after colon to avoid matching potential URLs
.*: Match anything until end
You can use the regex: ^.*?:(.*)$
^.*?: - from the beginning of the line, any character until the colon (non-greedy) included
(.*)$ - use a matching group to anything that follows it till the end of the line
Link to DEMO

Regular expression to extract file name from perforce path

A perforce depot path is of the following format:
//depot/solution/project/file.cs#232
How can I extract just the "file.cs". I have tried the following.
[^//]*$
Not sure how to eliminate "#1" part. Could anyone help?
This will find file names even if they don't have a # after them.
(\w+\.\w+)[^/]*$
Explanation:
(\w+\.\w+)
This matches the file name itself, \w is a word character (same as [a-zA-Z0-9_]). So its 1+ word character, a full stop (. on its own matches any character, you need \. to match an actual .), then 1+ more word characters.
[^/]*
Matches 0+ characters that are not /. But all the word characters will get put into the \w+ match before (because it is evaluated first and + will try to match as much as it can) so in your example this matches the #200
$
matches the end of the line. Which is needed so a.directory wouldn't get matched in /a.directory/file.txt
You can use this regex:
/\/([^\/#]*)#/
And use matched group #1 for your value file.cs
Assuming you're using PCRE, you can use the pattern:
'[^/]*(?=#)'