Regular expression to extract file name from perforce path - regex

A perforce depot path is of the following format:
//depot/solution/project/file.cs#232
How can I extract just the "file.cs". I have tried the following.
[^//]*$
Not sure how to eliminate "#1" part. Could anyone help?

This will find file names even if they don't have a # after them.
(\w+\.\w+)[^/]*$
Explanation:
(\w+\.\w+)
This matches the file name itself, \w is a word character (same as [a-zA-Z0-9_]). So its 1+ word character, a full stop (. on its own matches any character, you need \. to match an actual .), then 1+ more word characters.
[^/]*
Matches 0+ characters that are not /. But all the word characters will get put into the \w+ match before (because it is evaluated first and + will try to match as much as it can) so in your example this matches the #200
$
matches the end of the line. Which is needed so a.directory wouldn't get matched in /a.directory/file.txt

You can use this regex:
/\/([^\/#]*)#/
And use matched group #1 for your value file.cs

Assuming you're using PCRE, you can use the pattern:
'[^/]*(?=#)'

Related

RegEx - Match a String Between Last '\' and Second '_'

I am trying to extract part of a filename out of a file path so that I can use it in the filename of a modified file. I'm having a little trouble trying to get RegEx to give me the part of the filename that I need, though. Here is the file path that I'm working with:
X:\\folder1\\folder2\\folder3\\folder4\\folder5\\Wherever-Place_2555025_Monthly-Report_202209150000.csv
Within this path, the drive name, the number of folders, the number of dashes in "Wherever-Place", and the information after the second underscore in the filename may vary. The important part is that I need to extract the following information:
Wherever-Place_2555025
from the path. Basically, I need to match everything between the last backslash and the second underscore. I can come up with the following RegEx to match everything after the last backslash:
[^\\]+$
And, if I run the output of that first RegEx through this next RegEx, I can get a match that includes the beginning of the string through the last character before the second underscore:
[^_]+_[^_]+
But, that also gives me another match that starts after the second underscore and goes through the end of the filename. This is not desirable - I need a single match, but I can't figure out how to get it to stop after it finds one match. I'd also really like to do all of this in one single RegEx, if that is possible. My RegEx has never been that good, and on top of that what I had is rusty...
Any help would be much appreciated.
If Lookarounds are supported, you may use:
(?<=\\)[^\\_]*_[^\\_]*(?=_[^\\]*$)
Demo.
For this match:
Basically, I need to match everything between the last backslash and
the second underscore.
You can use a capture group:
.*\\\\([^\s_]+_[^\s_]+)
The pattern matches:
.*\\\\ Match the last occurrence of \\
( Capture group 1
[^\s_]+_[^\s_]+ Match 1+ chars other than _ and \, then match the first _ and again match 1+ chars other than _ and \
) Close group 1
Regex demo
Or if supported with lookarounds and a match only:
(?<=\\)[^\s_\\]+_[^\s_]+(?![^\\]*\\)
The pattern matches:
(?<=\\) Positive lookbehind, assert \ to the left
[^\s_\\]+_[^\s_]+ Match 1+ chars other than _ and \, then match the first _ and again match 1+ chars other than _ and \
(?![^\\]*\\) Negative lookahead, assert not \ to the right
Regex demo

Regular Expression to match extension file depending of the drive letter

I have this regex which can detect specific extension file,
([a-zA-Z0-9\s_\\.\-\(\):])+(.cmd|.exe|.bat)$
but I would like to change it so that it never applies to c:\ , the goal is to detect these extension files only on secondary or external drives
Example
D:\test.bat match
c:\test.bat does not match
Thank you
In the pattern that you tried, you have to escape the dot to match it literally, and you don't have to escape the dot or the parenthesis in the character class.
Note that \s could also match a newline.
For the listed examples, you can make use of a negetive lookahead if supported, to rule out c:\ or C:\
Without the capture groups, to get a match only:
^(?![cC]:\\)[a-zA-Z0-9\s_\\.():-]+\.(?:cmd|exe|bat)$
^ Start of string
(?![cC]:\\) Negative lookahead to assert what is directly to the right is not c:\ or C:\
[a-zA-Z0-9\s_\\.():-]+ Match 1+ times any of the listed in the character class
\.(?:cmd|exe|bat) Match a dot, and 1 of the alternatives
$ End of string
Regex demo
Or with the capture groups:
^(?![cC]:\\)([a-zA-Z0-9\s_\\.():-]+)(\.(?:cmd|exe|bat))$
Regex demo
Assuming every path is in a separate line based on the $ you included in your pattern, here's a very simple solution you can build upon:
^[^cC].*(cmd|exe|bat)$
Explanation:
^ matches the beginning of a line.
[^cC] matches everything except c or C.
.* matches any character except line terminators, zero or more times.
(cmd|exe|bat) matches your extensions. Since the dot was matched in the previous line, there's no need to match it again.
$ matches end of line.
TL;DR: you forgot to match the beginning of your lines.

Match a part of a string using regex

I have a string and would like to match a part of it.
The string is Accept: multipart/mixedPrivacy: nonePAI: <sip:4168755400#1.1.1.238>From: <sip:4168755400#1.1.1.238>;tag=5430960946837208_c1b08.2.3.1602135087396.0_1237422_3895152To: <sip:4168755400#1.1.1.238>
I want to match PAI: <sip:4168755400#
the whitespace can be a word so i would like to use .* but if i used that it matches most of the string
The example on that link is showing what i'm matching if i use the whitespace instead of .*
(PAI: <sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
The example on that link is showing what i'm trying to achieve with .* but it should only match PAI: <sip:4168755400#
(PAI:.*<sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
I tried lookaround but failing.
Any idea?
thanks
Matching the single space can be updated by using a character class matching either a space or a word character and repeat that 1 or more times to match at least a single occurrence.
Note that you don't have to escape the spaces, and in both occasions you can use an optional character class matching either a space or hyphen [ -]?
If you want the match only, you can omit the 2 capturing groups if you want to.
(PAI:[ \w]+<sip:)((?:\([2-9]\d{2}\) ?|[2-9]\d{2}[ -]?)[2-9]\d{2}[- ]?\d{4})#
Regex demo
The regex should be like
PAI:.*?(<sip:.*?#)
Explanation:
PAI:.*? find the word PAI: and after the word it can be anything (.*) but ? is used to indicate that it should match as few as possible before it found the next expression.
(<sip:.*?#) capturing group that we want the result.
<sip:.*?# find <sip: and after the word it can be anything .*? before it found #.
Example

Regex - Finding fullstops (periods) that aren't followed by a space

I'm trying to create a simple Grammar correction tool.
I want to create a regular expression that finds fullstops (" . ") that are not followed by a space so I can replace that with a fullstop and space.
For e.g. This is a sentence.This is another sentence.
Only the first fullstop in the above example should be matched in the expression.
I've tried /\.[^\s]/g but it returns an additional character after the matched fullstop. I would like to match only the fullstop.
How can I do this?
The negated character class [^\s] in the pattern expects a match (any character except a whitespace character), that is why you have the additional character.
If you want to match the dot only, you could use a negative lookahead to assert what is on the right is not a whitspace char or the end of the string:
\.(?!\s|$)
Regex demo
To not match a dot that is not followed by a whitespace char excluding a newline:
\.(?![^\S\r\n])
Regex demo
You can look for all dots using:
(\.)
This will match all dots on below examples:
This is a sentence.This is another sentence.
i am looking. for dots. . ...
You can add a |$ to seek for end of line, and with a little tweak, you get a regex that match all dots not followed by whitespace nor being on the end of a line:
(\.(?!\ |$))
Note that there's a whitespace as literal here. The "must-work-everywhere" example will be like:
(\.(?![[:space:]]|$))
If not, search on the regex reference on the language you use.

Regex search for characters like "/", "<" and ">"

What should be the regex pattern if my texts contain the characters like "\ / > <" etc and I want to find them. That's because regex treats "/" like it's part of the search pattern and not an individual character.
For example, I want to find Super Kings from the string <span>Super Kings</span>, using VB 2010.
Thanks!
Just try this:
\bYour_Keyword_to_find\b
\b is used in RegEx for matching word boundary.
[EDIT]
You might be looking for this:
(?<=<span>)([^<>]+?)(?=</span>)
Explanation:
<!--
(?<=<span>)([^<>]+?)(?=</span>)
Options: case insensitive; ^ and $ match at line breaks
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=<span>)»
Match the characters “<span>” literally «<span>»
Match the regular expression below and capture its match into backreference number 1 «([^<>]+?)»
Match a single character NOT present in the list “<>” «[^<>]+?»
Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=</span>)»
Match the characters “</span>” literally «</span>»
-->
[/EDIT]
In regex you must escape the / with \.
For instance, try: <span>(.*)<\/span> <span>([^<]*)<\/span> or <span>(.*?)<\/span>
Read more from:
http://www.regular-expressions.info/characters.html