How to stop matching if the condition is satisfied? - regex

Target is to remove patterns (split by '/') with single alphabet, AND if one such pattern appears, then remove the rest right parts.
For example:
/modadisi/v/list -> /modadisi
/i/m/videos/tnt -> null
New examples:
/abcd/abcd/abcd/a/abcd -> /abcd/abcd/abcd
/abcd -> /abcd
/abcd/abcd/abcd -> /abcd/abcd/abcd
The current regex I use is
\/[a-zA-Z]{2,}
This will match all patterns, like /modadisi/v/list-> /modadisi/list. Is it possible to modify the regex to scan from left to right, and stop if condition is matched?

Based on your new examples, just anchor the pattern to the start of the string using ^, and put the pattern inside a group that repeats. The full pattern would be ^(\/[a-zA-Z]{2,})*.
For the inputs:
/modadisi/v/list
/i/m/videos/tnt
/abcd/abcd/abcd/a/abcd
/abcd
/abcd/abcd/abcd
it produces:
/modadisi
{nothing}
/abcd/abcd/abcd
/abcd
/abcd/abcd/abcd
If any of this isn't right, let me know and I will adjust the pattern.

Related

Regex to allow special only between alpha numeric

I need a REGEX which should validate only if a string starts and ends with Alphabets or numbers and should allow below special characters in between them as below:
/*
hello -> pass
what -> pass
how ##are you -> pass
how are you? -> pass
hi5kjjv -> pass
8ask -> pass
yyyy. -> fail
! dff -> fail
NoSpeci###alcharacters -> pass
Q54445566.00 -> pass
q!Q1 -> pass
!Q1 -> fail
q!! -> fail
#NO -> fail
0.2Version -> pass
-0.2Version -> fail
*/
the rex works fine for above query but the issue is it expects at least two valid characters:
^[A-Za-z0-9]+[ A-Za-z0-9_#./#&+$%_=;?\'\,\!-]*[A-Za-z0-9]+
failing in case if we pass:
a -> failed but valid
1 -> failed but valid
I tried replacing + with * but this was accepting special characters from the start (#john) which is wrong.
[A-Za-z0-9]+ with [A-Za-z0-9]*
You may use this regex:
^[A-Za-z0-9](?:[! #$%;&'+,./=?#\w-]*[A-Za-z0-9?])?$
Or if your regex flavor supports atomic groups then use a bit more efficient:
^[A-Za-z0-9](?>[! #$%;&'+,./=?#\w-]*[A-Za-z0-9?])?$
RegEx Demo
RegEx Details:
^: Start
[A-Za-z0-9]: Match an alphanumeric character
(?>[! #$%;&'+,./=?#\w-]*[A-Za-z0-9?])?: Optional atomic group to match 0 or more of a specified char in [...] followed by an alphanumeric or ? characters
$: End

Replace regex match by arbitrary function of match itself

I'm trying to write a function of type Text -> (Text -> Text) -> Text that replaces occurrences of a regular expression in a piece of text by something else that is a function of what the regular expression has matched. There is subRegex from Text.Regex but this only allows replacing a match with some fixed replacement string whereas I would like the replacement to an an arbitrary function of the match. Is there a package that already implements something like that?
You can use matchRegexAll
matchRegexAll
:: Regex -- ^ The regular expression
-> String -- ^ The string to match against
-> Maybe ( String, String, String, [String] )
-- ^ Returns: 'Nothing' if the match failed, or:
--
-- > Just ( everything before match,
-- > portion matched,
-- > everything after the match,
-- > subexpression matches )
For example:
subFirst :: Regex -> String -> (String -> String) -> String
subFirst rx input f = case matchRegexAll rx input of
Nothing -> input
Just (pre, match, post, _) -> pre <> f match <> post
If you want to do this for all matches rather than just the first, you can call this function recursively on the remainder post (left as an exercise).
For a different approach, it looks like the text-regex-replace replace package might be of use to you. It works directly on Text rather than String, and it appears to have the capability of arbitrary replacement functions (however the usage seems a bit obtuse).
If you’re willing to write your pattern matching function as a parser instead of a regular expression, then the function Replace.Megaparsec.streamEdit with the match combinator has the signature you’re looking for.
Here’s a usage example in the README

Regex for returning multiple values between strings separated by new line

I'm using PowerShell to read output from an executable and needing to parse the output into an array. I've tried regex101 and I start to get close but not able to return everything.
Identity type: group
Group type: Generic
Project scope: PartsUnlimited
Display name: [PartsUnlimited]\Contributors
Description: {description}
5 member(s):
[?] test
[A] [PartsUnlimited]\PartsUnlimited-1
[A] [PartsUnlimited]\PartsUnlimited-2
[?] test2
[A] [PartsUnlimited]\PartsUnlimited 3
Member of 3 group(s):
e [A] [org]\Project Collection Valid Users
[A] [PartsUnlimited]\Endpoint Creators
e [A] [PartsUnlimited]\Project Valid Users
I need returned an array of:
test
[PartsUnlimited]\PartsUnlimited-1
[PartsUnlimited]\PartsUnlimited-2
test2
[PartsUnlimited]\PartsUnlimited 3
At first I tried:
$pattern = "(?<=\[A|\?\])(.*)"
$matches = ([Regex]$pattern).Matches(($output -join "`n")).Value
But that will return also the "Member of 3 group(s):" section which I don't want.
I also can only get the first value under 5 member(s) with (?<=member\(s\):\n).*?\n ([?] test).
No matches are returned when I add in a positive lookahead: (?<=member\(s\):\n).*?\n(?=Member).
I feel like I'm getting close, just not sure how to handle multiple \n and get strings in between strings if that's needed.
You could do it in two steps (not sure if \G is supported in PowerShell).
The first step would be to separate the block in question with
^\d+\s+member.+[\r\n]
(?:.+[\r\n])+
With the multiline and verbose flags, see a demo on regex101.com.
On this block we then need to perform another expression such as
^\s+\[[^][]+\]\s+(.+)
Again with the multiline flag enabled, see another demo on regex101.com.
The expressions explained:
^\d+\s+member.+[\r\n] # start of the line (^), digits,
# spaces, "member", anything else + newline
(?:.+[\r\n])+ # match any consecutive line that is not empty
The second would be
^\s+ # start of the string, whitespaces
\[[^][]+\]\s+ # [...] (anything allowed within the brackets),
# whitespaces
(.+) # capture the rest of the line into group 1
If \G was supported, you could do it in one rush:
(?:
\G(?!\A)
|
^\d+\s+member.+[\r\n]
)
^\s+\[[^][]*\]\s+
(.+)
[\r\n]
See a demo for the latter on regex101.com as well.

Regular expression for specific file mask

I want to have 2 regex patterns that checks files after specific file mask. The way I like to do it is written below.
Pattern 1:
check if the left side of _ has 7 digits.
checks if the right side of _ is numeric.
checks for the specified extension is there.
the input will look like this : 1234567_1.jpg
Pattern 2:
check if there is 10 digits to the left of a "Space" char
check if there is 4 digits to the right of a "Space" char
check to the right side of _ is numeric
check for the specified extension is there.
The input will look like this: 1234567891 1234_1.png
As stated above this is to be used to check for a specific file mask.
i have been playing around with ideas like : ^[0-9][0-9].jpg$
and ^[0-9] [0-9][0-9].jpg$ is my first tries.
i do apologies for not providing my tries.
I suggest combining patterns with | (or):
string pattern = string.Join("|",
#"(^[0-9]{7}_[0-9]+\.jpg$)", // 1st possibility
#"(^[0-9]{10} [0-9]{4}_[0-9]+\.png$)"); // 2nd one
....
string fileName = #"c:\myFiles\1234567_1.jpg";
// RegexOptions.IgnoreCase - let's accept ".JPG" or ".Jpg" files
if (Regex.IsMatch(Path.GetFileName(fileName), pattern, RegexOptions.IgnoreCase)) {
...
}
Let's explain the second pattern: (^[0-9]{10} [0-9]{4}_[0-9]+\.jpg$)
^ - anchor (string start)
[0-9]{10} - 10 digits - 0-9
- single space
[0-9]{4} - 4 digits
_ - single underscope
[0-9]+ - one or more digits
\.png - .png (. is escaped)
$ - anchor (string end)
This should work for first regex:
\d{7}_\d*.(jpg|png)
This should work for second regex:
\d{10}\s\d{4}_\d*.(jpg|png)
If you want to use them together just do it like below:
(\d{7}_\d*.(jpg|png)|\d{10}\s\d{4}_\d*.(jpg|png))
In this group (jpg|png) you can just add another extensions by separating them with | (or).
You can check if it works for you at: https://regex101.com/
Cheers!

regex to match what is not matched

I have a regex to search through just under 2 million product numbers: -([A-Za-z0-9]{1-5})$ to match the MFG code (last few letters after the last dash) for example, G4F,XB-RJG4 SJG2G-TRMH would match -TRMH. This was supposed to match every string on my list, however, I am a couple thousand short. This probably means that some were formatted wrong.
what could I do to match a string that doesn't end in -XXXXX, -XXXX, -XXX, or -XX, or in other words, match what is not matched?
Just two steps:
In the search dialog, tick "Bookmark line"
After the search is done, click "Search -> Bookmark -> Inverse Bookmark"
Alternatively, in step 2: "Search -> Bookmark -> Remove bookmarked lines"; afterwards, only the lines that didn't match the regular expression remain.