characters between two delimiters - regex

Trying to put a regex expression together that returns the string between _ and _$ (where $ is the end of the string).
input:
abc_def_ghi_
desired regex outcoume:
def_ghi
I've tried quite a few combinations such as thsi.
((([^_]*){1})[^_]*)_$
any help appreciated.
Note: the regex above returns abc_def, and not the desired def_ghi.

So it's everything between the first _ and the final _ (both excluding)?
Then try
(?<=_).*(?=_$)
(hoping you're not using JavaScript)
Explanation:
(?<=_) # Assert that the previous character is a _
.* # Match any number of characters...
(?=_$) # ... until right before the final, string-ending _

You could try to use the greedyness of operators to your advantage:
^.*?_(.*)_$
matches everything from the start (non-greedy), up to an underscore, and from this underscore on to the end of the string, where it expects and underscore, then the end of the string, and captures it in the first match.
^ Beginning of string
.*? Any number of characters, at least 0
_ Anchor-tag, literal underscore
(.*) Any number of characters, greedy
_ Anchor-tag, literal underscore
$ End of string

I was searching for this within a larger log entry:
"threat_name":"PUP.Optional.Wajam"
The format enclosed the field name in double quotes then a colon then the value in double quotes.
Here's what I ended up with to avoid punctuation breaking the regex..
threat_name["][:]["](?P<signature>.*?)["]
(from regex101.com)
threat_name matches the characters threat_name literally (case sensitive)
["] match a single character present in the list below
" a single character in the list " literally (case sensitive)
[:] match a single character present in the list below
: the literal character :
["] match a single character present in the list below
" a single character in the list " literally (case sensitive)
(?P<signature>.*?) Named capturing group signature
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible,
expanding as needed [lazy]
["] match a single character present in the list below
" a single character in the list " literally (case sensitive)

Related

Regex to identify a specific pattern

I am writing regex to find a specific pattern in my string. I have to identify if the string satisfy all the pattern that I am looking for. I have following criteria:
The name should start with either "P" or "Q" or "R"
Following the first character the string should match either "XYZ" or "ABCD"
If the XYZ is present then the 8th character should either be "H" or "D", if "ABCD" is present the 9th character should be either "H" or "D".
String could be:
PXYZ****H***** -> Should be true
QABCD****H***** -> Should be true
AXYG****Z***** -> Should be false
RABCD****H=D***** -> Should be true
I have tried if the string starts with ([P|Q|R])\w+, not sure how to combine others.
Use
^[PQR](XYZ|ABCD)....[HD].*
See regex proof.
EXPLANATION
^ asserts position at start of a line
Match a single character present in the list below [PQR]
PQR matches a single character in the list PQR (case sensitive)
1st Capturing Group (XYZ.|ABCD.)
1st Alternative XYZ.
XYZ matches the characters XYZ literally (case sensitive)
2nd Alternative ABCD.
ABCD matches the characters ABCD literally (case sensitive)
. matches any character (except for line terminators)
. matches any character (except for line terminators)
. matches any character (except for line terminators)
. matches any character (except for line terminators)
Match a single character present in the list below [HD]
HD matches a single character in the list HD (case sensitive)
. matches any character (except for line terminators)
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
What is specific about this regex is that:
starts with PQR
continues with XYZ or ABCD
has an H or D five chars before the end
Here's my attempt:
'^[PQR](XYZ|ABCD).*[HD].{5}$'
Does it work for you?

Regex - All before an underscore, and all between second underscore and the last period?

How do I get everything before the first underscore, and everything between the last underscore and the period in the file extension?
So far, I have everything before the first underscore, not sure what to do after that.
.+?(?=_)
EXAMPLES:
111111_SMITH, JIM_END TLD 6-01-20 THR LEWISHS.pdf
222222_JONES, MIKE_G URS TO 7.25 2-28-19 SA COOPSHS.pdf
DESIRED RESULTS:
111111_END TLD 6-01-20 THR LEWISHS
222222_G URS TO 7.25 2-28-19 SA COOPSHS
You can match the following regular expression that contains no capture groups.
^[^_]*|(?!.*_).*(?=\.)
Demo
This expression can be broken down as follows.
^ # match the beginning of the string
[^_]* # match zero or more characters other than an underscore
| # or
(?! # begin negative lookahead
.*_ # match zero or more characters followed by an underscore
) # end negative lookahead
.* # match zero or more characters greedily
(?= # begin positive lookahead
\. # match a period
) # end positive lookahead
.*_ means to match zero or more characters greedily, followed by an underscore. To match greedily (the default) means to match as many characters as possible. Here that includes all underscores (if there are any) before the last one. Similarly, .* followed by (?=\.) means to match zero or more characters, possibly including periods, up to the last period.
Had I written .*?_ (incorrectly) it would match zero or more characters lazily, followed by an underscore. That means it would match as few characters as possible before matching an underscore; that is, it would match zero or more characters up to, but not including, the first underscore.
If instead of capturing the two parts of the string of interest you wanted to remove the two parts of the string you don't want (as suggested by the desired results of your example), you could substitute matches of the following regular expression with empty strings.
_.*_|\.[^.]*$
Demo
This regular expression reads, "Match an underscore followed by zero of more characters followed by an underscore, or match a period followed by zero or more characters that are not periods, followed by the end of the string".
You could use 2 capture groups:
^([^_\n]+_).*\b([^\s_]*_.*)(?=\.)
^ Start of string
([^_\n]+_) Capture group 1, match any char except _ or a newline followed by matching a _
.*\b Match the rest of the line and match a word boundary
([^\s_]*_.*) Capture group 2, optionally match any char except _ or a whitespace char, then match _ and the rest of the line
(?=\.) Positive lookahead, assert a . to the right
See a regex demo.
Another option could be using a non greedy version to get to the first _ and make sure that there are no following underscores and then match the last dot:
^([^_\n]+_).*?(\S*_[^_\n]+)\.[^.\n]+$
See another regex demo.
Looks like you're very close. You could eliminate the names between the underscores by finding this
(_.+?_)
and replacing the returned value with a single underscore.
I am assuming that you did not intend your second result to include the name MIKE.

How does conditional formating work with Regex? I.E. if there is a blacksplash present ensure it is follow by one of the following { bnrt'' }

I am new to regex so I do not have all the terminology down, but I am trying to write a regex for string literals. All characters are allowed including some escape characters. Additionally, the string should begin and end with a quotation mark.
I have tried
(^\")(?=.*\\)(b|n|r|t).*($")
I think I am using the ? incorrectly, my thought process is if there is a backsplash, it should be followed by one of those characters, then any other remaining characters in the string literal.
Is there a way to create conditional formatting where if one character is present, it must be followed by one character from a list? i.e if the backsplash is present, it must be followed by one of these, bnrt'"\
You might exclude the backslash and " from matching, and only match the backslash when it is preceding any of [bnrt] using a character class.
^"[^"\\]*(?:\\[bnrt][^"\\]*)*"$
The pattern matches
^ Start of string
" Match " at the start
[^"\\]* Match 0+ times any char except " and \\
(?: Non capture group
\\[bnrt] Match \\ and either b n r t
[^"\\]* Match 0+ times any char except " and \\
)* Close non capture group
" Match " at the end
$ End of string
Regex demo

regex underscore delimited pattern matching

Hi I am struggling with getting the regex right for the pattern matching.
I basically want to use regex to match the following pattern.
[anyCharacters]_[anyCharacters]_[anyCharacters]_[anyCharacters]_[1or2]
for example, the below string should match to the above pattern.
AA_B_D_ test-adf123_1
i tried the below regex but doesn't work .....
^[.]+_[.]+_[.]+_[.]+_(1|2)
. matches any character (once) _ included
.* matches any character (largest sequence) (_ included)
[.]+ matches only . character (at least one) (largest sequence)
[^_]+ matches any character except _ (at least one) (largest sequence)
.*? matches any character (shortest sequence)
you may need one of the last two.
^[^_]+_[^_]+_[^_]+_[^_]+_(1|2)
or
^(.*?_){4}[12]
The problem with .*? is that it can backtrack and matches also
one_two_three_four_five_1
The shortest is
^([^_]+_){4}[12]
Try
^(.+_)+(1|2)$
If you want to specify the number of occurrences:
^(.+_){4}(1|2)$
Use a [^_] negated character class rather than [.] that only matches a dot symbol:
^[^_]+_[^_]+_[^_]+_[^_]+_[12]
If the pattern must match the whole string, add $:
^[^_]+_[^_]+_[^_]+_[^_]+_[12]$
Also, you may shorten it a bit with a limiting quantifier:
^[^_]+(?:_[^_]+){3}_[12]$
See the regex demo.
Note that [12] is a better way to match single chars, it will match 1 or 2. A grouping construct like (...) (or (?:...), a non-capturing variant)
should be used when matching multicharacter values.
Pattern details:
^ - start of string
[^_]+ - 1 or more chars other than _
(?:_[^_]+){3} - 3 occurrences of:
_ - an underscore
[^_]+ - 1 or more chars other than _
_ - an underscore
[12] - 1 or 2
$ - end of string.

regular expression match _ underscore

I have a string like this :
002_part1_part2_______________by_test
and I would like to stop the match at the second underscore character, like this :
002_part1_part2_
How can I do that with a Regular expression ?
Thanks
Create a pattern to match any character but not of an _ zero or more times followed by an underscore symbol. Put that pattern inside a capturing or non-capturing group and make it to repeat exactly 3 times by adding range quantifier {3} next to that group.
^(?:[^_]*_){3}
DEMO
You can use:
.*\d_
EXPLANATION:
Match any single character that is NOT a line break character (line feed) «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d»
Match the character “_” literally «_»
https://regex101.com/r/uX0qD5/1