Match all except specific group - regex

I have a test string repo-2019-12-31-14-30-11.gz and I want to exclude 2019-12-31-14-30-11.gz from that string and match everything else. Digits with date and hour can be different. String at the beginning of text can be any word, can contain digits, dashes or underscores. Constant characters are:
dash between repo name and date
.gz at end of text
I tried following regex:
^.*(?!-\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}.gz$)
but it always matches whole text

The pattern that you tried ^.*(?!-\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}.gz$) always matches the whole text because .* will first match until the end of the string. Then at the end of the string, it will assert that what is directly on the right is not the date like pattern.
That assertion will succeed as it is at the end of the string.
You could use a capturing group with a character class matching word characters or a hyphen and use that in the replacement:
^([\w-]+)-\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}\.gz$
Regex demo
If the beginning can not start with an underscore and can not contain consecutive underscores, you could repeat matching a hyphen and a word character in a grouping stucture \w+(?:-\w+)*
^(\w+(?:-\w+)*)-\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}\.gz$
Regex demo

Related

Regex match last word in string ending in

I want to regex match the last word in a string where the string ends in ... The match should be the word preceding the ...
Example: "Do not match this. This sentence ends in the last word..."
The match would be word. This gets close: \b\s+([^.]*). However, I don't know how to make it work with only matching ... at the end.
This should NOT match: "Do not match this. This sentence ends in the last word."
If you use \s+ it means there must be at least a single whitespace char preceding so in that case it will not match word... only.
If you want to use the negated character class, you could also use
([^\s.]+)\.{3}$
( Capture group 1
[^\s.]+ Match 1+ times any char except a whitespace char or dot
) Close group
\.{3} Match 3 dots
$ End of string
Regex demo
You can anchor your regex to the end with $. To match a literal period you will need to escape it as it otherwise is a meta-character:
(\S+)\.\.\.$
\S matches everything everything but space-like characters, it depends on your regex flavor what it exactly matches, but usually it excludes spaces, tabs, newlines and a set of unicode spaces.
You can play around with it here:
https://regex101.com/r/xKOYa4/1

Regex to find the last word in string -Javascript flavor

I am close but no quite there. I am trying to match the last word to pull out the last name.
My Regex:
Insured Name:\W*(?<insured_last_name>.*)
Text that I am searching:
Insured Name:
FRED & ETHYL MERTZ
Sample here...
https://regex101.com/r/McdMcq/3
You can match Insured Name: until the end of the line. Then match a newline and optional following whitespace chars.
Then at the line where you want to get the last word, first match until the end of the line, then backtrack until the last space, and capture 1+ non whitespace chars in group insured_last_name
\bInsured Name:.*\r?\n\s*.* (?<insured_last_name>\S+)
In parts
\bInsured Name: Match literally
.*\r?\n\s* Match the rest of the line, a newline and 0+ whitespace chars
.* Match the rest of the line and match the last space
(?<insured_last_name>\S+) Match 1+ non whitespace chars in group insured_last_name
Regex demo
You can simply /\w+$/gm
Demo: https://regex101.com/r/McdMcq/4
Explanation:
\w: Look for alphanumeric letters
+: At least one
$: And then the end of the string
If there are multiple rows and potentially garbage data in between I would recommend you to remove the 2 newlines (\n\n) and then do a Positive Lookbehind looking for "Name". Demo: https://regex101.com/r/McdMcq/5
If you need to store the result in a capture group simply enclose \w+$ with parenthesis and group name (i.e (?<insured_last_name>\w+$)) on any of the two regexes.
You may need to define your data set a little more, but you can try
Insured Name:\n+.*(?<insured_last_name>\b.+)
Example
It starts at "Insured Name:", then any empty lines, then will read the following line until the final word boundary (excluding the EOL); anything after that is in your named group.

regular expression to get the start and end matches of a string

i Have a string of words. I want get a word which begins and ends with 3 back ticks ```. how to I use regular expressions to accomplish this in flutter. I have tried this(^```.*\.```$)\w+but its not working on a sentence like Hello there, ```friend```, how are you doing?
The pattern you tried (^```.*\.```$)\w+ uses anchors to assert the start ^ and the end $ of the string and in between match any char except a newline followed by a literal dot around triple backticks.
After that it tries to match 1+ word characters which will not match.
You could use a capturing group and match 1+ word characters in between
```(\w+)```
Regex demo

REGEX: Select all text between last underscore and dot

I'm having trouble retrieving specific information of a string.
The string is as follows:
20190502_PO_TEST.pdf
This includes the .pdf part. I need to retrieve the part between the last underscore (_) and the dot (.) leaving me with TEST
I've tried this:
[^_]+$
This however, returns:
TEST.PDF
I've also tried this:
_(.+)\.
This returns:
PO_TEST
This pattern [^_]+$ will match not an underscore until the end of the string and will also match the .
In this pattern _(.+). you have to escape the dot to match it literally like _(.+)\. see demo and then your match will be in the first capturing group.
What you also might use:
^.*_\K[^.]+
^.*_ Match the last underscore
\K Forget what was matched
[^.]+ Match 0+ times not a dot
Regex demo

Why is this regex selecting this text

I am using the regex
(.*)\d.txt
on the expression
MyFile23.txt
Now the online tester says that using the above regex the mentioned string would be allowed (selected). My understanding is that it should not be allowed because there are two numeric digits 2 and 3 while the above regex expression has only one numeric digit in it i.e \d.It should have been \d+. My current expression reads. Zero of more of any character followed by one numeric digit followed by .txt. My question is why is the above string passing the regex expression ?
This regex (.*)\d.txt will still match MyFile23.txt because of .* which will match 0 or more of any character (including a digit).
So for the given input: MyFile23.txt here is the breakup:
.* # matches MyFile2
\d # matched 3
. # matches a dot (though it can match anything here due to unescaped dot)
txt # will match literal txt
To make sure it only matches MyFile2.txt you can use:
^\D*\d\.txt$
Where ^ and $ are anchors to match start and end. \D* will match 0 or more non-digit.
The pattern you have has one group (.*) which would match using your example:MyFile2
because the . allows any character.
Furthermore the . in the pattern after this group is not escaped which will result in allowing another character of any kind.
To avoid this use:
(\D*)\d+\.txt
the group (\D*) would now match all non digit characters.
Here is the explanation, your "MyFile23.txt" matches the regex pattern:
A literal period . should always be escaped as \. else it will match "any character".
And finally, (.*) matches all the string from the beginning to the last digit (MyFile2). Have a look at the "MATCH INFORMATION" area on the right at this page.
So, I'd suggest the following fix:
^\D*\d\.txt$ = beginning of a line/string, non-digit character, any number of repetitions, a digit, a literal period, a literal txt, and the end of the string/line (depending on the m switch, which depends on the input string, whether you have a list of words on separate lines, or just a separate file name).
Here is a working example.