Regex match all text except trailing paranthesis with number - regex

This is probably easy, but I can't seem to grasp regex properly.
I need to match characters in strings up from start until a paranthesis with a digit inside (if one exists). If the paranthesis is followed by more text the entire string should match.
Test string (abc) = match "Test string (abc)"
Test string (abc) test = match "Test string (abc) test"
Test string (1) = match "Test string"
Test string (1) Test = match "Test string (1) Test"
I have this but it don't care whats inside the paranthesis so only match "Test string" no matter what.
^[^\(\d\)]+
Can anyone help me out? Thanks a lot!
EDITED: Added extra test string (#4) to my question and Wictor's regex in the comment matches this as well:
^.*?(?=\s*\(\d+\)$|$)

If you are extracting with -match, you may use
^.*?(?=\s*\(\d+\)|$)
See the regex demo
Details
^ - start of string
.*? - any 0+ chars other than newline, as few as possible
(?=\s*\(\d+\)|$) - a positive lookahead that matches a location immediately followed with 0+ whitespaces, (, 1+ digits, ) or end of string.
Note you may try a replacing approach with
$line -replace '\s*\(\d+\).*'
where \s*\(\d+\).* matches 0+ whitespaces, (, 1+ digits, ) and then the whole rest of the line with .*.

Related

Regex that doesn't recognise a pattern

I want to make a regex that recognize some patterns and some not.
_*[a-zA-Z][a-zA-Z0-9_][^-]*.*(?<!_)
The sample of patterns that i want to recognize:
a100__version_2
_a100__version2
And the sample of patterns that i dont want to recognize:
100__version_2
a100__version2_
_100__version_2
a100--version-2
The regex works for all of them except this one:
a100--version-2
So I don't want to match the dashes.
I tried _*[a-zA-Z][a-zA-Z0-9_][^-]*.*(?<!_)
so the problem is at [^-]
You could write the pattern like this, but [^-]* can also match newlines and spaces.
To not match newlines and spaces, and matching at least 2 characters:
^_*[a-zA-Z][a-zA-Z0-9_][^-\s]*$(?<!_)
Regex demo
Or matching only word characters, matching at least a single character repeating \w* zero or more times:
^_*[a-zA-Z]\w*$(?<!_)
^ Start of string
_* Match optional underscores
[a-zA-Z] Match a single char a-zA-Z
\w* Match optional word chars (Or [a-zA-Z0-9_]*)
$ End of string
(?<!_) Assert not _ to the left at the end of the string
Regex demo

Regular expression matching and remove spaces

Please how can I get the address using regex:
Address 123 Mayor Street, LAG Branch ABC
used (?<=Address(\s))(.*(?=\s)) but it includes the spaces after "Address". Trying to get an expression that extracts the address without the spaces. (There are a couple of spaces after "Address" before "123")
Thanks!
The pattern (?<=Address(\s))(.*(?=\s)) that you tried asserts Address followed by a single whitespace char to the left, and then matches the rest of the line asserting a whitespace char to the right.
For the example data, that will match right before the last whitespace char in the string, and the match will also contain all the whitespace chars that are present right after Address
One option to match the bold parts in the question is to use a capture group.
\bAddress\s+([^,]+,\s*\S+)
The pattern matches:
\bAddress\s+ Match Address followed by 1+ whitespace chars
( Capture group 1
[^,]+, Match 1+ occurrences of any char except , and then match ,
\s*\S+ Match optional whitespace chars followed by 1+ non whitespace chars
) Close group 1
.NET regex demo (Click on the Table tab to see the value for group 1)
Note that \s and [^,] can also match a newline
A variant with a positive lookbehind to get a match only:
(?<=\bAddress\s+)[^,\s][^,]+,\s*\S+
.NET Regex demo

Match everything until upcase word

I want to capture a word placed before another one which is full capitalized
Mister Foo BAR is here # => "Foo"
Miss Bar-Barz FOO loves cats # => "Bar-Barz"
I've been trying the following regex: (Mister|Miss)\s([[:alpha:]\s\-]+)(?=\s[A-Z]+), but sometimes it includes the rest of the sentence. For example, it'll return Bar-Barz FOO loves cats instead of Bar-Barz).
How can I say, using RegExp, "match every words until the upcase word" ?
To clarify the usage of negative lookahead, can we say it "captures until the specified sub-pattern matches, but does not include it to the match data" ?
As a non-native English speaker, apologies if my answer isn't perfectly formulated. Thanks by advance
Match 1+ word chars optionally repeated by a - and 1+ word chars to not match only hyphens or a hyphen at the end.
Assert a space followed by 1+ uppercase chars and a word boundary at the right.
\w+(?:-\w+)*(?=\s[A-Z]+\b)
Explanation
\w+ Match 1+ word char
(?:-\w+)* Optionally repeat matching - and 1+ word chars
(?=\s[A-Z]+\b) Positive lookahead, assert what is directly at the right is 1+ uppercase chars A-Z followed by a word boundary
Regex demo
If there can not be any newlines between the words, you can use [^\S\r\n] instead of \s
\w+(?:-\w+)*(?=[^\S\r\n]+[A-Z]+\b)
Regex demo
I want to capture a word placed before another one which is full capitalized
You may use this regex with a lookahead:
\b\S+(?=[ \t]+[A-Z]+\b)
RegEx Demo
RegEx Description:
\b: Word boundadry
\S+: Match 1+ non-whitespace characters
(?=[ \t]+[A-Z]+\b): Positive lookahead that asserts we have 1+ space and then a word containing only capital letters
You don't say what language you're working in, but the following works for me. The idea is to stop when the parser hits a sequence of uppercase letters/hyphens.
JS example:
let ptn = /(Mister|Miss)\s[\w\-]+(?=\s[A-Z\-]+)/;
"Mister Foo BAR is here".match(ptn); //["Mister Foo", "Mister"]
"Miss Bar-Barz FOO loves cats".match(ptn); //["Miss Bar-Barz", "Miss"]

Regex match last word in string ending in

I want to regex match the last word in a string where the string ends in ... The match should be the word preceding the ...
Example: "Do not match this. This sentence ends in the last word..."
The match would be word. This gets close: \b\s+([^.]*). However, I don't know how to make it work with only matching ... at the end.
This should NOT match: "Do not match this. This sentence ends in the last word."
If you use \s+ it means there must be at least a single whitespace char preceding so in that case it will not match word... only.
If you want to use the negated character class, you could also use
([^\s.]+)\.{3}$
( Capture group 1
[^\s.]+ Match 1+ times any char except a whitespace char or dot
) Close group
\.{3} Match 3 dots
$ End of string
Regex demo
You can anchor your regex to the end with $. To match a literal period you will need to escape it as it otherwise is a meta-character:
(\S+)\.\.\.$
\S matches everything everything but space-like characters, it depends on your regex flavor what it exactly matches, but usually it excludes spaces, tabs, newlines and a set of unicode spaces.
You can play around with it here:
https://regex101.com/r/xKOYa4/1

Regex for all illegal filename characters before filetype extension

I'm looking for a Regex that exchanges all illegal filename chars like () space . etc before the filetype ending like .jpg by an -
i got:
[^a-zA-Z0-9_-]+
matches every illegal filename char, but including file extension
and
.*(?=.)
matching everything until the last occurence of .
how do i combine these?
one of my evil file names is
(800x800-png)MGC1000-03EPTD-021_RAL7035-5010.tif.png
after regex replace it should look like
-800x800-png-MGC1000-03EPTD-021_RAL7035-5010-tif.png
the regex should be working in libre office / excel search and replace.
thanks for your help!
You could use your negated character class [^a-zA-Z0-9_-]+ and use a positive lookahead to assert that the string ends with a dot and 1+ word characters.
In the replacement use a hyphen -
[^a-zA-Z0-9_-]+(?=.*\.\w+$)
As per comment from #Stein you might shorten it to:
[^\w-]+(?=.*\.\w+$)
Explanation
[^a-zA-Z0-9_-]+ Match 1+ times any character that is not in the character class
(?= Positive lookahead, assert what is on the right is
.*\.\w+ Match any character 0+ times, then a dot and 1+ word chars
$ Assert the end of the string
) Close positive lookahead
Regex demo
If the extension itself could have special characters, then you might update \w+$ to [^.\s]+$ like:
[^\w-]+(?=.*\.[^.\s]+$)