Regex: How to capture one set of parenthesis, but not the next - regex

I have the following data.
Nike.com (Nike) (Apparel)
Adidas.com (Adidas) (Footwear)
Under Armour (Accessories)
Lululemon (Apparel)
I am trying to capture the company name, but not the type of product. Specifically, I want to capture
Nike.com (Nike)
Adidas.com (Adidas)
Under Armour
Lululemon
Using this RegEx:
(.+? \(.+?\))
I get the following:
Nike.com (Nike)
Adidas.com (Adidas)
Under Armour (Accessories)
Lululemon (Apparel)
This works for Nike and Adidas, but it doesn't work for Under Armour or Lululemon. The type of product will always be at the end of the line. I've tried the following with no success:
(.+? \(.+?\)(?!Accessories|Apparel|Footwear))
(.+? \(.+?\)(?!.*Accessories|.*Apparel|.*Footwear).*)

You seem to want to get all up to the parenthesized substring at the end of string.
You may use
^(.+?) *\([^()]+\)$
See the regex demo
Details
^ - start of string
(.+?) - Group 1: any one or more chars other than line break chars, as few as possible
* - zero or more spaces
\( - a ( char
[^()]+ - 1+ chars other than ( and )
\) - a ) char
$ - end of string.

Related

Problem with regular expression with 2 capture group, one is optional

I'm struggling to write the correct regex to match the data below. I want to capture the "Focus+Terminal" and its optional parameter "NYET". How can I re-write my incorrect regex?
user:\/\/(.*)(?:=(.*+))?
I also tried and failed:
user:\/\/(.*)=?(?:(.*+))?
Sample Data
* user://Focus+Terminal=NYET
* user://Focus+Terminal
You can use
user:\/\/(.*?)(?:=(.*))?$
See the regex demo.
Details:
user:\/\/ - a user:// string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
(?:=(.*))? - an optional non-capturing group that matches a = and then captures into Group 2 any zero or more chars other than line break chars as many as possible
$ - end of string.
As an alternative you might use a negated character class excluding matching a newline or equals sign for the first capture group.
user:\/\/([^=\n]*)(?:=(.*))?
Explanation
user:\/\/ Match user://
([^=\n]*) Capture group 1, match optional chars other than = or a newline
(?:=(.*))? Optionally match = and capture the rest of the line in group 2
Regex demo

Match string between delimiters, but ignore matches with specific substring

I have to parse all the text in a paranthesis but not the one that contains "GST"
e.g:
(AUSTRALIAN RED CROSS – ATHERTON)
(Total GST for this Invoice $1,104.96)
today for a quote (07) 55394226 − admin.nerang#waste.com.au − this applies to your Nerang services.
expected parsed value:
AUSTRALIAN RED CROSS – ATHERTON
I am trying:
^\(((?!GST).)*$
But its only matching the value and not grouping correctly.
https://regex101.com/r/HndrUv/1
What would be the correct regex for the same?
This regex should work to get the expected string:
^\((?!.*GST)(.*)\)$
It first checks if it does not contain the regular expression *GST. If true, it then captures the entire text.
(?!*GST)(.*)
All that is then surrounded by \( and \) to leave it out of the capturing group.
\((?!.*GST)(.*)\)
Finally you add the BOL and EOL symbols and you get the result.
^\((?!.*GST)(.*)\)$
The expected value is saved in the first capture group (.*).
You can use
^\((?![^()]*\bGST\b)([^()]*)\)$
See the regex demo. Details:
^ - start of string
\( - a ( char
(?![^()]*\bGST\b) - a negative lookahead that fails the match if, immediately to the right of the current location, there are zero or more chars other than ) and ( and then GST as a whole word (remove \bs if you do not need whole word matching)
([^()]*) - Group 1: any zero or more chars other than ) and (
\) - a ) char
$ - end of string
Bonus:
If substrings in longer texts need to be matched, too, you need to remove ^ and $ anchors in the above regex.

RegEx - Return pattern to the right of a text string for URL

I'm looking to return the URL string to the right of a specific set of text using RegEx:
URL:
www.websitename/countrycode/websitename/contact/thank-you/whitepaper/countrycode/whitepapername.pdf
What I would like to just return:
/whitepapername.pdf
I've tried using ^\w+"countrycode"(\w.*) but the match won't recognize countrycode.
In Google Data Studio, I want to create a new field to remove the beginning of the URL using the REGEX_REPLACE function.
Ideally using:
REGEX_REPLACE(Page,......)
The REGEXP_REPLACE function below does the trick, capturing all (.*) the characters after the last countrycode, where Page represents the respective field:
REGEXP_REPLACE(Page, ".*(countrycode)(.*)$", "\\2")
Alternatively - Adapting the RegEx by The fourth bird to Google Data Studio:
REGEXP_REPLACE(Page, "^.*/countrycode(/[^/]+\\.\\w+)$", "\\1")
Google Data Studio Report as well as a GIF to elaborate:
You could use a capturing group and replace with group 1. You could match /countrycode literally or use the pattern to match 2 times chars a-z with an underscore in between like /[a-z]{2}_[a-z]{2}
In the replacement use group 1 \\1
^.*/countrycode(/[^/]+\.\w+)$
Regex demo
Or using a country code pattern from the comments:
^.*/[a-z]{2}_[a-z]{2}(/[^/]+\.\w+)$
Regex demo
The second pattern in parts
^ Start of string
.*/ Match until the last occurrence of a forward slash
[a-z]{2}_[a-z]{2} Match the country code part, an underscore between 2 times 2 chars a-z
( Capture group 1
/[^/]+ Match a forward slash, then match 1+ occurrences of any char except / using a negated character class
\.\w+ Match a dot and 1+ word chars
) Close group
$ End of string

RegEx string to find two strings and delete the rest of the text in the file including lines that don't contain the strings [duplicate]

I need to do a find and delete the rest in a text file with notepad+++
i want tu use RegeX to find variations on thban..... the variable always has max 5 chars behind it(see dots).
with my search string it hit the last line but the whole line. I just want the word preserved.
When this works i also want keep the words containing C3.....
The rest of a tekst file can be delete.
It should also be caps insensitive
(?!thban\w+).*\r?\n?
\
THBANES900 and C3950 bla bla
THBAN
..THBANES901.. C3850 bla bla
THBANMP900
**..thbanes900..**
This should result in
THBANES900 C3950
THBAN
THBANES901 C3850
THBANMP900
thbanes900
Maybe just capture those words of interest instead of replacing everything else? In Notepad++ search for pattern:
^.*\b(thban\S{0,5})(?:.*(\sC3\w+))?.*$|.+
See the Online Demo
^ - Start string ancor.
.*\b - Any character other than newline zero or more times upto a word-boundary.
(- Open 1st capture group.
thban\S{0,5} - Match "thban" and zero or 5 non-whitespace chars.
) - Close 1st capture group.
(?: - Open non-capturing group.
.* - Any character other than newline zero or more times.
( - Open 2nd capture group.
\sC3\w+ - A whitespace character, match "C3" and one ore more word characters.
) - Close 2nd capture group.
)? - Close non-capturing group and make it optional.
.* - Any character other than newline zero or more times.
$ - End string ancor.
| - Alternation (OR).
.+ - Any character other than newline once or more.
Replace with:
$1$2
After this, you may end up with empty line you can switly remove using the build-in option. I'm unaware of the english terms so I made a GIF to show you where to find these buttons:
I'm not sure what the english checkbutton is for ignore case. But make sure that is not ticked.
You may use
Find What: (?|\b(thban\S{0,5})|\s(C3\w+))|(?s:.)
Replace With: (?1$1\n:)
Screenshot & settings
Details
(?| - start of a branch reset group:
\b(thban\S{0,5}) - Group 1: a word boundary, then thban and any 0 to 5 non-whitespace chars
| - or
\s(C3\w+) - a whitespace char, and then Group 1: C3 and one or more word chars
) - end of the branch reset group
| - or
(?s:.) - any one char (including line break chars)
The replacement is
(?1 - if Group 1 matched,
$1\n - Group 1 value with a newline
: - else, replace with empty string
) - end of the conditional replacement pattern

Regular expression for substitute a string with another

I have this two lines of text, that I want to manipulate using Regular Expression and substitute:
Obj.FieldNameA = Reader.GetEnumFromInt32<ClassName>(QueryGenerator,nameof(Obj.));
Obj.FieldNameB=Reader.GetTrimmedStringOrNull(QueryGenerator,nameof(Obj.));
Attached on the first Obj. there is a Field name, so in this case they are FieldNameA,FieldNameB
I want to attach these values to the second Obj. found on the same line, so the text should become:
Obj.FieldNameA = Reader.GetEnumFromInt32<ClassName>(QueryGenerator,nameof(Obj.FieldNameA));
Obj.FieldNameB=Reader.GetTrimmedStringOrNull(QueryGenerator,nameof(Obj.FieldNameB));
I have tested this very simple (and wrong) regex:
Obj\.(\w*).*\n
With substituition as $1
But I don't know how to use substitution...
Sample code here
Some Notes:
After FieldNameA there is always an equal sign that could be preceded or followed by a space.
Before the second Obj. there could be any character, including < ( etc...
Could this be achieved?
You may use
Find: (Obj\.(\w+).*\(Obj\.)\)
Replace: $1$2)
See the regex demo.
You may also add ^ to the start of the regex to match only at the start of a line/string.
Details
^ - start of string
(Obj\.(\w+).*\(Obj\.) - Group 1 ($1 in the replacement):
Obj\. - Obj. text
(\w+) - Group 2 ($2): 1 or more word chars
.* - any 0+ chars other than line break chars as many as possible (you may use .*? to only match the second Obj. on a line, your current input only has two with the second one closer to the end of a line, so .* will work better)
\(Obj\. - (Obj. text
\) - a ) char.