Regexp, that ignores only first capture group

Regexp, that ignores only first capture group - regex

We have tab spaced list of "key=value" pairs.
How we can split it, using regexp?
Case key=value must be transformed into value. Case key=value=value2 must be transformed into value=value2.
https://regex101.com/r/dR5dT0/1 - I've started solution like this, but can't find beautiful way to remove only "key=" part from text.
UPD BTW, do you know cool crash courses on regular expressions?

You can just use
=(\S*)
See regex demo
Since the list is already formatted, the = in the pattern will always be the name/value delimiter.
The \S matches any non-whitespace character.
The * is a quantifier meaning that the \S should occur zero or more times (\S* matches zero or more non-whitespace characters).

You can use this regex for matching:
/\w+=(\S+)/
and grab captured group #1
RegEx Demo

Related

Deleted everything before the dot

How can I use regex in notepad++ to make a query like this:
I have a list with subdomains containing three words such as
web1.com
test.web2.com
www.test.web3.com
I want to filter so that only three words remain and something like this comes out:
web1.com
test.web2.com
test.web3.com
I was able to delete so that only the domain remains, but this is not what I want
^(?:.+\.)?([^.\r\n]+\.[^.\r\n]+)$

An idea to match until the endpart starts and capture that.
^.*?\.([\w-]+\.[\w-]+\.[\w-]+)$
Replace with $1 (what was captured by the first group)
.*? matches lazily any amount of any characters (besides newline)
[\w-]+ char-class matches one or more word characters and hyphen
See this demo at regex101 (more explanation on the right side)
In Notepad++ be sure to have unchecked: [ ] dot matches newline

Another take at it using a positive lookahead to assert the 3 "words" to the right, allowing for non whitespace chars excluding a dot using [^\s.]
In the replacement use an empty string.
^\S+?\.(?=[^\s.]+\.[^\s.]+\.[^\s.]+$)
See a regex demo.

Regex for getting string between varying prefix & suffix [duplicate]

Simple regex question..
I have a very basic expression built to pull text out between two words:
BEGN: (.*?)DETAIL:
Which works fine when both words exist, but on some occasions there is no "DETAIL:" so in those cases I just want to capture to the end of the text. Is that possible with a single expression, or do I need a conditional statement of some type?

The simplest is to use a group with a $ (end-of-string anchor) alternation:
BEGN: (.*?)(?:DETAIL:|$)
BEGN: (.*?)(?=DETAIL:|$)
(?<=BEGN: ).*?(?=DETAIL:|$)
See the regex demo.
The (?:DETAIL:|$) is a non-capturing group that matches DETAIL: or end of string. The other two cases are similar, just the left- and right-hand delimiters are put into non-cosuming lookarounds so that the text they match could be omitted from the match value.
There are alternative solutions.
If the trailing delimiter can be absent, use a tempered greedy token or an unrolled one:
BEGN: ((?:(?!DETAIL:).)*)
See a regex demo
The (?:(?!DETAIL:).)* matches any text up to the first DETAIL:. You may add a word boundary \b before D so as to only match DETAIL that is a whole word.
If the text can be spanning across multiple lines, do not forget a DOTALL modifier. If you use an unrolled version, the DOTALL modifier is not needed:
BEGN: ([^D]*(?:D(?!ETAIL:)[^D]*)*)
See another demo

Matching comma after certain phrase

I'm using Atom's regex search and replace feature and not JavaScript code.
I thought this JavaScript-compatible regex would work (I want to match the commas that have Or rather behind it):
(?!\b(Or rather)\b),
?! = Negative lookahead
\b = word boundary
(...) = search the words as a whole not character by character
\b = word boundary
, = the actual character.
However, if I remove characters from "Or rather" the regex still matches. I'm confused.
https://regexr.com/4keju

You probably meant to use positive lookbehind instead of negative lookbehind
(?<=\b(Or rather)\b),
Regex Demo
You can activate lookbehind in atom using flags, Read this thread

The (?!\b(Or rather)\b), pattern is equal to , as the negative lookahead always returns true since , is not equal to O.
To remove commas after Or rather in Atom, use
Find What: \b(Or rather),
Replace With: $1
Make sure you select the .* option to enable regular expressions (and the Aa is for case sensitivity swapping).
\b(Or rather), matches
\b - a word boundary
(Or rather) - Capturing group 1 that matches and saves the Or rather text in a memory buffer that can be accessed using $1 in the replacement pattern
, - a comma.
JS regex demo:
var s = "Or rather, an image.\nor rather, an image.\nor rather, friends.\nor rather, an image---\nOr rather, another time they.";
console.log(s.replace(/\b(Or rather),/g, '$1'));
// Case insensitive:
console.log(s.replace(/\b(Or rather),/gi, '$1'));

To Match any comma after "Or rather" you can simply use
(or rather)(,) and access the second group using match[2]
Or an alternative would be to use or rather as a non capturing group
(?:or rather)(,) so the first group would be commas after "Or rather"

Regex to match and replace a character in a pattern

I would like to replace a character "?" with "fi" in a string.
I could write a generic str replace for this. But I want to replace the "?" only if it appears in between two A-Za-z character and avoid the rest
Eg., "Okay?" should be "Okay?" and not "Okayfi"
but
Modi?es should be Modifies since it has ? in middle
What have I tried?
sentence = re.sub(r"(\?)\b", "fi", sentence)
Please see here.
https://regexr.com/3nvk3
Seems to work fine in regexr. but doesnt work well in code. Am I doing something wrong?

The best approach here is to find the original text with the ﬁ ligature and read it in with proper encoding.
Otherwise, you will have to use some workarounds.
You may use (?<=[a-zA-Z]) / (?=[A-Za-z]) lookarounds:
sentence = re.sub(r"(?<=[a-zA-Z])\?(?=[a-zA-Z])", "fi", sentence)
See the regex demo. The (?<=[a-zA-Z]) positive lookbehind matches a position immediately after an ASCII letter, and (?!=[A-Za-z]) positive lookahead matches a position immediately before an ASCII letter.
Or, you may also use a capturing group with backreferences:
sentence = re.sub(r"([a-zA-Z])\?([a-zA-Z])", r"\1fi\2", sentence)
See another regex demo. Note that \1 references the value captured with the first ([a-zA-Z]) group and \2 references the value captured into Group 2 (([a-zA-Z])).

Capturing uppercase words in text with regex

I'm trying to find words that are in uppercase in a given piece of text. The words must be one after the other to be considered and they must be at least 4 of them.
I have a "almost" working code but it captures much more: [A-Z]*(?: +[A-Z]*){4,}. The capture group also includes spaces at the start or the end of those words (like a boundary).
I have a playground if you want to test it out: https://regex101.com/r/BmXHFP/2
Is there a way to make the regex in example capture only the words in the first sentence? The language I'm using is Go and it has no look-behind/ahead.

In your regex, you just need to change the second * for a +:
[A-Z]*(?: +[A-Z]+){4,}
Explanation
While using (?: +[A-Z]*), you are matchin "a space followed by 0+ letters". So you are matching spaces. When replacing the * by a +, you matches spaces if there are uppercase after.
Demo on regex101

Replace the *s by +s, and your regex only matches the words in the first sentence.
.* also matches the empty string. Looking at you regex and ignoring both [A-Z]*, all that remains is a sequence of spaces. Using + makes sure that there is at least one uppercase char between every now and then.

You had to mark at least 1 upper case as [A-Z]*(?: +[A-Z]+){4,} see updated regex.
A better Regex will allow non spaces as [A-Z]*(?: *[A-Z]+){4,}.see better regex
* After will indicate to allow at least upper case even without spaces.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regexp, that ignores only first capture group - regex

You can use this regex for matching: /\w+=(\S+)/ and grab captured group #1 RegEx Demo

Related

Deleted everything before the dot

Regex for getting string between varying prefix & suffix [duplicate]

Matching comma after certain phrase

Regex to match and replace a character in a pattern

Capturing uppercase words in text with regex

Categories

Resources