We have tab spaced list of "key=value" pairs.
How we can split it, using regexp?
Case key=value must be transformed into value. Case key=value=value2 must be transformed into value=value2.
https://regex101.com/r/dR5dT0/1 - I've started solution like this, but can't find beautiful way to remove only "key=" part from text.
UPD BTW, do you know cool crash courses on regular expressions?
You can just use
=(\S*)
See regex demo
Since the list is already formatted, the = in the pattern will always be the name/value delimiter.
The \S matches any non-whitespace character.
The * is a quantifier meaning that the \S should occur zero or more times (\S* matches zero or more non-whitespace characters).
You can use this regex for matching:
/\w+=(\S+)/
and grab captured group #1
RegEx Demo
Related
How can I use regex in notepad++ to make a query like this:
I have a list with subdomains containing three words such as
web1.com
test.web2.com
www.test.web3.com
I want to filter so that only three words remain and something like this comes out:
web1.com
test.web2.com
test.web3.com
I was able to delete so that only the domain remains, but this is not what I want
^(?:.+\.)?([^.\r\n]+\.[^.\r\n]+)$
An idea to match until the endpart starts and capture that.
^.*?\.([\w-]+\.[\w-]+\.[\w-]+)$
Replace with $1 (what was captured by the first group)
.*? matches lazily any amount of any characters (besides newline)
[\w-]+ char-class matches one or more word characters and hyphen
See this demo at regex101 (more explanation on the right side)
In Notepad++ be sure to have unchecked: [ ] dot matches newline
Another take at it using a positive lookahead to assert the 3 "words" to the right, allowing for non whitespace chars excluding a dot using [^\s.]
In the replacement use an empty string.
^\S+?\.(?=[^\s.]+\.[^\s.]+\.[^\s.]+$)
See a regex demo.
Simple regex question..
I have a very basic expression built to pull text out between two words:
BEGN: (.*?)DETAIL:
Which works fine when both words exist, but on some occasions there is no "DETAIL:" so in those cases I just want to capture to the end of the text. Is that possible with a single expression, or do I need a conditional statement of some type?
The simplest is to use a group with a $ (end-of-string anchor) alternation:
BEGN: (.*?)(?:DETAIL:|$)
BEGN: (.*?)(?=DETAIL:|$)
(?<=BEGN: ).*?(?=DETAIL:|$)
See the regex demo.
The (?:DETAIL:|$) is a non-capturing group that matches DETAIL: or end of string. The other two cases are similar, just the left- and right-hand delimiters are put into non-cosuming lookarounds so that the text they match could be omitted from the match value.
There are alternative solutions.
If the trailing delimiter can be absent, use a tempered greedy token or an unrolled one:
BEGN: ((?:(?!DETAIL:).)*)
See a regex demo
The (?:(?!DETAIL:).)* matches any text up to the first DETAIL:. You may add a word boundary \b before D so as to only match DETAIL that is a whole word.
If the text can be spanning across multiple lines, do not forget a DOTALL modifier. If you use an unrolled version, the DOTALL modifier is not needed:
BEGN: ([^D]*(?:D(?!ETAIL:)[^D]*)*)
See another demo
I'm using Atom's regex search and replace feature and not JavaScript code.
I thought this JavaScript-compatible regex would work (I want to match the commas that have Or rather behind it):
(?!\b(Or rather)\b),
?! = Negative lookahead
\b = word boundary
(...) = search the words as a whole not character by character
\b = word boundary
, = the actual character.
However, if I remove characters from "Or rather" the regex still matches. I'm confused.
https://regexr.com/4keju
You probably meant to use positive lookbehind instead of negative lookbehind
(?<=\b(Or rather)\b),
Regex Demo
You can activate lookbehind in atom using flags, Read this thread
The (?!\b(Or rather)\b), pattern is equal to , as the negative lookahead always returns true since , is not equal to O.
To remove commas after Or rather in Atom, use
Find What: \b(Or rather),
Replace With: $1
Make sure you select the .* option to enable regular expressions (and the Aa is for case sensitivity swapping).
\b(Or rather), matches
\b - a word boundary
(Or rather) - Capturing group 1 that matches and saves the Or rather text in a memory buffer that can be accessed using $1 in the replacement pattern
, - a comma.
JS regex demo:
var s = "Or rather, an image.\nor rather, an image.\nor rather, friends.\nor rather, an image---\nOr rather, another time they.";
console.log(s.replace(/\b(Or rather),/g, '$1'));
// Case insensitive:
console.log(s.replace(/\b(Or rather),/gi, '$1'));
To Match any comma after "Or rather" you can simply use
(or rather)(,) and access the second group using match[2]
Or an alternative would be to use or rather as a non capturing group
(?:or rather)(,) so the first group would be commas after "Or rather"
I would like to replace a character "?" with "fi" in a string.
I could write a generic str replace for this. But I want to replace the "?" only if it appears in between two A-Za-z character and avoid the rest
Eg., "Okay?" should be "Okay?" and not "Okayfi"
but
Modi?es should be Modifies since it has ? in middle
What have I tried?
sentence = re.sub(r"(\?)\b", "fi", sentence)
Please see here.
https://regexr.com/3nvk3
Seems to work fine in regexr. but doesnt work well in code. Am I doing something wrong?
The best approach here is to find the original text with the fi ligature and read it in with proper encoding.
Otherwise, you will have to use some workarounds.
You may use (?<=[a-zA-Z]) / (?=[A-Za-z]) lookarounds:
sentence = re.sub(r"(?<=[a-zA-Z])\?(?=[a-zA-Z])", "fi", sentence)
See the regex demo. The (?<=[a-zA-Z]) positive lookbehind matches a position immediately after an ASCII letter, and (?!=[A-Za-z]) positive lookahead matches a position immediately before an ASCII letter.
Or, you may also use a capturing group with backreferences:
sentence = re.sub(r"([a-zA-Z])\?([a-zA-Z])", r"\1fi\2", sentence)
See another regex demo. Note that \1 references the value captured with the first ([a-zA-Z]) group and \2 references the value captured into Group 2 (([a-zA-Z])).
I'm trying to find words that are in uppercase in a given piece of text. The words must be one after the other to be considered and they must be at least 4 of them.
I have a "almost" working code but it captures much more: [A-Z]*(?: +[A-Z]*){4,}. The capture group also includes spaces at the start or the end of those words (like a boundary).
I have a playground if you want to test it out: https://regex101.com/r/BmXHFP/2
Is there a way to make the regex in example capture only the words in the first sentence? The language I'm using is Go and it has no look-behind/ahead.
In your regex, you just need to change the second * for a +:
[A-Z]*(?: +[A-Z]+){4,}
Explanation
While using (?: +[A-Z]*), you are matchin "a space followed by 0+ letters". So you are matching spaces. When replacing the * by a +, you matches spaces if there are uppercase after.
Demo on regex101
Replace the *s by +s, and your regex only matches the words in the first sentence.
.* also matches the empty string. Looking at you regex and ignoring both [A-Z]*, all that remains is a sequence of spaces. Using + makes sure that there is at least one uppercase char between every now and then.
You had to mark at least 1 upper case as [A-Z]*(?: +[A-Z]+){4,} see updated regex.
A better Regex will allow non spaces as [A-Z]*(?: *[A-Z]+){4,}.see better regex
* After will indicate to allow at least upper case even without spaces.