Notepad++ : Replace all the words starting with * - regex

I want to replace all the words starting with *
for eg:-
*finish :- finish (* removed)
a*finish :- a*finish (not removed)
What regular expression would work in notepad++ ?
I tried ^* but it says invalid regular expression.
similarly for ^[\\*] doesn't work as well.
For normar characters it is working.

you can use ^(\s+)?\*.+
Online demo

Replace each match of this regex:
(?:(?<=\s)|(?<=^))\*
with a blank string.
Click for Demo
Explanation:
(?<=\s) - positive lookbehind to make sure that the current position is preceded by a whitespace
| - OR
(?<=^) - positive lookbehind to make sure that the current position is preceded by start of the line
\* - If any of the above conditions satisfy, match the *

I think you can use this:
Press Ctrl+H
Fill in Find what: (^|\s)\*(.+?)(\s|$)
Fill in Replace with: \1\2\3
[ Regex Demo ]
Explanation:
(^|\s) => Group 1: start of line -^- or any white-space character -\s-
\* => * character
(.+?) => Group 2: one or many characters on lowest length until next match
(\s|$) => Group 3: any white-space character -\s- or end of line -$-

Related

Match string between delimiters, but ignore matches with specific substring

I have to parse all the text in a paranthesis but not the one that contains "GST"
e.g:
(AUSTRALIAN RED CROSS – ATHERTON)
(Total GST for this Invoice $1,104.96)
today for a quote (07) 55394226 − admin.nerang#waste.com.au − this applies to your Nerang services.
expected parsed value:
AUSTRALIAN RED CROSS – ATHERTON
I am trying:
^\(((?!GST).)*$
But its only matching the value and not grouping correctly.
https://regex101.com/r/HndrUv/1
What would be the correct regex for the same?
This regex should work to get the expected string:
^\((?!.*GST)(.*)\)$
It first checks if it does not contain the regular expression *GST. If true, it then captures the entire text.
(?!*GST)(.*)
All that is then surrounded by \( and \) to leave it out of the capturing group.
\((?!.*GST)(.*)\)
Finally you add the BOL and EOL symbols and you get the result.
^\((?!.*GST)(.*)\)$
The expected value is saved in the first capture group (.*).
You can use
^\((?![^()]*\bGST\b)([^()]*)\)$
See the regex demo. Details:
^ - start of string
\( - a ( char
(?![^()]*\bGST\b) - a negative lookahead that fails the match if, immediately to the right of the current location, there are zero or more chars other than ) and ( and then GST as a whole word (remove \bs if you do not need whole word matching)
([^()]*) - Group 1: any zero or more chars other than ) and (
\) - a ) char
$ - end of string
Bonus:
If substrings in longer texts need to be matched, too, you need to remove ^ and $ anchors in the above regex.

How to find the first occurrence of sub-strings not ended with specified characters

I'm gonna select the first occurrence of an only-alphabet string which is not ended by any of the characters ".", ":" and ";"
For example:
"float a bbc 10" --> "float"
"float.h" --> null
"float:: namespace" --> "namesapace"
"float;" --> null
I came up with the regex \G([A-z]+)(?![:;\.]) but it only ignores the character before the banned characters, while I need it to skip all string before banned characters.
You may use
/(?<!\S)[A-Za-z]++(?![:;.])/
See the regex demo. Make sure not to use the g modifier to get the first match only.
One of the main trick here is to use a possessive ++ quantifier to match all consecutive letters and check for :, ; or . only once right after the last of the matched letters.
Pattern details
(?<!\S) - either whitespace or start of string should immediately precede the current location
[A-Za-z]++ - 1+ letters matched possessively allowing no backtracking into the pattern
(?![:;.]) - a negative lookahead that fails the match if there is a ;, : or . immediately to the right of the current location.

A regular expression for matching a group followed by a specific character

So I need to match the following:
1.2.
3.4.5.
5.6.7.10
((\d+)\.(\d+)\.((\d+)\.)*) will do fine for the very first line, but the problem is: there could be many lines: could be one or more than one.
\n will only appear if there are more than one lines.
In string version, I get it like this: "1.2.\n3.4.5.\n1.2."
So my issue is: if there is only one line, \n needs not to be at the end, but if there are more than one lines, \n needs be there at the end for each line except the very last.
Here is the pattern I suggest:
^\d+(?:\.\d+)*\.?(?:\n\d+(?:\.\d+)*\.?)*$
Demo
Here is a brief explanation of the pattern:
^ from the start of the string
\d+ match a number
(?:\.\d+)* followed by dot, and another number, zero or more times
\.? followed by an optional trailing dot
(?:\n followed by a newline
\d+(?:\.\d+)*\.?)* and another path sequence, zero or more times
$ end of the string
You might check if there is a newline at the end using a positive lookahead (?=.*\n):
(?=.*\n)(\d+)\.(\d+)\.((\d+)\.)*
See a regex demo
Edit
You could use an alternation to either match when on the next line there is the same pattern following, or match the pattern when not followed by a newline.
^(?:\d+\.\d+\.(?:\d+\.)*(?=.*\n\d+\.\d+\.)|\d+\.\d+\.(?:\d+\.)*(?!.*\n))
Regex demo
^ Start of string
(?: Non capturing group
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
(?=.*\n\d+\.\d+\.) Positive lookahead, assert what follows a a newline starting with the pattern
| Or
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
*(?!.*\n) Negative lookahead, assert what follows is not a newline
) Close non capturing group
(\d+\.*)+\n* will match the text you provided. If you need to make sure the final line also ends with a . then (\d+\.)+\n* will work.
Most programming languages offer the m flag. Which is the multiline modifier. Enabling this would let $ match at the end of lines and end of string.
The solution below only appends the $ to your current regex and sets the m flag. This may vary depending on your programming language.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /((\d+)\.(\d+)\.((\d+)\.)*)$/gm,
match;
while (match = regex.exec(text)) {
console.log(match);
}
You could simplify the regex to /(\d+\.){2,}$/gm, then split the full match based on the dot character to get all the different numbers. I've given a JavaScript example below, but getting a substring and splitting a string are pretty basic operations in most languages.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /(\d+\.){2,}$/gm;
/* Slice is used to drop the dot at the end, otherwise resulting in
* an empty string on split.
*
* "1.2.3.".split(".") //=> ["1", "2", "3", ""]
* "1.2.3.".slice(0, -1) //=> "1.2.3"
* "1.2.3".split(".") //=> ["1", "2", "3"]
*/
console.log(
text.match(regex)
.map(match => match.slice(0, -1).split("."))
);
For more info about regex flags/modifiers have a look at: Regular Expression Reference: Mode Modifiers

Notepad++ remove all non regex'd text

I have a large list of urls that has a unique numeric string in each, the string falls between a / and a ? I would like to remove all other text from notepad++ that are not these strings. for example
www.website.com/dsw/fv3n24nv1e4121v/123456789012?fwe=32432fdwe23f3 would end up as only 123456789012
I have figured out that the following regex \b\d{12}\b will get me the 12 digits, now I just need to remove all of the information that falls each side. I have had a look and found some posts that suggest replace with \t$1 , $1\n
, $1 , and /1 however all of these do the exact oposite of what I want and just remove the 12 digit string.
You can use this regex and replace it with empty string,
^[^ ]*\/|\?[^ ]*$
Demo
Explanation:
^[^ ]*\/ --> Matches anything expect space from start of string till it finds a /
\?[^ ]*$ --> Similarly, this matches anything except space starting from ? till end of input.
Ctrl+H
Find what: ^.*/([^?]+).*$
Replace with: $1
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.* # 0 or more any character but newline
/ # a slash
([^?\r\n]+) # group 1, 1 or more any character that is not ? or line break
.* # 0 or more any character but newline
$ # end of line
Result for given example:
123456789012

Regex that matches every nth occurences of character

I have found solutions for finding nth occurrence but could not find about finding every nth occurrences.
I have string such as "key1~value1~key2~value2~key3~value3~".
What is the regex that will match every second occurrence of the ~?
key1~value1~key2~value2~key3~value3~
I am trying to create a custom Pattern Analizer for Elasticsearch that is the regex should match the token seperators instead of tokens.
You may use
~(?=(?:[^~]*~[^~]*~)*[^~]*$)
The pattern matches:
~ - a tilde that is followed by...
(?=(?:[^~]*~[^~]*~)*[^~]*$) - 0+ non-tildes + ~ x 2 times, 0+ times, and then 0+ non-tildes up to the end of string. So, this check makes sure there is an even number of tildes up to the end of string after matching the first tilde.
You need to ensure that there are not an even number of ~ before:
(?<!^([^~]*~[^~]*~)*[^~]*)~
Try it online!
How it works:
(?<!^([^~]*~[^~]*~)*[^~]*)~ Our regex.
~ Matches a tilde (~).
(?<! ) Assert that before it is not:
^ the beginning
( )* followed by zero or more times:
[^~]*~[^~]*~ two tildes, no matter what comes within
[^~]* followed by non-tildes.
First group of non-overlapping occurrences of ~.*?(~). Try: http://regexr.com/3dc15.