Notepad ++ Regular Expressions

Notepad ++ Regular Expressions - regex

Notepad ++ Replacing Multiple Words
Okay so heres what i need to know, currently i am searching multiple words at once, heres some sample data
(\bACCESS\b)|(\bAccs\b)|(\bALLEY\b)|(\bAlly\b)|(\bALLEYWAY\b)
What i want to do is add a ":" to the end of every word that is found. Like this
41 dwadadad Rd:
93 awdawdadawd Terrace:
4/100 awdadawdwad St:
32 awdawdawdawd Ave:
59 awdawdawd Street: Ferny Grove
Is there a regular expression for only getting the end of the matched word?

I suggest using an alternation list with just two word boundaries - at the start and end of the pattern, and just one group:
\b(?:Rd|Terrace|St|Ave|Street)\b
And replace with $0: (where $0 backreference references the whole match, if the pattern matched Rd, the Rd will be inserted in the resulting string).
Note that we can use 2 \b only becayse they enclose the alternation non-capturing group (?:...), and are thus applied to each alternative. It shortens the regex and speeds it up.

All you have to do is change your regex to:
((\bACCESS\b)|(\bAccs\b)|(\bALLEY\b)|(\bAlly\b)|(\bALLEYWAY\b))
And then replace with: \1:

Related

Regex for matching 4 words within 20 words distance in a text

I'm trying to create a regular expression which can match 4 words in a text within 20 word distance and they also should be in order
For 2 words within 10 words range I could use the below
"\b(?:word1\W+(?:\w+\W+){0,10}?word2"
But unable to get for matching 4 words, so looking for suggestions.
Example:
sample,regular, validation, transformation - These 4 words should be present within 20 words in the below text
This is a sample regular expression text. Regular expression is used for string validation, parsing and transformation.The term "Regular expression" is typically abbreviated as "RegEx" or "regex".
Thanks in advance.

Assuming you want up to 20 words in between first and last word which makes 22 total.
What you could do, is to check if the first and last word are within the desired words distance and use a lookahead (?=...) after the first word to check for the two middle words in desired order while not skipping the last word by use of negative lookaheads (?!...).
/\b(sample)\W+ # first word (group 1)
(?= # look ahead
(?:(?!(?4))\w+\W+)*?(regular)\W+ # for word 2 (group 2)
(?:(?!(?4))\w+\W+)*?(validation) # for word 3 (group 3)
) # eof lookahead
(?:\w+\W+){0,20}?(transformation\b) # last word (group 4)
/ix
In this pcre demo at regex101 I put some capture groups for highlighting the matched words and used a pattern reference (?4) to the last word pattern for making the regex shorter.
Used the flags i (case insensitive), x (free spacing/comment mode)
You can improve performance by dropping the capturing groups and reference. Just dropping the reference would also make the pattern more compatible to other regex flavors like Javascript.
If you're using PHP and just want to check if the 4 words are in order within desired range you could also use a more simple regex and count the words with str_word_count().
$pattern = '/\bsample\b.*?\bregular\b.*?\bvalidation\b.*?\btransformation\b/is';
if(preg_match($pattern, $str, $out) && (str_word_count($out[0]) <= 22))
{ /* do something */ }
See this php demo at eval.in

Regular Expression to Anonymize Names

I am using Notepad++ and the Find and Replace pattern with regular expressions to alter usernames such that only the first and last character of the screen name is shown, separated by exactly four asterisks (*). For example, "albobz" would become "a****z".
Usernames are listed directly after the cue "screen_name: " and I know I can find all the usernames using the regular expression:
screen_name:\s([^\s]+)
However, this expression won't store the first or last letter and I am not sure how to do it.
Here is a sample line:
February 3, 2018 screen_name: FR33Q location: Europe verified: false lang: en

Method 1
You have to work with \G meta-character. In N++ using \G is kinda tricky.
Regex to find:
(?>(screen_name:\s+\S)|\G(?!^))\S(?=\S)
Breakdown:
(?> Construct a non-capturing group (atomic)
( Beginning of first capturing group
screen_name:\s\S Match up to first letter of name
) End of first CG
| Or
\G(?!^) Continue from previous match
) End of NCG
\S Match a non-whitespace character
(?=\S) Up to last but one character
Replace with:
\1*
Live demo
Method 2
Above solution substitutes each inner character with a * so length remains intact. If you want to put four number of *s without considering length you would search for:
(screen_name:\s+\S)(\S*)(\S)
and replace with: \1****\3
Live demo

Select digits on the end of line

I need to replace only digits at the end of line with semicolon ; using RegEx in Notepad++.
Before:
ddd 66 ffff 5
d 44 dds 55
After:
ddd 66 ffff;
d 44 dds;
I'm trying to find digits at the end of lines with expression
($)(\d+)
but Notepad++ can't find anything by use of this expression. How to achieve this?

Find:
\s\d+$
Replace:
;
\d+ will match one or more digits. $ will match the end of the line--this is non-capturing (so don't worry... the end of the line will not be replaced in a find/replace operation). And so \d+$ will match one or more digits immediately followed by the end of the line.
I included \s (a single whitespace character) because it looks like you want to replace the space preceding the digits as well.
Note that you will need to do "Replace All" for this to work like you want. (because each regex match is for one instance only)

Try this find/replace:
find:
^(.*) \d+$
replace:
\1;
The find regex above matches anything up to and excluding a final space followed by at least one digit. If the end pattern for a given line is not space followed by one or more digits, the regex should not match. The replacement is the capture group, what is in parenthesis, which is everything up to but excluding the final space and number.

Regexp: find out if value that repeats several times

I have strings:
TH 8H 5C QS TC
9S 4S JS KS JS
I want the second one to be picked up by reqexp. Help me please to contract the necessary expression.
What I tried so far is: S{5} but of course it look up sequentially.
Could I avoid determining which character I am looking for. I need 5 repetition of any. Could it be like .{5} ?
Thanks in advance!

If you have standalone strings, use
^\wS(?: \wS){4}$
See the regex demo
If these strings appear inside a larger text, replace the ^ and $ anchors with word boundaries \b:
\b\wS(?: \wS){4}\b
See another demo
Note that \w matches any alphanumeric or underscore character. If there can be any non-whitespace character, use \S instead:
\b\SS(?: \SS){4}\b
One more demo
\SS will match a non-whitespace followed with an S and (?: \SS){4} will match 4 same sequences (thus, there will be 5 2-character sequences with S at the end of each).

Regex that only matches on odd/even indices

Is there a regex that matches a string only when it starts on an odd or an even index? My use case is a hex string in which I want to replace certain "bytes".
Now, when trying to match 20 (space), 20 in "7209" would be matched as well even though it consists of the bytes 72 and 09. I am restricted to the regex implementation of Notepad++ in this case, so I'm not able to check the match index as e.g. in Java.
My sample input looks like:
324F8D8A20561205231920
I set up a testing page here, the regex should only match the first and the last occurence of 20, since the one in the middle starts on an odd index.

You can use the following regex to match 20 at even positions inside a hex string:
20(?=(?:[\da-fA-F]{2})*$)
See demo
I assume the string has no spaces in this case.
In case you have spaces between the values (or any other symbols), this could be an alternative (with $1XX-like replacement string):
((?:.{2})*?)20
See another demo

This seems to work for evens:
rx <- "^(.{2})*(20)"
strings <- c("7209","2079","9720")
grepl(rx,strings) # [1] FALSE TRUE TRUE

Not sure what Notepad++ uses for regex engine - it's been a while since I used it. This works in javascript...
/^(?:..)*?(20)/
...
/^ # start regex
(?: # non capturing group
.. # any character (two times)
)*? # close group, and repeat zero or more times, un-greedily
(20) # capture `20` in group
/ # end regex

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Notepad ++ Regular Expressions - regex

All you have to do is change your regex to: ((\bACCESS\b)|(\bAccs\b)|(\bALLEY\b)|(\bAlly\b)|(\bALLEYWAY\b)) And then replace with: \1:

Related

Regex for matching 4 words within 20 words distance in a text

Regular Expression to Anonymize Names

Select digits on the end of line

Regexp: find out if value that repeats several times

Regex that only matches on odd/even indices

Categories

Resources