notepad++ remove text between two string using regular expression - regex

I want to remove text between two strings using regular expression in notepad++. Here is my full string
[insertedOn]) VALUES (1, N'1F9ACCD2-3B60-49CF-830B-42B4C99F6072',
I want final string like this
[insertedOn]) VALUES (N'1F9ACCD2-3B60-49CF-830B-42B4C99F6072',
Here I removed 1, from string. 1,2,3 is in incremental order.
I tried lot of expression but not worked. Here is one of them (VALUES ()(?s)(.*)(, N')
How can I remove this?

You may use
(VALUES \().*?,\s*(N')
and replace with $1$2. Note that in case the part of string to be removed can contain line breaks, enable the . matches newline. If the N and VALUES must be matched only when in ALLCAPS, make sure the Match case option is checked.
Pattern details
(VALUES \() - Group 1 (later referred with $1 from the replacement pattern): a literal substring VALUES (
.*? - any 0+ chars, as few as possible, up to the leftmost occurrence of the sunsequent subpatterns
,\s* - a comma and 0+ whitespaces (use \h instead of \s to only match horizontal whitespace chars)
(N') - Group 2 (later referred with $2 from the replacement pattern): a literal substring N'.

You should first escape literal ( before VALUES: \(
By doing so, .* in your regex in addition to s (DOTALL) flag causes engine to greedily match up to end of input string then backtracks to stop at the first occurrence of , N' which means unexpected matches.
To improve your regex you should 1) make .* ungreedy 2) remove (?s) 3) escape (:
(VALUES \().*?, (N')
To be more precise in matching you'd better search for:
VALUES \(\K\d+, *(?=N')
and replace with nothing.
Breakdown:
VALUES \( March VALUES ( literally
\K Reset match
\d+, * Match digits preceding a comma and optional spaces
(?=N') Followed by N'

Related

Using regex to duplicate a selection and replacing some characters

Probably a terrible title.
I am trying to take the following:
Joe Dane
Bob Sagget
Whitney Houston
Some
Other
Test
And trying to produce:
JOE_DANE("Joe Dane"),
BOB_SAGGET("Bob Sagget"),
WHITNEY_HOUSTON("Whitney Houston"),
SOME("Some"),
OTHER("Other"),
TEST("Test"),
I'm using Notepad++ and am close but not good enough at regex to figure out the remaining expression. So far, this is what I have:
Find what: (^.*)
Replace with: \1 \(\"\1\"\),
Produces: Joe Dane("Joe Dane"),
I've tried replacing with: \U$1 \(\"\1\"\), but this also impacts the second instance of \1 with upper case. It also does not replace the whitespace with an underscore _.
This can be done in a single step.
If you don't have more than 2 words in a line:
Ctrl+H
Find what: ^(\S+)(?: (\S+))?$
Replace with: \U$1(?2_$2)\E\("$0"\),
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
^ # beginning of line
(\S+) # group 1, 1 or more non space
(?: (\S+))? # non capture group, a space, group 2, 1 or more non space, optional
$
Replacement:
\U # uppercased
$1 # group 1
(?2_$2) # if group 2 exists, add and underscore before
\E # end uppercase
\("$0"\), # the whole match with parens and quote
Screenshot (after):
If you have more than 2 words (up to 5), use:
Find ^(\S+)(?: (\S+))?(?: (\S+))?(?: (\S+))?(?: (\S+))?
Replace: \U$1(?2_$2)(?3_$3)(?4_$4)(?5_$5)\E\("$0"\),
I you have more thans five word, add as many (?: (\S+))? as needed.
You might do it in 2 steps, first matching any char 1+ more times from the start of the string.
Find what
^.+
For the first replacement you can use \E to end the activation of \U and use the full match $0
Replace with
\U$0\E\("$0"\),
For the second step, to replace the spaces with underscores, you could skip over the text between parenthesis, and match spaces between uppercase chars.
Find what
\(".*?"\)(*SKIP)(*F)|[A-Z]+\K\h+(?=[A-Z])
\(".*?"\) Match from (" till ")
(*SKIP)(*F)| Skip this part of the match
[A-Z]+\K Match uppercase chars and use \K to clear the current match buffer (forget what is matches do far)
\h+(?=[A-Z]) Match 1+ horizontal whitespace chars and assert an uppercase char to the right
Replace with _

Regular expression to replace content parentheses and their contents

I am looking for a regular expression that will replace parentheses and the strings within them if the string anything that is not a digit.
The string can be any combination of characters including numbers, letters, spaces etc.
For example:
(3) will not be replaced
(1234) will not be replaced
(some letters) will be replaced
(some letters, spaces - and numbers 123) will be replaced
So far I have a regex that will replace any parentheses and its content
str = str.replaceAll("\\(.*?\\)","");
I am not good with the syntax of replaceAll, so I am just going to write the way you have written it. But I think I can help you with the regex.
Try this Regex:
\((?=[^)]*[a-zA-Z ])[^)]+?\)
Demo
OR an even better one:
\((?!\d+\))[^)]+?\)
Demo
Explanation(for 1st Regex)
\( - matches opening paranthesis
(?=[^)]*[a-zA-Z ]) - Positive Lookahead - checks for 0 or more of any characters which are not ) followed by a space or a letter
[^)]+? - Captures 1 or more characters which are not )
\) - Finally matches the closing Paranthesis
Explanation(for 2nd Regex)
\( - matches opening paranthesis
(?!\d+\)) - Negative Lookahead - matches only those strings which do not have ALL the characters as digits after the opening paranthesis but before the closing paranthesis appears
[^)]+? - Captures 1 or more characters which are not )
\) - Finally matches the closing Paranthesis
Now, you can try your Replace statement as:
str = str.replaceAll("\((?=[^)]*[a-zA-Z ])[^)]+?\)","");
OR
str = str.replaceAll("\((?!\d+\))[^)]+?\)","");

Regex to match numbers followed by a specific character

I am so sorry, I know this is a simple question, which is not appropriate here, but I am terrible in regex.
I use preg_match with a pattern of (numbers A) to match the following replaces with the substrings
2A -> <i>2A</i>
100 A -> <i>100 A</i>
84.55A -> <i>84.55A</i>
92.1 A -> <i>92.1 A</i>
The numbers can be separated from the character or not
The numbers can be decimal
The letter should not be the begging of a word (not matching 4 All;
in fact, A should be followed by a space or period or linebreak)
My problem is to apply OR conditions to match a character which may exist or not to have a single match to be replaced as
$str = preg_replace($pattern, '<i>$1</i>', $str);
I can suggest
'~\b(?<![\d.])\d*\.?\d+\s*A\b~'
See the regex demo. Replace with '<i>$0</i>' where the $0 is the backreference to the whole match.
Details:
\b - leading word boundary
(?<![\d.]) - a negative lookbehind that fails the match if there is a dot or digit before the current location (NOTE: this is added to avoid matching 33.333.4444 A like strings, just remove if not necessary)
\d*\.?\d+ - a usual simplified float/int value regex (0+ digits, an optional . and 1+ digits) (NOTE: if you need a more sophisticated regex for this, see Matching Floating Point Numbers with a Regular Expression)
\s* - 0+ whitespaces
A\b - a whole word A (here, \b is a trailing word boundary).

reg expression to truncate a string from last dot

I have following string and I want to strip the last part starting from dot. Could you please advise? I am new to reg expressions.
[abc].[def].[ghi]
Thanks,
mc
The regexp you need is:
(.*?)(?:\.[^.]*)?$
The regexp piece by piece:
( # start of the first capturing sub-pattern
.* # matches any character, any number of times (zero or more)
? # make the previous quantifier (`*`) not greedy
) # end of the first sub-pattern
(?: # start of the second sub-pattern; it doesn't capture the matching string
\. # matches a dot (.)
[^.]* # matches anything but a dot (.), any number of times (zero or more)
) # end of the second sub-pattern
? # the previous sub-expression (the non-capturing sub-pattern) is optional
$ # matches the end of the string
How it works:
The first part (.*?) matches and captures everything until the last dot. The question mark (?) makes the zero or more quantifier (*) not greedy. It is greedy by default and, because of the second sub-expression have to be optional (read below), its greediness makes it match the entire string.
The ?: specifier at the start of the second sub-pattern makes it non-capturing. The sub-string it matches is not stored and it's not available for further use.
The second sub-pattern contains \.[^.]* and matches a dot (.) followed by zero or more characters but none of them can be dots. It doesn't match anything if the input string doesn't contain a dot and this makes the entire regexp not matching. This is why it is marked as optional by following it with a question mark (?).
Most tools that work with regexp provide a way to get and use the captured strings using $n or \n as placeholders in the replacement string. n above is the number of the capturing pattern, counting by its open parenthesis (. Since we have only one capturing sub-pattern, the substring it matches should be available either as $1 or \1 (or both, or using a different syntax).
You can play with this regexp on regex101.com.

Why is this regex selecting this text

I am using the regex
(.*)\d.txt
on the expression
MyFile23.txt
Now the online tester says that using the above regex the mentioned string would be allowed (selected). My understanding is that it should not be allowed because there are two numeric digits 2 and 3 while the above regex expression has only one numeric digit in it i.e \d.It should have been \d+. My current expression reads. Zero of more of any character followed by one numeric digit followed by .txt. My question is why is the above string passing the regex expression ?
This regex (.*)\d.txt will still match MyFile23.txt because of .* which will match 0 or more of any character (including a digit).
So for the given input: MyFile23.txt here is the breakup:
.* # matches MyFile2
\d # matched 3
. # matches a dot (though it can match anything here due to unescaped dot)
txt # will match literal txt
To make sure it only matches MyFile2.txt you can use:
^\D*\d\.txt$
Where ^ and $ are anchors to match start and end. \D* will match 0 or more non-digit.
The pattern you have has one group (.*) which would match using your example:MyFile2
because the . allows any character.
Furthermore the . in the pattern after this group is not escaped which will result in allowing another character of any kind.
To avoid this use:
(\D*)\d+\.txt
the group (\D*) would now match all non digit characters.
Here is the explanation, your "MyFile23.txt" matches the regex pattern:
A literal period . should always be escaped as \. else it will match "any character".
And finally, (.*) matches all the string from the beginning to the last digit (MyFile2). Have a look at the "MATCH INFORMATION" area on the right at this page.
So, I'd suggest the following fix:
^\D*\d\.txt$ = beginning of a line/string, non-digit character, any number of repetitions, a digit, a literal period, a literal txt, and the end of the string/line (depending on the m switch, which depends on the input string, whether you have a list of words on separate lines, or just a separate file name).
Here is a working example.