Notepad++ regular expression - replace a part of regex - regex

I am fixing corrupted DB export to txt file, I am new to Regular expressions:
My corrupted lines can be found using Notepad++ regular expression:
\r\n[^"]
(fine line breaks followed by everything that is not " )
I need to delete these \r\n but I need to preserve the characters following it (in my data these are digits)
Desired data:
"USERNAME"|"Text1"|"Text2"|"Spreadsheet" (CR)(LF)
"USERNAME"|"Text1"|"Text2"|"Spreadsheet" (CR)(LF)
Corrupted data:
"USERNAME"|"Text1"|"Text2line1 - #3.50 (CR)(LF)
1 x text2line2 - #5.40 (CR)(LF)
2 x text2line3 #6.75 (CR)(LF)
|"Spreadsheet" (CR)(LF)
Therefore this does not work:
FIND: \r\n[^"]
REPLACE: [^"]
Because this way I would get rid of "1" and "2" and the beginning of the new line.
I will be grateful for your help :)

Make a minor change to the expression so that it reads \r\n([^"]) (notice the extra ( and )). This will place the match in a regex group.
Then, simply replace that by \1, which is the regex group you are matching in the expression above.

You could use a positive lookahead:
Find what: \R(?!=[ ^"])
Replace with: NOTHING
\R stands for any kind of linebreak.
(?!=[ ^"]) is a zero width assertion that assumes there're no quotes after the linebreak

Related

Using Regex selecting text match everything after a word or patterns (similar topic but text is not fix patterns except 1 character)

I am trying to use Regex in notepad++ to select everything after v+(number|character)* but in the selection it should excluded the v+(num|char)*.
e.g. master\_\move_consolidate_archives_html_to_move_base_v2kjkj_(2021_01_19_11h43m59s-fi_m_dt xx-) - Copy (2).bat"
I am expecting
_(2021_01_19_11h43m59s-fi_m_dt xx-) - Copy (2).bat"
so far I can use this line (?i)(v\d[0-9a-z]*)
to select v2kjkj
but I can't get this to work with lookbehind (?<=xxxx).
I am also trying to use if-then-else condition but no luck for me. I am still don't understand enough to using it.
issue.
because the "v" have different pattern in it. I can't hard code to certain string
v2
v23
v2kjkj
v2343434
Test string:
mmaster\_\move_consolidate_archives_html_to_move_base_v2_16_.bat"
master\_\move_consolidate_archiv es_html_to_move_base_v23_17_.bat"
master\_\move_consolidate_archives_html_to_move_base_v2_17_(2021_01_19_12h37m19s-fi_m_dt xx-).bat"
master\_\move_consolidate_archives_html_to_move_base_v2_(2021_01_19_11h43m59s-fi_m_dt xx-) - CopyCopy.bat"
master\_\move_consolidate_archives_html_to_move_base_v2kjkj_(2021_01_19_11h43m59s-fi_m_dt xx-) - Copy (2).bat"
master\_\move_consolidate_archives_html_to_move_base_v2343434_(2021_01_19_11h43m59s-fi_m_dt xx-) - Copy (3).bat"
I have been reading and searching for a day but I can't apply anything I have seen so for.
the closest one I see was
Regexp match everything after a word
Getting the text that follows after the regex match
I am welcome any comments.
Ctrl+H
Find what: v\d[0-9a-z]*\K.*$
Replace with: LEAVE EMPTY
UNCHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
v # a "v"
\d # a digit
[0-9a-z]* # 0 or more alphanum
\K # forget all we have seen until this position
.* # 0 or more any character but newline
$ # end of line
Screenshot (before):
Screenshot (after):

Multi-line search & replace for Beginning and End of Each Line in Notepad++

To start off, I want to be able to do 2 things:
1st Thing:
To extract foo_abc (and similarly every other line, for example, goo_zxy, and doo_fgh), I needed to remove some text appended BEFORE foo_abc, and AFTER foo_abc.
For example:
TEXTBEFOREfoo_abcTEXTAFTER
TEXTBEFOREgoo_zxyTEXTAFTER
TEXTBEFOREdoo_fghTEXTAFTER
to obtain:
foo_abc
goo_zxy
doo_fgh
2nd Thing:
I now need to append different text before and after foo_abc again.
Like so:
TextAfoo_abcTextB
So what I've done is:
Find: ^
Replace: TextA
Find: $
Replace: TextB
Which works well, but I have to perform a find&replace TWICE which is not very efficient. To avoid that, I found this: Multiple word search and replace in notepad++
And applied it like so:
Find: (^)|($)
Replace: (?1TextA)(?2TextB)
But it doesn't work out too well.
AND, as mentioned, I need this to work for EACH and every line:
For example:
foo_abc
goo_zxy
doo_fgh
I need to insert TextA at the beginning for each of those lines, and TextB at the end of each line, like so:
TextAfoo_abcTextB
TextAgoo_zxyTextB
TextAdoo_fghTextB
Can this be done? (Yes, I actually need to do this to over 10000 lines, not just 3 and wanting an efficient way to do so).
Have I missed a quicker way to do all of this? Perhaps by performing a search and replace above in '1st Thing' on the TEXTBEFORE and TEXTAFTER, with TextA and TextB, respectively, in one-go?
Many thanks.
EDIT: Yes, they are literal strings. Yes, they do contain special characters because they are represent parts of a URL.
There are two scenarios: 1) you want to replace the TEXTBEFORE or TEXTAFTER regardless of the fact that either of them exists, 2) both TEXTBEFORE and TEXTAFTER must exist
Scenario 1
You may use a single search and replace operation for this:
Find What: ^(TEXTBEFORE)|TEXTAFTER$
Replace With: (?{1}TextA:TextB)
NOTE: If the TEXTBEFORE and TEXTAFTER contain special chars, you may use
Find What: ^(\QTEXTBEFORE\E)|\QTEXTAFTER\E$
Details:
^(TEXTBEFORE)- match and capture into Group 1 TEXTBEFORE at the start of a line
| - or
TEXTAFTER$ - match TEXTAFTER at the end of a line.
Replacement pattern:
(?{1} - if Group 1 is matched, then
TextA - return TextA
: - else
TextB - replace with TextB
) - end of the conditional replacement pattern.
Scenario 2
If you need to match lines starting with some text and ending with another, use
Find What: ^TEXTBEFORE(.*?)TEXTAFTER$
Replace With: TextA$1TextB
Details:
^ - start of a line
TEXTBEFORE - some text here
(.*?) - Group 1 (that can be referred to with $1 backreference from the replacement pattern) matching any 0+ chars other than line break chars
TEXTAFTER - some text at the...
$ - end of line.
Try:
TEXTBEFORE(.+?)TEXTAFTER
replace with
TextA$1TextB
See this for example and explanation
If you need to find whole line:
^TEXTBEFORE(.+?)TEXTAFTER$
Replace is the same as before.

replace after "word" character by character in notepad ++?

I have a STRING
"wordride plain fire "
I have tried to replace with Regular Expressions:
Find what: (?>(word)|\G(?<!^))\K\S
Replace with: $1$2$0
In Notepad ++, it does not change the text but it works in regex101 (https://regex101.com/r/aI6gE1/2), where i replaces characters after word as follows
First replace: wordwordide plain fire
Second replace: wordwordwordde plain fire
Third replace: wordwordwordworde plain fire
Fourth replace: wordwordwordwordword plain fire
Fifth replace: wordwordwordwordwordwordplain fire
Sixth replace: wordwordwordwordwordwordwordlain fire
Can you help me to see the error or give me a workaround in Notepad ++ for this purpose: replacing string after "word" character by character using a group not included in match group
Please help me
The answer is yes, it is possible to do with Notepad++ BUT only with the help of a PythonScript plug-in.
Get the plugin ready, and create the following script:
import re
regex = r"^(word)(.+)"
def process_match(match):
return "{0}{1}".format(match.group(1), "".join([match.group(1) for x in list(match.group(2))]))
editor.rereplace(regex, process_match)
The ^(word)(.+) pattern will match a line with word at its start into Group 1 and all the rest of the line into Group 2.
The "{0}{1}".format(match.group(1), "".join([match.group(1) for x in list(match.group(2))])) will paste the Group 1 value into the result first (see format(match.group(1)) and then "".join([match.group(1) for x in list(match.group(2))]) will replace each character in Group 2 with the value in Group 1.
This text:
word1
word1 2
wordride plain fire
will turn into:
NOTE: You can control how many chars after word are replace with word by adjusting (modifying) the (.+) pattern.
It's hard to understand exactly what you want to do but the following is working based on your examples:
Find: ^((word)+).
Replace with: $1$2

NotePad++ Currency RegEx with Optional Replace

My Search pattern: \"(\$)(\d{0,3}?)\,?(\d{1,3}?)\,?(\d{0,3})\s?\"
Matches all of these:
"$1"
"$10"
"$100"
"$1,000"
"$10,000 "
"$100,000"
"$1,000,000 "
"$10,000,000"
"$100,000,000"
I know I don't really need to search for under the thousands place, but am including those for possible future application.
My problem: I need to replace all of the commas with HTML escape char ,, but only if there is a comma present in the search result.
This replace pattern $1$2,$3,$4 gives the incorrect result, and I'm just not seeing the right pattern to use for my replacement.
$,1,
$,1,0
$,1,00
$,1,000
$,10,000
$,100,000
$1,000,000
$10,000,000
$100,000,000
This is the result I am attempting to get:
$1
$10
$100
$1,000
$10,000
$100,000
$1,000,000
$10,000,000
$100,000,000
No Quotes and no extra space after the last digit.
I'm not married to having to find the 1's through 100's, but it is preferable.
Any ideas on how to do optional replace in NotePad++?
Use a regex Search and Replace: Replace (\d),(\d) with \1,\2. Check regular expression, click Replace or Replace all.
For some unknown reason, the RE of Sebastian from the comments above did not work with notepad++ 6.8.6 (find worked fine, but not replace). So instead of using look around, we capture the surrounding digits into \1 and \2 for reuse in the replacement.
Try following regex:
(?<=\d),(?=\d)
After running test on your dataset, I got result as:
"$1"
"$10"
"$100"
"$1,000"
"$10,000 "
"$100,000"
"$1,000,000 "
"$10,000,000"
"$100,000,000"

Matching all occurrences of a html element attribute in notepad++ regex

I have a file which has hundreds of links like this:
<h3>aspnet</h3>
Ex 1
Ex 2
Ex 3
So I want to remove all the elements
icon="..."
from all the lines. I went through the official Notepad++ regex wiki and have come up with this after several trials:
icon=\"[^\.]+\"
The problem with this is, it is selecting past the second double quote and stopping at the next occurring double quote. To illustrate, this will select the following content:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">EX 1</a> <a href="
If I modify the above regex to,
icon=\"[^\.]+\">
Then it is almost perfect, but it is also selecting the >:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">
The regex I am looking for would select like this:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt..."
I also tried the following, but it doesn't match anything at all
icon=\"[^\.]+\"$
Just match anything but a quote, followed by a quote:
icon="[^"]+"
Just tested with notepad++ 6.2.2 and confirmed that this matches correctly as written.
Broken down:
icon="
This is fairly obvious, match the literal text icon=".
[^"]+
This means to match any character that is not a ". Adding the + after it means "one or more times."
Finally we match another literal ".
I am not a notepad++ user. so don't know how notepad++ plays with regex, but can you try to replace
icon=\"[^>]* to (empty string) ?
Try this solution:
This is I just check was working as you wanted it.
The way achieving your goal:
Find what: (icon.*")|.*?
Replace with: $1