Make changes to the fist line for each file in notepad++ - regex

I have 50 files which have a blank first line and column headers surrounded in double quotes on the second line. I want to delete the first line and remove double quotes " from the second line for every file.
Can both these changes be done in 1 regular expression or do I need to use two different expressions?
Note: I am unable to print the first line as blank in sample data as this website is not allowing me. The \n is just to denote an empty line.
Also the second line is different in all 50 file, so I cannot use simple find and replace. I need to use some regular expression.
Sample data.
\n
"PRODUCTID","ATTRIBUTENAME_VALUE","STATE"
"00300678116042","NOT_APPLICABLE","CONFIRMED"
"00041260363603","NOT_APPLICABLE","CONFIRMED"
Expected output
PRODUCTID,ATTRIBUTENAME_VALUE,STATE
"00300678116042","NOT_APPLICABLE","CONFIRMED"
"00041260363603","NOT_APPLICABLE","CONFIRMED"

I think this should work as one replace find in files:
Find what: ^\r\n"(.*?)","(.*?)","(.*?)"
Replace with: \1,\2,\3

You can try something like this:
(?:\G(?!^)|^\R)"([^"\n]*)
and replace it with $1.
pattern details:
(?:
\G # contiguous to the previous match
(?!^) # not at the start of the line
# (to prevent \G to match the start of the string)
| # OR
^\R # start of a line followed by a newline (an empty line)
)
"
([^"\n]*) # capture group 1: all that is not a quote or a newline
# (to reach the next quote)

Related

(Notepad++) How do I use Regex to remove everything in a file before a word, but NOT including the word?

I have a file that has lots of lines and I have one line that starts with "1020":
990.1.1={
holder=1000083706 #Dowelani
}
1020.1.1={
holder=1000083707 #Mutsutshudzi
}
1050.1.1={
holder=1000083708 #Khathu
}
I want to remove every line above that line starting with 1020, but I want to keep the 1020 line.
I have been trying .*1020, and this removes everything before the line containing "1020", but it also removes the 1020. How can I modify the code to keep the line I search for but also remove every line above it?
Rather that discarding the part of the string up to the target string it's easier to simply match the line that begins with the target string and all subsequent lines. You can do that with the regular expression
^1020\..*
with the multiline and single line (or dotall) flags set.
Demo
The multiline flag causes ^ to the match the beginning of a line, rather than the beginning of the string, and the single line flag causes . to match every character, including line terminators. (Without that flag set . matches all characters other than line terminators.)
If you only want to keep the (first) line that begins with the target string, do not set the single-line flag and return the first match (using re.search()).
You can use
Find What: (?s)^.*?\R(?=1020\.)
Replace With: empty string
See the regex demo. Details:
(?s) - a dot now matches newlines, too
^ - start of a line
.*? - any zero or more chars, as few as possible
\R - a line break sequence
(?=1020\.) - a positive lookahead that matches a location in string that is immediately followed with 1020..

Replace Certain Line Breaks with Equivalent of Pressing delete key on Keyboard NotePad++ Regex

Im using Notepad++ Find and replace and I have regex that looks for [^|]\r which will find the end of the line that starts with 8778.
8778|44523|0||TENNESSEE|ADMINISTRATION||ROLL 169 BATCH 8|1947-09-22|0|OnBase
See Also 15990TT|
I want to basically merge that line with the one below it, so it becomes this:
8778|44523|0||TENNESSEE|ADMINISTRATION||ROLL 169 BATCH 8|1947-09-22|0|OnBase See Also 15990TT|
Ive tried the replace being a blank space, but its grabbing the last character on that line (an e in this case) and replacing that with a space, so its making it
8778|44523|0||TENNESSEE|ADMINISTRATION||ROLL 169 BATCH 8|1947-09-22|0|OnBas
See Also 15990TT|
Is there any way to make it essentially merge the two lines?
\r only matches a carriage return symbol, to match a line break, you need \R that matches any line break sequence.
To keep a part of a pattern after replacement, capture that part with parentheses, and then use a backreference to that group.
So you may use
([^|\r])\R
Replace with $1. Or with $1 if you need to append a space.
Details
([^|\r]) - Capturing group 1 ($1 is the backreference that refers to the group value from the replacement pattern): any char other than | and CR
\R - any line break char sequence, LF, CR or CRLF.
See the regex demo and the Notepad++ demo with settings:
The issue is you're using [^|] to match anything that's not a pipe character before the carriage return, which, on replacement, will remove that character (hence why you're losing an e).
If it's imperative that you match only carriage returns that follow non-pipe characters, capture the preceding character ([^|])\r$ and then put it back in the replacement using $1.
You're also missing a \n in your regex, which is why the replacement isn't concatenating the two lines. So your search should be ([^|])\r\n$ and your replace should be $1.
Find
(\r\n)+
For "Replace" - don't put anything in (not even a space)

Regex end of nth line

How can I use regex in sublime to target the end of every third line, so that I can insert a semicolon.
I know I can target/wrap every third line like this:
(.*\n){3}
And target the end of each line like this: $
But how can I target the END of every THIRD line so that I can insert a semicolon?
You shouldn't match the third newline character. Try the following regex:
^.*(?:\R.*){2}\K
See live demo here
In above regex \R means any kind of newline character, \K means reset match output and ^ matches at start of each line by default in Sublime Text (so no need for (?m)).
Put the cursor at the beginning of file content then search for the given regex and replace with ;.

\1 not defined in the RE

In my script, I'm in passing a markdown file and using sed, I'm trying to find lines that do not have one or more # and are not empty lines and then surround those lines with <p></p> tags
My reasoning:
^[^#]+ At beginning of line, find lines that do not begin with 1 or more #
.\+ Then find lines that contain one or more character (aka not empty lines)
Then replace the matched line with <p>\1</p>, where \1 represents the matched line.
However, I'm getting "\1 not defined in the RE". Is my reasoning above correct and how do I fix this error?
BODY=$(sed -E 's/^[^#]+.\+/<p>\1</p>/g' "$1")
Backslash followed by a number is replaced with the match for the Nth capture group in the regexp, but your regexp has no capture groups.
If you want to replace the entire match, use &:
BODY=$(sed -E 's%^[^#].*%<p>&</p>%' "$1")
You don't need to use .+ to find non-empty lines -- the fact that it has a character at the beginning that doesn't match # means it's not empty. And you don't need + after [^#] -- all you care is that the first character isn't #. You also don't need the g modifier when the regexp matches the entire line -- that's only needed to replace multiple matches per line.
And since your replacement string contains /, you need to either escape it or change the delimiter to some other character.

In Notepad++ replace all lines except ones that match to given expression, using regex

I saw here some answers that might help me if I'll combine them together but I can't seem to figure out how to do it properly.
Lets assume we have a following text file:
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaa
[a]
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa[h]
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
[a]
aaaaaaaaaaaaaaaaaaaaaaaa[h]
aaaaaaaaaaaaaaaaaaaaaaaaaaa
Where:
"a" means literally any character (or set of characters), including special symbols, unicode characters etc.
"h" is a fixed latin character
brackets mean brackets
blank line is a blank line
Then:
How do I keep only lines with [h] at the end replacing everything else with bank lines? (means carriage return remains)
How do I keep the same lines but also remove [h]?
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa[h]
aaaaaaaaaaaaaaaaaaaaaaaa[h]
As title says, I guess what I need can be also described as: replace any line except the line that matches to the given expression.
Find what:
^.*$(?<!\[h\])
Replace with nothing. Make sure to uncheck . matches newline.
How does it work?
^ # matches the beginning of a line (after the line break)
.* # matches as many non-line-break characters as possible (an entire line)
$ # matches the end of a line (before the line break)
(?<! # a negative lookbehind, if it's contents match left of the current
# position, it causes the pattern to fail
\[h\] # match [h] literally
) # end of lookbehind
Note that lookarounds are not part of the match. So ^.*$ simply makes sure that you are matching entire lines and not parts of them and neither multiple ones. The lookbehind then assures that the matched line has not ended with [h].
You can then remove the [h] with an additional step:
Find what: \[h\]$
Replace with nothing.
EDIT: Due to the fact that the regex engine traverses the file from beginning to end and the fact that matches can never overlap, you can actually put both patterns into one:
^.*$(?<!\[h\])|\[h\]$
By the time an [h] at the end of the line is removed, the engine will not look at that line again, so you're only left with the lines that used to have an [h] at the end.