I have a dataset:
1.
Name1
Name2
Name3
2.
Name1
Name2.
Name3
and so on.
Using regex, I want the output to be:
Name1,Name2,Name3
Name1,Name2.,Name3
I'm trying to import into a google sheet, so need a comma delimited file. I believe that the steps are to replace the numbers followed by a period with \n and then add a comma after each column name. Note that some fields Ex: Name2. have a number followed by a period so having issues with \d+[.]
Ctrl+H
Find what: ^(.+)\R(.+)\R(.+)$
Replace with: $1,$2,$3
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(.+) # group 1, 1 or more any character but newline
\R # any kind of linebreak
(.+) # group 2, 1 or more any character but newline
\R # any kind of linebreak
(.+) # group 3, 1 or more any character but newline
$
Replacement:
$1 # content of group 1
, # comma
$2 # content of group 2
, # comma
$3 # content of group 3
Screenshot (before):
Screenshot (after):
Related
I have some tex files with \section{text} and \subsection{text}, etc. And I want to convert them to # text and ## text in markdown files using regular expressions in Notepad++. How can I achieve that?
Open the replace window with Ctrl + H. Check the radio button "Regular Expression" and search for:
\\section\{([^}]*)}
And replace with:
# \1
For subsections:
\\subsection\{([^}]*)}
## \1
What we're doing:
\\ is an escaped backslash matching the litteral backslash of your expression
{ needs to be escaped as well otherwise it would be recognized as quantifier, hence \{
([^}]*) is a group made of 0 or more characters that are NOT }
\1 is a reference to the first and only group of our regular expression
It can be done in a single pass:
Ctrl+H
Find what: \\(sub)?section{([^}]*)}
Replace with: (?1#)# $2
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
\\ # a backslash, have to be escaped
(sub)? # group 1, literally "sub", optional
section{ # literally
([^}]*) # group 2, 0 or more any character that is not "}"
} # "}" character
Replacement:
(?1#) # conditional replace, if group 1 exists, print a "#"
# $2 # "#", a space and content of group 2
Screenshot (before):
Screenshot (after):
I am trying to find in the Notepad++ strings like this:
'',
And convert them into this:
'',
I've made a regular expression to crop the string beginning with cards/ and ending with </a>:
(cards/)([^\s]{1,50})(([\s\.\?\!\-\,])(\w{1,50}))+(\.mp3"></a>)
Or an alternative approach:
(cards/)([^\s]{1,50})([\s\.\?\!\-\,]{0,})([^\s]{1,50})
Both work fine for search, but I can't get the replacement.
The problem is that the number of words in a sentence may vary.
And I can't get the ID of sub-expressions in the double parentheses.
The following format of replacement: \1\2\3... doesn't work, as I can't get the correct ID of the sub-expressions in the double parentheses.
I tried to google the topic, but couldn't find anything. Any advice, link or best of all a full replacement expression will be very much appreciated.
This will replace all spaces after /cards/ with a hyphen and lowercase the filename.
Ctrl+H
Find what: (?:href="/mp3files/cards/|\G)\K(?!\.mp3)(\S+)(?:\h+|(\.mp3))
Replace with: \L$1(?2$2:-)
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(?: # non capture group
href="/mp3files/cards/ # literally
| # OR
\G # restart fro last match position
) # end group
(?!\.mp3) # negative lookahead, make sure we haven't ".mp3" after this position
\K # forget all we have seen until this position
(\S+) # group 1, 1 or more non spaces
(?: # non capture group
\h+ # 1 or more horizontal spaces
| # OR
(\.mp3) # group 2, literally ".mp3"
) # end group
Replacement:
\L$1 # lowercase content of group 1
(?2 # if group 2 exists (the extension .mp3)
$2 # use it
: # else
- # put a hyphen
) # endif
Screenshot (before):
Screenshot (after):
I have a snippet of text from EDI X12. I am trying to find lines where a BBQ segment is followed by another BBQ segment. I want to replace all BBQ segments in the second line with BBB
Orig text
HI*BBR<0Y6D0Z1<D8<20190816~
HI*BBQ<05BC0ZZ<D8<20190806*BBQ<05BB0ZZ<D8<20190729*BBQ<06UM07Z<D8<20190729~
HI*BBQ<0JBL0ZZ<D8<20190809*BBQ<0J9N0ZZ<D8<20190816*BBQ<0KBS0ZZ<D8<20190816~
HI*BI<71<RD8<20190716-20190722~
Needs to become
HI*BBR<0Y6D0Z1<D8<20190816~
HI*BBQ<05BC0ZZ<D8<20190806*BBQ<05BB0ZZ<D8<20190729*BBQ<06UM07Z<D8<20190729~
HI*BBB<0JBL0ZZ<D8<20190809*BBB<0J9N0ZZ<D8<20190816*BBB<0KBS0ZZ<D8<20190816~
HI*BI<71<RD8<20190716-20190722~
This targets what I am looking for in capturing group 3, but how to replace BBQ with BBB within that group?
(^HI\*BBQ.+?~\r\n)(^HI\*)(BBQ.+?~\r\n)
Thanks for any ideas!
Ctrl+H
Find what: (?:^HI\*BBQ\b.+?~\RHI\*BB|\G(?!^).*?\bBB)\KQ\b
Replace with: B
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # non capture group
^ # begining of line
HI\*BBQ # literally
.+? # 1 or more any character but newline
~ # a tilde
\R # any kind of linebreak
HI\*BB # literally
| # OR
\G # restart from last match position
(?!^) # not at the beginning of line
.*?BB # 0 or more any character but newline, not greedy, followed by BB
) # end group
\K # forget all we have seen until this position
Q # the letter Q
Screen capture (before):
Screen capture (after):
My file has 4000k lines. I need to reformat it. So, I am trying notepad++ (or awk). The structure every line is
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324|pol protein Tabulator[Human immunodeficiency virus 1]TabulatorTLWQRPFVTIKVGGQLKEALLDTGADDTVLEEIELPGRWKPKMIGGIGGFIKVRQYDQIXVEICGHKAIGTVLVGPTPVNVIGRNLMTQIGCTLN
The characters among the 4th vertical bar | and the first [ is variable length. Only I am looking for tips or where to focus to do it myself. I tried to print with awk but how there are one part variable in length, I obtained different results. Neither I can select by columns.
I would like to obtain a file with this structure
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,pol protein
and other file with this structure
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324TabulatorTLWQRPFVTIKVGGQLKEALLDTGADDTVLEEIELPGRWKPKMIGGIGGFIKVRQYDQIXVEICGHKAIGTVLVGPTPVNVIGRNLMTQIGCTLN
TAB are in bold letters - Tabulator
Here is a way to do for the first file.
Ctrl+H
Find what: (^[^|]+(?:\|[^|]+){4})\|(.+?)\h+\[.+$
Replace with: $1,$1,$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
( # group 1
^ # beginning of line
[^|]+ # 1 or more non pipe
(?: # start non capture group
\| # a pipe
[^|]+ # 1 or more non pipe
){4} # end group, must appear 4 times
) # end group 1
\| # a pipe
(.+?) # group 2, 1 or more any character but newline, not greedy
\h+ # 1 or more horizontal spaces (space or tabulation)
\[ # 1 openning square bracket
.+ # 1 or more any character but newline
$ # end of line
Replacement:
$1 # content of group 1
, # a comma
$1 # content of group 1
, # a comma
$2 # content of group 2
Result for given example:
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,pol protein
Screen capture:
For the second file:
Ctrl+H
Find what: (^[^|]+(?:\|[^|]+){4})\|.+?\h+\[.+?\](.+)$
Replace with: $1$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
( # group 1
^ # beginning of line
[^|]+ # 1 or more non pipe
(?: # start non capture group
\| # a pipe
[^|]+ # 1 or more non pipe
){4} # end group, must appear 4 times
) # end group 1
\| # a pipe
.+? # 1 or more any character but newline, not greedy
\h+ # 1 or more horizontal spaces (space or tabulation)
\[ # 1 openning square bracket
.+? # 1 or more any character but newline, not greedy
\] # a closing square bracket
(.+) # group 2, 1 or more any character but newline
$ # end of line
Screen capture:
Using a regex in Notepad++ I am trying to replace 53 characters on a line with spaces:
Find: (^RS.{192})(.{53})(.{265})
Replace: \1(\x20){53}\3
It's replacing group \2 with " {53}" but what I want is 53 spaces.
How do you do this?
Replacement terms are not regex expressions, except they may use back references.
Just code 53 literal spaces:
Replace: \1 \3
A bit tedious, but it works.
space is \s
which means you need to use \s{53}
Assuming there is ALLWAYS RS and 192 characters before and 265 after
Ctrl+H
Find what: (?:^RS.{192}|\G)\K.(?=.{265,}$)
Replace with: # a space
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # start non capture group
^ # beginning of line
RS # literally RS
.{192} # 192 any character
| # R
\G # restart from last match position
) # end group
\K # forget all we've seen until this position
. # 1 any character
(?= # positive lookahead, zero-length assertion to make sure we have after:
.{265,} # at least 256 any characters
$ # end of line
) # en lookahead
Replacement:
% # the character to insert
Given shorter line to illusrate:
RSabcdefghijklmnopqrstuvwxyz
Result for given example:
RSabcdefghij qrstuvwxyz
Screen shot: