Regex replace in capture group

Regex replace in capture group - regex

I have a snippet of text from EDI X12. I am trying to find lines where a BBQ segment is followed by another BBQ segment. I want to replace all BBQ segments in the second line with BBB
Orig text
HI*BBR<0Y6D0Z1<D8<20190816~
HI*BBQ<05BC0ZZ<D8<20190806*BBQ<05BB0ZZ<D8<20190729*BBQ<06UM07Z<D8<20190729~
HI*BBQ<0JBL0ZZ<D8<20190809*BBQ<0J9N0ZZ<D8<20190816*BBQ<0KBS0ZZ<D8<20190816~
HI*BI<71<RD8<20190716-20190722~
Needs to become
HI*BBR<0Y6D0Z1<D8<20190816~
HI*BBQ<05BC0ZZ<D8<20190806*BBQ<05BB0ZZ<D8<20190729*BBQ<06UM07Z<D8<20190729~
HI*BBB<0JBL0ZZ<D8<20190809*BBB<0J9N0ZZ<D8<20190816*BBB<0KBS0ZZ<D8<20190816~
HI*BI<71<RD8<20190716-20190722~
This targets what I am looking for in capturing group 3, but how to replace BBQ with BBB within that group?
(^HI\*BBQ.+?~\r\n)(^HI\*)(BBQ.+?~\r\n)
Thanks for any ideas!

Ctrl+H
Find what: (?:^HI\*BBQ\b.+?~\RHI\*BB|\G(?!^).*?\bBB)\KQ\b
Replace with: B
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # non capture group
^ # begining of line
HI\*BBQ # literally
.+? # 1 or more any character but newline
~ # a tilde
\R # any kind of linebreak
HI\*BB # literally
| # OR
\G # restart from last match position
(?!^) # not at the beginning of line
.*?BB # 0 or more any character but newline, not greedy, followed by BB
) # end group
\K # forget all we have seen until this position
Q # the letter Q
Screen capture (before):
Screen capture (after):

Related

Notepad++ all instances of character under a condition

This is my text
BROKEN This is a "sentence".
This sentence is an actual normal sentence.
I wish to replace/filter the quotation marks out of every line that has the word BROKEN in it
I thought this would be simple but I couldn't do it
my regex
(?=BROKEN)"
could I get some help?

If you also want to match double quotes before the word BROKEN, you can skip the whole line that does not contain the word.
Find what:
^(?!.*\bBROKEN\b).*\R?(*SKIP)(*F)|"
Replace with: (leave empty)
Explanation
^ Start of string
(?!.*\bBROKEN\b) Negative lookahead, assert that the word BROKEN does not occur
.*\R?(*SKIP)(*F) Match the whole line including an optional newline and skip the match
| Or
" Match a double quote
See a regex101 demo.
Before
After

Ctrl+H
Find what: (?:^.*?\bBROKEN\b|\G(?!^))[^"\r\n]*\K"
Replace with: LEAVE EMPTY
TICK Match case
TICK Wrap around
SELECT Regular expression
UNTICK . matches newline
Replace all
Explanation:
(?: # non capture group
^ # beginning of line
.*? # 0 or more any character but newline
\bBROKEN\b # literally
| # OR
\G # restart from last match position
(?!^) # not at the beginning of line
) # end group
[^"\r\n]* # 0 or more any character that is not a quote or linebreak
\K # forget all we have seen until this position
" # quote
Screenshot (before):
Screenshot (after):

Notepad++ regexp ID of the sub-expressions in the double parentheses

I am trying to find in the Notepad++ strings like this:
'',
And convert them into this:
'',
I've made a regular expression to crop the string beginning with cards/ and ending with </a>:
(cards/)([^\s]{1,50})(([\s\.\?\!\-\,])(\w{1,50}))+(\.mp3"></a>)
Or an alternative approach:
(cards/)([^\s]{1,50})([\s\.\?\!\-\,]{0,})([^\s]{1,50})
Both work fine for search, but I can't get the replacement.
The problem is that the number of words in a sentence may vary.
And I can't get the ID of sub-expressions in the double parentheses.
The following format of replacement: \1\2\3... doesn't work, as I can't get the correct ID of the sub-expressions in the double parentheses.
I tried to google the topic, but couldn't find anything. Any advice, link or best of all a full replacement expression will be very much appreciated.

This will replace all spaces after /cards/ with a hyphen and lowercase the filename.
Ctrl+H
Find what: (?:href="/mp3files/cards/|\G)\K(?!\.mp3)(\S+)(?:\h+|(\.mp3))
Replace with: \L$1(?2$2:-)
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(?: # non capture group
href="/mp3files/cards/ # literally
| # OR
\G # restart fro last match position
) # end group
(?!\.mp3) # negative lookahead, make sure we haven't ".mp3" after this position
\K # forget all we have seen until this position
(\S+) # group 1, 1 or more non spaces
(?: # non capture group
\h+ # 1 or more horizontal spaces
| # OR
(\.mp3) # group 2, literally ".mp3"
) # end group
Replacement:
\L$1 # lowercase content of group 1
(?2 # if group 2 exists (the extension .mp3)
$2 # use it
: # else
- # put a hyphen
) # endif
Screenshot (before):
Screenshot (after):

I'm trying to delete everything but the email but I get the opposite

I tried to search in other questions here but none seem to work for me
I'm using notepadd++ and i'm trying to remove everything but email from a email list where everything is in the same line.
Example:
| RONNAN FERREIRA | RENANRFCRON#GMAIL.COM 17933 | RONNE YAN CANAVARRO DE ASSIS |
This regex seems to select every email perfecly:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
When i put $1 or \1 on "replace with" it just delete all the emails, i want it to do the opposite.

Ctrl+H
Find what: \G.*?(\S+#\S+)(?:(?=.*#)|.*$)
Replace with: $1 <-- there is a space after $1, you may add any character you want as delimiter
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
\G # restart from last match position
.*? # 0 or more any character, not greedy
(\S+#\S+) # group 1, an email
(?: # non capture group
(?=.*#) # positive lookahead, check if we have an # after
| # OR
.*$ # 0 or more any character until end of line
) # end group
Screen capture (before):
Screen capture (after):

Another option might be to either capture an email format in a capturing group, or match all parts divided by 1+ or more horizontal whitespace chars that do not contain an #.
To match only the spaces after an email except for the last one, you could make use of a conditional which has the form (?(?=regex)then|else)
In the replacement use $1
([^\h#]+#[^\h#]+(?(?=[^#\r\n]*#)\h+))|\h*(?<!\S)[^\h#]+(?:\h+[^\h#]+)*(?!\S)\h*
In parts:
( Capture group 1
[^\h#]+#[^\h#]+ Match an e-mail like format
(?(?=[^#\r\n]*#)\h+) Conditional, match 1+ horizontal whitespace chars if there is another email
) Close group
| Or
\h* Match 0+ horizontal whitespace chars
(?<!\S) Assert what is on the left is not a non whitespace char
[^\h#]+(?:\h+[^\h#]+)* Repeat matching the parts that don't have an #
(?!\S) Assert what is on the right is not a non whitespace char
\h* Match 0+ horizontal whitespace chars
Regex demo
A test for the example content with multiple email addresses
| RONNAN FERREIRA | RENANRFCRON#GMAIL.COM 17933 | RONNE YAN CANAVARRO DE ASSIS | RENANRFCRON#GMAIL.COM 17933 |

How can I move a column with variable length in between one vertical bar "|" and "["?

My file has 4000k lines. I need to reformat it. So, I am trying notepad++ (or awk). The structure every line is
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324|pol protein Tabulator[Human immunodeficiency virus 1]TabulatorTLWQRPFVTIKVGGQLKEALLDTGADDTVLEEIELPGRWKPKMIGGIGGFIKVRQYDQIXVEICGHKAIGTVLVGPTPVNVIGRNLMTQIGCTLN
The characters among the 4th vertical bar | and the first [ is variable length. Only I am looking for tips or where to focus to do it myself. I tried to print with awk but how there are one part variable in length, I obtained different results. Neither I can select by columns.
I would like to obtain a file with this structure
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,pol protein
and other file with this structure
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324TabulatorTLWQRPFVTIKVGGQLKEALLDTGADDTVLEEIELPGRWKPKMIGGIGGFIKVRQYDQIXVEICGHKAIGTVLVGPTPVNVIGRNLMTQIGCTLN
TAB are in bold letters - Tabulator

Here is a way to do for the first file.
Ctrl+H
Find what: (^[^|]+(?:\|[^|]+){4})\|(.+?)\h+\[.+$
Replace with: $1,$1,$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
( # group 1
^ # beginning of line
[^|]+ # 1 or more non pipe
(?: # start non capture group
\| # a pipe
[^|]+ # 1 or more non pipe
){4} # end group, must appear 4 times
) # end group 1
\| # a pipe
(.+?) # group 2, 1 or more any character but newline, not greedy
\h+ # 1 or more horizontal spaces (space or tabulation)
\[ # 1 openning square bracket
.+ # 1 or more any character but newline
$ # end of line
Replacement:
$1 # content of group 1
, # a comma
$1 # content of group 1
, # a comma
$2 # content of group 2
Result for given example:
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,pol protein
Screen capture:
For the second file:
Ctrl+H
Find what: (^[^|]+(?:\|[^|]+){4})\|.+?\h+\[.+?\](.+)$
Replace with: $1$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
( # group 1
^ # beginning of line
[^|]+ # 1 or more non pipe
(?: # start non capture group
\| # a pipe
[^|]+ # 1 or more non pipe
){4} # end group, must appear 4 times
) # end group 1
\| # a pipe
.+? # 1 or more any character but newline, not greedy
\h+ # 1 or more horizontal spaces (space or tabulation)
\[ # 1 openning square bracket
.+? # 1 or more any character but newline, not greedy
\] # a closing square bracket
(.+) # group 2, 1 or more any character but newline
$ # end of line
Screen capture:

Notepad++ replace with spaces

Using a regex in Notepad++ I am trying to replace 53 characters on a line with spaces:
Find: (^RS.{192})(.{53})(.{265})
Replace: \1(\x20){53}\3
It's replacing group \2 with " {53}" but what I want is 53 spaces.
How do you do this?

Replacement terms are not regex expressions, except they may use back references.
Just code 53 literal spaces:
Replace: \1 \3
A bit tedious, but it works.

space is \s
which means you need to use \s{53}

Assuming there is ALLWAYS RS and 192 characters before and 265 after
Ctrl+H
Find what: (?:^RS.{192}|\G)\K.(?=.{265,}$)
Replace with: # a space
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # start non capture group
^ # beginning of line
RS # literally RS
.{192} # 192 any character
| # R
\G # restart from last match position
) # end group
\K # forget all we've seen until this position
. # 1 any character
(?= # positive lookahead, zero-length assertion to make sure we have after:
.{265,} # at least 256 any characters
$ # end of line
) # en lookahead
Replacement:
% # the character to insert
Given shorter line to illusrate:
RSabcdefghijklmnopqrstuvwxyz
Result for given example:
RSabcdefghij qrstuvwxyz
Screen shot:

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex replace in capture group - regex

Related

Notepad++ all instances of character under a condition

Notepad++ regexp ID of the sub-expressions in the double parentheses

I'm trying to delete everything but the email but I get the opposite

How can I move a column with variable length in between one vertical bar "|" and "["?

Notepad++ replace with spaces

Categories

Resources