This is my text
BROKEN This is a "sentence".
This sentence is an actual normal sentence.
I wish to replace/filter the quotation marks out of every line that has the word BROKEN in it
I thought this would be simple but I couldn't do it
my regex
(?=BROKEN)"
could I get some help?
If you also want to match double quotes before the word BROKEN, you can skip the whole line that does not contain the word.
Find what:
^(?!.*\bBROKEN\b).*\R?(*SKIP)(*F)|"
Replace with: (leave empty)
Explanation
^ Start of string
(?!.*\bBROKEN\b) Negative lookahead, assert that the word BROKEN does not occur
.*\R?(*SKIP)(*F) Match the whole line including an optional newline and skip the match
| Or
" Match a double quote
See a regex101 demo.
Before
After
Ctrl+H
Find what: (?:^.*?\bBROKEN\b|\G(?!^))[^"\r\n]*\K"
Replace with: LEAVE EMPTY
TICK Match case
TICK Wrap around
SELECT Regular expression
UNTICK . matches newline
Replace all
Explanation:
(?: # non capture group
^ # beginning of line
.*? # 0 or more any character but newline
\bBROKEN\b # literally
| # OR
\G # restart from last match position
(?!^) # not at the beginning of line
) # end group
[^"\r\n]* # 0 or more any character that is not a quote or linebreak
\K # forget all we have seen until this position
" # quote
Screenshot (before):
Screenshot (after):
Related
All.
I have some data with some improper line breaks. I would like to search and replace any CR LF that is not followed by an 8 digit number and a pipe.
For example:
12345678|Text|Text CRLF
123.4567|Text|Text CRLF
Text|4567890|Text
This text above should change to:
12345678|Text|Text 123.4567|Text|Text Text|4567890|Text
I have tried the following:
\r\n([^[0-9]{8}\|])
Any help is very appreciated.
Ctrl+H
Find what: \R(?!\d{8}\|)
Replace with: LEAVE EMPTY
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
\R # any kind of linebreak, you can use \r\n if you want to replace ONLY \r\n
(?! # negative lookahead, make we haven't after:
\d{8} # 8 digit
\| # a pipe
) # endd lookahead
Screenshot (before):
Screenshot (after):
Probably a terrible title.
I am trying to take the following:
Joe Dane
Bob Sagget
Whitney Houston
Some
Other
Test
And trying to produce:
JOE_DANE("Joe Dane"),
BOB_SAGGET("Bob Sagget"),
WHITNEY_HOUSTON("Whitney Houston"),
SOME("Some"),
OTHER("Other"),
TEST("Test"),
I'm using Notepad++ and am close but not good enough at regex to figure out the remaining expression. So far, this is what I have:
Find what: (^.*)
Replace with: \1 \(\"\1\"\),
Produces: Joe Dane("Joe Dane"),
I've tried replacing with: \U$1 \(\"\1\"\), but this also impacts the second instance of \1 with upper case. It also does not replace the whitespace with an underscore _.
This can be done in a single step.
If you don't have more than 2 words in a line:
Ctrl+H
Find what: ^(\S+)(?: (\S+))?$
Replace with: \U$1(?2_$2)\E\("$0"\),
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
^ # beginning of line
(\S+) # group 1, 1 or more non space
(?: (\S+))? # non capture group, a space, group 2, 1 or more non space, optional
$
Replacement:
\U # uppercased
$1 # group 1
(?2_$2) # if group 2 exists, add and underscore before
\E # end uppercase
\("$0"\), # the whole match with parens and quote
Screenshot (after):
If you have more than 2 words (up to 5), use:
Find ^(\S+)(?: (\S+))?(?: (\S+))?(?: (\S+))?(?: (\S+))?
Replace: \U$1(?2_$2)(?3_$3)(?4_$4)(?5_$5)\E\("$0"\),
I you have more thans five word, add as many (?: (\S+))? as needed.
You might do it in 2 steps, first matching any char 1+ more times from the start of the string.
Find what
^.+
For the first replacement you can use \E to end the activation of \U and use the full match $0
Replace with
\U$0\E\("$0"\),
For the second step, to replace the spaces with underscores, you could skip over the text between parenthesis, and match spaces between uppercase chars.
Find what
\(".*?"\)(*SKIP)(*F)|[A-Z]+\K\h+(?=[A-Z])
\(".*?"\) Match from (" till ")
(*SKIP)(*F)| Skip this part of the match
[A-Z]+\K Match uppercase chars and use \K to clear the current match buffer (forget what is matches do far)
\h+(?=[A-Z]) Match 1+ horizontal whitespace chars and assert an uppercase char to the right
Replace with _
I have a snippet of text from EDI X12. I am trying to find lines where a BBQ segment is followed by another BBQ segment. I want to replace all BBQ segments in the second line with BBB
Orig text
HI*BBR<0Y6D0Z1<D8<20190816~
HI*BBQ<05BC0ZZ<D8<20190806*BBQ<05BB0ZZ<D8<20190729*BBQ<06UM07Z<D8<20190729~
HI*BBQ<0JBL0ZZ<D8<20190809*BBQ<0J9N0ZZ<D8<20190816*BBQ<0KBS0ZZ<D8<20190816~
HI*BI<71<RD8<20190716-20190722~
Needs to become
HI*BBR<0Y6D0Z1<D8<20190816~
HI*BBQ<05BC0ZZ<D8<20190806*BBQ<05BB0ZZ<D8<20190729*BBQ<06UM07Z<D8<20190729~
HI*BBB<0JBL0ZZ<D8<20190809*BBB<0J9N0ZZ<D8<20190816*BBB<0KBS0ZZ<D8<20190816~
HI*BI<71<RD8<20190716-20190722~
This targets what I am looking for in capturing group 3, but how to replace BBQ with BBB within that group?
(^HI\*BBQ.+?~\r\n)(^HI\*)(BBQ.+?~\r\n)
Thanks for any ideas!
Ctrl+H
Find what: (?:^HI\*BBQ\b.+?~\RHI\*BB|\G(?!^).*?\bBB)\KQ\b
Replace with: B
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # non capture group
^ # begining of line
HI\*BBQ # literally
.+? # 1 or more any character but newline
~ # a tilde
\R # any kind of linebreak
HI\*BB # literally
| # OR
\G # restart from last match position
(?!^) # not at the beginning of line
.*?BB # 0 or more any character but newline, not greedy, followed by BB
) # end group
\K # forget all we have seen until this position
Q # the letter Q
Screen capture (before):
Screen capture (after):
I'm trying to clean up some assembly code and I'd like to convert the spaces between the instruction and argument to tabs. However, I'd like to avoid inadvertently converting the spaces between the words in the comments after the semicolon.
So here is an example some lines of code:
label: bcf INTCON,2 ; comment comment and more comment.
btfss PORTA,2
The closest I've come is (?<=^).+(?=;). This not only matches EVERYTHING between the beginning of the line and the semicolon, but it includes all semicolons except for the very last semicolon. Imagine lines of codes with comments that was commented out. It also doesn't take into consideration line without comments.
How do I do this?
Maybe,
^([^:\r\n]+:)\s*([^\r\n]+?)(?:$|\s{2,})(;.*)?$
and a replacement of,
$1 $2 $3
might be OK to start with.
Demo
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Ctrl+H
Find what: ^(\w+:)\h+|^\h+
Replace with: (?1$1\t:\t\t)
check Wrap around
check Regular expression
Replace all
Explanation:
^ # beginning of line
(\w+:) # group 1, 1 or more word characters followed by colon
\h+ # 1 or more horizontal spaces
| # OR
^ # beginning of line
\h+ # 1 or more horizontal spaces
Replacement:
(?1 # if group 1 exists, then
$1\t # content of group 1 and a tab
: # else
\t\t # 2 tabs
) # end conditional replace
Screen capture:
If you want to change the space between bcf and INTCON,2 to 2 tabs, you might match the 2 "words" and make sure that they don't start with a ;
^(?:\S+:)?\h+(?!;)\S+\K\h+(?=[^\s;])
^ Start of string
(?:\S+:)? Optionally match 1+ non whitespace chars and :
\h+(?!;) Match 1+ horizontal whitespace chars, then assert what is on the right is not a ;
\S+\K Match 1+ non whitespace chars, forget what was matched
\h+ Match 1+ horizontal whitespace chars (this match will be replaced)
(?=[^\s;]) Assert what is on the right is not a whitespace char or ;
In the replacement use 2 tabs \t\t
Regex demo
Edit
If you want to find the first space between non whitespace chars, you might use
^.*?\S\K (?=\S)
Using a regex in Notepad++ I am trying to replace 53 characters on a line with spaces:
Find: (^RS.{192})(.{53})(.{265})
Replace: \1(\x20){53}\3
It's replacing group \2 with " {53}" but what I want is 53 spaces.
How do you do this?
Replacement terms are not regex expressions, except they may use back references.
Just code 53 literal spaces:
Replace: \1 \3
A bit tedious, but it works.
space is \s
which means you need to use \s{53}
Assuming there is ALLWAYS RS and 192 characters before and 265 after
Ctrl+H
Find what: (?:^RS.{192}|\G)\K.(?=.{265,}$)
Replace with: # a space
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # start non capture group
^ # beginning of line
RS # literally RS
.{192} # 192 any character
| # R
\G # restart from last match position
) # end group
\K # forget all we've seen until this position
. # 1 any character
(?= # positive lookahead, zero-length assertion to make sure we have after:
.{265,} # at least 256 any characters
$ # end of line
) # en lookahead
Replacement:
% # the character to insert
Given shorter line to illusrate:
RSabcdefghijklmnopqrstuvwxyz
Result for given example:
RSabcdefghij qrstuvwxyz
Screen shot: