I'm trying to clean up some assembly code and I'd like to convert the spaces between the instruction and argument to tabs. However, I'd like to avoid inadvertently converting the spaces between the words in the comments after the semicolon.
So here is an example some lines of code:
label: bcf INTCON,2 ; comment comment and more comment.
btfss PORTA,2
The closest I've come is (?<=^).+(?=;). This not only matches EVERYTHING between the beginning of the line and the semicolon, but it includes all semicolons except for the very last semicolon. Imagine lines of codes with comments that was commented out. It also doesn't take into consideration line without comments.
How do I do this?
Maybe,
^([^:\r\n]+:)\s*([^\r\n]+?)(?:$|\s{2,})(;.*)?$
and a replacement of,
$1 $2 $3
might be OK to start with.
Demo
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Ctrl+H
Find what: ^(\w+:)\h+|^\h+
Replace with: (?1$1\t:\t\t)
check Wrap around
check Regular expression
Replace all
Explanation:
^ # beginning of line
(\w+:) # group 1, 1 or more word characters followed by colon
\h+ # 1 or more horizontal spaces
| # OR
^ # beginning of line
\h+ # 1 or more horizontal spaces
Replacement:
(?1 # if group 1 exists, then
$1\t # content of group 1 and a tab
: # else
\t\t # 2 tabs
) # end conditional replace
Screen capture:
If you want to change the space between bcf and INTCON,2 to 2 tabs, you might match the 2 "words" and make sure that they don't start with a ;
^(?:\S+:)?\h+(?!;)\S+\K\h+(?=[^\s;])
^ Start of string
(?:\S+:)? Optionally match 1+ non whitespace chars and :
\h+(?!;) Match 1+ horizontal whitespace chars, then assert what is on the right is not a ;
\S+\K Match 1+ non whitespace chars, forget what was matched
\h+ Match 1+ horizontal whitespace chars (this match will be replaced)
(?=[^\s;]) Assert what is on the right is not a whitespace char or ;
In the replacement use 2 tabs \t\t
Regex demo
Edit
If you want to find the first space between non whitespace chars, you might use
^.*?\S\K (?=\S)
Related
This is my text
BROKEN This is a "sentence".
This sentence is an actual normal sentence.
I wish to replace/filter the quotation marks out of every line that has the word BROKEN in it
I thought this would be simple but I couldn't do it
my regex
(?=BROKEN)"
could I get some help?
If you also want to match double quotes before the word BROKEN, you can skip the whole line that does not contain the word.
Find what:
^(?!.*\bBROKEN\b).*\R?(*SKIP)(*F)|"
Replace with: (leave empty)
Explanation
^ Start of string
(?!.*\bBROKEN\b) Negative lookahead, assert that the word BROKEN does not occur
.*\R?(*SKIP)(*F) Match the whole line including an optional newline and skip the match
| Or
" Match a double quote
See a regex101 demo.
Before
After
Ctrl+H
Find what: (?:^.*?\bBROKEN\b|\G(?!^))[^"\r\n]*\K"
Replace with: LEAVE EMPTY
TICK Match case
TICK Wrap around
SELECT Regular expression
UNTICK . matches newline
Replace all
Explanation:
(?: # non capture group
^ # beginning of line
.*? # 0 or more any character but newline
\bBROKEN\b # literally
| # OR
\G # restart from last match position
(?!^) # not at the beginning of line
) # end group
[^"\r\n]* # 0 or more any character that is not a quote or linebreak
\K # forget all we have seen until this position
" # quote
Screenshot (before):
Screenshot (after):
I have strings that look like some text - other text and I need to delete everything before and including the hyphen - and the space after it
But do to typos I might have :
some text -other text or some text- other text or some text-other text or double spaces instead of single spaces
I am using RegEx ^.*\s+\-\s+ and this works for some text - other text with single or multiple spaces before and after the -
But for the other possibilities where the whitespace is missing, I have used two or so I have ^.*\s+\-\s+|.*\-\s|.*\-
Is there a more concise patter that does not use multiple ors for this?
Thank you for any help on this
https://regex101.com/r/TNU7i6/1
Instead of using an alternation with 3 patterns, you might use a pattern to match all except the -, then match the - and optional whitespace chars.
^[^-]*-\s*
Regex demo
If there should be a non whitespace char following, and a lookahead is supported:
^[^-]*-\s*(?=\S)
^ Start of string
[^-]*- Match 0+ times any char except -, then match -
\s* Match optional whitespace chars
(?=\S) Positive lookahead, assert a non whitespace char to the right
Regex demo
Note that \s and the negated character class [^-] can also match a newline.
1st solution: With your shown samples, please try following.
^.*?\s+\S+\s?-\s*(.*)$
OR
^.*?\s+\S+\s*-\s*(.*)$
Online demo for above regex
2nd solution: You could use \K option too to forget matched regex part, in that case try:
^.*?\s+\S+\s?-\s*\K.*$
OR
^.*?\s+\S+\s*-\s*\K.*$
Online demo for above regex
1st solution explanation:
^.*?\s+ ##From starting of value matching till 1st occurrence of space(s).
\S+\s? ##Matching 1 or more non-space occurrences followed by optional space here.
-\s* ##Matching - followed by optional space.
(.*)$ ##Matching everything till last of value.
2nd solution explanation:
^.*?\s+ ##Matching everything till 1st space occurrence(s) from starting of value.
\S+\s? ##Matching non spaces 1 or more occurrences followed by space optional.
-\s*\K ##Matching - followed by spaces(0 or more occurrences) and \K will discard all previous matched values(so that we can match exact values as per output).
.*$ ##Matching everything after previously matched values(which is discarded by \K).
Probably a terrible title.
I am trying to take the following:
Joe Dane
Bob Sagget
Whitney Houston
Some
Other
Test
And trying to produce:
JOE_DANE("Joe Dane"),
BOB_SAGGET("Bob Sagget"),
WHITNEY_HOUSTON("Whitney Houston"),
SOME("Some"),
OTHER("Other"),
TEST("Test"),
I'm using Notepad++ and am close but not good enough at regex to figure out the remaining expression. So far, this is what I have:
Find what: (^.*)
Replace with: \1 \(\"\1\"\),
Produces: Joe Dane("Joe Dane"),
I've tried replacing with: \U$1 \(\"\1\"\), but this also impacts the second instance of \1 with upper case. It also does not replace the whitespace with an underscore _.
This can be done in a single step.
If you don't have more than 2 words in a line:
Ctrl+H
Find what: ^(\S+)(?: (\S+))?$
Replace with: \U$1(?2_$2)\E\("$0"\),
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
^ # beginning of line
(\S+) # group 1, 1 or more non space
(?: (\S+))? # non capture group, a space, group 2, 1 or more non space, optional
$
Replacement:
\U # uppercased
$1 # group 1
(?2_$2) # if group 2 exists, add and underscore before
\E # end uppercase
\("$0"\), # the whole match with parens and quote
Screenshot (after):
If you have more than 2 words (up to 5), use:
Find ^(\S+)(?: (\S+))?(?: (\S+))?(?: (\S+))?(?: (\S+))?
Replace: \U$1(?2_$2)(?3_$3)(?4_$4)(?5_$5)\E\("$0"\),
I you have more thans five word, add as many (?: (\S+))? as needed.
You might do it in 2 steps, first matching any char 1+ more times from the start of the string.
Find what
^.+
For the first replacement you can use \E to end the activation of \U and use the full match $0
Replace with
\U$0\E\("$0"\),
For the second step, to replace the spaces with underscores, you could skip over the text between parenthesis, and match spaces between uppercase chars.
Find what
\(".*?"\)(*SKIP)(*F)|[A-Z]+\K\h+(?=[A-Z])
\(".*?"\) Match from (" till ")
(*SKIP)(*F)| Skip this part of the match
[A-Z]+\K Match uppercase chars and use \K to clear the current match buffer (forget what is matches do far)
\h+(?=[A-Z]) Match 1+ horizontal whitespace chars and assert an uppercase char to the right
Replace with _
Given the following ; delimited string
a;; z
toy;d;hh
toy
;b;;jj
z;
d;23
d;23td
;;io;
b y;b;12
z
a;b;bb;;;34
z
and this regex
^(?!(?:(a|d))(?:;|$)).*(\s*\z|$)\R*
I am looking to get the full lines whose 1st. column is not a or d, and have the matching lines removed, to get this , after substituting with empty
a;; z
d;23
d;23td
a;b;bb;;;34
Please see the demo
In the Substitution panel, there is a 5th empty line, which needs to be removed.
I have used this \s*\z in this past for this purpose. As implemented here, it does not seem to work.
Any help is appreciated
I think the reason your regex won't remove the last newline, is that it is part of the end of the last part that you want to keep, so without matching it you can't remove it.
So I rewrote the regex to match the line you want to keep, but also to include everything above and below the match that is not another match.
The key difference is using a conditional to only match the newline of the group you want to keep if it is followed by another match.
regex (linebreaks for readability):
((?!(a|d)).*(\s*\z|$)\R*)*
(^(a|d).*(?(?=\R*(.*\s*\R+)*(a|b))\R))
((?!(a|d)).*(\s*\z|$)\R*)*
replace with $4 -->
a;; z
d;23
d;23td
a;b;bb;;;34
For readability I removed some of the non-capturing and string separator logic you had, if they are necessary you can add them back in.
Logic breakdown of the parts:
(?(?=\R*(.*\s*\R+)*(a|b))\R) is the conditional, it only matches the newline \R if (?) it is followed by (?=) any non-matching lines (.*\s*\R+)* that end in a newline followed by (a|b).
The middle part (^(a|d).*(?(?=\R*(.*\s*\R+)*(a|b))\R)) containing this ends up as the replacing group $4. It thus matches lines starting with (a|d), and all but the last match also match the newline at the end of their line.
The beginning and end of the regex ((?!(a|d)).*(\s*\z|$)\R*)* is exactly the same, and matches of all the unneeded stuff so that it gets removed.
You could match what you want to remove, and capture in a group what you want to keep.
To prevent removing the newline sequences between capturing groups, you could use an if clause (? to only match 0+ unicode newline sequences when there is no more line following that starts with [ad];
In the replacement use group 1 $1
^(?:(?![ad];).*\R*)*|^([ad];.*(?:\R[ad];.*)*)(?(?![\s\S]*\R[ad];)\R*)
Explanation
^ Start of line
(?: Non capture group
(?![ad];) If the line does not start with a or d followed by ;
.*\R* Match the whole line and 0+ times a unicode newline sequence
)* Close group and repeat 0+ times to match all consecutive lines
| Or
^ Start of line
( Capture group 1
[ad];.* Match a or d followed by ; and the rest of the line
(?: Non capture group
\R[ad];.* Match newline, a or d followed by ; and the rest of the line
)* Close group and repeat 0+ times to match all consecutive lines
) Close group 1
(? If clause, only match a unicode newline sequence if the [ad]; pattern does not occur anymore
(?! Negative lookahead, assert what follows is not
[\s\S]*\R[ad]; Match the [ad]; pattern
) Close lookahead.
\R* If the assertion is true, Match 0+ unicode newline sequences
) Close if clause
See a Regex demo
I would like to know how to capture text only if the beginning of a line matching a certain string... but i dont want to capture the begining string...
for example if i have the text:
BEGIN_TAG: Text To Capture
WRONG_TAG: Text Not to Capture
i want to capture:
Text To Capture
From the line that begin with BEGIN_TAG: not the line that begin with WRONG_TAG:
I know the how to select the line that begin with the desired text: ^BEGIN_TAG:\W?(.*)
but this selects the text "BEGIN_TAG:"... i dont want this only want the text after "BEGIN_TAG"
I am using PCRE regex
Instead of a positive lookbehind that does not allow unknown width patterns, you may use a match reset operator \K:
^BEGIN_TAG:\W?\K.*
See the regex demo
Details:
^ - in Sublime, start of a line
BEGIN_TAG: - a string of literal chars
\W? - 1 or 0 non-word chars
\K - the match reset operator that discards all text matched so far
.* - any 0+ chars other than linebreak characters (the rest of the line) that are the only chars that will be kept in the matched text.
You can use lookbehind. Then, the text in the lookbehind group isn't part of the whole match. You can see it as an anchor like \b, ^, etc.
You then get:
(?<=^BEGIN_TAG:\W)(\w.*)$
Explained:
(?<= # Positive lookbehind group
^ # Start of line / string
BEGIN_TAG: # Literal
\W # A non-word character ([^a-zA-Z_])
)
( # First and only matching group (probably not needed)
\w # A word character ([a-zA-Z_])
.* # Any character, any number of times
)
$ # End of line / string