Using regex to duplicate a selection and replacing some characters - regex

Probably a terrible title.
I am trying to take the following:
Joe Dane
Bob Sagget
Whitney Houston
Some
Other
Test
And trying to produce:
JOE_DANE("Joe Dane"),
BOB_SAGGET("Bob Sagget"),
WHITNEY_HOUSTON("Whitney Houston"),
SOME("Some"),
OTHER("Other"),
TEST("Test"),
I'm using Notepad++ and am close but not good enough at regex to figure out the remaining expression. So far, this is what I have:
Find what: (^.*)
Replace with: \1 \(\"\1\"\),
Produces: Joe Dane("Joe Dane"),
I've tried replacing with: \U$1 \(\"\1\"\), but this also impacts the second instance of \1 with upper case. It also does not replace the whitespace with an underscore _.

This can be done in a single step.
If you don't have more than 2 words in a line:
Ctrl+H
Find what: ^(\S+)(?: (\S+))?$
Replace with: \U$1(?2_$2)\E\("$0"\),
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
^ # beginning of line
(\S+) # group 1, 1 or more non space
(?: (\S+))? # non capture group, a space, group 2, 1 or more non space, optional
$
Replacement:
\U # uppercased
$1 # group 1
(?2_$2) # if group 2 exists, add and underscore before
\E # end uppercase
\("$0"\), # the whole match with parens and quote
Screenshot (after):
If you have more than 2 words (up to 5), use:
Find ^(\S+)(?: (\S+))?(?: (\S+))?(?: (\S+))?(?: (\S+))?
Replace: \U$1(?2_$2)(?3_$3)(?4_$4)(?5_$5)\E\("$0"\),
I you have more thans five word, add as many (?: (\S+))? as needed.

You might do it in 2 steps, first matching any char 1+ more times from the start of the string.
Find what
^.+
For the first replacement you can use \E to end the activation of \U and use the full match $0
Replace with
\U$0\E\("$0"\),
For the second step, to replace the spaces with underscores, you could skip over the text between parenthesis, and match spaces between uppercase chars.
Find what
\(".*?"\)(*SKIP)(*F)|[A-Z]+\K\h+(?=[A-Z])
\(".*?"\) Match from (" till ")
(*SKIP)(*F)| Skip this part of the match
[A-Z]+\K Match uppercase chars and use \K to clear the current match buffer (forget what is matches do far)
\h+(?=[A-Z]) Match 1+ horizontal whitespace chars and assert an uppercase char to the right
Replace with _

Related

How to extract all the strings between 2 patterns using regex Notepad++?

Extract all the string between 2 patterns:
Input:
test.output0 testx.output1 output3 testds.output2(\t)
Output:
output0 output1 ouput3 output2
Note: (" ") is the tab character.
You may try:
\.\w+$
Explanation of the above regex:
\. - Matches . literally. If you do not want . to be included in your pattern; please use (?<=\.) or simply remove ..
\w+ - Matches word character [A-Za-z0-9_] 1 or more time.
$ - Represents end of the line.
You can find the demo of the regex in here.
Result Snap:
EDIT 2 by OP:
According to your latest edit; this might be helpful.
.*?\.?(\w+)(?=\t)
Explanation:
.*? - Match everything other than new line lazily.
\.? - Matches . literally zero or one time.
(\w+) - Represents a capturing group matching the word-characters one or more times.
(?=\t) - Represents a positive look-ahead matching tab.
$1 - For the replacement part $1 represents the captured group and a white-space to separate the output as desired by you. Or if you want to restore tab then use the replacement $1\t.
Please find the demo of the above regex in here.
Result Snap 2:
Try matching on the following pattern:
Find: (?<![^.\s])\w+(?!\S)
Here is an explanation of the above pattern:
(?<![^.\s]) assert that what precedes is either dot, whitespace, or the start of the input
\w+ match a word
(?!\S) assert that what follows is either whitespace of the end of the input
Demo

Using Notepad++ regex, match all spaces between specific characters

I'm trying to clean up some assembly code and I'd like to convert the spaces between the instruction and argument to tabs. However, I'd like to avoid inadvertently converting the spaces between the words in the comments after the semicolon.
So here is an example some lines of code:
label: bcf INTCON,2 ; comment comment and more comment.
btfss PORTA,2
The closest I've come is (?<=^).+(?=;). This not only matches EVERYTHING between the beginning of the line and the semicolon, but it includes all semicolons except for the very last semicolon. Imagine lines of codes with comments that was commented out. It also doesn't take into consideration line without comments.
How do I do this?
Maybe,
^([^:\r\n]+:)\s*([^\r\n]+?)(?:$|\s{2,})(;.*)?$
and a replacement of,
$1 $2 $3
might be OK to start with.
Demo
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Ctrl+H
Find what: ^(\w+:)\h+|^\h+
Replace with: (?1$1\t:\t\t)
check Wrap around
check Regular expression
Replace all
Explanation:
^ # beginning of line
(\w+:) # group 1, 1 or more word characters followed by colon
\h+ # 1 or more horizontal spaces
| # OR
^ # beginning of line
\h+ # 1 or more horizontal spaces
Replacement:
(?1 # if group 1 exists, then
$1\t # content of group 1 and a tab
: # else
\t\t # 2 tabs
) # end conditional replace
Screen capture:
If you want to change the space between bcf and INTCON,2 to 2 tabs, you might match the 2 "words" and make sure that they don't start with a ;
^(?:\S+:)?\h+(?!;)\S+\K\h+(?=[^\s;])
^ Start of string
(?:\S+:)? Optionally match 1+ non whitespace chars and :
\h+(?!;) Match 1+ horizontal whitespace chars, then assert what is on the right is not a ;
\S+\K Match 1+ non whitespace chars, forget what was matched
\h+ Match 1+ horizontal whitespace chars (this match will be replaced)
(?=[^\s;]) Assert what is on the right is not a whitespace char or ;
In the replacement use 2 tabs \t\t
Regex demo
Edit
If you want to find the first space between non whitespace chars, you might use
^.*?\S\K (?=\S)

REGEX input validation

I am trying to put together REGEX expression to validate the following format:
"XXX/XXX","XXX/XXX","XXX/XXX"
where X could be either a letter, a number, or dash or underscore. What i got so far is
"(.*?)(\/)(.*?)"(?:,|$)/g
but it does not seem to work
Update: there could be any number of "XXX/XXX" strings, comma-separated, not just 3
you can try the following regex:
"([\w-]+)\/([\w-]+)"
Edit: regex explained:
([\w-]+) in the square brackets we say we want to match \w: matches any word character (equal to [a-zA-Z0-9_]). After this, we have "-", which just adds literally the symbol "-" to the matching symbols.
"+" says we want one or more symbols from the previous block: [\w-]
\/ matches the symbol "/" directly. It should be escaped in the regex, that's why it is preceded by "\"
([\w-]+) exactly like point 1, matches the same thing since the two parts are identical.
() - those brackets mark capturing group, which you can later use in your code to get the value it surrounds and matches.
Example:
Full match: 1X-/-XX
Group 1: 1X-
Group 2: -XX
Here is a demo with the matching cases - click. If this doesn't do the trick, let me know in the comments.
This will do the job:
"[-\w]+/[-\w]+"(?:,"[-\w]+/[-\w]+")*
Explanation:
" # quote
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
/ # a slash
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
" # quote
(?: # non capture group
, # a comma
" # quote
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
/ # a slash
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
" # quote
)* # end group, may appear 0 or more times
Demo
Here, we would be starting with a simple expression with quantifiers:
("[A-Za-z0-9_-]+\/[A-Za-z0-9_-]+")(,|$)
where we collect our desired three chars in a char class, followed by slash and at the end we would add an optional ,.
Demo
RegEx Circuit
jex.im visualizes regular expressions:

Regex: remove all except first character and last number

I know that ^. is first character and (\d+)(?!.*\d) is last number. I've tried using | between these and have been trying to find code for the second character, but with no success.
This is in R.
Take for example:
'ABCD some random words and spaces 1234' should output 'A4' when I do
sub([regex here], "", 'ABCD some random words and spaces 1234')
If you used ^.|(\d+)(?!.*\d), the pattern would only match the first char and remove it with sub, and would remove the first char and the last 1+ digits if used with gsub without backreferences in the replacement pattern. See this pattern demo.
You can use
sub("^(.).*(\\d).*$", "\\1\\2", "ABCD some random words and spaces 1234")
See the R demo and the regex demo.
This TRE regex pattern matches:
^ - start of string
(.) - Group 1 capturing any char
.* - 0+ any chars as many as possible up to the last...
(\\d) - Group 2 capturing a digit
.* - the rest of the string
$ - end of string.
The \\1\\2 replacement pattern re-inserts the values captured with Group 1 and Group 2 back to the result.

Why does (?:\s)\w{2}(?:\s) not match only a 2 letter sub string with spaces around it not the spaces as well?

I am trying to make a regex that matches all sub strings under or equal to 2 characters with a space on ether side. What did I do wrong?
ex what I want to do is have it match %2 or q but not 123 and not the spaces.
update this \b\w{2}\b if it also matched one letter sub strings and did not ignore special characters like - or #.
You should use
(^|\s)\S{1,2}(?=\s)
Since you cannot use a look-behind, you can use a capture group and if you replace text, you can then restore the captured part with $1.
See regex demo here
Regex breakdown:
(^|\s) - Group 1 - either a start of string or a whitespace
\S{1,2} - 1 or 2 non-whitespace characters
(?=\s) - check if after 1 or 2 non-whitespace characters we have a whitespace. If not, fail.