Split comma separated list on separate line (notepad++ / regex) - regex

I have a few files where each file has some text which has a description and list of tags.
I would like to manipulate the tags in the text with notepad++ and regular expressions in each file.
I could easily replace the commas with /r/n, but that would also take into account the description part where there are also commas and I want to keep that intact. I only need to manipulate the tag part.
Plus, there is not always the same amount of tags (sometimes there are 4, sometimes more, it varies).
Original input text:
Description: blah, blah, blah, slsls,
tag:
- hello, bye, Thanks, etc, Notepad
Desired output text:
Description: blah, blah, blah, slsls,
tag:
- hello
- bye
- thanks
- etc
- notepad
Any idea how I could achieve this? thanks much

Ctrl+H
Find what: (^tag:\s+|\G)[,-]\h*(\w+)
Replace with: $1\t- $2\n
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
( # start group 1
^ # beginning of line
tag: # literally
\s+ # 1 or more spaces
| # OR
\G # restart from last match position
) # end group
[,-] # comma or hyphen
\h* # 0 or more horizontal spaces
(\w+) # group 2, 1 or more word character (you can use [^\s,])
Replacement:
$1 # content of group 1
\t # a tabulation
- # a hyphen followed by a space
$2 # content of group 2
\n # linefeed
Screen capture (before):
Screen capture (after):

Related

Regex to disregard partial matches across lines / matching too much

I have three lines of tab-separated values:
SELL 2022-06-28 12:42:27 39.42 0.29 11.43180000 0.00003582
BUY 2022-06-28 12:27:22 39.30 0.10 3.93000000 0.00001233
_____2022-06-28 12:27:22 39.30 0.19 7.46700000 0.00002342
The first two have 'SELL' or 'BUY' as first value but the third one has not, hence a Tab mark where I wrote ______:
I would like to capture the following using Regex:
My expression ^(BUY|SELL).+?\r\n\t does not work as it gets me this:
I do know why outputs this - adding an lazy-maker '?' obviously won't help. I don't get lookarounds to work either, if they are the right means at all. I need something like 'Match \r\n\t only or \r\n(?:^\t) at the end of each line'.
The final goal is to make the three lines look at this at the end, so I will need to replace the match with capturing groups:
Can anyone point me to the right direction?
Ctrl+H
Find what: ^(BUY|SELL).+\R\K\t
Replace with: $1\t
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(BUY|SELL) # group 1, BUY or SELL
.+ # 1 or more any character but newline
\R # any kind of linebreak
\K # forget all we have seen until this position
\t # a tabulation
Replacement:
$1 # content of group 1
\t # a tabulation
Screenshot (before):
Screenshot (after):
You can use the following regex ((BUY|SELL)[^\n]+\n)\s+ and replace with \1\2.
Regex Match Explanation:
((BUY|SELL)[^\n]+\n): Group 1
(BUY|SELL): Group 2
BUY: sequence of characters "BUY" followed by a space
|: or
SELL: sequence of characters "SELL" followed by a space
[^\n]+: any character other than newline
\n: newline character
\s+: any space characters
Regex Replace Explanation:
\1: Reference to Group 1
\2: Reference to Group 2
Check the demo here. Tested on Notepad++ in a private environment too.
Note: Make sure to check the "Regular expression" checkbox.
Regex

Regex to generate dynamic sql

I want to generate dynamic sql on Notepad++ based on some rules. These rules include everything, so no sql knowledge is needed, and are the following:
Dynamic sql must have each single quote escaped by another single quote ( 'hello' becomes ''hello'')
Each line should begin with "+#lin"
If a line has only whitespace, nothing should be following the "+#lin", despite following rules
Replace each \t directly following "+#lin" with "+#tab"
Add " +' " after the #lin/#tab sequence
Add a single quote at the end of line
So, as an example, this input:
select 1,'hello'
from --two tabs exist after from
table1
should become:
+#lin+'select 1,''hello'''
+#lin+'from --two tabs exist after from'
+#lin
+#lin+#tab+'table1'
What I have for now is the following 4 steps:
Replace single quote with double quotes to cover rule 1
Replace ^(\t*)(.*)$ with \+#lin\1\+'\2' to cover rules 2,5,6
Replace \t with \+#tab to cover rule 4
Replace (\+#tab)*\+''$ with nothing to cover rule 3
Notice that this mostly works, except for the third replacement, which replaces all tabs, and not only the ones at the beginning. I tried (?<=^\t*)\t with no success- it matches nothing.
I'm looking for a solution which satisfies the rules in as few replacement steps as possible.
After replacing single quotes with 2 quotes, you can do the rest in a single step:
Not very elegant for processing multiple TABs, but it works.
Ctrl+H
Find what: ^(?:(\t)(\t)?(\t)?(\t)?(\t)?(\S.*)|\h*|(.+))$
Replace with: +#lin(?1+#tab+(?2#tab+)(?3#tab+)(?4#tab+)(?5#tab+)'$6')(?7+'$7')
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(?: # non capture group
(\t) # group 1, tabulation
(\t)? # group 2, tabulation, optional
(\t)? # group 3, tabulation, optional
(\t)? # group 4, tabulation, optional
(\t)? # group 5, tabulation, optional
(\S.*) # group 6, a non-space character followed by 0 or more any character but newline
| # OR
\h* # 0 or more horizontal spaces
| # OR
(.+) # group 7, 1 or more any character but newline
) # end group
$ # end of line
Replacement:
+#lin # literally
(?1 # if group 1 exists
+#tab+ # add this
(?2#tab+) # if group 2 exists, add a second #tab+
(?3#tab+) # id
(?4#tab+) # id
(?5#tab+) # id
'$6' # content of group 6 with single quotes
) # endif
(?7 # if group 7 exists
+ # plus sign
'$7' # content of group 3 with single quotes
) # endif
Screenshot (before):
Screenshot (after):
You can use three substitutions here, it is not quite possible (without additional assumptions) to reduce the number of steps here since you need to replace at the same positions.
Step 1: Replace single quotes with double - ' with ''. No regex so far, but you can have the regex checkbox on.
Step 2: Add +#lin+ at the start of the line and only wrap its contents with ' if there is any non-whitespace char on the line (while keeping all TABs before the first '):
Find What: ^(\t*+)(\h*\S)?+(.*)
Replace With: +#lin+$1(?2'$2$3':)
Details:
^ - start of a line
(\t*+) - Group 1 ($1): zero or more TABs
(\h*\S)?+ - Group 2 ($2): an optional sequence of any zero or more horizontal whitespace chars and then a non-whitespace char
(.*) - Group 3 ($3): the rest of the line
+#lin+$1(?2'$2$3':) - replaces the match with +#lin+ + Group 1 value (i.e. tabs found), and then - only if Group 2 matches - ' + Group 2 + Group 3 values + '
Step 3: Replace each TAB after +#lin+ with #tab+:
Find What: (\G(?!^)|^\+#lin\+)\t
Replace With: $1#tab+
Details:
(\G(?!^)|^\+#lin\+) - Group 1: either
\G(?!^) - end of the previous match
| - or
^\+#lin\+ - start of a line and +#lin+ string
\t - a TAB char.
The replacement is the concatenation of Group 1 value and #tab+ string.
See this regex online demo.

Notepad++ and regex - how to title case string between two particular strings?

I have hundreds of bib references in a file, and they have the following syntax:
#article{tabata1999precise,
title={Precise synthesis of monosubstituted polyacetylenes using Rh complex catalysts.
Control of solid structure and $\pi$-conjugation length},
author={Tabata, Masayoshi and Sone, Takeyuchi and Sadahiro, Yoshikazu},
journal={Macromolecular chemistry and physics},
volume={200},
number={2},
pages={265--282},
year={1999},
publisher={Wiley Online Library}
}
I would like to title case (aka Proper Case) the journal name in Notepad++ using regular expression. For example, from Macromolecular chemistry and physics to Macromolecular Chemistry and Physics.
I am able to find all instances using:
(?<=journal\=\{).*?(?=\})
but I am unable to change the case via Edit > Convert Case to. Apparently it doesn't work on find all and I have to go one by one.
Next, I tried recording and running a macro but Notepad++ just hangs indefinitely when I try to run it (option to run until the end of the file).
So my question is: does anyone know the replace regex syntax I could use to change the case? Ideally, I would also like to use "|" exclusions for particular words such as " of ", " an ", " the ", etc. I tried to play with some of the examples provided here, but I was not able to integrate it into my look-aheads.
Thank you in advance, I'd appreciate any help.
This works for any number of words:
Ctrl+H
Find what: (?:journal={|\G)\K(?:(\w{4,})|(\w+))(\h*)
Replace with: \u$1\E$2$3
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(?: # non capture group
journal={ # literally
| # OR
\G # restart from last match position
) # end group
\K # forget all we have seen until this position
(?: # non capture group
(\w{4,}) # group 1, a word with 4 or more characters
| # OR
(\w+) # group 2, a word of any length
) # end group
(\h*) # group 3, 0 or more horizontal spaces
Replacement:
\u # uppercased the first letter of the following
$1 # content of group 1
\E # stop the uppercased
$2 # content of group 2
$3 # content of group 3
Screenshot (before):
Screenshot (after):
if the format is always in the form:
journal={Macromolecular chemistry and physics},
i.e. journal followed by 3 words then use the following:
Find: journal={(\w+)\s*(\w+)\s*(\w+)\s*(\w+)
Replace with: journal={\u\1 \u\2 \l\3 \u\4
You can modify that if you have more words to replace by adding more \u\x, where x is the position of the word.
Hope it helps to give you an idea to move forward for a better solution.
\u translates the next letter to uppercase (used for all other words)
\l translates the next letter to lowercase (used for the word "and")
\1 replaces the 1st captured () search group
\2 replaces the 2nd captured () search group
\3 replaces the 3rd captured () search group

Regex finding all commas between two words

I trying to clean up a large .csv file that contains many comma separated words that I need to consolidate parts of. So I have a subsection where I want to change all the commas to slashes. Lets say my file contains this text:
Foo,bar,spam,eggs,extra,parts,spoon,eggs,sudo,test,example,blah,pool
I want to select all commas between the unique words bar and blah. The idea is to then replace the commas with slashes (using find and replace), such that I get this result:
Foo,bar,spam/eggs/extra/parts/spoon/eggs/sudo/test/example,blah,pool
As per #EganWolf input:
How do I include words in the search but exclude them from the selection (for the unique words) and how do I then match only the commas between the words?
Thus far I have only managed to select all the text between the unique words including them:
bar,.*,blah, bar:*, *,blah, (bar:.+?,blah)*,*\2
I experimented with negative look ahead but cant get any search results from my statements.
Using Notepad++, you can do:
Ctrl+H
Find what: (?:\bbar,|\G(?!^))\K([^,]*),(?=.+\bblah\b)
Replace with: $1/
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # start non capture group
\bbar, # word boundary then bar then a comma
| # OR
\G # restart from last match position
(?!^) # negative lookahead, make sure not followed by beginning of line
) # end group
\K # forget all we've seen until this position
([^,]*) # group 1, 0 or more non comma
, # a comma
(?= # positive lookahead
.+ # 1 or more any character but newlie
\bblah\b # word boundary, blah, word boundary
) # end lookahead
Result for given example:
Foo,bar,spam/eggs/extra/parts/spoon/eggs/sudo/test/example,blah,pool
Screen capture:
The following regex will capture the minimally required text to access the commas you want:
(?<=bar,)(.*?(,))*(?=.*?,blah)
See Regex Demo.
If you want to replace the commas, you will need to replace everything in capture group 2. Capture group 0 has your entire match.
An alternative approach would be to split your string by comma to create an array of words. Then join words between bar and blah using / and append the other words joined by ,.
Here is a PowerShell example of split and join:
$a = "Foo,bar,spam,eggs,extra,parts,spoon,eggs,sudo,test,example,blah,pool"
$split = $a -split ","
$slashBegin = $split.indexof("bar")+1
$commaEnd = $split.indexof("blah")-1
$str1 = $split[0..($slashbegin-1)] -join ","
$str2 = $split[($slashbegin)..$commaend] -join "/"
$str3 = $split[($commaend+1)..$split.count] -join ","
#($str1,$str2,$str3) -join ","
Foo,bar,spam/eggs/extra/parts/spoon/eggs/sudo/test/example,blah,pool
This could easily be made into a function with your entire line and keywords as inputs.

Multiple search & replace - Notepad ++ (regex)

I have a list of words, for example:
Good -> Bad
Sky -> Blue
Gray -> Black
etc...
What is the best why to do find&replace in notepad++?
I tried this:
FIND: (Good)|(Sky)|(Gray)
Replace: (?1Bad)(?2Blue)(?3Black)
but it doesn't work :(
any idea? or suggestions ?
There is however a workaround if you add this newline at the end of your text (it must be the last line, so don't press enter at the end):
#Good:Bad#Sky:Blue#Gray:Black#
and if you use this pattern:
(Good|Blue|Black)(?=(?:.*\R)++#(?>[^#]+#)*?\1:([^#]+))|\R.++(?!\R)
with this replacement:
$2
pattern details:
(Good|Blue|Black) # this part capture the word in group 1
(?= # then we reach the last line in a lookakead
(?:.*\R)++ # match all the lines until the last line
#(?>[^#]+#)*? # advance until the good value is found
\1 # the good value (backreference to the capture group 1)
: ([^#]+) # capture the replacement in group 2
) # close the lookbehind
| # OR
\R.++(?!\R) # match the last line (to remove it)
Note: to make the pattern more efficient, you can put it in a non capturing group and add a lookahead at the begining with all the first possible characters to quickly discard useless positions in the string:
(?=[GB\r\n])(?:\b(Good|Blue|Black)\b(?=(?:.*\R)++#(?>[^#]+#)*?\1:([^#]+))|\R.++(?!\R))