select underlined text from rtf using regex - regex

i want to select the next piece of text which is underlined. You see the rtf of a richtextbox has following code for an underlines text :
\ul\i0 hello friend\ulnone\i
But the normal text looks like underlined. What i want to do is on a click of button the rtfbox should select the next piece of text which is underlined. An example piece of text is :
hello [friend your] house [looks] amazing.
imagine the words within square brackets are underlined. When i first click button1 "friend your" should be selected and on next click "looks" should be selected. Kind of keep moving forward and keep selecting it type of application. I know this can be done using regex but can't build a logic.
Any help will be appreciated. Thanks a lot :D

The regex would be
Dim pattern As String = "\\ul\\i0\s*((?:(?!\\ulnone\\i).)+)\\ulnone\\i"
Explanation
\\ul\\i0 # the sequence "\ul\i0"
\s* # any number of white space
( # begin group 1:
(?: # non-capturing group:
(?! # negative look-ahead ("not followed by..."):
\\ulnone\\i # the sequence "\ulnone\i"
) # end negative look-ahead
. # match next character (it is underlined)
)+ # end non-capturing group, repeat
) # end group 1 (it will contain all underlined characters)
\\ulnone\\i # the sequence "\ulnone\i"

Related

Notepad++ and regex - how to title case string between two particular strings?

I have hundreds of bib references in a file, and they have the following syntax:
#article{tabata1999precise,
title={Precise synthesis of monosubstituted polyacetylenes using Rh complex catalysts.
Control of solid structure and $\pi$-conjugation length},
author={Tabata, Masayoshi and Sone, Takeyuchi and Sadahiro, Yoshikazu},
journal={Macromolecular chemistry and physics},
volume={200},
number={2},
pages={265--282},
year={1999},
publisher={Wiley Online Library}
}
I would like to title case (aka Proper Case) the journal name in Notepad++ using regular expression. For example, from Macromolecular chemistry and physics to Macromolecular Chemistry and Physics.
I am able to find all instances using:
(?<=journal\=\{).*?(?=\})
but I am unable to change the case via Edit > Convert Case to. Apparently it doesn't work on find all and I have to go one by one.
Next, I tried recording and running a macro but Notepad++ just hangs indefinitely when I try to run it (option to run until the end of the file).
So my question is: does anyone know the replace regex syntax I could use to change the case? Ideally, I would also like to use "|" exclusions for particular words such as " of ", " an ", " the ", etc. I tried to play with some of the examples provided here, but I was not able to integrate it into my look-aheads.
Thank you in advance, I'd appreciate any help.
This works for any number of words:
Ctrl+H
Find what: (?:journal={|\G)\K(?:(\w{4,})|(\w+))(\h*)
Replace with: \u$1\E$2$3
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(?: # non capture group
journal={ # literally
| # OR
\G # restart from last match position
) # end group
\K # forget all we have seen until this position
(?: # non capture group
(\w{4,}) # group 1, a word with 4 or more characters
| # OR
(\w+) # group 2, a word of any length
) # end group
(\h*) # group 3, 0 or more horizontal spaces
Replacement:
\u # uppercased the first letter of the following
$1 # content of group 1
\E # stop the uppercased
$2 # content of group 2
$3 # content of group 3
Screenshot (before):
Screenshot (after):
if the format is always in the form:
journal={Macromolecular chemistry and physics},
i.e. journal followed by 3 words then use the following:
Find: journal={(\w+)\s*(\w+)\s*(\w+)\s*(\w+)
Replace with: journal={\u\1 \u\2 \l\3 \u\4
You can modify that if you have more words to replace by adding more \u\x, where x is the position of the word.
Hope it helps to give you an idea to move forward for a better solution.
\u translates the next letter to uppercase (used for all other words)
\l translates the next letter to lowercase (used for the word "and")
\1 replaces the 1st captured () search group
\2 replaces the 2nd captured () search group
\3 replaces the 3rd captured () search group

Split comma separated list on separate line (notepad++ / regex)

I have a few files where each file has some text which has a description and list of tags.
I would like to manipulate the tags in the text with notepad++ and regular expressions in each file.
I could easily replace the commas with /r/n, but that would also take into account the description part where there are also commas and I want to keep that intact. I only need to manipulate the tag part.
Plus, there is not always the same amount of tags (sometimes there are 4, sometimes more, it varies).
Original input text:
Description: blah, blah, blah, slsls,
tag:
- hello, bye, Thanks, etc, Notepad
Desired output text:
Description: blah, blah, blah, slsls,
tag:
- hello
- bye
- thanks
- etc
- notepad
Any idea how I could achieve this? thanks much
Ctrl+H
Find what: (^tag:\s+|\G)[,-]\h*(\w+)
Replace with: $1\t- $2\n
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
( # start group 1
^ # beginning of line
tag: # literally
\s+ # 1 or more spaces
| # OR
\G # restart from last match position
) # end group
[,-] # comma or hyphen
\h* # 0 or more horizontal spaces
(\w+) # group 2, 1 or more word character (you can use [^\s,])
Replacement:
$1 # content of group 1
\t # a tabulation
- # a hyphen followed by a space
$2 # content of group 2
\n # linefeed
Screen capture (before):
Screen capture (after):

How to select parts of text from markdown with regexp?

I have next text:
#Header
my header text
##SubHeader
my sub header text
###Sub3Header
my sub 3 text
#Header2
my header2 text
I need to select text from "#Header" to "#Header2".
I tried to wrote regexp: http://regexr.com/3ffva but it's do not match what i needed.
^#[^#\n]+([\W\w]*?)^#[^#\n]+
Basic idea: find first level-1 heading, find any text until... second level-1 heading.
^#[^#\n]+ first level-1 heading
^ start of line (because of multi-line flag)
[^#\n]+ Any character that isn't # or a newline character. Repeat 1 or more times.
([\W\w]*?) any text until next matching part
^#[^#\n]+ second level-1 heading (see above)
Flags: multiline.
With looking ahead for closing capture and also matching, before next heading:
1- without multi-line flag
(^|\n)#([^#]+?)\n([^]+?)(?=\n#[^#]|$)
Demo without multi-line flag
Description:
Group 1 captures first of string or new line that follows # and no other #, that means new Heading starts there.
Group 2 captures Heading title
Group 3 captures any thing till the next heading or end of string
Group 4 is non-capturing and looks ahead for new heading, or end of text.
2- with multi-line flag
^#([^#]+?)\n([^]+?)(?=^#[^#])
Demo with Multi-line flag
Description:
first, add #-- at the end of text, for matching last Heading by this regex!
Starts matching from first char of line by ^ and matches # with no # in heading text. Group 1 captured: Heading before \n
Group 2 captures texts till next Heading start, that defined by just one # at starting line.
Depending on your regex flavor you can use:
(^#{1}.+)(.*\n)*
As shown here: http://regexr.com/3fg08
Alternately, you can use Vim's very magic mode:
\v(^#{1}.+)(.*\n)*(^#{1}\w+)

Multiple search & replace - Notepad ++ (regex)

I have a list of words, for example:
Good -> Bad
Sky -> Blue
Gray -> Black
etc...
What is the best why to do find&replace in notepad++?
I tried this:
FIND: (Good)|(Sky)|(Gray)
Replace: (?1Bad)(?2Blue)(?3Black)
but it doesn't work :(
any idea? or suggestions ?
There is however a workaround if you add this newline at the end of your text (it must be the last line, so don't press enter at the end):
#Good:Bad#Sky:Blue#Gray:Black#
and if you use this pattern:
(Good|Blue|Black)(?=(?:.*\R)++#(?>[^#]+#)*?\1:([^#]+))|\R.++(?!\R)
with this replacement:
$2
pattern details:
(Good|Blue|Black) # this part capture the word in group 1
(?= # then we reach the last line in a lookakead
(?:.*\R)++ # match all the lines until the last line
#(?>[^#]+#)*? # advance until the good value is found
\1 # the good value (backreference to the capture group 1)
: ([^#]+) # capture the replacement in group 2
) # close the lookbehind
| # OR
\R.++(?!\R) # match the last line (to remove it)
Note: to make the pattern more efficient, you can put it in a non capturing group and add a lookahead at the begining with all the first possible characters to quickly discard useless positions in the string:
(?=[GB\r\n])(?:\b(Good|Blue|Black)\b(?=(?:.*\R)++#(?>[^#]+#)*?\1:([^#]+))|\R.++(?!\R))

How to build a regex to detect words not adjacent to or enclosed in braces

I am stuck in a problem with an application. I have following lines of text :
1) hi {my|your|his} name is {stacker|monster|overflow}
2) hi {my|your|his} job can be to {stacker|monster|overflow}
3) hi {my|your|his} car {stacker|monster|overflow}
What I want :
on the click of a button select and replace those words which do not have { or } just before or after the word i.e. in line 1 or 3 we do not have any such word. In line 2 we have "can" and "be".
I used substrings to check for { or } but it does not work. I thought there might be a regex to check for such words?
Thanks and happy new year. Quite sweet that you guys are helping on new year day :)
(?<![{}]\s+|\{[^{}]*)\b\w+\b(?!\s+[{}]|[^{}]*\})
does this.
Explanation:
(?<! # Assert that we can't match this before the current position:
[{}]\s* # Any directly adjacent brace (plus optional whitespace)
| # or
\{[^{}]* # an opening brace before any other brace.
)
\b\w+\b # Match an entire word
(?! # Assert that we can't match this after the current position:
\s*[{}] # Any directly adjacent brace (plus optional whitespace)
| # or
[^{}]*\} # a closing brace before any other brace.
)
Caveat: This fails if braces can be nested.