How to extract all the strings between 2 patterns using regex Notepad++?

How to extract all the strings between 2 patterns using regex Notepad++? - regex

Extract all the string between 2 patterns:
Input:
test.output0 testx.output1 output3 testds.output2(\t)
Output:
output0 output1 ouput3 output2
Note: (" ") is the tab character.

You may try:
\.\w+$
Explanation of the above regex:
\. - Matches . literally. If you do not want . to be included in your pattern; please use (?<=\.) or simply remove ..
\w+ - Matches word character [A-Za-z0-9_] 1 or more time.
$ - Represents end of the line.
You can find the demo of the regex in here.
Result Snap:
EDIT 2 by OP:
According to your latest edit; this might be helpful.
.*?\.?(\w+)(?=\t)
Explanation:
.*? - Match everything other than new line lazily.
\.? - Matches . literally zero or one time.
(\w+) - Represents a capturing group matching the word-characters one or more times.
(?=\t) - Represents a positive look-ahead matching tab.
$1 - For the replacement part $1 represents the captured group and a white-space to separate the output as desired by you. Or if you want to restore tab then use the replacement $1\t.
Please find the demo of the above regex in here.
Result Snap 2:

Try matching on the following pattern:
Find: (?<![^.\s])\w+(?!\S)
Here is an explanation of the above pattern:
(?<![^.\s]) assert that what precedes is either dot, whitespace, or the start of the input
\w+ match a word
(?!\S) assert that what follows is either whitespace of the end of the input
Demo

Related

Testing a single sentence with an optional period

I'm trying to write a regex that tests a single sentence. The sentence can contain any content and should either: end in a period and have nothing following that period or not have a period or any ending punctuation.
I started with this: .*?\.$ and it worked fine testing for a sentence ending in a period. But if I mark the period as optional .*?\.?$ then a sentence can have any ending including a period and text after that period.
To be clear, these should pass the test: He jumped over the fence. He jumped over the fence
And this should not pass the test: He jumped over the fence. She jumped over it too.

Try:
^(?:[^.]+\.|[^.]+)$
Regex demo.
^ - start of the string
(?:[^.]+\.|[^.]+) - match either [^.]+\. (one or more non-. characters and .) or [^.]+ (one or more non-. characters) in non-capturing group.
$ - end of the string

This pattern .*?\.$ can match the whole line He jumped over the fence. She jumped over it too. because the . can also match a literal dot.
If you don't want to cross newlines and you do want to match for example 1.2m when having to end on a dot, or matching only chars other than ending punctuations:
If a lookahead assertion is supported:
^(?:[^\.\n]*(?:\.(?![^\S\n])[^\.\n]*)*\.|[^!?.\n]+)$
Explanation
^ Start of string
(?: Non capture group
[^\.\n]* Match optional chars other than a dot
(?:\.(?![^\S\n])[^\.\n]*)* Optionally repeat matching a dot not directly followed by a space
\. Match a dot
| Or
[^!?.\n]+ Match 1+ times any char except for ! ? . or a newline (Or add more ending punctuation chars)
) Close the non capture group
$ End of string
See a regex101 demo

You can use such regex:
.*?[^.]$
Optional (?) means that regex will match if symbol presents or not presents in string
[^.]$ - means that you want to exclude the presence of a dot at the end of a sentence.

Using regex to duplicate a selection and replacing some characters

Probably a terrible title.
I am trying to take the following:
Joe Dane
Bob Sagget
Whitney Houston
Some
Other
Test
And trying to produce:
JOE_DANE("Joe Dane"),
BOB_SAGGET("Bob Sagget"),
WHITNEY_HOUSTON("Whitney Houston"),
SOME("Some"),
OTHER("Other"),
TEST("Test"),
I'm using Notepad++ and am close but not good enough at regex to figure out the remaining expression. So far, this is what I have:
Find what: (^.*)
Replace with: \1 \(\"\1\"\),
Produces: Joe Dane("Joe Dane"),
I've tried replacing with: \U$1 \(\"\1\"\), but this also impacts the second instance of \1 with upper case. It also does not replace the whitespace with an underscore _.

This can be done in a single step.
If you don't have more than 2 words in a line:
Ctrl+H
Find what: ^(\S+)(?: (\S+))?$
Replace with: \U$1(?2_$2)\E\("$0"\),
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
^ # beginning of line
(\S+) # group 1, 1 or more non space
(?: (\S+))? # non capture group, a space, group 2, 1 or more non space, optional
$
Replacement:
\U # uppercased
$1 # group 1
(?2_$2) # if group 2 exists, add and underscore before
\E # end uppercase
\("$0"\), # the whole match with parens and quote
Screenshot (after):
If you have more than 2 words (up to 5), use:
Find ^(\S+)(?: (\S+))?(?: (\S+))?(?: (\S+))?(?: (\S+))?
Replace: \U$1(?2_$2)(?3_$3)(?4_$4)(?5_$5)\E\("$0"\),
I you have more thans five word, add as many (?: (\S+))? as needed.

You might do it in 2 steps, first matching any char 1+ more times from the start of the string.
Find what
^.+
For the first replacement you can use \E to end the activation of \U and use the full match $0
Replace with
\U$0\E\("$0"\),
For the second step, to replace the spaces with underscores, you could skip over the text between parenthesis, and match spaces between uppercase chars.
Find what
\(".*?"\)(*SKIP)(*F)|[A-Z]+\K\h+(?=[A-Z])
\(".*?"\) Match from (" till ")
(*SKIP)(*F)| Skip this part of the match
[A-Z]+\K Match uppercase chars and use \K to clear the current match buffer (forget what is matches do far)
\h+(?=[A-Z]) Match 1+ horizontal whitespace chars and assert an uppercase char to the right
Replace with _

RegEx match unserscore chatacters in strings not starting with #

I'm trying to write a RegEx that matches all underscore characters but should not match ones in strings starting with an # character.
What I've gotten until now is a RegEx with a negative lookbeind which only ignores the first underscore in strings starting with a # character: /(?<!#)_/gi.
Playground with test data: https://regex101.com/r/Hd8IeX/1/

You can use:
^#.*(*SKIP)(*F)|_
See the online demo. You just had to use the right flags.
^ - Start string anchor.
# - Literally match "#".
.* - Match anything other than newline (greedy).
(*SKIP)(*F) - Fail any match up to that point.
| - Or:
_ - Match an underscore.

Based on the comment, could this work:
^#.*|[^\w]#.*(*SKIP)(*F)|_

REGEX input validation

I am trying to put together REGEX expression to validate the following format:
"XXX/XXX","XXX/XXX","XXX/XXX"
where X could be either a letter, a number, or dash or underscore. What i got so far is
"(.*?)(\/)(.*?)"(?:,|$)/g
but it does not seem to work
Update: there could be any number of "XXX/XXX" strings, comma-separated, not just 3

you can try the following regex:
"([\w-]+)\/([\w-]+)"
Edit: regex explained:
([\w-]+) in the square brackets we say we want to match \w: matches any word character (equal to [a-zA-Z0-9_]). After this, we have "-", which just adds literally the symbol "-" to the matching symbols.
"+" says we want one or more symbols from the previous block: [\w-]
\/ matches the symbol "/" directly. It should be escaped in the regex, that's why it is preceded by "\"
([\w-]+) exactly like point 1, matches the same thing since the two parts are identical.
() - those brackets mark capturing group, which you can later use in your code to get the value it surrounds and matches.
Example:
Full match: 1X-/-XX
Group 1: 1X-
Group 2: -XX
Here is a demo with the matching cases - click. If this doesn't do the trick, let me know in the comments.

This will do the job:
"[-\w]+/[-\w]+"(?:,"[-\w]+/[-\w]+")*
Explanation:
" # quote
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
/ # a slash
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
" # quote
(?: # non capture group
, # a comma
" # quote
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
/ # a slash
[-\w]+ # 1 or more hyphen or word character [a-zA-0-9_]
" # quote
)* # end group, may appear 0 or more times
Demo

Here, we would be starting with a simple expression with quantifiers:
("[A-Za-z0-9_-]+\/[A-Za-z0-9_-]+")(,|$)
where we collect our desired three chars in a char class, followed by slash and at the end we would add an optional ,.
Demo
RegEx Circuit
jex.im visualizes regular expressions:

How to exclude non-numeric character in regex

I have a string which goes like this
Section 78(1) of the blabla
These are my regex
\b\s(?!\b(\d{1,3}|\d{1,2}[a-zA-Z]|\d{5,})\b)\b\S*
Expected output is: of the blabla
This regex works but it does not exclude "of" because of the (). Can anyone help me? Thank you

Try this pattern: .+\d\)?
Explanation:
.+ - match one or more times of any charaters
\d - match digit
\)? - match ) zero or one time
Because of greediness of + it will match until last digit, if it's in bracket, then match following bracket.
Demo
Alternatively use \d+(?:\(\d+\))?(.+)
Then desired output is in first capturing group.
Demo

It seems all you need to change is to remove the \b before \S* and replace the \S* with .+ or .* (if the match can be an empty string).
\s(?!\b(?:\d{1,3}|\d{1,2}[a-zA-Z]|\d{5,})\b)(.+)
See the regex demo, grab Group 1 value. Note I turned the first group matching digits in the negative lookahead into a non-capturing group to avoid clutter in the resulting match list.
VB.NET demo:
Dim r As New Regex("\s(?!\b(?:\d{1,3}|\d{1,2}[a-zA-Z]|\d{5,})\b)(.+)")
Dim s As String
s = "Section 78(1) of the blabla"
For Each m As Match In r.Matches(s)
Console.WriteLine(m.Groups(1).Value)
Next
Result: of the blabla.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract all the strings between 2 patterns using regex Notepad++? - regex

Extract all the string between 2 patterns: Input: test.output0 testx.output1 output3 testds.output2(\t) Output: output0 output1 ouput3 output2 Note: (" ") is the tab character.

Try matching on the following pattern: Find: (?<![^.\s])\w+(?!\S) Here is an explanation of the above pattern: (?<![^.\s]) assert that what precedes is either dot, whitespace, or the start of the input \w+ match a word (?!\S) assert that what follows is either whitespace of the end of the input Demo

Related

Testing a single sentence with an optional period

Using regex to duplicate a selection and replacing some characters

RegEx match unserscore chatacters in strings not starting with #

REGEX input validation

How to exclude non-numeric character in regex

Categories

Resources