find search item plus 4 lines before and after - regex

I am using notepad++ and would like to find the context in which a particular string occurs.
So the search string is 0wh.*0subj and I would like to find this search item plus 4 lines immediately before and after it.
eg: xxx means whatever is on a new line. the search result should be:
xxx
xxx
xxx
xxx
0wh.*0subj
xxx
xxx
xxx
xxx
I have tried using \n\r but its not working. Any assistance afforded would be greatly appreciated.
Regards

This will work in Notepad++ (tested):
(?m)(^[^\r\n]*\R+){4}0wh\.\*0subj[^\r\n]*\R+(^[^\r\n]*\R+){4}
On the screenshot, note that the 555 line is not selected. It is just the current line.
Explain Regex
(?m) # set flags for this block (with ^ and $
# matching start and end of line) (case-
# sensitive) (with . not matching \n)
# (matching whitespace and # normally)
( # group and capture to \1 (4 times):
^ # the beginning of a "line"
[^\r\n]* # any character except: '\r' (carriage
# return), '\n' (newline) (0 or more times
# (matching the most amount possible))
\R+ # 'R' (1 or more times (matching the most
# amount possible))
){4} # end of \1 (NOTE: because you are using a
# quantifier on this capture, only the LAST
# repetition of the captured pattern will be
# stored in \1)
0wh # '0wh'
\. # '.'
\* # '*'
0subj # '0subj'
[^\r\n]* # any character except: '\r' (carriage
# return), '\n' (newline) (0 or more times
# (matching the most amount possible))
\R+ # 'R' (1 or more times (matching the most
# amount possible))
( # group and capture to \2 (4 times):
^ # the beginning of a "line"
[^\r\n]* # any character except: '\r' (carriage
# return), '\n' (newline) (0 or more times
# (matching the most amount possible))
\R+ # 'R' (1 or more times (matching the most
# amount possible))
){4} # end of \2 (NOTE: because you are using a
# quantifier on this capture, only the LAST
# repetition of the captured pattern will be
# stored in \2)

Related

How to capture multiple sequence of numbers as repeated groups?

I have a URL that contains multiple sequences of numbers I want to capture them all in groups suppose I have the following
https://www.example.com//first/part/54323?key=value
or
https://www.example.com/first/12345/second/part/part2/5432?key=value
I tried to use something like that but it only matches one sequence of numbers
(.*\/)([0-9]{4,})(\/.*|$|)
I want to have multiple groups represent different sections if numbers sequence is included
1st group will be "example.com/first"
2nd group "12345"
3rd group "second/part"
4th group "5432"
5th group "?key=value"
The initial .* is Greedy, meaning it tries to match as much as possible. It matched everything up to the last slash "https://www.example.com/first/12345/second/part". You can modify this behavior by replacing the initial .* with .*?, but that will stop after the first slash, which is also not what you want "https:/" because there are no digits after those slashes.
But really we need to back up and ask some questions about your pattern. Apparently, you have a preamble you are not interested in, an indefinite number of sequences of 'character string, followed by slash, followed by number string' and then there is the "everything after there are no more slash digit patterns".
The key question is whether the number of char/char/digits combos are indefinite or limited to a definite number like the two pairs in your example. To get the regex parser to return an unbounded number of string-number pairs, you are going to want to turn on the /g (Global) switch so regex will return all matches. That is a problem with the part of your URL at the beginning and end which does not fit your pattern.
I recommend first using a regular expression to divide your URL into three parts, preamble, path, remaining data. Then you can pass the path string to a second regular expression to parse the pairs - it will be much simpler.
If you do it that way your first expression could be:
^[a-z+.-]+?:\/\/(:www\.)?([^?#]+?)(.*)$
The first part skips over everything through the optional www. and does not capture it because you are not interested in that part. The second part captures everything up to any query or fragment (delimited by ? and #, respectively) and places it in the first capture group. The last part captures the rest of the URL into the the second capture group. In your example that is ?key=value.
Now take your first capture group, which contains the host and the path, and pass it to a second regex with the global flag set (so it processes all pairs repeatedly). This second regex will be:
(.*?)\/([0-9]{4,})\/?
For each match of this string, the parsed values and numbers will be in capture groups 1 & 2.
It sounds very straight-forward:
https?:\/\/(?:www\.)?(.*?)\/(\d+)\/(.*?)\/(\d+)(?:\?(.*))?
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
http 'http'
--------------------------------------------------------------------------------
s? 's' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
www 'www'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\? '?'
--------------------------------------------------------------------------------
( group and capture to \5:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \5
--------------------------------------------------------------------------------
)? end of grouping

regexp: multiline, non-greedy match until optional string

Using Go's regexp, I'm trying to extract a predefined set of ordered key-value (multiline) pairs whose last element may be optional from a raw text, e.g.,
Key1:
SomeValue1
MoreValue1
Key2:
SomeValue2
MoreValue2
OptionalKey3:
SomeValue3
MoreValue3
(here, I want to extract all the values as named groups)
If I use the default greedy pattern (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?), it never sees OptionalKey3 and matches the rest of the text as Key2.
If I use the non-greedy pattern (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?), it doesn't even see SomeValue2 and stops immediately: https://regex101.com/r/QE2g3o/1
Is there a way to optionally match OptionalKey3 while also able to capture all the other ones?
Use
(?s)\AKey1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?\z
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
Key1: 'Key1:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<Key1> group and capture to "Key1":
--------------------------------------------------------------------------------
.* any character (0 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of "Key1"
--------------------------------------------------------------------------------
Key2: 'Key2:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<Key2> group and capture to "Key2":
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of "Key2"
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
OptionalKey3: 'OptionalKey3:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
(?P<OptionalKey3> group and capture to "OptionalKey3":
--------------------------------------------------------------------------------
.* any character (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of "OptionalKey3"
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\z the end of the string

need regex command to extract full filename which has 2 dot(.)

i am having issue in extracting full filename which has 2 dot(.).
although below command is working but i need a alternate solution without asterix in regex. can anyone help me out in alternate regex command to extract full filename without asterix?
(ABC_A.*\.)+.*
Here are filenames I am trying to match:
ABC_A_CommunityRollover_Autocreate_Community.12345678-1.out
ABC_A_CommunityRollover_Autocreate_Community.88345678-1.out
ABC_A_CommunityRollover_Autocreate_Community.99945678-1.out
TL;DR:
^ABC_A(?:[^.]+\.){2}[^.]+$
Live Demo.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
ABC_A 'ABC_A'
--------------------------------------------------------------------------------
(?: group, but do not capture (2 times):
--------------------------------------------------------------------------------
[^.]+ any character except: '.' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
){2} end of grouping
--------------------------------------------------------------------------------
[^.]+ any character except: '.' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Regex Help, anti-query replacement

How do I remove any lines that have 3 or less slashes, but retain bigger links?
A. http://two/three/four
B. http://two/three
C. http://two
A would stay nothing else would.
Thanks
Search: (?m)^(?:[^/]*/){0,3}[^/]*$
Replace: ""
On the demo, see how only the lines with 3 or fewer slashes are matched. These are the ones to nix.
Explain Regex
(?m) # set flags for this block (with ^ and $
# matching start and end of line) (case-
# sensitive) (with . not matching \n)
# (matching whitespace and # normally)
^ # the beginning of a "line"
(?: # group, but do not capture (between 0 and 3
# times (matching the most amount
# possible)):
[^/]* # any character except: '/' (0 or more
# times (matching the most amount
# possible))
/ # '/'
){0,3} # end of grouping
[^/]* # any character except: '/' (0 or more times
# (matching the most amount possible))
$ # before an optional \n, and the end of a
# "line"
sed
You can use following sed command to do that, assuming your lines are in foo.txt:
sed -n '/\(.*\/\)\{4,\}/p' foo.txt
The -n option is for no output, but lines matching the pattern between the /s are printed anyway thanks to the p command at the end of the sed expression.
The pattern is: at least 4 occurences of /, each one potentially preceeded by any other string.

regex: practical example with ms modifier

Is there a practical example with "ms" modifier ? And when use it ?
For example:
$data ~= /regex/ms
ThankS
Here is some sample text.
Begin 111
Match this
and This
End
Begin 222
Match this one too
End
Don't match this: Begin 333
Some stuff
End
This regex uses the s and m modifiers to match each Begin...End block while capturing the digits to Group 1:
(?sm)^Begin (\d+).*?End
(See the demo to examine the matches and captures.)
The s is important because we want the . in .*? to match characters on multiple lines. In s mode, the . can match newline characters, so it grabs characters over several lines.
The m is important because we only want the Begin to match at the beginning of the line (and the ^ allows us to do that when m is set). For instance, we don't want to match a Begin...End block in the middle of a line.
Explain Regex
(?ms) # set flags for this block (with ^ and $
# matching start and end of line) (with .
# matching \n) (case-sensitive) (matching
# whitespace and # normally)
^ # the beginning of a "line"
Begin # 'Begin '
( # group and capture to \1:
\d+ # digits (0-9) (1 or more times (matching
# the most amount possible))
) # end of \1
.*? # any character (0 or more times (matching
# the least amount possible))
End # 'End'