Select set of numbers between 2 words [duplicate] - regex

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 3 years ago.
I have the following output:
WordA
1
2
3
4
WordB
5
6
7
8
WordC
9
10
11
12
WordA
13
14
15
16
WordB
I need to grab the numbers between the two words: WordA and WordB
I tried (?<=WordA ).*(?= WordB) but the problem is, it grabs ALL the numbers here, including the ones between WordC and WordA that I don't want. I only want to grab the numbers between the 2 pairs which are 1 2 3 4 and 13 14 15 16
Any ideas?

First use this regex:
WordA(\s+\d+)+\s+WordB
this will cut you WITH WordA and WordB.
Then cut the digits with this regex:
\d+
First you will get:
WordA
1
2
3
4
WordB
WordA
13
14
15
16
WordB
Second you will get:
1
2
3
4
13
14
15
16

try this: (?<=WordA).*?(?=WordB).

Related

Find and KEEP all DUPLICATE lines (instead of unique lines) in a text file

I am aiming to identify and keep DUPLICATE, TRIPLICATE, etc. lines, i.e., all lines that occur more than once in Notepad++? In other words, how can I delete all unique lines only?
For example, here are seven (7) separate lists and the desired true duplicate lines of each lists (shown as 7 columns, regard each column as an individual list or file!). (The lists here are shown side by side only to save space, in real life, each of the 7 lists occurs alone and independently from the others and are separate files!)
list1 list2 list3 list4 list5 list6 list7
1 0 0 0 0 0 0
2 1 1 1 1 1 1
3 2 2 2 2 2 2
4 3 3 3 3 3 3
4 4 4 4 4 4 4
4 4 4 4 4 4 4
5 4 4 4 4 4 4
6 5 5 5 5 5 5
7 5 5 5 5 5 5
8 6 6 6 6 6 6
9 6 6 6 6 6 6
abc 7 7 7 7 7 7
abd 8 8 8 8 8 8
abd 9 9 9 9 9 9
abe <CR> 9 9 9 9
<CR> 99 99
<CR>
[Lines of multiple occurence of above lists:]
4 4 4 4 4 4 4
4 4 4 4 4 4 4
4 4 4 4 4 4 4
abd 5 5 5 5 5 5
abd 5 5 5 5 5 5
6 6 6 6 6 6
6 6 6 6 6 6
9 9 9 9
9 9 9 9
There are many solutions to eliminate duplicates (e.g., TextFX; notepad++ delete duplicate and original lines to keep unique lines), I can not find solutions to keep duplicates only.
((.*)\R(\2\R)+)*\K.+\R
#Lars Fischer: This script works nearly OK, except the last entry of the (presorted) list needs to be unique line followed by a <CR> empty line. One (suboptimal) workaround is to insert an artificial (helper) unique line (e.g., zzz) followed by an empty line <CR> as the last two lines.
(END OF QUESTION)
UPDATE 3: This question is reposted per stackoverflow "ask a new question" instruction. (#AdrianHHH, #B. Desai, #Paolo Forgia, #greg-449, #Erik von Asmuth draw the incorrect conclusion that this question is a duplicate of notepad++ delete duplicate and original lines to keep unique lines. This question is definitely not a duplicate of the one #AdrianHHH et al quotes.
UPDATE 2: #AdrianHHH This question is not less "broad" (in fact, one can hardly be more specific) or less researched than other Notepad++ questions, including the one https://stackoverflow.com/questions/29303148 cited (wrongly) by #AdrianHHH et al. as the same question.
UPDATE:
#AdrianHHH, #B. Desai, #Paolo Forgia, #greg-449, #Erik von Asmuth
This questions is different from:
https://stackoverflow.com/questions/29303148
beacuse Q 29303148 is (i) neither asking how to identify and keep only the lines of multiple occurrence, (ii) neither there is a solution provided in the answers for that. Q 29303148 asks "...I just need the unique lines."
Here is a solution based on regular Expressions and bookmarks, it works for a sorted file (i.e. each duplicated line is followed by its duplicates):
Open the Mark Dialog (Search -> Mark ....)
click Clear all Marks on the right
check Bookmark line
check Wrap aound
Find What: ((.*)\R(\2\R?)+)*\K.*
Check regular expression and uncheck . matches newline
Mark All
Click Close
Search -> Bookmark -> Remove Bookmarked Lines
Explanation
The regular expression is made up of three parts:
((.*)\R(\2\R?)+)* : this is an optional block of duplicates consisting of one ore more line blocks
the outher ( ... )* matches zero or more such blocks of duplicated lines (if in your example the three 4 would be followed by two 5 we will need a concept of sequences of duplicate blocks)
(.*)\R(\2\R?)+: \2 references the content of (.*): this are all duplicates of one line
the second \R is an optional ( due to the ?) linebreak. Thus it is possible to match a duplicate in the last line of the file if that line does not end with a linebreak
If there is a block of duplicated lines after the cursor position from which you start, this will match it.
now \K discards what we have matched so far (the duplicates) and "puts the cursor" before the first unique line
.* matches the next (unique) line and bookmarks it
Using Mark All we bookmark all such unique lines, so that we can remove them using the Entry from the Search -> Bookmark menu.

SAS_Specific Functions

I have a question and want to ask question by using example. My data-set is:
Group Value
1 10
1 8
1 12
2 13
2 11
2 7
I want to add two columns to this data-set. First column should consist of maximum value of second column by group. Second column should consist of minimum value of second column by group. So, the result should be look:
Group Value Max Min
1 10 12 8
1 8 12 8
1 12 12 8
2 13 13 7
2 11 13 7
2 7 13 7
12 - because there are 3 numbers (10,8,12) in group number 1 and 12 is maximum among these values.
13 - because there are 3 numbers (13,11,7) in group number 2 and 13 is maximum among these values.
8 - because there are 3 numbers (10,8,12) in group number 1 and 8 is minimum among these values.
7 - because there are 3 numbers (13,11,7) in group number 2 and 7 is minimum among these values.
I hope, i can explain it..
Many thanks in advance.
Try:
proc sql;
select *,max(value) as max,min(value) as min from have group by group;
quit;

regex pattern for number 1 to 11 with space speration [duplicate]

This question already has an answer here:
regex number range 1-17
(1 answer)
Closed 7 years ago.
Need an regex patter to identify an number input with space delimiter, and the number range should be from 1 to 11.
e.g. It should detect following
1 2 3
1 11 4 5 6
1 4 3 9
11 4 5
e.g It should fail in detecting
12 2 3
1 34 5555
23 3445 566 676544
dds 434 fv 434
dssd s ds sd
I came up with
^([0]?\d|1[0-1])(([, ]([0]?\d|1[0-1]))*)$
But this also detect when I provide
1 0 6 7
This is not an duplicate question, and I have explained it well enough. Please read question properly and if still someone thinks it's an duplicate question then tell me why it's a duplicate one instead of just marking an duplicate.
(?:([1-9][01]?)\s+)+ should do it.
Explanation:
(?: non capturing group
([1-9][01] two digits, first between 1 and 9, second optional
)\s+) spaces
)+ repeat the whole

Regex "easy" one -> post deleted afterwards [duplicate]

This question already has answers here:
Regex look back format condition
(2 answers)
Closed 8 years ago.
Input:
Hello fr. 2 699:- 2 fr. 599:- 3 fr. 899:- 4 fr. 3 899:- 5 fr. 1 499:- 6 fr. 999:-.
Output:
599 899 999
Where do I put in "Hello " in:
/(?<=fr\.\s)(\d{3})/
I test at : http://rubular.com/
If you require 'Hello' in the regular expression, then you can apply the positive lookbehind to both 'Hello' and 'fr.'
(?<=(Hello)|(fr\.\s))\d{3}

all possible inputs in Binary Search Tree

given a binary search tree print all possible inputs that will form the same binary search tree.
one simple example will be
2
1 3
we need to print
2 1 3
2 3 1
10
5 13
3 6 11 15
10 5 13 3 6 11 15
10 13 5 3 6 11 15
...
i tried solving this by reading the tree breadth wise and shuffling it. But, there could be possible input like
10 5 3 6 13 11 15
Do i need to use DFS here ?? am writing it in C++