Regex to check consecutive number with spaces - regex

I am trying to find some Regular Expression that will identify a string:
Question 1
With more than 4 consecutive numbers followed by white space:
e.g.
123456
something 1 2 3 4 5 and 1 2 3 4 5
12 34 5
1234
Question 2
With more than 4 consecutive number words with white space:
e.g.
one two three four
one two three four five
onetwothreefour
Question 3
Is there any smart way to do this for other languages too?
Thanks

If you only want numbers between 0 and 9, these should do:
Q1: (?:[0-9]\s*){4,}
Q2: (?:(?:zero|one|two|three|four|five|six|seven|eight|nine)\s*){4,}

Can't think of anything smart now, but for example:
1 *2 *3 *4 *(5 *(6 *)?)?
1 *2 *3 *4 *(5 *(6 *)?)?|one *two *three *four *(five *(six *)?)?

Related

Using awk, how do I match pattern and variants?

I've been struggling with this for a while in regex testers but what came up as a correct regex pattern actually failed. I've got a large file, tab delimited, with numerous types of data. I want to print a specific column, with the characters XYZ, and it's subsequent values.
In the specific column I'm interested in I have values like:
XYZ
ABCDE
XYZ/WORDS
XYZ/ABCDE
ABFE
XYZ
regex tester that was successful was something like:
XYZ(.....)*
It obviously fails when implemented as:
awk '{if ($1=="XYZ(......)*") print$0}'
What regex character do I use to denote that I want everything after the backslash(/), including the original pattern (XYZ)?
Specifically, I want to be able to capture all instances of XYZ, and print the other columns that go along with them (hence the print$0). Specifically, capture these values:
XYZ
XYZ/WORDS
XYZ/ABCDE
Thank you
Setup: (assuming actual data file does not include blank lines)
$ cat x
XYZ 1 2 3 4
ABCDE 1 2 3 4
XYZ/WORDS 1 2 3 4
XYZ/ABCDE 1 2 3 4
ABFE 1 2 3 4
XYZ 1 2 3 4
If you merely want to print all rows where the first field starts with XYZ:
$ awk '$1 ~ /^XYZ/' x
XYZ 1 2 3 4
XYZ/WORDS 1 2 3 4
XYZ/ABCDE 1 2 3 4
XYZ 1 2 3 4
If this doesn't provide the expected results then please update the question with more details (to include a more representative set of input data and the expected output).

Find and KEEP all DUPLICATE lines (instead of unique lines) in a text file

I am aiming to identify and keep DUPLICATE, TRIPLICATE, etc. lines, i.e., all lines that occur more than once in Notepad++? In other words, how can I delete all unique lines only?
For example, here are seven (7) separate lists and the desired true duplicate lines of each lists (shown as 7 columns, regard each column as an individual list or file!). (The lists here are shown side by side only to save space, in real life, each of the 7 lists occurs alone and independently from the others and are separate files!)
list1 list2 list3 list4 list5 list6 list7
1 0 0 0 0 0 0
2 1 1 1 1 1 1
3 2 2 2 2 2 2
4 3 3 3 3 3 3
4 4 4 4 4 4 4
4 4 4 4 4 4 4
5 4 4 4 4 4 4
6 5 5 5 5 5 5
7 5 5 5 5 5 5
8 6 6 6 6 6 6
9 6 6 6 6 6 6
abc 7 7 7 7 7 7
abd 8 8 8 8 8 8
abd 9 9 9 9 9 9
abe <CR> 9 9 9 9
<CR> 99 99
<CR>
[Lines of multiple occurence of above lists:]
4 4 4 4 4 4 4
4 4 4 4 4 4 4
4 4 4 4 4 4 4
abd 5 5 5 5 5 5
abd 5 5 5 5 5 5
6 6 6 6 6 6
6 6 6 6 6 6
9 9 9 9
9 9 9 9
There are many solutions to eliminate duplicates (e.g., TextFX; notepad++ delete duplicate and original lines to keep unique lines), I can not find solutions to keep duplicates only.
((.*)\R(\2\R)+)*\K.+\R
#Lars Fischer: This script works nearly OK, except the last entry of the (presorted) list needs to be unique line followed by a <CR> empty line. One (suboptimal) workaround is to insert an artificial (helper) unique line (e.g., zzz) followed by an empty line <CR> as the last two lines.
(END OF QUESTION)
UPDATE 3: This question is reposted per stackoverflow "ask a new question" instruction. (#AdrianHHH, #B. Desai, #Paolo Forgia, #greg-449, #Erik von Asmuth draw the incorrect conclusion that this question is a duplicate of notepad++ delete duplicate and original lines to keep unique lines. This question is definitely not a duplicate of the one #AdrianHHH et al quotes.
UPDATE 2: #AdrianHHH This question is not less "broad" (in fact, one can hardly be more specific) or less researched than other Notepad++ questions, including the one https://stackoverflow.com/questions/29303148 cited (wrongly) by #AdrianHHH et al. as the same question.
UPDATE:
#AdrianHHH, #B. Desai, #Paolo Forgia, #greg-449, #Erik von Asmuth
This questions is different from:
https://stackoverflow.com/questions/29303148
beacuse Q 29303148 is (i) neither asking how to identify and keep only the lines of multiple occurrence, (ii) neither there is a solution provided in the answers for that. Q 29303148 asks "...I just need the unique lines."
Here is a solution based on regular Expressions and bookmarks, it works for a sorted file (i.e. each duplicated line is followed by its duplicates):
Open the Mark Dialog (Search -> Mark ....)
click Clear all Marks on the right
check Bookmark line
check Wrap aound
Find What: ((.*)\R(\2\R?)+)*\K.*
Check regular expression and uncheck . matches newline
Mark All
Click Close
Search -> Bookmark -> Remove Bookmarked Lines
Explanation
The regular expression is made up of three parts:
((.*)\R(\2\R?)+)* : this is an optional block of duplicates consisting of one ore more line blocks
the outher ( ... )* matches zero or more such blocks of duplicated lines (if in your example the three 4 would be followed by two 5 we will need a concept of sequences of duplicate blocks)
(.*)\R(\2\R?)+: \2 references the content of (.*): this are all duplicates of one line
the second \R is an optional ( due to the ?) linebreak. Thus it is possible to match a duplicate in the last line of the file if that line does not end with a linebreak
If there is a block of duplicated lines after the cursor position from which you start, this will match it.
now \K discards what we have matched so far (the duplicates) and "puts the cursor" before the first unique line
.* matches the next (unique) line and bookmarks it
Using Mark All we bookmark all such unique lines, so that we can remove them using the Entry from the Search -> Bookmark menu.

How can I use regex to capture this specfic set of ages?

I have a set of age data, like below;
1
2
3
4
5
6
7
8
9
10
1,1
1,2
1,3
2,12
11,13,15
7,8,12
12,15
14,16,17
15,6
13,11,10,2
And so on... I am trying to use Regex in to target a 'mixed' range of childrens ages. The logic requires at least a combination of 2 childen (so requires one of the lines with a comma), with at least one aged under 10 (min is 1), and at least one aged equal or greater to 10 (max 17).
My expected results from the above would be to return these lines below, and nothing else;
2,12
7,8,12
15,6
13,11,10,2
Any advice would be appreciated on how to resolve? Thanks in advance, I am continuing to try to correct.
You can use this regex to meet your requirements:
^(?=.*\b[1-9]\b)(?=.*\b1[0-7]\b)[0-9]+(?:,[0-9]+)+$
RegEx Demo
There are 2 lookaheads to assert 2 numbers one between 1-9 and another between 10-17
([1-9]) matches a number that should be between 1 and 9
1[0-7] matches a number that should be between 10 and 17
[0-9]+(?:,[0-9]+)+ in the regex is for matching 1 or more comma separated numbers in the middle.
You can do it with
\b\d,1[0-7]\b
provided the ages always are sorted (youngest to oldest).
If the age of 0 isn't allowed, change to
\b[1-9],1[0-7]\b
It checks for a single digit followed by a comma and one followed by a single digit in the range 0-7.
See it here at regex101.

regex pattern for number 1 to 11 with space speration [duplicate]

This question already has an answer here:
regex number range 1-17
(1 answer)
Closed 7 years ago.
Need an regex patter to identify an number input with space delimiter, and the number range should be from 1 to 11.
e.g. It should detect following
1 2 3
1 11 4 5 6
1 4 3 9
11 4 5
e.g It should fail in detecting
12 2 3
1 34 5555
23 3445 566 676544
dds 434 fv 434
dssd s ds sd
I came up with
^([0]?\d|1[0-1])(([, ]([0]?\d|1[0-1]))*)$
But this also detect when I provide
1 0 6 7
This is not an duplicate question, and I have explained it well enough. Please read question properly and if still someone thinks it's an duplicate question then tell me why it's a duplicate one instead of just marking an duplicate.
(?:([1-9][01]?)\s+)+ should do it.
Explanation:
(?: non capturing group
([1-9][01] two digits, first between 1 and 9, second optional
)\s+) spaces
)+ repeat the whole

List a number's digits in J

I use the programming language: J.
I want to put all of the digit of a number in a list.
From:
12345
to:
1 2 3 4 5
What can I do?
The way I'd write this is
10&#.^:_1
which we can see in use with this sentence:
(10&#.^:_1) 123456789
1 2 3 4 5 6 7 8 9
That program relies on the reshaping built in to Base. It uses the (built-in) obverse of Base as a synonym for Antibase.
I found the answer:
intToList =: (".#;"0#":)
Another approach:
intToList =: 3 : '((>. 10 ^. y)#10) #: y'
This doesn't convert to string and back, which can be potentially costly, but counts the digits with a base-10 log, then uses anti-base (#:) to get each digit.
EDIT:
Better, safer version based on Dan Bron's comment:
intToList =: 3 : '10 #.^:_1 y'