Regex to find repeating numbers between other numbers

Regex to find repeating numbers between other numbers - regex

I have the following array and need two Regex filters that I want to use in PowerShell.
000111
010101
220220
123456
Filter 1: the number 0 that occurs equal or more than three times.
I expect the following values after filtering
000111
010101
Filter 2: all numbers that occur equal or more than three times.
I should only see these numbers.
000111
010101
220220
With 0{3,} I can only recognize numbers in sequence so i get only the number
000111
Is it possible to find repeating numbers that are between other numbers?

Since you insist to see the solution in regex, look at this: '(\d).*\1.*\1'
I think this is comprehensible without further explanation, isn't it?

Armali's helpful answer is short and to the point (use '(0).*\1.*\1' for filter 1), and definitely the best solution for the problem at hand, given that you only need to know in the abstract if a given string has 3 or more zeros / same digits.
The solutions below may be of interest if you need to know the specific count of 0s / digits, which, as far as I know, cannot be handled by a regex (alone)
Occurrence-counting variant of filter 1:
#(
'000111'
'010101'
'220220'
'123456'
).ForEach({
$zerosOnly = $_ -replace '[^0]'
[pscustomobject] #{
InputString = $_
CountOfZeros = $zerosOnly.Length
}
})
That is, each string in the input array (enumerated via the intrinsic ForEach() method), has all chars. that aren't '0' ([^0]) removed via the regex-based -replace operator. The length of the resulting string is therefore equivalent to the count of zeros.
Output:
InputString CountOfZeros
----------- ------------
000111 3
010101 3
220220 2
123456 0
Occurrence-counting variant of filter 2
#(
'000111'
'010101'
'220220'
'123456'
).ForEach({
$outputObject = [pscustomobject] #{ InputString = $_; DigitCounts = [ordered] #{} }
([char[]] $_ | Group-Object).ForEach({
$outputObject.DigitCounts[$_.Name] = $_.Count
})
$outputObject
})
That is, each input string by is grouped by its characters using Group-Object, whose output objects reflect the character at hand in the .Name property and the number of members of the group - i.e. the occurrence count for that character in the .Count property. An ordered hashtable is used to report character-occurrence-count pairs.
Output:
InputString DigitCounts
----------- -----------
000111 {[0, 3], [1, 3]}
010101 {[0, 3], [1, 3]}
220220 {[0, 2], [2, 4]}
123456 {[1, 1], [2, 1], [3, 1], [4, 1]…}
E.g., {[0, 2], [2, 4]} in the output above means that the char. '0' occurs 2 times, and '2' 4 times in input string '220220'.

Related

How can I get a number between two words with REGEXP_SUBSTR?

I would like to retrieve a sequence of numbers in between two strings. The text may have other numbers and I only want to get the sequence in between 'item ' and ' n' (first occurrence). Also, the length of the sequence can vary.
The following is what I have tried:
SELECT REGEXP_SUBSTR(clob_text, 'item ([0-9]+?) n') AS my_number FROM my_table WHERE something = something;
However it returns the value "item 123456789 n", and I want only the number value.
I have also tried the regex '\item ([0-9]+?) \n' which returns the same, and '(?=item )([0-9]+?)(?= n)' and '\item /([0-9]+?)/ \n', that returns nothing.
At last, I tried to intercalate the expressions and it worked but is not ideal:
SELECT REGEXP_SUBSTR(REGEXP_SUBSTR(clob_text, '\item ([0-9]+?) \n'), '[0-9]+') FROM ...
How can I remove these unwanted characters so the result would be only "123456789" with one expression only?
Example input:
'Somdasdas dasd sdaisdjas asod dasdhjs 1564, dasdohndsdias sdasdasdasdasds,
ddissd ksdnas skid as 5645 sdnaslndas, ndsadn ndasknd dnsd: sdas 5465 asdasd
dnaskldnas ojsd (dasdksdas) asdklhasdas dsd. isdjasdsdpoojs asdasdasdasdsad
46564 iasdonsoi sdjosd kjlsdk kkpnasd item 12345879 não-existente da lista 14
sdasdnsd jdspka 2564 sadasds.'
Expected output:
'12345879'

Here is the regular expression for your problem:
REGEXP_SUBSTR('<your string>', '\item ([0-9]*?) \n', 1, 1, null, 1)
Usage:
*? Matches the preceding pattern zero or more occurrences.
( ) Used to group expressions as a subexpression.
[0-9] Matches any digit.
Query with actual data and output:
SELECT
REGEXP_SUBSTR('Somdasdas dasd sdaisdjas asod dasdhjs 1564, dasdohndsdias sdasdasdasdasds, ddissd ksdnas skid as 5645 sdnaslndas, ndsadn ndasknd dnsd: sdas 5465 asdasd dnaskldnas ojsd (dasdksdas) asdklhasdas dsd. isdjasdsdpoojs asdasdasdasdsad 46564 iasdonsoi sdjosd kjlsdk kkpnasd item 12345879 não-existente da lista 14 sdasdnsd jdspka 2564 sadasds'
, '\item ([0-9]*?) \n', 1, 1, null, 1) as MY_STRING
FROM
DUAL;
Output:
MY_STRIN
--------
12345879
db<>fiddle demo
Cheers!!

When dealing with regular expressions, I have a tendency to try and make things way too complicated. In this case, the simple solution is
REGEXP_SUBSTR('item 123456789 n', '[0-9]+')
dbfiddle here

Assuming that you are always looking for a single number embedded between 2 strings
regexp_substr('item 123456789 n','[^0-9]+([0-9]+)[^0-9]+',1,1,null,1)
--^anything ^1 or more first match^
-- other than a digits
-- digit occurring
-- one or more times.
DEMO

How get all numerical values from a list except numbers relating to certain Strings

I want to get all the numbers from my String except for the numbers that are related to the String pattern 'SPN'
def layoutStr = '1 ABC, 2 DEF, 3 SPN, 4 GHI'
def splitted = layoutStr.split(',')
*.trim() // remove white space from all the entries (note *)
*.dropWhile { it ==~ /[^0-9 ]/ } // drop until you hit a char that isn't a letter or a space in the list
.findAll { it[0] != 'SPN' } // if a group starts with SPN, drop it
assert splitted == [1, 2, 4]
This doesn't seem to do what I expect it to do, I think I am missing the re-collecting step

You can use findResults which only collects elements that aren't null, so you can use it to filter AND transform at the same time:
def layoutStr = '1 ABC, 2 DEF, 3 SPN, 4 GHI'
def splitted = layoutStr.split(',')
*.trim() // remove white space from all the entries (note *)
*.split(/\s+/) // Split all the entries on whitespace
.findResults { it[1] == 'SPN' ? null : it[0] as Integer }
assert splitted == [1, 2, 4]

Split line based on regex in Julia

I'm interested in splitting a line using a regular expression in Julia. My input is a corpus in Blei's LDA-C format consisting of docId wordID : wordCNT For example a document with five words is represented as follows:
186 0:1 12:1 15:2 3:1 4:1
I'm looking for a way to aggregate words and their counts into separate arrays, i.e. my desired output:
words = [0, 12, 15, 3, 4]
counts = [1, 1, 2, 1, 1]
I've tried using m = match(r"(\d+):(\d+)",line). However, it only finds the first pair 0:1. I'm looking for something similar to Python's re.compile(r'[ :]').split(line). How would I split a line based on regex in Julia?

There's no need to use regex here; Julia's split function allows using multiple characters to define where the splits should occur:
julia> split(line, [':',' '])
11-element Array{SubString{String},1}:
"186"
"0"
"1"
"12"
"1"
"15"
"2"
"3"
"1"
"4"
"1"
julia> words = v[2:2:end]
5-element Array{SubString{String},1}:
"0"
"12"
"15"
"3"
"4"
julia> counts = v[3:2:end]
5-element Array{SubString{String},1}:
"1"
"1"
"2"
"1"
"1"

I discovered the eachmatch method that returns an iterator over the regex matches. An alternative solution is to iterate over each match:
words, counts = Int64[], Int64[]
for m in eachmatch(r"(\d+):(\d+)", line)
wd, cnt = m.captures
push!(words, parse(Int64, wd))
push!(counts, parse(Int64, cnt))
end

As Matt B. mentions, there's no need for a Regex here as the Julia lib split() can use an array of chars.
However - when there is a need for Regex - the same split() function just works, similar to what others suggest here:
line = "186 0:1 12:1 15:2 3:1 4:1"
s = split(line, r":| ")
words = s[2:2:end]
counts = s[3:2:end]
I've recently had to do exactly that in some Unicode processing code (where the split chars - where a "combined character", thus not something that can fit in julia 'single-quotes') meaning:
split_chars = ["bunch","of","random","delims"]
line = "line_with_these_delims_in_the_middle"
r_split = Regex( join(split_chars, "|") )
split( line, r_split )

Vim Sublist operations

I'm trying to create a script what detects the number of different characters in a selection.
p.e.
a = 4 (the character "a" is 4 times in the selection)
b = 2
e = 10
\ = 2
etc.
To obtain this, I created a list with sublist like this:
[['a', 1], ['b', 1], ['e', 1], ['\', 1]] --> etc
(a = the character // 1 = the number of times the character is found in the text)
What I don't know is:
how to searchi in a sublist? p.e. can I search if there is an "e" or "\" in the list?
when there is a match of "e" how can I add "1" to the number after the "e"?
[['e', 1]] --> [['e', 2]]
and how can I search in a sublist with regex and echo it in an echo command
p.e. search [a-f] and obtain this output:
a = 1
b = 1
e = 2
c, d, f are not found in list and has to be skipped.
Btw...does anyone know where I can find a good documentation about sublists?
(I can't find much information about sublists in the vim docs).

If I understand your problem correctly, the right data structure is a Dictionary mapping the character to the number of occurrences, not a list.
let occurrences = { 'a': 1, 'b': 1, 'e': 1, '\': 1 }
You can check for containment via has_key('a'), and increment via let occurrences['a'] += 1. To print the results use
for char in keys(occurrences)
echo char occurrences[char] "times"
endfor
And you can use the powerful map() and filter() functions on the Dictionary. For example, to only include characters a-f:
echo filter(copy(occurrences), 'v:key =~# "[a-f]"')
Read more at :help Dictionary.

Regular expression to match three different strings

I need to write a regular expression that matches with 3 slightly different strings and extracts values out of them
Strings are as follows (excluding quotes)
1. "Beds: 3, Baths: 3"
2. "Beds: 3 - Sleeps 10, Baths: 3"
3. "Beds: 3 - 10, Baths: 3"
Values to extract like, for
1. 3, 0 , 3
2. 3, 10, 3
3. 3, 10, 3
I have written something like
$pattern = '/Beds: ([0-9]+).*-[ Sleeps]* ([0-9]+).* Baths: ([\.0-9]+)/';
It matches with string 2 and 3, but not with string 1.

Just extract the digits from non-digits.
\D*(\d+)\D*(\d+)?\D*(\d+)

Beds: ([0-9]+)(?:(?:.*-[ Sleeps]* ([0-9]+))|).* Baths: ([\.0-9]+)

#!/usr/bin/perl
use strict;
use warnings;
open (my $rentals, '<', 'tmp.dat');
while (<$rentals>){
if (my ($beds, $sleeps, $baths) = $_=~m/^Beds:\s+(\d+)(?:\s+-)?\s*(?:Sleeps\s+)?(\d+)?,\s+Baths:\s+(\d+)$/){
$sleeps=$sleeps?$sleeps:"No information provided";
print "$.:\n\tBeds:\t$beds\n\tSleeps:\t$sleeps\n\tBeds:\t$beds\n\n";
}
else{
print "record $. did not match the regex:\n\t|$_|";
}
}

check this:
'/Beds:\s(\d)[\s,][\s-].*?(\d, |)Baths:\s(\d)/'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to find repeating numbers between other numbers - regex

Since you insist to see the solution in regex, look at this: '(\d).\1.\1' I think this is comprehensible without further explanation, isn't it?

Related

How can I get a number between two words with REGEXP_SUBSTR?

How get all numerical values from a list except numbers relating to certain Strings

Split line based on regex in Julia

Vim Sublist operations

Regular expression to match three different strings

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to find repeating numbers between other numbers - regex

Since you insist to see the solution in regex, look at this: '(\d).*\1.*\1' I think this is comprehensible without further explanation, isn't it?

Related

How can I get a number between two words with REGEXP_SUBSTR?

How get all numerical values from a list except numbers relating to certain Strings

Split line based on regex in Julia

Vim Sublist operations

Regular expression to match three different strings

Categories

Resources

Since you insist to see the solution in regex, look at this: '(\d).\1.\1' I think this is comprehensible without further explanation, isn't it?