Regex to find repeating numbers between other numbers - regex

I have the following array and need two Regex filters that I want to use in PowerShell.
000111
010101
220220
123456
Filter 1: the number 0 that occurs equal or more than three times.
I expect the following values after filtering
000111
010101
Filter 2: all numbers that occur equal or more than three times.
I should only see these numbers.
000111
010101
220220
With 0{3,} I can only recognize numbers in sequence so i get only the number
000111
Is it possible to find repeating numbers that are between other numbers?

Since you insist to see the solution in regex, look at this: '(\d).*\1.*\1'
I think this is comprehensible without further explanation, isn't it?

Armali's helpful answer is short and to the point (use '(0).*\1.*\1' for filter 1), and definitely the best solution for the problem at hand, given that you only need to know in the abstract if a given string has 3 or more zeros / same digits.
The solutions below may be of interest if you need to know the specific count of 0s / digits, which, as far as I know, cannot be handled by a regex (alone)
Occurrence-counting variant of filter 1:
#(
'000111'
'010101'
'220220'
'123456'
).ForEach({
$zerosOnly = $_ -replace '[^0]'
[pscustomobject] #{
InputString = $_
CountOfZeros = $zerosOnly.Length
}
})
That is, each string in the input array (enumerated via the intrinsic ForEach() method), has all chars. that aren't '0' ([^0]) removed via the regex-based -replace operator. The length of the resulting string is therefore equivalent to the count of zeros.
Output:
InputString CountOfZeros
----------- ------------
000111 3
010101 3
220220 2
123456 0
Occurrence-counting variant of filter 2
#(
'000111'
'010101'
'220220'
'123456'
).ForEach({
$outputObject = [pscustomobject] #{ InputString = $_; DigitCounts = [ordered] #{} }
([char[]] $_ | Group-Object).ForEach({
$outputObject.DigitCounts[$_.Name] = $_.Count
})
$outputObject
})
That is, each input string by is grouped by its characters using Group-Object, whose output objects reflect the character at hand in the .Name property and the number of members of the group - i.e. the occurrence count for that character in the .Count property. An ordered hashtable is used to report character-occurrence-count pairs.
Output:
InputString DigitCounts
----------- -----------
000111 {[0, 3], [1, 3]}
010101 {[0, 3], [1, 3]}
220220 {[0, 2], [2, 4]}
123456 {[1, 1], [2, 1], [3, 1], [4, 1]…}
E.g., {[0, 2], [2, 4]} in the output above means that the char. '0' occurs 2 times, and '2' 4 times in input string '220220'.

Related

How can I get a number between two words with REGEXP_SUBSTR?

I would like to retrieve a sequence of numbers in between two strings. The text may have other numbers and I only want to get the sequence in between 'item ' and ' n' (first occurrence). Also, the length of the sequence can vary.
The following is what I have tried:
SELECT REGEXP_SUBSTR(clob_text, 'item ([0-9]+?) n') AS my_number FROM my_table WHERE something = something;
However it returns the value "item 123456789 n", and I want only the number value.
I have also tried the regex '\item ([0-9]+?) \n' which returns the same, and '(?=item )([0-9]+?)(?= n)' and '\item /([0-9]+?)/ \n', that returns nothing.
At last, I tried to intercalate the expressions and it worked but is not ideal:
SELECT REGEXP_SUBSTR(REGEXP_SUBSTR(clob_text, '\item ([0-9]+?) \n'), '[0-9]+') FROM ...
How can I remove these unwanted characters so the result would be only "123456789" with one expression only?
Example input:
'Somdasdas dasd sdaisdjas asod dasdhjs 1564, dasdohndsdias sdasdasdasdasds,
ddissd ksdnas skid as 5645 sdnaslndas, ndsadn ndasknd dnsd: sdas 5465 asdasd
dnaskldnas ojsd (dasdksdas) asdklhasdas dsd. isdjasdsdpoojs asdasdasdasdsad
46564 iasdonsoi sdjosd kjlsdk kkpnasd item 12345879 não-existente da lista 14
sdasdnsd jdspka 2564 sadasds.'
Expected output:
'12345879'
Here is the regular expression for your problem:
REGEXP_SUBSTR('<your string>', '\item ([0-9]*?) \n', 1, 1, null, 1)
Usage:
*? Matches the preceding pattern zero or more occurrences.
( ) Used to group expressions as a subexpression.
[0-9] Matches any digit.
Query with actual data and output:
SELECT
REGEXP_SUBSTR('Somdasdas dasd sdaisdjas asod dasdhjs 1564, dasdohndsdias sdasdasdasdasds, ddissd ksdnas skid as 5645 sdnaslndas, ndsadn ndasknd dnsd: sdas 5465 asdasd dnaskldnas ojsd (dasdksdas) asdklhasdas dsd. isdjasdsdpoojs asdasdasdasdsad 46564 iasdonsoi sdjosd kjlsdk kkpnasd item 12345879 não-existente da lista 14 sdasdnsd jdspka 2564 sadasds'
, '\item ([0-9]*?) \n', 1, 1, null, 1) as MY_STRING
FROM
DUAL;
Output:
MY_STRIN
--------
12345879
db<>fiddle demo
Cheers!!
When dealing with regular expressions, I have a tendency to try and make things way too complicated. In this case, the simple solution is
REGEXP_SUBSTR('item 123456789 n', '[0-9]+')
dbfiddle here
Assuming that you are always looking for a single number embedded between 2 strings
regexp_substr('item 123456789 n','[^0-9]+([0-9]+)[^0-9]+',1,1,null,1)
--^anything ^1 or more first match^
-- other than a digits
-- digit occurring
-- one or more times.
DEMO

How get all numerical values from a list except numbers relating to certain Strings

I want to get all the numbers from my String except for the numbers that are related to the String pattern 'SPN'
def layoutStr = '1 ABC, 2 DEF, 3 SPN, 4 GHI'
def splitted = layoutStr.split(',')
*.trim() // remove white space from all the entries (note *)
*.dropWhile { it ==~ /[^0-9 ]/ } // drop until you hit a char that isn't a letter or a space in the list
.findAll { it[0] != 'SPN' } // if a group starts with SPN, drop it
assert splitted == [1, 2, 4]
This doesn't seem to do what I expect it to do, I think I am missing the re-collecting step
You can use findResults which only collects elements that aren't null, so you can use it to filter AND transform at the same time:
def layoutStr = '1 ABC, 2 DEF, 3 SPN, 4 GHI'
def splitted = layoutStr.split(',')
*.trim() // remove white space from all the entries (note *)
*.split(/\s+/) // Split all the entries on whitespace
.findResults { it[1] == 'SPN' ? null : it[0] as Integer }
assert splitted == [1, 2, 4]

Split line based on regex in Julia

I'm interested in splitting a line using a regular expression in Julia. My input is a corpus in Blei's LDA-C format consisting of docId wordID : wordCNT For example a document with five words is represented as follows:
186 0:1 12:1 15:2 3:1 4:1
I'm looking for a way to aggregate words and their counts into separate arrays, i.e. my desired output:
words = [0, 12, 15, 3, 4]
counts = [1, 1, 2, 1, 1]
I've tried using m = match(r"(\d+):(\d+)",line). However, it only finds the first pair 0:1. I'm looking for something similar to Python's re.compile(r'[ :]').split(line). How would I split a line based on regex in Julia?
There's no need to use regex here; Julia's split function allows using multiple characters to define where the splits should occur:
julia> split(line, [':',' '])
11-element Array{SubString{String},1}:
"186"
"0"
"1"
"12"
"1"
"15"
"2"
"3"
"1"
"4"
"1"
julia> words = v[2:2:end]
5-element Array{SubString{String},1}:
"0"
"12"
"15"
"3"
"4"
julia> counts = v[3:2:end]
5-element Array{SubString{String},1}:
"1"
"1"
"2"
"1"
"1"
I discovered the eachmatch method that returns an iterator over the regex matches. An alternative solution is to iterate over each match:
words, counts = Int64[], Int64[]
for m in eachmatch(r"(\d+):(\d+)", line)
wd, cnt = m.captures
push!(words, parse(Int64, wd))
push!(counts, parse(Int64, cnt))
end
As Matt B. mentions, there's no need for a Regex here as the Julia lib split() can use an array of chars.
However - when there is a need for Regex - the same split() function just works, similar to what others suggest here:
line = "186 0:1 12:1 15:2 3:1 4:1"
s = split(line, r":| ")
words = s[2:2:end]
counts = s[3:2:end]
I've recently had to do exactly that in some Unicode processing code (where the split chars - where a "combined character", thus not something that can fit in julia 'single-quotes') meaning:
split_chars = ["bunch","of","random","delims"]
line = "line_with_these_delims_in_the_middle"
r_split = Regex( join(split_chars, "|") )
split( line, r_split )

Vim Sublist operations

I'm trying to create a script what detects the number of different characters in a selection.
p.e.
a = 4 (the character "a" is 4 times in the selection)
b = 2
e = 10
\ = 2
etc.
To obtain this, I created a list with sublist like this:
[['a', 1], ['b', 1], ['e', 1], ['\', 1]] --> etc
(a = the character // 1 = the number of times the character is found in the text)
What I don't know is:
how to searchi in a sublist? p.e. can I search if there is an "e" or "\" in the list?
when there is a match of "e" how can I add "1" to the number after the "e"?
[['e', 1]] --> [['e', 2]]
and how can I search in a sublist with regex and echo it in an echo command
p.e. search [a-f] and obtain this output:
a = 1
b = 1
e = 2
c, d, f are not found in list and has to be skipped.
Btw...does anyone know where I can find a good documentation about sublists?
(I can't find much information about sublists in the vim docs).
If I understand your problem correctly, the right data structure is a Dictionary mapping the character to the number of occurrences, not a list.
let occurrences = { 'a': 1, 'b': 1, 'e': 1, '\': 1 }
You can check for containment via has_key('a'), and increment via let occurrences['a'] += 1. To print the results use
for char in keys(occurrences)
echo char occurrences[char] "times"
endfor
And you can use the powerful map() and filter() functions on the Dictionary. For example, to only include characters a-f:
echo filter(copy(occurrences), 'v:key =~# "[a-f]"')
Read more at :help Dictionary.

Regular expression to match three different strings

I need to write a regular expression that matches with 3 slightly different strings and extracts values out of them
Strings are as follows (excluding quotes)
1. "Beds: 3, Baths: 3"
2. "Beds: 3 - Sleeps 10, Baths: 3"
3. "Beds: 3 - 10, Baths: 3"
Values to extract like, for
1. 3, 0 , 3
2. 3, 10, 3
3. 3, 10, 3
I have written something like
$pattern = '/Beds: ([0-9]+).*-[ Sleeps]* ([0-9]+).* Baths: ([\.0-9]+)/';
It matches with string 2 and 3, but not with string 1.
Just extract the digits from non-digits.
\D*(\d+)\D*(\d+)?\D*(\d+)
Beds: ([0-9]+)(?:(?:.*-[ Sleeps]* ([0-9]+))|).* Baths: ([\.0-9]+)
#!/usr/bin/perl
use strict;
use warnings;
open (my $rentals, '<', 'tmp.dat');
while (<$rentals>){
if (my ($beds, $sleeps, $baths) = $_=~m/^Beds:\s+(\d+)(?:\s+-)?\s*(?:Sleeps\s+)?(\d+)?,\s+Baths:\s+(\d+)$/){
$sleeps=$sleeps?$sleeps:"No information provided";
print "$.:\n\tBeds:\t$beds\n\tSleeps:\t$sleeps\n\tBeds:\t$beds\n\n";
}
else{
print "record $. did not match the regex:\n\t|$_|";
}
}
check this:
'/Beds:\s(\d)[\s,][\s-].*?(\d, |)Baths:\s(\d)/'