Regular expression to match three different strings - regex

I need to write a regular expression that matches with 3 slightly different strings and extracts values out of them
Strings are as follows (excluding quotes)
1. "Beds: 3, Baths: 3"
2. "Beds: 3 - Sleeps 10, Baths: 3"
3. "Beds: 3 - 10, Baths: 3"
Values to extract like, for
1. 3, 0 , 3
2. 3, 10, 3
3. 3, 10, 3
I have written something like
$pattern = '/Beds: ([0-9]+).*-[ Sleeps]* ([0-9]+).* Baths: ([\.0-9]+)/';
It matches with string 2 and 3, but not with string 1.

Just extract the digits from non-digits.
\D*(\d+)\D*(\d+)?\D*(\d+)

Beds: ([0-9]+)(?:(?:.*-[ Sleeps]* ([0-9]+))|).* Baths: ([\.0-9]+)

#!/usr/bin/perl
use strict;
use warnings;
open (my $rentals, '<', 'tmp.dat');
while (<$rentals>){
if (my ($beds, $sleeps, $baths) = $_=~m/^Beds:\s+(\d+)(?:\s+-)?\s*(?:Sleeps\s+)?(\d+)?,\s+Baths:\s+(\d+)$/){
$sleeps=$sleeps?$sleeps:"No information provided";
print "$.:\n\tBeds:\t$beds\n\tSleeps:\t$sleeps\n\tBeds:\t$beds\n\n";
}
else{
print "record $. did not match the regex:\n\t|$_|";
}
}

check this:
'/Beds:\s(\d)[\s,][\s-].*?(\d, |)Baths:\s(\d)/'

Related

Regex to find repeating numbers between other numbers

I have the following array and need two Regex filters that I want to use in PowerShell.
000111
010101
220220
123456
Filter 1: the number 0 that occurs equal or more than three times.
I expect the following values after filtering
000111
010101
Filter 2: all numbers that occur equal or more than three times.
I should only see these numbers.
000111
010101
220220
With 0{3,} I can only recognize numbers in sequence so i get only the number
000111
Is it possible to find repeating numbers that are between other numbers?
Since you insist to see the solution in regex, look at this: '(\d).*\1.*\1'
I think this is comprehensible without further explanation, isn't it?
Armali's helpful answer is short and to the point (use '(0).*\1.*\1' for filter 1), and definitely the best solution for the problem at hand, given that you only need to know in the abstract if a given string has 3 or more zeros / same digits.
The solutions below may be of interest if you need to know the specific count of 0s / digits, which, as far as I know, cannot be handled by a regex (alone)
Occurrence-counting variant of filter 1:
#(
'000111'
'010101'
'220220'
'123456'
).ForEach({
$zerosOnly = $_ -replace '[^0]'
[pscustomobject] #{
InputString = $_
CountOfZeros = $zerosOnly.Length
}
})
That is, each string in the input array (enumerated via the intrinsic ForEach() method), has all chars. that aren't '0' ([^0]) removed via the regex-based -replace operator. The length of the resulting string is therefore equivalent to the count of zeros.
Output:
InputString CountOfZeros
----------- ------------
000111 3
010101 3
220220 2
123456 0
Occurrence-counting variant of filter 2
#(
'000111'
'010101'
'220220'
'123456'
).ForEach({
$outputObject = [pscustomobject] #{ InputString = $_; DigitCounts = [ordered] #{} }
([char[]] $_ | Group-Object).ForEach({
$outputObject.DigitCounts[$_.Name] = $_.Count
})
$outputObject
})
That is, each input string by is grouped by its characters using Group-Object, whose output objects reflect the character at hand in the .Name property and the number of members of the group - i.e. the occurrence count for that character in the .Count property. An ordered hashtable is used to report character-occurrence-count pairs.
Output:
InputString DigitCounts
----------- -----------
000111 {[0, 3], [1, 3]}
010101 {[0, 3], [1, 3]}
220220 {[0, 2], [2, 4]}
123456 {[1, 1], [2, 1], [3, 1], [4, 1]…}
E.g., {[0, 2], [2, 4]} in the output above means that the char. '0' occurs 2 times, and '2' 4 times in input string '220220'.

Filter a string using regular expression

I tried the following code. However, the result is not what I want.
$strLine = "100.11 Q9"
$sortString = StringRegExp ($strLine,'([0-9\.]{1,7})', $STR_REGEXPARRAYMATCH)
MsgBox(0, "", $sortString[0],2)
The output shows 100.11, but I want 100.11 9. How could I display it this way using a regular expression?
$sPattern = "([0-9\.]+)\sQ(\d+)"
$strLine = "100.11 Q9"
$sortString = StringRegExpReplace($strLine, $sPattern, '\1 \2')
MsgBox(0, "$sortString", $sortString, 2)
$strLine = "100.11 Q9"
$sortString = StringRegExp($strLine, $sPattern, 3); array of global matches.
For $i1 = 0 To UBound($sortString) -1
MsgBox(0, "$sortString[" & $i1 & "]", $sortString[$i1], 2)
Next
The pattern is to get the 2 groups being 100.11 and 9.
The pattern will 1st match the group with any digit and dot until it reach
/s which will match the space. It will then match the Q. The 2nd group
matches any remaining digits.
StringRegExpReplace replaces the whole string with 1st and 2nd groups
separated with a space.
StringRegExp get the 2 groups as 2 array elements.
Choose 1 from the 2 types regexp above of which you prefer.

VIM padding with appropriate number of ",0" to get CSV file

I have a file containing numbers like
1, 2, 3
4, 5
6, 7, 8, 9,10,11
12,13,14,15,16
...
I want to create a CSV file by padding each line such that there are 6 values separated by 5 commas, so I need to add to each line an appropriate number of ",0". It shall look like
1, 2, 3, 0, 0, 0
4, 5, 0, 0, 0, 0
6, 7, 8, 9,10,11
12,13,14,15,16, 0
...
How would I do this with VIM?
Can I count the number of "," in a line with regular expressions and add the correct number of ",0" to each line with the substitute s command?
You can achieve that by typing this command:
:g/^/ s/^.*$/&,0,0,0,0,0,0/ | normal! 6f,D
You can add six zeros in all lines first, irrespective of how many numbers they have and then, you can delete everything from sixth comma till end in every line.
To insert them,
:1,$ normal! i,0,0,0,0,0,0
To delete from sixth comma till end,
:1,$normal! ^6f,D
^ moves to first character in line(which is obviously a number here)
6f, finds comma six times
D delete from cursor to end of line
Example:
Original
1,2,
3,6,7,0,0,0
4,5,6
11,12,13
After adding six zeroes,
1,2,0,0,0,0,0,0
3,6,7,0,0,0,0,0,0,0,0,0
4,5,6,0,0,0,0,0,0
11,12,13,0,0,0,0,0,0
After removing from six comma to end of line
1,2,0,0,0,0,0
3,6,7,0,0,0,0
4,5,6,0,0,0,0
11,12,13,0,0,0
With perl:
perl -lpe '$_ .= ",0" x (5 - tr/,//)' file.txt
With awk:
awk -v FS=, -v OFS=, '{ for(i = NF+1; i <= 6; i++) $i = 0 } 1' file.txt
With sed:
sed ':b /^\([^,]*,\)\{5\}/ b; { s/$/,0/; b b }' file.txt
As far as how to do this from inside Vim, you can also pipe text through external programs and it will replace the input with the output. That's an easy way to leverage sorting, deduping, grep-based filtering, etc, or some of Sato's suggestions. So, if you have a script called standardize_commas.py, try selecting your block with visual line mode (shift+v then select), and then typing something like :! python /tmp/standardize_commas.py. It should prepend a little bit to that string indicating that the command will run on the currently selected lines.
FYI, this was my /tmp/standardize_commas.py script:
import sys
max_width = 0
rows = []
for line in sys.stdin:
line = line.strip()
existing_vals = line.split(",")
rows.append(existing_vals)
max_width = max(max_width, len(existing_vals))
for row in rows:
zeros_needed = max_width - len(row)
full_values = row + ["0"] * zeros_needed
print ",".join(full_values)

How to print a Perl character class?

I was in a code review this morning and came across a bit of code that was wrong, but I couldn't tell why.
$line =~ /^[1-C]/;
This line was suppose to evaluate to a hex character between 1 and C, but I assume this line does not do that. The question is not what does match, but what does this match? Can I print out all characters in a character class? Something like below?
say join(', ', [1-C]);
Alas,
# Examples:
say join(', ', 1..9);
say join(', ', 'A'..'C');
say join(', ', 1..'C');
# Output
Argument "C" isn't numeric in range (or flop) at X:\developers\PERL\Test.pl line 33.
1, 2, 3, 4, 5, 6, 7, 8, 9
A, B, C
It matches every code point from U+0030 ("1") to U+0043 ("C").
The simple answer is to use
map chr, ord("1")..ord("C")
instead of
"1".."C"
as you can see in the following demonstration:
$ perl -Mcharnames=:full -E'
say sprintf " %s U+%05X %s", chr($_), $_, charnames::viacode($_)
for ord("1")..ord("C");
'
1 U+00031 DIGIT ONE
2 U+00032 DIGIT TWO
3 U+00033 DIGIT THREE
4 U+00034 DIGIT FOUR
5 U+00035 DIGIT FIVE
6 U+00036 DIGIT SIX
7 U+00037 DIGIT SEVEN
8 U+00038 DIGIT EIGHT
9 U+00039 DIGIT NINE
: U+0003A COLON
; U+0003B SEMICOLON
< U+0003C LESS-THAN SIGN
= U+0003D EQUALS SIGN
> U+0003E GREATER-THAN SIGN
? U+0003F QUESTION MARK
# U+00040 COMMERCIAL AT
A U+00041 LATIN CAPITAL LETTER A
B U+00042 LATIN CAPITAL LETTER B
C U+00043 LATIN CAPITAL LETTER C
If you have Unicode::Tussle installed, you can get the same output from the following shell command:
unichars -au '[1-C]'
You might be interested in wasting time browsing the Unicode code charts. (This particular range is covered by "Basic Latin (ASCII)".)
This is a simple program to test the range of that regexpr:
use strict;
use warnings;
use Test::More qw(no_plan);
for(my $i=ord('1'); $i<=ord('C'); $i++ ) {
my $char = chr($i);
ok $char =~ /^[1-C]/, "match: $char";
}
Generate this result:
ok 1 - match: 1
ok 2 - match: 2
ok 3 - match: 3
ok 4 - match: 4
ok 5 - match: 5
ok 6 - match: 6
ok 7 - match: 7
ok 8 - match: 8
ok 9 - match: 9
ok 10 - match: :
ok 11 - match: ;
ok 12 - match: <
ok 13 - match: =
ok 14 - match: >
ok 15 - match: ?
ok 16 - match: #
ok 17 - match: A
ok 18 - match: B
ok 19 - match: C
1..19
[1-9A-C] is that match a hex number between 1 and C
[a char-an another char] match all the chars between the two chars in the Unicode table

regex for position matching with OR condition

Newbie to regex and looking for help in creating regexp to seek out following:
The data items consists of six character strings as shown in example below
1) "100100"
2) "110011"
3) "010000"
4) "110011"
5) "111111"
6) "000111"
Need to use regexp to find data with say
1 in the 1st position OR 1 in the 4th position: Items 1, 2, 4, 5 and 6 should be matched
1 in 2nd position: Items 2,4 ad 5 should be matched
1 in 5th and 6th position: Items 2, 4, 5 and 6 should be matched
Given your samples, these will work:
1 in the 1st position OR 1 in the 4th position: Items 1, 2, 4, 5 and 6 should be matched
1.....|...1...
1 in 2nd position: Items 2,4 ad 5 should be matched
.1....
1 in 5th and 6th position: Items 2, 4, 5 and 6 should be matched
....11
Or if you want to match any of these rules, combine them with the | (or) operator.
Example:
http://regexpal.com/?flags=g&regex=(1.....%7C...1...%7C.1....%7C....11)&input=100100%0A%0A110011%0A%0A010000%0A%0A110011%0A%0A111111%0A%0A000111
If it is always strings with only 1s and 0s, you should treat them as binary numbers and use logical operators to find the matches.
Try this regex
([1][0-1]{2}[1][0-1]{2})|([0-1][1][0-1]{4})|([0-1]{4}[1]{2})
Find the explanation and demo here http://www.regex101.com/r/vD9jE7
Here's an example. Change dots with zeros if necessary. /^(11..|.1.1)11$/
^ # beginning of string
( # either
11.. # 11 and any 2 char
| # or
.1.1 # any char, 1, any char, 1
)
11
$ # end of string