match regexp against variable string in perl/sed/awk

match regexp against variable string in perl/sed/awk - regex

I have this expression:
XX h, YY min, ZZ s,
XX, YY or ZZ can be 1 or 2 digits. Also "XX h," or "XX h, YY min," maybe not present. Can anyone recommend any perl or sed expression to extract XX YY and ZZ??
I've tried some matching group regexp with no luck.
thanks
EDIT:
example1: 12 h, 23 min, 2 s,
output1: 12 23 2
example2: 3 min, 59 s,
output2: 3 59

echo "12 h, 3 min, 56 s," | tr -cd "0-9 "
Output:
12 3 56
echo "12 h, 3 min, 56 s," | tr "," "\n" | awk '/h/ {print $1}'
12
echo "12 h, 3 min, 56 s," | tr "," "\n" | awk '/min/ {print $1}'
3
echo "12 h, 3 min, 56 s," | tr "," "\n" | awk '/s/ {print $1}'
56

Let's talk about Perl regex. Let's assume you need to be able to extract the following substrings:
12 h, 54 min, 11 s, # you have a trailing comma in your example
1 h, 54 min, 11 s,
54 min, 11 s,
4 min, 11 s,
55 s,
and so on. We will need some building blocks:
\d: any digit
?: when appended to something (a character, a meta-character like \d or a group in brackets), make it optional
( ): brackets for grouping and extracting values into $1, $2, etc.
(?: ): brackets for grouping without extracting
The seconds part will be \d\d? s,.
After adding minutes that can be optional, we'll get (?:\d\d? min, )?\d\d? s,.
After adding hours (also optional), we'll get (?:(?:\d\d? h,)? \d\d? min, )?\d\d? s,.
Now we'll use brackets around all this staff for capturing the match into $1 and we'll finally get a regex:
/((?:(?:\d\d? h,)? \d\d? min, )?\d\d? s,)/
Or, and is the trailing comma also optional? Just add ? after it.
If you need the values for h, min, and s, put each \d\d? into a pair of brackets and check $2, $3 and $4:
/((?:(?:(\d\d?) h,)? (\d\d?) min, )?(\d\d?) s,)/
This is not the easiest possible regex for this task but I just wanted to show how you can build them starting from something very simple and then adding more complex things to it.

Try this (Perl):
my #matches = "1 h, 30 min, 15 s" =~ /(\d{1,2}) [hms]/g;
Or a bit stricter
my #matches = "1 h, 30 min, 15 s" =~ /(\d{1,2}) (?:h|min|s)/g;
if(scalar #matches == 3) {
my ($h, $mi, $s) = #matches;
print "$h : $mi : $s\n";
}

Related

How to match variable length number range 0 to $n using perl regex?

I need to match a numeric range from 0 to a number $n where $n can be any random number from 1 - 40.
For example,
if $n = 16, I need to strictly match only the numeric range from 0-16.
I tried m/([0-9]|[1-3][0-9]|40)/ but that is matching all 0-40. Is there a way to use regex to match from 0 to $n ?
The code snippet is attached for context.
$n = getNumber(); #getNumber() returns a random number from 1 to 40.
$answer = getAnswer(); #getAnswer() returns a user input.
#Check whether user enters an integer between 0 and $n
if ($answer =~ m/regex/){
print("Answer is an integer within specified range!\n");
}
I know can probably do something like
if($answer >= 0 && $answer <=$n)
But I am just wondering if there is a regex way of doing it?

I wouldn't pull out the following trick if there's another reasonable way to solve the problem. There is, for instance, Matching Numeric Ranges with a Regular Expression.
The (?(...)true|false) construct is like a regex conditional operator, and you can use one of the regex verbs, (*FAIL), to always fail a subpattern.
For the condition, you can use (?{...}) as the condition:
my $pattern = qr/
\b # anchor somehow
(\d++) # non-backtracking and greedy
(?(?{ $1 > 42 })(*FAIL))
/x;
my #numbers = map { int( rand(100) ) } 0 .. 10;
say "#numbers";
foreach my $n ( #numbers ) {
next unless $n =~ $pattern;
say "Matched $n";
}
Here's a run:
74 69 24 15 23 26 62 18 18 43 80
Matched 24
Matched 15
Matched 23
Matched 26
Matched 18
Matched 18
This is handy when the condition is complex.
I only think about this because it's an encouraged feature in Raku (and I have several examples in Learning Perl 6). Here's some Raku code in the same form, and the pattern syntax is significantly different:
#!raku
my $numbers = map { 100.rand.Int }, 0 .. 20;
say $numbers;
for #$numbers -> $n {
next unless $n ~~ / (<|w> \d+: <?{ $/ <= 42 }>) /;
say $n
}
The result is the same:
(67 43 31 41 89 14 52 71 48 64 5 21 6 31 44 27 39 94 78 15 39)
31
41
14
5
21
6
31
27
39
15
39

You can dynamically create the pattern. I've used a non-capture group (?:) here to keep the start and end of string anchors outside the list of |-ed numbers.
my $n = int rand 40;
my $answer = 42;
my $pattern = join '|', 0 .. $n;
if ($answer =~ m/^(?:$pattern)$/) {
print "Answer is an integer within specified range";
}
Please keep in mind that for your purpose this makes little sense.

What is the regular expression for a total 10 digit number with a decimal precision of 1 or 2?

I am trying a regex that satisfy the following for a total 10 digit number.
Tried this so far :
^(\d){0,8}(\.){0,1}(\d){0,2}$
It works fine but fails if I give the following :
123456789.0
Valid example:
1234567890 (total 10 digits)
1234567.1 (total 8 digits)
12345678.10 (total 10 digits)
123456789.1 (total 10 digits)
Invalid example :
12345678901 (11 characters)

Here is a way to go:
^(?:\d{1,10}|(?=\d+\.\d\d?$)[\d.]{3,11})$
Explanation:
^ : begining of string
(?: : start non capture group
\d{1,10} : 1 upto 10 digits
| : OR
(?= : start look ahead
\d+\.\d\d?$ : 1 or more digits then a dot then 1 or 2 digits
) : end lookahead
[\d.]{3,11} : only digit or dot are allowed, with a length from 3 upto 11
) : end group
$ : end of string
In action:
#!/usr/bin/perl
use Modern::Perl;
my $re = qr~^(?:\d{1,10}|(?=\d+\.\d\d?$)[\d.]{3,11})$~;
while(<DATA>) {
chomp;
say (/$re/ ? "OK: $_" : "KO: $_");
}
__DATA__
1
123
1.2
1234567890
1234567.1
12345678.10
123456789.1
12345678901
1.2.3
Output:
OK: 1
OK: 123
OK: 1.2
OK: 1234567890
OK: 1234567.1
OK: 12345678.10
OK: 123456789.1
KO: 12345678901
KO: 1.2.3

The solution using String.prototype.match() and RegExp.prototype.text() functions:
var isValid = function (num) {
return /^\d+(\.\d+)?$/.test(num) && String(num).match(/\d/g).length <= 10;
};
console.log(isValid(1234567890));
console.log(isValid(12345678.10));
console.log(isValid(12345678901));
console.log(isValid('123d3457'));

you can break your pattern in 3 step:
First step
You need at least 8 digit + 1 or 2 precision that both are optional
\d{8}\.?\d?\d? Here . and both digit are optional
Second step
You need at least 9 digit + 1 precision and that's it
\d{9}\.?\d? Here . and digit are optional
Then you can mix these three rule together with or | keyword
^(\d{8}\.?\d?\d?|\d{9}\.?\d?)$
Okay now this regex only matches 7 to 10 digit with 1 or 2 precision
It never matches less than 8 digit and a tricky part is here that you can change second step \d{8} with \d{1,8} and then It match from 1 to 9999999999 and plus 1 or 2 precision.
what you want:
^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$
echo 1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1
echo 9999999999 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
9999999999
echo 1.1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1.1
echo 1.12 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1.12
echo 1234567.1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1234567.1
echo 1234567.12 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1234567.12
echo 99999999.9 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
99999999.9
echo 99999999.99 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
99999999.99
not match
echo 1.111 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 1234567.111 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 123456781.11 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 1234567891.1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 123456789101 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'

Convert HH:MM into decimal hours

I am trying to convert some time stamps from text file in format HH:MM into number format (for example, 12:30 -> 12,5)1 using a Perl regex for easier processing in future.
I am quite new in this topic so I am struggling with MM part and I don't know how to convert it. Currently I have something like this:
while ( <FILE> ) {
$line = $_;
$line =~ s/([0[0-9]|1[0-9]|2[0-3]):([0-5][0-9])/$2,$1/g;
print $line;
}
1) In my locale, the comma , is used for decimal points. Imagine a . So this means 12 and a half, or 12.5.

I would not use a regular expression for converting. It can be done with pretty simple math. Parse out the times using your search pattern, and then pass it through something like this.
sub to_decimal {
my $time = shift;
my ($hours, $minutes) = split /:/, $time;
my $decimal = sprintf '%.02d', ($minutes / 60) * 100 ;
return join ',', $hours, $decimal;
}
If you run it in a loop like this:
for (qw(00 01 05 10 15 20 25 30 35 40 45 50 55 58 59)) {
say "$_ => " . to_decimal("12:$_");
}
You get:
00 => 12,00
01 => 12,01
05 => 12,08
10 => 12,16
15 => 12,25
20 => 12,33
25 => 12,41
30 => 12,50
35 => 12,58
40 => 12,66
45 => 12,75
50 => 12,83
55 => 12,91
58 => 12,96
59 => 12,98

perl -ple 's|(\d\d):(\d\d)|{$2/60 + $1}|eg'
Your locale should take care of the comma, i think

This will achieve what you need. It uses an executable substitution to replace the time string by an expression in terms of the hour and minute values. tr/./,/r is used to covert all dots to commas
use strict;
use warnings 'all';
while ( <DATA> ) {
s{ ( 0[0-9] | 1[0-9] | 2[0-3] ) : ( [0-5][0-9] ) }{
sprintf('%.2f', $1 + $2 / 60) =~ tr/./,/r
}gex;
print;
}
__DATA__
00:00
05:17
12:30
15:59
23:59
output
0,00
5,28
12,50
15,98
23,98

You only have to adjust the substitution tomake it work:
$line =~ s/(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9])/"$1," . substr( int($2)\/60, 2)/eg;
The e modifier causes the substituting content to be eval'ed, thus you can write the intended result as kind of a formula contingent on the capture group contents. Note that the substr call eliminates the leading 0, in the string representation of fractions.
If you need to limit your self to a given number of fraction digits, format the result of the division using sprintf:
$line =~ s/(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9])/"$1," . substr( sprintf('%.2f', int($2)\/60), 2)/eg;

You could use egrep and awk:
$ echo 12:30 | egrep -o '([0[0-9]|1[0-9]|2[0-3]):([0-5][0-9])' | awk -F":" '{printf $1+$2/60}'
12.5

Assume your LC_NUMERIC is correct:
while (<FILE>) {
use locale ':not_characters';
my $line = $_;
$line =~ s!\b([01][0-9]|2[0-3]):([0-5][0-9])\b!$1 + $2/60!eg;
print $line;
}

Remove spaces between words only not between numbers

I have a string consist of words, special characters (*, |, ( etc.) and numbers(floating). I want to remove white spaces between only words and special characters. Spaces between numbers should not be removed. How I can do it in Perl?
E.g.:
Rama 1 * 2.34 * ( L - 0.45 ) XYZ 10 20.05 30.06 40 P > 25.
It should be after conversion:
Rama1*2.34*(L-0.45)XYZ 10 20.05 30.06 40 P>25.

(?<!\d)\h+|\h+(?!\d)
You can use lookarounds here.See demo.
https://regex101.com/r/uF4oY4/62

You may use the below lookaround based regex.
perl -pe 's/\s+(?=\D)|(?<=\D)\s+//g' file
Example:
$ echo 'Rama 1 * 2.34 * ( L - 0.45 ) XYZ 10 20.05 30.06 40 P > 25.' | perl -pe 's/\s+(?=\D)|(?<=\D)\s+//g'
Rama1*2.34*(L-0.45)XYZ10 20.05 30.06 40P>25.
or
$ echo 'Rama 1 * 2.34 * ( L - 0.45 ) XYZ 10 20.05 30.06 40 P > 25.' | perl -pe 's/(?<=[^\s\w])\s+|\s+(?=[^\w\s])//g'
Rama 1*2.34*(L-0.45)XYZ 10 20.05 30.06 40 P>25.

Separate string of digits into 3 columns using awk/sed

I have a string of digits in rows as below:
6390212345678912011012112121003574820069121409100000065471234567810
6390219876543212011012112221203526930428968109100000065478765432196
That I need to split into 6 columns as below:
639021234567891,201101211212100,3574820069121409,1000000,654712345678,10
639021987654321,201101211222120,3526930428968109,1000000,654787654321,96
Conditions:
Field 1 = 15 Char
Field 2 = 15 Char
Field 3 = 15 or 16 Char
Field 4 = 7 Char
Field 5 = 12 Char
Field 6 = 2 Char
Final Output:
639021234567891,3574820069121409,654712345678
639021987654321,3526930428968109,654787654321

It's not clear how detect whether field 3 should have 15 or 16 chars. But as draft for the first 3 fields you could use something like that:
echo 63902910069758520110121121210035748200670169758510 |
awk '{ printf("%s,%s,%s",substr($1,1,15),substr($1,16,15),substr($1,30,15)); }'

Or with sed:
echo $NUM | sed -r 's/^([0-9]{16})([0-9]{15})([0-9]{15,16}) ...$/\1,\2,\3, .../'
This will use 15 or 16 for the length of field 3 based the length of the whole string.

If you're using gawk:
gawk -v f3w=16 'BEGIN {OFS=","; FIELDWIDTHS="15 15 " f3w " 7 12 2"} {print $1, $3, $5}'
Do you know ahead of time what the width of Field 3 should be? Do you need it to be programatically determined? How? Based on the total length of the line? Does it change line-by-line?
Edit:
If you don't have gawk, then this is a similar approach:
awk -v f3w=16 'BEGIN {OFS=","; FIELDWIDTHS="15 15 " f3w " 7 12 2"; n=split(FIELDWIDTHS,fw," ")} { p=1; r=$0; for (i=1;i<=n;i++) { $i=substr(r,p,fw[i]); p += fw[i]}; print $1,$3,$5}'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

match regexp against variable string in perl/sed/awk - regex

echo "12 h, 3 min, 56 s," | tr -cd "0-9 " Output: 12 3 56 echo "12 h, 3 min, 56 s," | tr "," "\n" | awk '/h/ {print $1}' 12 echo "12 h, 3 min, 56 s," | tr "," "\n" | awk '/min/ {print $1}' 3 echo "12 h, 3 min, 56 s," | tr "," "\n" | awk '/s/ {print $1}' 56

Try this (Perl): my #matches = "1 h, 30 min, 15 s" =~ /(\d{1,2}) [hms]/g; Or a bit stricter my #matches = "1 h, 30 min, 15 s" =~ /(\d{1,2}) (?:h|min|s)/g; if(scalar #matches == 3) { my ($h, $mi, $s) = #matches; print "$h : $mi : $s\n"; }

Related

How to match variable length number range 0 to $n using perl regex?

What is the regular expression for a total 10 digit number with a decimal precision of 1 or 2?

Convert HH:MM into decimal hours

Remove spaces between words only not between numbers

Separate string of digits into 3 columns using awk/sed

Categories

Resources