regex expression help for string - regex

I have a string
2045111780&&-3&5&-7
I want a regex to give me groups as:
2045111780
&&-
3
... and then next groups as
3
&
5
... and so on.
I came up with (\d+)(&&?-?)? but that gives me groups as:
2045111780
&&-
... and then next groups as
3
&
... and so on.
Note that I need the delim ( regex: &&?-? )
Thanks.
update1: changed the groups output.

I think it's not possible to share a match between groups (the -3 in your example). So, I recommend to do a 2 line processing: split the spring and take 2 pairs in an array. For example, using Perl:
$a = "2045111780&&-3&5&-7";
#pairs = split /&+/, $a;
# at this point you get $pairs[0] = '2045111780', $pairs[1] = '-3', ...

How about (-?\d+|&+). It will match numbers with an optional minus and sequences of &s.

If I understand correctly, you want to have overlapping matches.
You could use a regex like (-?\d+)(&&?)(-?\d+) and match it repeatedly until it fails, each time removing the beginning of the given string up to the start of the third group.

You could do it in perl like this:
$ perl -ne 'while (/(-?\d+)(&&?)(-?\d+)/g) { print $1, " ", $2, " ", $3, "\n"; pos() -= length($3); }'
2045111780&&-3&5&-7 # this is the input
2045111780 && -3
-3 & 5
5 & -7
But that's very ugly. The split approach by Miguel Prz is much, much cleaner.

Related

Perl Regex get last digits in string before \

Do you know of a way to combine these 2 regular expressions?
Or any other way to just get the last 6 digits before the last \.
The end result that I want is 100144 from the string:
\\XXX\Extract_ReduceSize\MonitoringExport\dev\files\100144\
Here are some things I have tried
(.{1})$
Gets rid of the trailing \ of the string resulting in
\\XXX\Extract_ReduceSize\MonitoringExport\dev\files\100144
.*\\
Gets rid of everything before the last \ resulting in 100144
The software I am using requires just one line. So I can perform 2 calls.
Since you want the last one, would ([^\\]*)\\$ be appropriate? This matches with as many non-slash characters as possible before the last slash. Alternatively, if you don't want to extract the first group, you can do a lookahead with ([^\\]+)(?=\\$).
This code shows two different solutions. Hope it helps:
use strict;
use warnings;
my $example = '\\XXX\Extract_ReduceSize\MonitoringExport\dev\files\100144\\';
# Method 1: split the string by the \ character. This gives us an array,
# and then, select the last element of that array [-1]
my $number = (split /\\/, $example)[-1];
print $number, "\n"; # <-- prints: 100144
# Method 2: use a regexpr. Search in reverse mode ($),
# and catch the number part (\d+) in $1
if( $example =~ m!(\d+)\\$! ) {
print $1, "\n"; # <-- prints: 100144
}
This works to extract the last segment of digits:
(\d+)(?=\\$)

Arithmetic Calculation in Perl Substitute Pattern Matching

Using just one Perl substitute regular expression statement (s///), how can we write below:
Every success match contains just a string of Alphabetic characters A..Z. We need to substitute the match string with a substitution that will be the sum of character index (in alphabetical order) of every character in the match string.
Note: For A, character index would be 1, for B, 2 ... and for Z would be 26.
Please see example below:
success match: ABCDMNA
substitution result: 38
Note:
1 + 2 + 3 + 4 + 13 + 14 + 1 = 38;
since
A = 1, B = 2, C = 3, D = 4, M = 13, N = 14 and A = 1.
I will post this as an answer, I guess, though the credit for coming up with the idea should go to abiessu for the idea presented in his answer.
perl -ple'1 while s/(\d*)([A-Z])/$1+ord($2)-64/e'
Since this is clearly homework and/or of academic interest, I will post the explanation in spoiler tags.
- We match an optional number (\d*), followed by a letter ([A-Z]). The number is the running sum, and the letter is what we need to add to the sum.
- By using the /e modifier, we can do the math, which is add the captured number to the ord() value of the captured letter, minus 64. The sum is returned and inserted instead of the number and the letter.
- We use a while loop to rinse and repeat until all letters have been replaced, and all that is left is a number. We use a while loop instead of the /g modifier to reset the match to the start of the string.
Just split, translate, and sum:
use strict;
use warnings;
use List::Util qw(sum);
my $string = 'ABCDMNA';
my $sum = sum map {ord($_) - ord('A') + 1} split //, $string;
print $sum, "\n";
Outputs:
38
Can you use the /e modifier in the substitution?
$s = "ABCDMNA";
$s =~ s/(.)/$S += ord($1) - ord "#"; 1 + pos $s == length $s ? $S : ""/ge;
print "$s\n"
Consider the following matching scenario:
my $text = "ABCDMNA";
my $val = $text ~= s!(\d)*([A-Z])!($1+ord($2)-ord('A')+1)!gr;
(Without having tested it...) This should repeatedly go through the string, replacing one character at a time with its ordinal value added to the current sum which has been placed at the beginning. Once there are no more characters the copy (/r) is placed in $val which should contain the translated value.
Or an short alternative:
echo ABCDMNA | perl -nlE 'm/(.)(?{$s+=-64+ord$1})(?!)/;say$s'
or readable
$s = "ABCDMNA";
$s =~ m/(.)(?{ $sum += ord($1) - ord('A')+1 })(?!)/;
print "$sum\n";
prints
38
Explanation:
trying to match any character what must not followed by "empty regex". /.(?!)/
Because, an empty regex matches everything, the "not follow by anything", isn't true ever.
therefore the regex engine move to the next character, and tries the match again
this is repeated until is exhausted the whole string.
because we want capture the character, using capture group /(.)(?!)/
the (?{...}) runs the perl code, what sums the value of the captured character stored in $1
when the regex is exhausted (and fails), the last say $s prints the value of sum
from the perlre
(?{ code })
This zero-width assertion executes any embedded Perl code. It always
succeeds, and its return value is set as $^R .
WARNING: Using this feature safely requires that you understand its
limitations. Code executed that has side effects may not perform
identically from version to version due to the effect of future
optimisations in the regex engine. For more information on this, see
Embedded Code Execution Frequency.

Convert string with preg_replace in PHP

I have this string
$string = "some words and then #1.7 1.7 1_7 and 1-7";
and I would like that #1.7/1.7/1_7 and 1-7 to be replaced by S1E07.
Of course, instead of "1.7" is just an example, it could be "3.15" for example.
I managed to create the regular expression that would match the above 4 variants
/\#\d{1,2}\.\d{1,2}|\d{1,2}_\d{1,2}|\d{1,2}-\d{1,2}|\d{1,2}\.\d{1,2}/
but I cannot figure out how to use preg_replace (or something similar?) to actually replace the matches so they end up like S1E07
You need to use preg_replace_callback if you need to pad 0 if the number less than 10.
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$string = preg_replace_callback('/#?(\d+)[._-](\d+)/', function($matches) {
return 'S'.$matches[1].'E'.($matches[2] < 10 ? '0'.$matches[2] : $matches[2]);
}, $string);
You could use this simple string replace:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/', 'S${1}E${2}', $string);
But it would not yield zero-padded numbers for the episode number:
// some words and then S1E7 S1E7 S1E7 and S1E7
You would have to use the evaluation modifier:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/e', '"S".str_pad($1, 2, "0", STR_PAD_LEFT)."E".str_pad($2, 2, "0", STR_PAD_LEFT)', $string);
...and use str_pad to add the zeroes.
// some words and then S01E07 S01E07 S01E07 and S01E07
If you don't want the season number to be padded you can just take out the first str_pad call.
I believe this will do what you want it to...
/\#?([0-9]+)[._-]([0-9]+)/
In other words...
\#? - can start with the #
([0-9]+) - capture at least one digit
[._-] - look for one ., _ or -
([0-9]+) - capture at least one digit
And then you can use this to replace...
S$1E$2
Which will put out S then the first captured group, then E then the second captured group
You need to put brackets around the parts you want to reuse ==> capture them. Then you can access those values in the replacement string with $1 (or ${1} if the groups exceed 9) for the first group, $2 for the second one...
The problem here is that you would end up with $1 - $8, so I would rewrite the expression into something like this:
/#?(\d{1,2})[._-](\d{1,2})/
and replace with
S${1}E${2}
I tested it on writecodeonline.com:
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$result = preg_replace('/#?(\d{1,2})[._-](\d{1,2})/', 'S${1}E${2}', $string);

Is there a way to evaluate the number of times a Perl regular expression has matched?

I've been poring over perldoc perlre as well as the Regular Expressions Cookbook and related questions on Stack Overflow and I can't seem to find what appears to be a very useful expression: how do I know the number of current match?
There are expressions for the last closed group match ($^N), contents of match 3 (\g{3} if I understood the docs correctly), $', $& and $`. But there doesn't seem to be a variable I can use that simply tells me what the number of the current match is.
Is it really missing? If so, is there any explained technical reason why it is a hard thing to implement, or am I just not reading the perldoc carefully enough?
Please note that I'm interested in a built-in variable, NOT workarounds like using (${$count++}).
For context, I'm trying to build a regular expression that would match only some instances of a match (e.g. match all occurrences of character "E" but do NOT match occurrences 3, 7 and 10 where 3, 7 and 10 are simply numbers in an array). I ran into this when trying to construct a more idiomatic answer to this SO question.
I want to avoid evaluating regexes as strings to actually insert 3, 7 and 10 into the regex itself.
I'm completely ignoring the actually utility or wisdom of using this for the other question.
I thought #- or #+ might do what you want since they hold the offsets of the numbered matches, but it looks like the regex engine already knows what the last index will be:
use v5.14;
use Data::Printer;
$_ = 'abc123abc345abc765abc987abc123';
my #matches = m/
([0-9]+)
(?{
print 'Matched \$' . $#+ . " group with $^N\n";
say p(#+);
})
.*?
([0-9]+)
(?{
print 'Matched \$' . $#+ . " group with $^N\n";
say p(#+);
})
/x;
say "Matches: #matches";
This gives strings that show the last index as 2 even though it hasn't matched $2 yet.
Matched \$2 group with 123
[
[0] 6,
[1] 6,
[2] undef
]
Matched \$2 group with 345
[
[0] 12,
[1] 6,
[2] 12
]
Matches: 123 345
Notice that the first time around, $+[2] is undef, so that one hasn't been filled in yet. You might be able to do something with that, but I think that's probably getting away from the spirit of your question. If you were really fancy, you could create a tied scalar that has the value of the last defined index in #+, I guess.
I played around with this for a bit. Again, I know that this is not really what you are looking for, but I don't think that exists in the way you want it.
I had two thoughts. First, with a split using separator retention mode, you get the interstitial bits as the odd numbered elements in the output list. With the list from the split, you count which match you are on and put it back together how you like:
use v5.14;
$_ = 'ab1cdef2gh3ij4k5lmn6op7qr8stu9vw10xyz';
my #bits = split /(\d+)/; # separator retention mode
my #skips = qw(3 7 10);
my $s;
while( my( $index, $value ) = each #bits ) {
# shift indices to match number ( index = 2 n - 1 )
if( $index % 2 and ! ( ( $index + 1 )/2 ~~ #skips ) ) {
$s .= '^';
}
else {
$s .= $value;
}
}
I get:
ab^cdef^gh3ij^k^lmn^op7qr^stu^vw10xyz
I thought I really liked my split answer until I had the second thought. Does state work inside a substitution? It appears that it does:
use v5.14;
$_ = 'ab1cdef2gh3ij4k5lmn6op7qr8stu9vw10xyz';
my #skips = qw(3 7 10);
s/(\d+)/
state $n = 0;
$n++;
$n ~~ #skips ? $1 : '$'
/eg;
say;
This gives me:
ab$cdef$gh3ij$k$lmn$op7qr$stu$vw10xyz
I don't think you can get much simpler than that, even if that magic variable existed.
I had a third thought which I didn't try. I wonder if state works inside a code assertion. It might, but then I'd have to figure out how to use one of those to make a match fail, which really means it has to skip over the bit that might have matched. That seems really complicated, which is probably what Borodin was pressuring you to show even in pseudocode.

how to extract a single digit in a number using regexp

set phoneNumber 1234567890
this number single digit, i want divide this number into 123 456 7890 by using regexp. without using split function is it possible?
The following snippet:
regexp {(\d{3})(\d{3})(\d{4})} "8144658695" -> areacode first second
puts "($areacode) $first-$second"
Prints (as seen on ideone.com):
(814) 465-8695
This uses capturing groups in the pattern and subMatchVar... for Tcl regexp
References
http://www.hume.com/html84/mann/regexp.html
regular-expressions.info/Brackets for Capturing
On the pattern
The regex pattern is:
(\d{3})(\d{3})(\d{4})
\_____/\_____/\_____/
1 2 3
It has 3 capturing groups (…). The \d is a shorthand for the digit character class. The {3} in this context is "exactly 3 repetition of".
References
regular-expressions.info/Repetition, Character Class
my($number) = "8144658695";
$number =~ m/(\d\d\d)(\d\d\d)(\d\d\d\d)/;
my $num1 = $1;
my $num2 = $2;
my $num3 = $3;
print $num1 . "\n";
print $num2 . "\n";
print $num3 . "\n";
This is writen for Perl and works assuming the number is in the exact format you specified, hope this helps.
This site might help you with regex
http://www.troubleshooters.com/codecorn/littperl/perlreg.htm