Regex equivalent - regex

What is the regex equivalent of $string=~/[^x]/ if x is replaced by multi-character string say xyz ? i.e string doesn't contain contain xyz
I eventually want to match
$string = 'beginning string xyz remaining string which doesn't contain xyz';
using
$string =~/(<pattern>)xyz(<pattern>)xyz/
so that
$1 = 'beginning string '
$2 = ' remaining string which doesn't contain '

(?:(?!STRING).)* is to STRING as [^CHAR]* is to CHAR.
(Actually, far more than just strings can be used in this fashion. For example, you can use STRING1|STRING2 just as well as for STRING.)
$string =~ /
( (?:(?!xyz).)* )
xyz
( (?:(?!xyz).)* )
xyz
/sx
If that matches, that will always match at position zero, so let's anchor it to prevent needless backtracking on failure.
$string =~ /
^
( (?:(?!xyz).)* )
xyz
( (?:(?!xyz).)* )
xyz
/sx

In your particular case, a non-greedy .* will work. That is:
(.*?)xyz(.*?)xyz
will give you what you're looking for, as shown in http://rubular.com/r/RtaMG6ZvWK
However, as pointed out in the comment from #ikegami below, this is a fragile approach. And it turns out there is a "string" counterpart to the character-based [^...] construct, as shown in #ikegami's answer https://stackoverflow.com/a/20367916/1008891
You can see this in rubular at http://rubular.com/r/zsO1F0nkXu

while (<>) {
if (/(([^x]|x(?!yz))+)xyz(([^x]|x(?!yz))+)xyz/) {
printf("'%s' '%s'\n", $1, $3);
}
}

Related

How to verify if a variable value contains a character and ends with a number using Perl

I am trying to check if a variable contains a character "C" and ends with a number, in minor version. I have :
my $str1 = "1.0.99.10C9";
my $str2 = "1.0.99.10C10";
my $str3 = "1.0.999.101C9";
my $str4 = "1.0.995.511";
my $str5 = "1.0.995.AC";
I would like to put a regex to print some message if the variable has C in 4th place and ends with number. so, for str1,str2,str3 -> it should print "matches". I am trying below regexes, but none of them working, can you help correcting it.
my $str1 = "1.0.99.10C9";
if ( $str1 =~ /\D+\d+$/ ) {
print "Candy match1\n";
}
if ( $str1 =~ /\D+C\d+$/ ) {
print "Candy match2\n";
}
if ($str1 =~ /\D+"C"+\d+$/) {
print "candy match3";
}
if ($str1 =~ /\D+[Cc]+\d+$/) {
print "candy match4";
}
if ($str1 =~ /\D+\\C\d+$/) {
print "candy match5";
}
if ($str1 =~ /C[^.]*\d$/)
C matches the letter C.
[^.]* matches any number of characters that aren't .. This ensures that the match won't go across multiple fields of the version number, it will only match the last field.
\d matches a digit.
$ matches the end of the string. So the digit has to be at the end.
I found it really helpful to use https://www.regextester.com/109925 to test and analyse my regex strings.
Let me know if this regex works for you:
((.*\.){3}(.*C\d{1}))
Following your format, this regex assums 3 . with characters between, and then after the third . it checks if the rest of the string contains a C.
EDIT:
If you want to make sure the string ends in a digit, and don't want to use it to check longer strings containing the formula, use:
^((.*\.){3}(.*C\d{1}))$
Lets look what regex should look like:
start{digit}.{digit}.{2-3 digits}.{2-3 digits}C{1-2 digits}end
very very strict qr/^1\.0\.9{2,3}\.101?C\d+\z/ - must start with 1.0.99[9]?.
very strict qr/^1\.\0.\d{2,3}\.\d{2,3}C\d{1,2}\z/ - must start with 1.0.
strict qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d{1,2}\z/
relaxed qr/^\d\.\d\.\d+\.\d+C\d+\z/
very relaxed qr/\.\d+C\d+\z/
use strict;
use warnings;
use feature 'say';
my #data = qw/1.0.99.10C9 1.0.99.10C10 1.0.999.101C9 1.0.995.511 1.0.995.AC/;
#my $re = qr/^\d\.\d\.\d+\.\d+C\d+\z/;
my $re = qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d+\z/;
say '--- Input Data ---';
say for #data;
say '--- Matching -----';
for( #data ) {
say 'match ' . $_ if /$re/;
}
Output
--- Input Data ---
1.0.99.10C9
1.0.99.10C10
1.0.999.101C9
1.0.995.511
1.0.995.AC
--- Matching -----
match 1.0.99.10C9
match 1.0.99.10C10
match 1.0.999.101C9

Matching a variable in a string in Perl from the end

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.
There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).
The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list
I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output
To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

Regular expressions to match protected separated values

I'd like to have a regular expression to match a separated values with some protected values that can contain the separator character.
For instance:
"A,B,{C,D,E},F"
would give:
"A"
"B"
"{C,D,E}"
"F"
Please note the protected values can be nested, as follows:
"A,B,{C,D,{E,F}},G"
would give:
"A"
"B"
"{C,D,{E,F}}"
"G"
I already coded that feature with a character iteration as follow:
sub Parse
{
my #item;
my $curly;
my $string;
foreach(split //)
{
$_ eq "{" and ++$curly;
$_ eq "}" and --$curly;
if(!$curly && /[,:]/)
{
push #item, $string;
undef $string;
next;
}
$string .= $_;
}
push #item, $string;
return #item;
}
But it would definitively be so much nicer with a regexp.
A regex that supports nesting would look as follows:
my #items;
push #items, $1 while
/
(?: ^ | \G , )
(
(?: [^,{}]+
| (
\{
(?: [^{}]
| (?2)
)*
\}
)
| # Empty
)
)
/xg;
$ perl -E'$_ = shift; ... say for #items;' 'A,B,{C,D,{E,F}},G'
A
B
{C,D,{E,F}}
G
Assumes valid input since it can't extract and validate at the same time. (Well, not without making things really messy.)
Improved from nhahtdh's answer.
$_ = "A,B,{C,D,E},F";
while ( m/(\{.*?\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "[$&]\n";
}
Improved it again. Please look at this one!
$_ = "A,B,{C,D,{E,F}},G";
while ( m/(\{.*\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "$&\n";
}
It will get:
A
B
{C,D,{E,F}}
G
$a = "A,B,{C,D,E},F";
while ($a =~ s/(\{[\{\}\w,]+\}|\w)//) {
push (#res, $1);
}
print "\#res: #res\n"
Result:
#res: A B {C,D,E} F
Explanation : we try to match either the protected block \{[\{\}\w,]+\} or just a single character \w successively in a loop, deleting it from the original string if there is a match. Every time there is a match, we store it (meaning the $1) in the array, et voilĂ !
Here is a regex in bash:
chronos#localhost / $ echo "A,B,{C,D,E},F" | grep -oE "(\{[^\}]*\}|[A-Z])"
A
B
{C,D,E}
F
Try this regex. Use the regex to match and extract the token.
/(\{.*?\}|(?<=,|^).*?(?=,|$))/
I have not tested this code in Perl.
There is an assumption about on how the regex engine works here (I assume that it will try to match the first part \{.*?\} before the second part). I also assume that there are no nested curly bracket, and badly paired curly brackets.
$s = "A,B,{C,D,E},F";
#t = split /,(?=.*{)|,(?!.*})/, $s;

Removing delimiters from a date/time string

I want to take this
Code:
2010-12-21 20:00:00
and make it look like this:
Code:
20101221200000
This is the last thing I tried
Code:
#!/usr/bin/perl -w
use strict;
my ($teststring) = '2010-12-21 20:00:00';
my $result = " ";
print "$teststring\n";
$teststring =~ "/(d\{4\})(d\{3\})(d\{3\})(d\{3\})(d\{3\})(d\{3\})/$result";
{
print "$_\n";
print "$result\n";
print "$teststring\n";
}
And it produced this:
Code:
nathan#debian:~/Desktop$ ./ptest
2010-12-21 20:00:00
Use of uninitialized value $_ in concatenation (.) or string at ./ptest line 8.
2010-12-21 20:00:00
nathan#debian:~/Desktop$
-Thanks
First, here is the problem with your code:
$teststring =~ "/(d\{4\})(d\{3\})(d\{3\})(d\{3\})(d\{3\})(d\{3\})/$result";
You want to use =~ with the substitution operator s///. That is, the right hand side should not be a plain string, but s/pattern/replacement/.
In the pattern part, \d would denote a digit. However, \d includes all sorts characters that are in the Unicode digit class, so it is safer to use the character class [0-9] if that's what you want to match against. [0-9]{4} would mean match characters 0 through 9 four times. Note that you should not escape the curly brackets { and }.
The parentheses ( and ) define capture groups. In the replacement part, you want to keep the stuff you captured, and ignore the stuff you did not.
In addition, I am assuming these timestamps occur in other input, and you do not want to accidentally replace stuff you did not mean to (by blindly removing all non-digits).
Below, I use the /x modifier for the s/// operator so I can format the pattern more clearly using white-space.
#!/usr/bin/perl
use strict; use warnings;
while ( <DATA> ) {
s{
^
([0-9]{4})-
([0-9]{2})-
([0-9]{2})[ ]
([0-9]{2}):
([0-9]{2}):
([0-9]{2})
}{$1$2$3$4$5$6}x;
print;
}
__DATA__
Code:
2010-12-21 20:00:00
or, using named capture groups introduced in 5.10 can make the whole thing slightly more readable:
#!/usr/bin/perl
use 5.010;
while ( <DATA> ) {
s{
^
( ?<year> [0-9]{4} ) -
( ?<month> [0-9]{2} ) -
( ?<day> [0-9]{2} ) [ ]
( ?<hour> [0-9]{2} ) :
( ?<min> [0-9]{2} ) :
( ?<sec> [0-9]{2} )
}
{
local $";
"#+{qw(year month day hour min sec)}"
}ex;
print;
}
__DATA__
Code:
2010-12-21 20:00:00
Use a regular expression to replace all non-digits ([^\d] or [\D]) with the empty string:
$ perl -e '$_ = "2010-12-21 20:00:00"; s/[\D]//g; print $_;'
20101221200000
Can't you just remove anything that's not a digit?
s/[^\d]//g
in sed format, can't remember the perl.
($result = $teststring) =~ y/0-9//cd;

How can I match a pipe character followed by whitespace and another pipe?

I am trying to find all matches in a string that begins with | |.
I have tried: if ($line =~ m/^\\\|\s\\\|/) which didn't work.
Any ideas?
You are escaping the pipe one time too many, effectively escaping the backslash instead.
print "YES!" if ($line =~ m/^\|\s\|/);
Pipe character should be escaped with a single backslash in a Perl regex. (Perl regexes are a bit different from POSIX regexes. If you're using this in, say, grep, things would be a bit different.) If you're specifically looking for a space between them, then use an unescaped space. They're perfectly acceptable in a Perl regex. Here's a brief test program:
my #lines = <DATA>;
for (#lines) {
print if /^\| \|/;
}
__DATA__
| | Good - space
|| Bad - no space
| | Bad - tab
| | Bad - beginning space
Bad - no bars
If it's a literal string you're searching for, you don't need a regular expression.
my $search_for = '| |';
my $search_in = whatever();
if ( substr( $search_in, 0, length $search_for ) eq $search_for ) {
print "found '$search_for' at start of string.\n";
}
Or it might be clearer to do this:
my $search_for = '| |';
my $search_in = whatever();
if ( 0 == index( $search_in, $search_for ) ) {
print "found '$search_for' at start of string.\n";
}
You might also want to look at quotemeta when you want to use a literal in a regexp.
Remove the ^ and the double back-slashes. The ^ forces the string to be at the beginning of the string. Since you're looking for all matches in one string, that's probably not what you want.
m/\|\s\|/
What about:
m/^\|\s*\|/