Replace a given substring before the digits - regex

In a string like /p20 (can be any number) i want to replace the /p into /pag- but keep the number /pag-20
This is what i tried:
preg_replace('/\/p+[0-9]/', '/pag-', $string);
but the result is /pag-0

Use a capturing group and a backreference:
$string = "/p-20";
echo preg_replace('~/p-([0-9])~', '/pag-$1', $string);
^^^^^^^ ^^
Here, /p- matches a literal substring and ([0-9]) matches and captures any 1 digit into Group 1 that can be referred to with $1 backreference from the replacement pattern.
Alternatively, you may use a lookahead based solution:
preg_replace('~/p-(?=[0-9])~', '/pag-', $string);
See the PHP demo
Here, no backreference is necessary as the (?=[0-9]) positive lookahead does not consume text it matches, i.e. it does not add the text to the match value.

Related

regular expression

How do I write a regular expression to get that returns only the letters and numbers without the asterisks in between ?
You could use a regex replacement here:
my $var = 'RMRIV43069411**2115.82';
$var =~ s/^.*?\D(\d+(?:\.\d+)*)$/$1/g;
print "$var"; // 2115.82
The idea is to capture the final number in the string, and then replace with only that captured quantity.
Here is an explanation of the pattern:
^ from the start of the input
.*? consume all content up until
\D the first non digit character, which is followed by
(\d+(?:\.\d+)*) match AND capture: a number, with optional decimal component,
occurring before
$ the end of the input
Then, we place with just this captured number, which is available in $1.

Non-Capturing and Capturing Groups - The right way

I'm trying to match an array of elements preceeded by a specific string in a line of text. For Example, match all pets in the text below:
fruits:apple,banana;pets:cat,dog,bird;colors:green,blue
/(?:pets:)(\w+[,|;])+/g**
Using the given regex I only could match the last word "bird"
Can anybody help me to understand the right way of using Non-Capturing and Capturing Groups?
Thanks!
First, let's talk about capturing and non-capturing group:
(?:...) non-capturing version, you're looking for this values, but don't need it
() capturing version, you want this values! You're searching for it
So:
(?:pets:) you searching for "pets" but don't want to capture it, after that point, you WANT to capture (if I've understood):
So try (?:pets:)([a-zA-Z,]+); ... You're searching for "pets:" (but don't want it !) and stop at the first ";" (and don't want it too).
Result is :
Match 1 : cat,dog,bird
A better solution exists with 1 match == 1 pet.
Since you want to have each pet in a separate match and you are using PCRE \G is, as suggested by Wiktor, a decent option:
(?:pets:)|\G(?!^)(\w+)(?:[,;]|$)
Explanation:
1st Alternative (?:pets:) to find the start of the pattern
2nd Alternative \G(?!^)(\w+)(?:[,;]|$)
\G asserts position at the end of the previous match or the start of the string for the first match
Negative Lookahead (?!^) to assert that the Regex does not match at the start of the string
(\w+) to matches the pets
Non-capturing group (?:[,;]|$) used as a delimiter (matches a single character in the list ,; (case sensitive) or $ asserts position at the end of the string
Perl Code Sample:
use strict;
use Data::Dumper;
my $str = 'fruits:apple,banana;pets:cat,dog,bird;colors:green,blue';
my $regex = qr/(?:pets:)|\G(?!^)(\w+)(?:[,;]|$)/mp;
my #result = ();
while ( $str =~ /$regex/g ) {
if ($1 ne '') {
#print "$1\n";
push #result, $1;
}
}
print Dumper(\#result);

Match a string between multiple whitespaces

Hello can anyone help me with a regex to match a string between multiple whitespaces
My string may look like this :
This is just Nicolas-764 sdh and his sister
I want to match Nicolas-764 sdh
So far I wrote this but it matches all the string after the first whitespaces
if ($string =~ m/(just) {5,}(.*) {5,}/) {
print "$1\n";
print "$2\n";
}
I want to create a hash that will have as key just and as value Nicolas-764 sdh.
I don't want to just match a string between multiple spaces. I need to use just too
You're suffering from greedy matching .*.
You simply need to change to non-greedy matching using .*?.
use strict;
use warnings;
my $string = 'This is just Nicolas-764 sdh and his sister';
if ($string =~ m/just\s{5,}(.*?)\s{5,}/) {
print "$1\n";
}
Outputs:
Nicolas-764 sdh
Your code would be,
if ($string =~ m/^.*?just {5,}(\S+)\s+(\S+) {5,}.*$/) {
print "$1\n";
print "$2\n";
}
First group contains Nicolas-764 and the second group contains sdh
DEMO
Or
You could try the below regex also,
^.*?just {5,}(\S+(?:\s\S+)*?) {5,}.*$
Explanation:
^ Asserst that we are at the start of the line.
.*?just This would match upto the first just string. ? after * does a non-greedy match.
{5,} Matches 5 or more spaces.
() Capturing groups.
\S+ One or more non-space characters.
(?:) Non-capturing groups. It won't capture anything. Just matching would be done.
(?:\s\S+)*? Matches a space followed by one or more non-space characters. And the whole would occur zero or more times.
{5,} Matches 5 or more spaces.
.* Matches any character zero or more times.
$ Asserts that we are at the end of the line.

Regular expression Capture and Backrefence

Here's the string I'm searching.
T+4ACCGT+12CAAGTACTACCGT+12CAAGTACTACCGT+4ACCGA+6CTACCGT+12CAAGTACTACCGT+12CAAGTACTACCG
I want to capture the digits behind the number for X digits (X being the previous number) I also want to capture the complete string.
ie the capture should return:
+4ACCG
+12AAGTACTACCGT
etc.
and :
ACCG
AAGTACTACCGT
etc.
Here's the regex I'm using:
(\+(\d+)([ATGCatgcnN]){\2});
and I'm using $1 and $3 for the captures.
What am I missing ?
You can not use a backreference in a quantifier. \1 is a instruction to match what $1 contains, so {\1} is not a valid quantifier. But why do you need to match the exact number? Just match the letters (because the next part starts again with a +).
So try:
(\+\d+([ATGCatgcnN]+));
and find the complete match in $1 and the letters in $2
Another problem in your regex is that your quantifier is outside your third capturing group. That way only the last letter would be in the capturing group. Place the quantifier inside the group to capture the whole sequence.
You can also remove the upper or lower case letters from your class by using the i modifier to match case independent:
/(\+\d+([ATGCN]+))/gi
This loop works because the \G assertion tells the regex engine to begin the search after the last match , (digit(s)), in the string.
$_ = 'T+4ACCGT+12CAAGTACTACCGT+12CAAGTACTACCGT+4ACCGA+6CTACCGT+12CAAGTACTACCGT+12CAAGTACTACCG';
while (/(\d+)/g) {
my $dig = $1;
/\G([TAGCN]{$dig})/i;
say $1;
}
The results are
ACCG
CAAGTACTACCG
CAAGTACTACCG
ACCG
CTACCG
CAAGTACTACCG
CAAGTACTACCG
I think this is correct but not sure :-|
Update: Added the \G assertion which tells the regex to begin immediately after the last matched number.
my #sequences = split(/\+/, $string);
for my $seq (#sequences) {
my($bases) = $seq =~ /([^\d]+)/;
}

Insertion with Regex to format a date (Perl)

Suppose I have a string 04032010.
I want it to be 04/03/2010. How would I insert the slashes with a regex?
To do this with a regex, try the following:
my $var = "04032010";
$var =~ s{ (\d{2}) (\d{2}) (\d{4}) }{$1/$2/$3}x;
print $var;
The \d means match single digit. And {n} means the preceding matched character n times. Combined you get \d{2} to match two digits or \d{4} to match four digits. By surrounding each set in parenthesis the match will be stored in a variable, $1, $2, $3 ... etc.
Some of the prior answers used a . to match, this is not a good thing because it'll match any character. The one we've built here is much more strict in what it'll accept.
You'll notice I used extra spacing in the regex, I used the x modifier to tell the engine to ignore whitespace in my regex. It can be quite helpful to make the regex a bit more readable.
Compare s{(\d{2})(\d{2})(\d{4})}{$1/$2/$3}x; vs s{ (\d{2}) (\d{2}) (\d{4}) }{$1/$2/$3}x;
Well, a regular expression just matches, but you can try something like this:
s/(..)(..)(..)/$1/$2/$3/
#!/usr/bin/perl
$var = "04032010";
$var =~ s/(..)(..)(....)/$1\/$2\/$3/;
print $var, "\n";
Works for me:
$ perl perltest
04/03/2010
I always prefer to use a different delimiter if / is involved so I would go for
s| (\d\d) (\d\d) |$1/$2/|x ;