how to replace a string with a dynamic string - regex

Case 1.
I have a string of alphabets like fthhdtrhththjgyhjdtygbh. Using regex I want to change it to ftxxxxxxxxxxxxxxxxxxxxx, i.e, keep the first two letters and replace the rest by x.
After a lot of googling, I achieved this:
s/^(\w\w)(\w+)/$1 . "x" x length($2)/e;
Case 2.
I have a string of alphabets like sdsABCDEABCDEABCDEABCDEABCDEsdf. Using regex I want to change it to sdsABCDExyxyxyABCDEsdf, i.e, keep the first and last ABCDE and replace the ABCDE in the middle with xy.
I achieved this:
s/ABCDE((ABCDE)+)ABCDE/$len = length($1)\/5; ABCDE."xy"x $len . ABCDE/e;
Problem : I am not happy with my solution to the mentioned problem. Is there any better or neat solution to the mentioned problem.
Contraint : Only one regex have to be used.
Sorry for the poor English in the title and the body of the problem, english isn't my first language. Please ask in comments if anything is not clear.

Task 1: Simplify the password hider regex
Use a Positive Lookbehind Assertion to replace all word characters preceded by two other word characters. This removes the need for the /e Modifier:
my $str = 'fthhdtrhththjgyhjdtygbh';
$str =~ s/(?<=\w{2})\w/x/g;
print $str;
Outputs:
ftxxxxxxxxxxxxxxxxxxxxx
Task 2: Translate inner repeated pattern regex
Use both a Positive Lookbehind and Lookahead Assertion to replace all ABCDE that are bookended by the same string:
my $str = 'sdsABCDEABCDEABCDEABCDEABCDEsdf';
$str =~ s/(?<=(ABCDE))\1(?=\1)/xy/g;
print $str, "\n";
Output:
sdsABCDExyxyxyABCDEsdf

One regex, less redundancy using \1 to refer to first captured group,
s|(ABCDE)\K (\1+) (?=\1)| "xy" x (length($2)/length($1)) |xe;

Related

Matching string between first and last parentheses

I need to pick up all texts between the first and last parentheses but am having a hard time with regex.
What I have is this so far and I'm stuck and don't know hot to proceed further.
/(\w+)\((.*?)\)\s/g)
But it stops at the first ")" that it sees.
Sample:
(me)
(mine)
((me) and (you))
Desired output is
me
mine
(me) and (you)
Your code is almost correct, it would worked only if you would not add the ? in the regex, for example: (I have also removed a couple of things)
/\w+\((.*)\)/
Since you want to capture all text inside parenthesis, you shouldn't use non-greedy quantifier. You can use this regex which uses lookarounds and greedy version .* which captures all text in between ( and ).
(?<=\().*(?=\))
Demo
EDIT: Another alternative solution
Another way to extract same data can be done using following regex which doesn't have any look ahead/behind which is not supported by some regex flavors and might be useful in those situations.
^\((.*)\)$
Here ^\( matches the starting bracket and then (.*) consumes any text in a exhaustive manner and places in first grouping pattern and only stops at last occurrence of ) before end of line.
Demo without lookaround
Here's a non-regex solution. Since you want the absolute first and last instances of fixed substrings, index and rindex find the right positions that you can feed to substr:
#!/usr/bin/perl
use v5.10;
while( <DATA> ) {
chomp;
my $start = 1 + index $_, '(';
my $end = rindex $_, ')';
my $s = substr $_, $start, ($end - $start);
say "Read: $_";
say "Extracted: $s";
}
__END__
(me)
(mine)
((me) and (you))
A non-regex way with chop() and reverse()
$string='((me) and (you))';
chop($string);
$string = reverse($string);
chop($string);
$string = reverse($string);
print $string;
Output:
(me) and (you)
DEMO: http://tpcg.io/MhaLed

Pattern match any alphanumeric text before - in the string

I try to get any alphanumeric word or text in a string before the negative sign - for example:
earth-green, random-stuff, coffee-stuff, another-tag
I try to match earth random coffee another
I tried the following regex:
(\w*[^][\-])
However, it matches earth- random- coffee- another-
This DEMO shows the situation. there you may notice that earth- random- coffee- another- are highlighted while I don't want include the negative sign - in the highlighting.
This is a good example to use positive look ahead regex.
You can use a regex like this:
(\w+)(?=-)
Working demo
On the other hand, the problem in your regex was that you were putting the hypen and ^ within the capturing group:
(\w*[^][\-])
^---^---- Here (btw... you don't need [^])
You had to use this one instead
(\w+)-
Working demo
You can just add a word boundary and - to bookmark what you want:
\b(\w+)-
Demo
>>> x = 'earth-green, random-stuff, coffee-stuff, another-tag'
>>> re.compile('(\w+)-\w+').findall(x)
['earth', 'random', 'coffee', 'another']
>>>
A lot of good examples with diverse use cases regex howtos
You can match it like this.
my $string = "earth-green, random-stuff, coffee-stuff, another-tag";
while ($string =~ m/[\w]*-/g)
{
my $temp = $&;
$temp =~s/-//;
print "$temp\n";
}
Hope this helps.

regular expression to match strings with decimals

I'm trying to create a regex which will do the following:
Name description: "QUARTERLY PATCH FOR XAQE (JUL 2013 - 11.2.0.3.20) : (125546467)"
Val version : 11.2.0.3.4
In order to output:
"Name, 11.2.0.3.20"
"Val, 11.2.0.3.4"
I have created the following regex: /^([\w]+).*([\d\.\d]+).*/, but it is only matching the last number in the 2nd group, i.e. in 11.2.0.3.4 it will only match 4. Could anyone help?
Also, there could be more than the two lines given above, so it needs to account for arbitrary lines where the version number could be anywhere in the line.
You can use a one-liner for this as well:
perl -lne '/(\w+).*?(\d+(\.\d+)+)/; print "$1, $2"' <filename>
__END__
Name, 11.2.0.3.20
Val, 11.2.0.3.4
If you are only planning for the output and not doing any processing over the captured groups, then this will do:
$str =~ s/([\n\r]|^)(Name|Val).*?(\d+(\.\d+)+).*/$1"$2, $3"/g;
Your problem is that .* is greedy and will consume as much as it can whilst the pattern still matches. One solution is to make is lazy .*?
Also [\d\.\d]+ means match one of \d, \. and \d, so it's the same as [\d.]+ which isn't what you want since it would match "2013" in the first line. \d+(\.\d+)+ is more suitable.
After those 2 changes you have:
^([\w]+).*?(\d+(\.\d+)+).*
RegExr

Perl search and replace the last character occurrence

I have what I thought would be an easy problem to solve but I am not able to find the answer to this.
How can I find and replace the last occurrence of a character in a string?
I have a string: GE1/0/1 and I would like it to be: GE1/0:1 <- This can be variable length so no substrings please.
Clarification:
I am looking to replace the last / with a : no matter what comes before or after it.
use strict;
use warnings;
my $a = 'GE1/0/1';
(my $b = $a) =~ s{(.*)/}{$1:}xms;
print "$b\n";
I use the greedy behaviour of .*
Perhaps I have not understand the problem with variable length, but I would do the following :
You can match what you want with the regex :
(.+)/
So, this Perl script
my $text = 'GE1/0/1';
$text =~ s|(.+)/|$1:|;
print 'Result : '.$text;
will output :
Result : GE1/0:1
The '+' quantifier being 'greedy' by default, it will match only the last slash character.
Hope this is what you were asking.
This finds a slash and looks ahead to make sure there are no more slashes past it.:
Raw regex:
/(?=[^/]*$)
I think the code would look something like this, but perl isn't my language:
$string =~ s!/(?=[^/]*$)!\:!g;
"last occurrence in a string" is slightly ambiguous. The way I see it, you can mean either:
"Foo: 123, yada: GE1/0/1, Bar: null"
Meaning the last occurrence in the "word" GE1/0/1, or:
"GE1/0/1"
As a complete string.
In the latter case, it is a rather simple matter, you only have to decide how specific you can be in your regex.
$str =~ s{/(\d+)$}{:$1};
Is perfectly fine, assuming the last character(s) can only be digits.
In the former case, which I don't think you are referring to, but I'll include anyway, you'd need to be much more specific:
$str =~ s{(\byada:\s+\w+/\w+)/(\w+\b)}{$1:$2};

How to return the first five digits using Regular Expressions

How do I return the first 5 digits of a string of characters in Regular Expressions?
For example, if I have the following text as input:
15203 Main Street
Apartment 3 63110
How can I return just "15203".
I am using C#.
This isn't really the kind of problem that's ideally solved by a single-regex approach -- the regex language just isn't especially meant for it. Assuming you're writing code in a real language (and not some ill-conceived embedded use of regex), you could do perhaps (examples in perl)
# Capture all the digits into an array
my #digits = $str =~ /(\d)/g;
# Then take the first five and put them back into a string
my $first_five_digits = join "", #digits[0..4];
or
# Copy the string, removing all non-digits
(my $digits = $str) =~ tr/0-9//cd;
# And cut off all but the first five
$first_five_digits = substr $digits, 0, 5;
If for some reason you really are stuck doing a single match, and you have access to the capture buffers and a way to put them back together, then wdebeaum's suggestion works just fine, but I have a hard time imagining a situation where you can do all that, but don't have access to other language facilities :)
it would depend on your flavor of Regex and coding language (C#, PERL, etc.) but in C# you'd do something like
string rX = #"\D+";
Regex.replace(input, rX, "");
return input.SubString(0, 5);
Note: I'm not sure about that Regex match (others here may have a better one), but basically since Regex itself doesn't "replace" anything, only match patterns, you'd have to look for any non-digit characters; once you'd matched that, you'd need to replace it with your languages version of the empty string (string.Empty or "" in C#), and then grab the first 5 characters of the resulting string.
You could capture each digit separately and put them together afterwards, e.g. in Perl:
$str =~ /(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)/;
$digits = $1 . $2 . $3 . $4 . $5;
I don't think a regular expression is the best tool for what you want.
Regular expressions are to match patterns... the pattern you are looking for is "a(ny) digit"
Your logic external to the pattern is "five matches".
Thus, you either want to loop over the first five digit matches, or capture five digits and merge them together.
But look at that Perl example -- that's not one pattern -- it's one pattern repeated five times.
Can you do this via a regular expression? Just like parsing XML -- you probably could, but it's not the right tool.
Not sure this is best solved by regular expressions since they are used for string matching and usually not for string manipulation (in my experience).
However, you could make a call to:
strInput = Regex.Replace(strInput, "\D+", "");
to remove all non number characters and then just return the first 5 characters.
If you are wanting just a straight regex expression which does all this for you I am not sure it exists without using the regex class in a similar way as above.
A different approach -
#copy over
$temp = $str;
#Remove non-numbers
$temp =~ s/\D//;
#Get the first 5 numbers, exactly.
$temp =~ /\d{5}/;
#Grab the match- ASSUMES that there will be a match.
$first_digits = $1
result =~ s/^(\d{5}).*/$1/
Replace any text starting with a digit 0-9 (\d) exactly 5 of them {5} with any number of anything after it '.*' with $1, which is the what is contained within the (), that is the first five digits.
if you want any first 5 characters.
result =~ s/^(.{5}).*/$1/
Use whatever programming language you are using to evaluate this.
ie.
regex.replace(text, "^(.{5}).*", "$1");