Cycle through regex and replace found instance - regex

I currently have the following:
# Pre-append "$" to variable names.
# ['"](?:[^'"]*?(?:\\")*)*["'] Matches strings within double or single quotes.
# (*SKIP)(*F) Causes the preceding pattern to fail. Tries to match the pattern on the right side of the | operator using the remaining strings.
my $temp = $entire_line;
while ($temp =~ /['"](?:[^'"]*?(?:\\")*)*["'](*SKIP)(*F)|([A-Za-z0-9_]+)/g){
my $variable_name = $1;
$entire_line =~ s/$variable_name/\$$variable_name/;
}
Given $entire_line = ((factor0 + factor1) * factor2) + factor0
I would like my output to be:
(($factor0 + $factor1) * $factor2) + $factor0
However, I'm getting:
(($$factor0 + $factor1) * $factor2) + factor0
I know this is happening because it is finding the first instance offactor0 twice. Is there a good way to prevent this from happening and replace the instance that is being found?
Also do I need to use the $temp variable?
Thanks for your help.

(\w+)
Use this.Replace with $$1.
See dmeo.
http://regex101.com/r/qC9cH4/17

The long regex is not finding the first factor0 twice. It's the simple regex in the substitution that does. In order to get that to work, you need to make sure it doesn't find the ones that start with a $.
$entire_line =~ s/([^\$])$variable_name/$1\$$variable_name/;
You can just use $entire_line with that solution and get rid of $temp, but it's very confusing in general. If this is production code, I suggest you add comments to the code and also to the regex by using the /x flag. Your future self will thank you later.
Check your regex here: http://regex101.com/r/vX0aJ9/1

Related

Make a regular expression in perl to grep value work on a string with different endings

I have this code in perl where I want to extract the value of 'EUR_AF', in this case '0.39'.
Sometimes 'EUR_AF' ends with ';', sometimes it doesn't.
Alternatively, 'EUR_AF' may end with '=0' instead of '=0.39;' or '=0.39'.
How do I make the code handle that? Can't seem to find it online...I could of course wrap everything in an almost endless if-elsif-else statement, but that seems overkill.
Example text:
AVGPOST=0.9092;AN=2184;RSQ=0.5988;ERATE=0.0081;AC=144;VT=SNP;THETA=0.0045;AA=A;SNPSOURCE=LOWCOV;LDAF=0.0959;AF=0.07;ASN_AF=0.05;AMR_AF=0.10;AFR_AF=0.11;EUR_AF=0.039
Code: $INFO =~ m/\;EUR\_AF\=(.*?)(;)/
I did find that: $INFO =~ m/\;EUR\_AF\=(.*?0)/ handles the cases of EUR_AF=0, but how to handle alternative scenarios efficiently?
Extract one value:
my ($eur_af) = $s =~ /(?:^|;)EUR_AF=([^;]*)/;
my ($eur_af) = ";$s" =~ /;EUR_AF=([^;]*)/;
Extract all values:
my %rec = split(/[=;]/, $s);
my $eur_af = $rec{EUR_AF};
This regex should work for you: (?<=EUR_AF=)\d+(\.\d+)?
It means
(?<=EUR_AF=) - look for a string preceeded by EUR_AF=
\d+(\.\d+)? - consist of a digit, optionally a decimal digit
EDIT: I originally wanted the whole regex to return the correct result, not only the capture group. If you want the correct capture group edit it to (?<=EUR_AF=)(\d+(?:\.\d+)?)
I have found the answer. The code:
$INFO =~ m/(?:^|;)EUR_AF=([^;]*)/
seems to handle the cases where EUR_AF=0 and EUR_AF=0.39, ending with or without ;. The resulting $INFO will be 0 or 0.39.

Powershell regex replace only first hit

I'm trying to use a regular expression to replace the first character after a single hit, while using PowerShell.
No matter how I try, I can't seem to make it work. Here's what I'm talking about:
Code:
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$regex = [REGEX]'/*'
$regex.Replace($info,"/C",1)
$regex
Output:
/CAB/F/*ZXCVBN/MTF/ ---
I'm simply trying to replace the /F in the expression with /C, but it fails every time.
I'm using /* since I don't really know what character will I find after the first / but that's what I want to replace in the end of the day.
I pretty sure this will be pretty simple but, as you can see, I'm, just not familiar enough with regular expressions.
Ok, rather than just a comment I guess I'll add an answer. You can use a negative lookbehind to make sure that there are no /'s before what you are matching, so it will only match the first one. Also, as Noah stated the * is not a wildcard, . is. This will match any / plus 1 character that does not have another / anywhere before it in the string:
"(?<!/.*)/."
So in context to your code, it would look like this:
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$regex = [REGEX]"(?<!/)/."
$regex.Replace($info,"/C",1)
Those lines will output:
AB/C/*ZXCVBN/MTF/ ---
Edit: RegEx broken down at RegEx101: http://regex101.com/r/tI7oN1/1
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$regex = [REGEX]'^([^/]*)/[a-zA-Z]'
$regex.Replace($info,"$1/C",1)
$regex
^([^/]*) - this looks for anything but slashes at the beginning, captured in a group
/[a-zA-Z] - then a slash followed by a letter
The replacement puts back whatever was matched by the first group, and adds /C
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$regex = [REGEX]'/.'
$regex.Replace($info,"/C",1)
Or simply:
$info -replace '^(.*?)/.','$1/C'
If you don't want to use the regex, you can accomplish the same thing with -split and -join:
$info = 'AB/F/*ZXCVBN/MTF/ ---'
$info -split '/.',2 -join '/C'
AB/C/*ZXCVBN/MTF/ ---
The ,2 will stop the split after the first match (2 elements). Then re-join the elements with /C.
You are misunderstanding * in regex. It is not a wildcard character.
The star in regex means capture 0 or more of the preceding items in the expression.
Wildcard in regex is actually a period.

Pattern matching in Perl

I am doing pattern match for some names below:
ABCD123_HH1
ABCD123_HH1_K
Now, my code to grep above names is below:
($name, $kind) = $dirname =~ /ABCD(\d+)\w*_([\w\d]+)/;
Now, problem I am facing is that I get both the patterns that is ABCD123_HH1, ABCD123_HH1_K in $dirname. However, my variable $kind doesn't take this ABCD123_HH1_K. It does take ABCD123_HH1 pattern.
Appreciate your time. Could you please tell me what can be done to get pattern with _k.
You need to add the _K part to the end of your regex and make it optional with ?:
/ABCD(\d+)_([\w\d]+(_K)?)/
I also erased the \w*, which is useless and keeps you from correctly getting the HH1_K.
You should check for zero or more occurrences of _K.
* in Perl's regexp means zero or more times
+ means atleast one or more times.
Hence in your regexp, append (_K)*.
Finally, your regexp should be this:
/ABCD(\d+)\w*_([\w\d]+(_K)*)/
\w includes letters, numbers as well as underscores.
So you can use something as simple as this:
/ABCD\w+/

remove up to _ in perl using regex?

How would I go about removing all characters before a "_" in perl? So if I had a string that was "124312412_hithere" it would replace the string as just "hithere". I imagine there is a very simple way to do this using regex, but I am still new dealing with that so I need help here.
Remove all characters up to and including "_":
s/^[^_]*_//;
Remove all characters before "_":
s/^[^_]*(?=_)//;
Remove all characters before "_" (assuming the presence of a "_"):
s/^[^_]*//;
This is a bit more verbose than it needs to be, but would be probably more valuable for you to see what's going on:
my $astring = "124312412_hithere";
my $find = "^[^_]*_";
my $replace = "_";
$astring =~ s/$find/$replace/;
print $astring;
Also, there's a bit of conflicting requirements in your question. If you just want hithere (without the leading _), then change it to:
$astring =~ s/$find//;
I know it's slightly different than what was asked, but in cases like this (where you KNOW the character you are looking for exists in the string) I prefer to use split:
$str = '124312412_hithere';
$str = (split (/_/, $str, 2))[1];
Here I am splitting the string into parts, using the '_' as a delimiter, but to a maximum of 2 parts. Then, I am assigning the second part back to $str.
There's still a regex in this solution (the /_/) but I think this is a much simpler solution to read and understand than regexes full of character classes, conditional matches, etc.
You can try out this: -
$_ = "124312412_hithere";
s/^[^_]*_//;
print $_; # hithere
Note that this will also remove the _(as I infer from your sample output). If you want to keep the _ (as it seems doubtful what you want as per your first statement), you would probably need to use look-ahead as in #ikegami's answer.
Also, just to make it little more clear, any substitution and matching in regex is applied by default on $_. So, you don't need to bind it to $_ explicitly. That is implied.
So, s/^[^_]*_//; is essentially same as - $_ =~ s/^[^_]*_//;, but later one is not really required.

Regular expression using powershell

Here's is the scenario, i have these lines mentioned below i wanted to extract only the middle character in between two dots.
"scvmm.new.resources" --> This after an regular expression match should return only "new"
"sc.new1.rerces" --> This after an regular expression match should return only "new1"
What my basic requirement was to exract anything between two dots anything can come in prefix and suffix
(.*).<required code>.(.*)
Could anyone please help me out??
You can do that without using regex. Split the string on '.' and grab the middle element:
PS> "scvmm.new.resources".Split('.')[1]
new
Or this
'scvmm.new.resources' -replace '.*\.(.*)\..*', '$1'
Like this:
([regex]::Match("scvmm.new1.resources", '(?<=\.)([^\.]*)(?=\.)' )).value
You don't actually need regular expressions for such a trivial substring extraction. Like Shay's Split('.') one can use IndexOf() for similar effect like so,
$s = "scvmm.new.resources"
$l = $s.IndexOf(".")+1
$r = $s.IndexOf(".", $l)
$s.Substring($l, $r-$l) # Prints new
$s = "sc.new1.rerces"
$l = $s.IndexOf(".")+1
$r = $s.IndexOf(".", $l)
$s.Substring($l, $r-$l) # Prints new1
This looks the first occurence of a dot. Then it looks for first occurense of a dot after the first hit. Then it extracts the characters between the two locations. This is useful in, say, scenarios in which the separation characters are not the same (though the Split() way would work in many cases too).