Regex match and replace word delimited by certain characters - regex

I need a little help with regex to match and replace
<comma|dash|fullstop|questionmark|exclamation mark|space|start-of-string>WORD<comma|dash|fullstop|space|end-of-string>
I need to find a specific WORD (case insensitive) which
is preceded by: comma or dash or fullstop or question mark or exclamation mark or space or start-of-string
and is followed by: comma or dash or fullstop or question mark or exclamation mark or space or end-of-string
test string:
MATCH me, yes please,MATCH me but dontMATCHme!MATCH me and of course MATCH, and finally MATCH
I want to REPLACE all matches with another string in PHP, so i possibly need to use preg_replace or something?

Try this
$input = "MATCH me, yes please,MATCH me but dontMATCHme!MATCH me and of course MATCH, and finally MATCH";
echo($input."<br/>");
$result = preg_replace("/
(?:^ # Match the start of the string
|(?<=[-,.?! ])) # OR look if there is one of these chars before
match # The searched word
(?=([-,.?! ]|$)) # look that one of those chars or the end of the line is following
/imx", # Case independent, multiline and extended
"WORD", $input);
echo($result);

This is not doing exactly what you asked for, but possibly fulfills your actual requirements better (which I'm guessing to be "Replace MATCH with WORD only if MATCH is an entire word, and not part of a different word"):
$input = 'MATCH me, yes please,MATCH me but dontMATCHme!MATCH me and of course MATCH, and finally MATCH'
$result = preg_replace('/\bMATCH\b/i', "WORD", $input)
The \b are word boundary anchors that only match at the start or end of an alphanumeric word.
Result:
WORD me, yes please,WORD me but dontMATCHme!WORD me and of course WORD, and finally WORD

Here is an implementation in PHP that will do the task you described. It will replace all words with "WORD".
<?php
$msg = "MATCH me, yes please,MATCH me but dontMATCHme!MATCH me and of course MATCH, and finally MATCH";
echo($msg."<br/><br/>");
$msg = preg_replace("/(\w)+/", "WORD", $msg);
echo($msg."<br/>");
?>

Related

My regexp has anorexia

I'm trying to get multiple key/value pairs from a string where the keys is on the left of an = character and the value on the right. So the following code
$line = <<END;
names='bob,jane, Alexander the Great' colors = "red,green" test= %results
END
my %hash = ($line =~ m/(\w+)\s*=\s*(.+?)/g);
for (keys %hash) { print "$_: $hash{$_}\n"; }
Should output
names: 'bob,jane, Alexander the Great'
colors: "red,green"
test: %results
But my regexp is just returning the first character of the value like
names: '
colors: "
and so on. If I change the second match to (.+) then it matches the whole line after the first =. Can someone fix this regexp?
Because .+? is non-greedy which stops once it finds a match since you're not giving any regex pattern next to non-greedy form.
my %hash = ($line =~ m/(\w+)\s*=\s*(.+?)(?=\h+\w+\h*=|$)/gm);
DEMO
(?=\h+\w+\h*=|$) called positive lookahead which asserts that the match must be followed by
\h+ one or more horizontal spaces.
\w+ one or more word characters.
\h* zero or more horizontal spaces.
= equal symbol.
| OR
$ End of the line anchor.
.+? says match one or more non-newline characters, preferring as few as possible.
You want .+ which matches one or more non-newline characters, preferring as many as possible.
Then it looks like you also need to stop at a matching quote, so
/(\w+)\s*=\s*('.+?'|".+?"|.+)/g
Though if spaces aren't allowed in unquoted values, you want ´\S+´ instead of ´.+´

What is the regex to match exactly an alphanumeric 16 character string?

Here is a regex string I need to use but I only want it to match exactly 16 alphanumeric characters not the 16 within a longer string.
[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}
Its matches this: PLDTLL47S04L424T and MRTMTT25D09F205Z perfectly But what i dont want it to match is something like this in bold thats in middle of this long string:
FA4127E57FE52E49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
Thanks in advance!
You didn't say which regex flavor you're using, but the issue is that you're missing start and end anchors.
Add ^ and $ to your regex as such:
^[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}$
^ means match at the start of a string, or the point after any newline in multiline mode.
$ means the opposite: the end of a string, or the point before the newline in multiline mode.
In addition to my predecessors:
assuming that you want to match if and only if the line starts with something that matches your pattern, both anchor ^ and word boundary \b will do.
Ending the pattern with anchor $ and/or \b is, however, - taken into account the assumption that a line starting with something that matches, NOT correct.
See some example code:
#!/usr/bin/perl -w
my #tests = qw/
AAAAAA00A00AAAAA49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
0AAAAAA00A00AAAAA49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
/;
foreach my $test (#tests){
if ( $test =~ /^([A-Z]{6}[0-9]{2}[A-EHLMPR-T][0-9]{2}[A-Z0-9]{5})/ ) {
print "$1 matches\n";
} else {
print "NO MATCH\n";
}
}
generates output:
marc:tmp marc$ perl test.pl
AAAAAA00A00AAAAA matches
NO MATCH
if you change the pattern to
if ( $test =~ /^([A-Z]{6}[0-9]{2}[A-EHLMPR-T][0-9]{2}[A-Z0-9]{5}$)/ ) {
the result is:
marc:tmp marc$ perl test.pl
NO MATCH
NO MATCH
You can use Boundry Matchers to match the beginning and endings of lines, strings, words or other things. What is available depends on your flavour of regex. The start and end of string/input matchers are pretty universal.
^[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}$
Again depending on the flavour of regex you are using you can also POSIX character classes to match alpha numerics with \p{Alpha} and \p{Digit}. This will simplfy your regex a bit.
You should use ^ and $ to bound the regex
You can use word boundaries \b for this purpose:
\b[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}\b
^ ^
Edit: Word boundaries and not start ^ and end $ anchors because I am assuming you just want to avoid matches as a substring and your patterns are more like your sample string but with spaces
You may try this regex: ^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9]+){16}$

Need help in matching regexp

I am having a string say
my $str = "FILLER-1-1,EQPT:MN,EQPT_MISSING,NSA,04-30,15-07-13,NEND,NA";
I want to match a pattern say
my $pattern = "FILLER-1-1";
I am using the below regexp
$reg = $str =~ /$pattern/;
This is working fine
Now the problem is it is also matching if our string is
FILLER-1-10/FILLER-1-11/FILLER-1-12 so on ...
I dont want to match this. Also I don't want my regexp to be like
$reg = $str =~ /$pattern\W+/;
This one is working against the above mentioned issue but \W may come or not come. In some strings it can come while in other it may not come. So i need the regexp to match only FILLER-1-1 without using \W+ and it should match specifically FILLER-1-10
Note: If somebody is doing -(minus) rating to my question, please let me know what's wrong in the code. It will be appreciable if the person write the comment too
As \w matches [a-zA-Z0-9], you can use the zero-width assumption \b, which denotes a change in \w state (called a "word boundary", hence the "b" shortcut):
/FILLER-1-1\b/
This means that there needs to be a character that differs from the previous word state - a word state change.
It will match
FILLER-1-1.
FILLER-1-1&
FILLER-1-1,
It will not match
FILLER-1-1a
FILLER-1-16
Read more about it here.
If you want to match FILLER at the start of the input (line) followed by two numbers, this simple regex should work:
/~FILLER-\d+-\d+/
~ matches the beginning of the input
\d matches any digit ([0-9])
+ matches at least one, but can match any number
use ? quantifier like so:
/FILLER-\d-\d\W?/
The \W? means not a word zero or one time

Problems with perl regex

I need a perl regex to match A.CC3 on a line begining with something followed by anything then, my 'A.CC3 " and then anything...
I am surprised this (text =~ /^\W+\CC.*\A\.CC\[3].*/) is not working
Thanks
\A is an escape sequence that denotes beginning of line, or ^ like in the beginning of your regex. Remove the backslash to make it match a literal A.
Edit: You also seem to have \C in there. You should only use backslash to escape meta characters such as period ., or to create escape sequences, such as \Q .. \E.
At its simplest, a regex to match A.CC3 would be
$text =~ /A\.CC3/
That's all you need. This will match any string with A.CC3 in it. In the comments you mention the string you are matching is this:
my $text = "//%CC Unused Static Globals, A.CC3, Halstead Progam Volume";
You might want to avoid partial matches, in which case you can use word boundary \b
$text =~ /\bA\.CC3\b/
You might require that a line begins with //%
$text =~ m#^//%.*\bA\.CC3\b#
Of course, only you know which parts of the string should be matched and in what way. "Something followed by anything followed by A.CC3 followed by anything" really just needs the first simple regex.
It doesn't seem like you're trying to capture anything. If that's the case, and all you need to do is find lines that contain A.CC3 then you can simply do
if ( index( $str, 'A.CC3' ) >= 0 ) # Found it...
No need for a regex.
Try to give this a shot:
^.*?A\.CC.*$
That will match anything until it reaches A, then a literal ., followed by CC, then anything until end of string.
It depends what you want to match. If you want to pull back the whole line in which the A.CC3 pattern occurs then something like this should work:
^.*A\.CC3.*$

Ignoring Whitespace with Regex(perl)

I am using Perl Regular expressions.
How would i go about ignoring white space and still perform a test to see if a string match.
For example.
$var = " hello "; #I want var to igonore whitespace and still match
if($var =~ m/hello/)
{
}
what you have there should match just fine. the regex will match any occurance of the pattern hello, so as long as it sees "hello" somewhere in $var it will match
On the other hand, if you want to be strict about what you ignore, you should anchor your string from start to end
if($var =~ m/^\s*hello\s*$/) {
}
and if you have multiple words in your pattern
if($var =~ m/^\s*hello\s+world\s*$/) {
}
\s* matches 0 or more whitespace, \s+ matches 1 or more white space. ^ matches the beginning of a line, and $ matches the end of a line.
As other have said, Perl matches anywhere in the string, not the whole string. I found this confusing when I first started and I still get caught out. I try to teach myself to think about whether I need to look at the start of the line / whole string etc.
Another useful tip is use \b. This looks for word breaks so /\bbook\b/ matches
"book. "
"book "
"-book"
but not
"booking"
"ebook"
This regex is a little unrelated but if you wanted to concatenate all of the whitespaces from your string before passing it through the if.
s/[\h\v]+/ /g;
/^\shello\s$/