What is the regex to match exactly an alphanumeric 16 character string? - regex

Here is a regex string I need to use but I only want it to match exactly 16 alphanumeric characters not the 16 within a longer string.
[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}
Its matches this: PLDTLL47S04L424T and MRTMTT25D09F205Z perfectly But what i dont want it to match is something like this in bold thats in middle of this long string:
FA4127E57FE52E49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
Thanks in advance!

You didn't say which regex flavor you're using, but the issue is that you're missing start and end anchors.
Add ^ and $ to your regex as such:
^[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}$
^ means match at the start of a string, or the point after any newline in multiline mode.
$ means the opposite: the end of a string, or the point before the newline in multiline mode.

In addition to my predecessors:
assuming that you want to match if and only if the line starts with something that matches your pattern, both anchor ^ and word boundary \b will do.
Ending the pattern with anchor $ and/or \b is, however, - taken into account the assumption that a line starting with something that matches, NOT correct.
See some example code:
#!/usr/bin/perl -w
my #tests = qw/
AAAAAA00A00AAAAA49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
0AAAAAA00A00AAAAA49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
/;
foreach my $test (#tests){
if ( $test =~ /^([A-Z]{6}[0-9]{2}[A-EHLMPR-T][0-9]{2}[A-Z0-9]{5})/ ) {
print "$1 matches\n";
} else {
print "NO MATCH\n";
}
}
generates output:
marc:tmp marc$ perl test.pl
AAAAAA00A00AAAAA matches
NO MATCH
if you change the pattern to
if ( $test =~ /^([A-Z]{6}[0-9]{2}[A-EHLMPR-T][0-9]{2}[A-Z0-9]{5}$)/ ) {
the result is:
marc:tmp marc$ perl test.pl
NO MATCH
NO MATCH

You can use Boundry Matchers to match the beginning and endings of lines, strings, words or other things. What is available depends on your flavour of regex. The start and end of string/input matchers are pretty universal.
^[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}$
Again depending on the flavour of regex you are using you can also POSIX character classes to match alpha numerics with \p{Alpha} and \p{Digit}. This will simplfy your regex a bit.

You should use ^ and $ to bound the regex

You can use word boundaries \b for this purpose:
\b[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}\b
^ ^
Edit: Word boundaries and not start ^ and end $ anchors because I am assuming you just want to avoid matches as a substring and your patterns are more like your sample string but with spaces

You may try this regex: ^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9]+){16}$

Related

Multibyte String replacement (Only if completely String) [duplicate]

im kinda strumped in a situation where i need to match a whole string with a regular expression rather than finding if the pattern exists in the string.
suppose if i have a regular expression
/\\^^\\w+\\$^/
what i want is that the code will run through various strings , compare the strings with the regular expression and perform some task if the strings start and end with a ^.
Examples
^hello world^ is a match
my ^hello world^ should not be a match
the php function preg_match matches both of the results
any clues ???
Anchor the ends.
/^...$/
Here is a way to do the job:
$strs = array('^hello world^', 'my ^hello world^');
foreach($strs as $str) {
echo $str, preg_match('/^\^.*\^$/', $str) ? "\tmatch\n" : "\tdoesn't match\n";
}
Output:
^hello world^ match
my ^hello world^ doesn't match
Actually, ^\^\w+\^$ will not match "^hello world^" because you have two words there; the regex is only looking for a single word enclosed by "^"s.
What you are looking for is: ^\^.*\^$
This will match "^^", "^hello world^", "^a very long string of characters^", etc. while not matching "hello ^world^".
You can use the regex:
^\^[\w\s]+\^$
^ is a regex meta-character which is used as start anchor. To match a literal ^ you need to escape it as \^.
So we have:
^ : Start anchor
\^: A literal ^
[\w\s]+ : space separated words.
\^: A literal ^
$ : End anchor.
Ideone Link
Another pattern is: ^\^[^\^]*\^$ if you want match "^hello world^" and not "hello ^world^" , while \^[^\^]*\^ if you want match "^hello world^" and world in the "hello ^world^" string.
For Will: ^\^.*\^$ this match also "^hello^wo^rld^" i think isn't correct.
Try
/^\^\s*(\w+\s*)+\^$/

Problems with perl regex

I need a perl regex to match A.CC3 on a line begining with something followed by anything then, my 'A.CC3 " and then anything...
I am surprised this (text =~ /^\W+\CC.*\A\.CC\[3].*/) is not working
Thanks
\A is an escape sequence that denotes beginning of line, or ^ like in the beginning of your regex. Remove the backslash to make it match a literal A.
Edit: You also seem to have \C in there. You should only use backslash to escape meta characters such as period ., or to create escape sequences, such as \Q .. \E.
At its simplest, a regex to match A.CC3 would be
$text =~ /A\.CC3/
That's all you need. This will match any string with A.CC3 in it. In the comments you mention the string you are matching is this:
my $text = "//%CC Unused Static Globals, A.CC3, Halstead Progam Volume";
You might want to avoid partial matches, in which case you can use word boundary \b
$text =~ /\bA\.CC3\b/
You might require that a line begins with //%
$text =~ m#^//%.*\bA\.CC3\b#
Of course, only you know which parts of the string should be matched and in what way. "Something followed by anything followed by A.CC3 followed by anything" really just needs the first simple regex.
It doesn't seem like you're trying to capture anything. If that's the case, and all you need to do is find lines that contain A.CC3 then you can simply do
if ( index( $str, 'A.CC3' ) >= 0 ) # Found it...
No need for a regex.
Try to give this a shot:
^.*?A\.CC.*$
That will match anything until it reaches A, then a literal ., followed by CC, then anything until end of string.
It depends what you want to match. If you want to pull back the whole line in which the A.CC3 pattern occurs then something like this should work:
^.*A\.CC3.*$

Replace specific capture group instead of entire regex in Perl

I've got a regular expression with capture groups that matches what I want in a broader context. I then take capture group $1 and use it for my needs. That's easy.
But how to use capture groups with s/// when I just want to replace the content of $1, not the entire regex, with my replacement?
For instance, if I do:
$str =~ s/prefix (something) suffix/42/
prefix and suffix are removed. Instead, I would like something to be replaced by 42, while keeping prefix and suffix intact.
As I understand, you can use look-ahead or look-behind that don't consume characters. Or save data in groups and only remove what you are looking for. Examples:
With look-ahead:
s/your_text(?=ahead_text)//;
Grouping data:
s/(your_text)(ahead_text)/$2/;
If you only need to replace one capture then using #LAST_MATCH_START and #LAST_MATCH_END (with use English; see perldoc perlvar) together with substr might be a viable choice:
use English qw(-no_match_vars);
$your_string =~ m/aaa (bbb) ccc/;
substr $your_string, $LAST_MATCH_START[1], $LAST_MATCH_END[1] - $LAST_MATCH_START[1], "new content";
# replaces "bbb" with "new content"
This is an old question but I found the below easier for replacing lines that start with >something to >something_else. Good for changing the headers for fasta sequences
while ($filelines=~ />(.*)\s/g){
unless ($1 =~ /else/i){
$filelines =~ s/($1)/$1\_else/;
}
}
I use something like this:
s/(?<=prefix)(group)(?=suffix)/$1 =~ s|text|rep|gr/e;
Example:
In the following text I want to normalize the whitespace but only after ::=:
some text := a b c d e ;
Which can be achieved with:
s/(?<=::=)(.*)/$1 =~ s|\s+| |gr/e
Results with:
some text := a b c d e ;
Explanation:
(?<=::=): Look-behind assertion to match ::=
(.*): Everything after ::=
$1 =~ s|\s+| |gr: With the captured group normalize whitespace. Note the r modifier which makes sure not to attempt to modify $1 which is read-only. Use a different sub delimiter (|) to not terminate the replacement expression.
/e: Treat the replacement text as a perl expression.
Use lookaround assertions. Quoting the documentation:
Lookaround assertions are zero-width patterns which match a specific pattern without including it in $&. Positive assertions match when their subpattern matches, negative assertions match when their subpattern fails. Lookbehind matches text up to the current match position, lookahead matches text following the current match position.
If the beginning of the string has a fixed length, you can thus do:
s/(?<=prefix)(your capture)(?=suffix)/$1/
However, ?<= does not work for variable length patterns (starting from Perl 5.30, it accepts variable length patterns whose length is smaller than 255 characters, which enables the use of |, but still prevents the use of *). The work-around is to use \K instead of (?<=):
s/.*prefix\K(your capture)(?=suffix)/$1/

Ignoring Whitespace with Regex(perl)

I am using Perl Regular expressions.
How would i go about ignoring white space and still perform a test to see if a string match.
For example.
$var = " hello "; #I want var to igonore whitespace and still match
if($var =~ m/hello/)
{
}
what you have there should match just fine. the regex will match any occurance of the pattern hello, so as long as it sees "hello" somewhere in $var it will match
On the other hand, if you want to be strict about what you ignore, you should anchor your string from start to end
if($var =~ m/^\s*hello\s*$/) {
}
and if you have multiple words in your pattern
if($var =~ m/^\s*hello\s+world\s*$/) {
}
\s* matches 0 or more whitespace, \s+ matches 1 or more white space. ^ matches the beginning of a line, and $ matches the end of a line.
As other have said, Perl matches anywhere in the string, not the whole string. I found this confusing when I first started and I still get caught out. I try to teach myself to think about whether I need to look at the start of the line / whole string etc.
Another useful tip is use \b. This looks for word breaks so /\bbook\b/ matches
"book. "
"book "
"-book"
but not
"booking"
"ebook"
This regex is a little unrelated but if you wanted to concatenate all of the whitespaces from your string before passing it through the if.
s/[\h\v]+/ /g;
/^\shello\s$/

regex to check string is certain length

I am trying to write a regex to match pairs of cards (AA, KK, QQ ... 22) and I have the regex ([AKQJT2-9])\1. The problem I have is that this regex will match AA as well as AAbc etc. Is there a way to write the regex such that I can specify I want to match ([AKQJT2-9])\1 and only that (i.e. no more characters after).
Enclose the regex in ^ and $:
^([AKQJT2-9])\1$
^ is the "start-of-string" anchor, and $ is the "end-of-string" anchor. If your regex flavor supports it, \A and \Z might be an even better choice since ^ and $ can also match start/end of a line in a multiline string, depending on your regex engine and configuration.
You mean, like this ?
^([AKQJT2-9])\1$
It will only match if the string is "AA", "KK", …
If you want to capture both characters, but not the rest of the string, you'll have to use another parenthesis
($match,$unused) = $string ~= (([AKQJT2-9])\2); # in perl