Multibyte String replacement (Only if completely String) [duplicate] - regex

im kinda strumped in a situation where i need to match a whole string with a regular expression rather than finding if the pattern exists in the string.
suppose if i have a regular expression
/\\^^\\w+\\$^/
what i want is that the code will run through various strings , compare the strings with the regular expression and perform some task if the strings start and end with a ^.
Examples
^hello world^ is a match
my ^hello world^ should not be a match
the php function preg_match matches both of the results
any clues ???

Anchor the ends.
/^...$/

Here is a way to do the job:
$strs = array('^hello world^', 'my ^hello world^');
foreach($strs as $str) {
echo $str, preg_match('/^\^.*\^$/', $str) ? "\tmatch\n" : "\tdoesn't match\n";
}
Output:
^hello world^ match
my ^hello world^ doesn't match

Actually, ^\^\w+\^$ will not match "^hello world^" because you have two words there; the regex is only looking for a single word enclosed by "^"s.
What you are looking for is: ^\^.*\^$
This will match "^^", "^hello world^", "^a very long string of characters^", etc. while not matching "hello ^world^".

You can use the regex:
^\^[\w\s]+\^$
^ is a regex meta-character which is used as start anchor. To match a literal ^ you need to escape it as \^.
So we have:
^ : Start anchor
\^: A literal ^
[\w\s]+ : space separated words.
\^: A literal ^
$ : End anchor.
Ideone Link

Another pattern is: ^\^[^\^]*\^$ if you want match "^hello world^" and not "hello ^world^" , while \^[^\^]*\^ if you want match "^hello world^" and world in the "hello ^world^" string.
For Will: ^\^.*\^$ this match also "^hello^wo^rld^" i think isn't correct.

Try
/^\^\s*(\w+\s*)+\^$/

Related

Regex: match a string at start or after some special characters

I'm using Java Pattern class to find a string "keyword" which is at the beginning of the string or after a character that is in a list of characters. For example, the list of characters is ' ' and '<', then:
match:
"keyword..."
"...<keyword..."
"... keyword..."
not match:
"...akeyword..."
I've tried all these:
"[^ <]keyword"
"[ <^]keyword"
"[\\^ <]keyword" note:for a Java/C# string backslash need to be escaped
This question is similar Match only at string start or after whitespace but with only basic skills of Regex I can't adopt it to this problem. I'v tried:
"(?<!\\S<)keyword"
"(?<!([\\S<]))keyword"
And this seems to be a very basic problem, there may be a very easy and clear way.
This should work (^|[< ])keyword
(...|...) has ^ and [< ], stating either it should be start of string of be after char(<) or char( )
You could use an alternation | in a non capturing group (?:^|[ <]) to assert either the start of the string ^ or match a space or < in a character class and use a capturing group for keyword.
(?:^|[ <])(keyword)\b
Regex demo
Or you could use a positive lookbehind (?<=...) and match only keyword
(?<=^|[< ])keyword\b
Regex demo
(^keyword |[< ^]keyword)
Write in the square brackets the character you need.

Replace unmatched regex by given input

I have a string like below.
update comment for line OBC-1234:Message is this
I wanted to match OBC-1234:Message is this out of the above string.
Regex I used is \w*-\d+:(\w+\s?)+
The tool on which I work have only one function which replaces matched regex by some input parameter.
That means it will first match the regex from the string and will replace it by given input.
But my requirement is to replace the unmatched string by the given input.
The output should be like below
update comment for line input
I know it can be done through negation but I don't how to use it for a bigger string. Please help.
You can do:
/(.*?)(?:[A-Z\d-]+:[\w ]+)$/\1New Addition/
^ words and ' ' to end of line
^ literal :
^ character class for OBC-1234 pattern
^ Non capturing group
^ Capture to the LH of description
Demo
If the OBC-1234 is more concrete, you can do:
/(.*?)(?:[A-Z]+-\d+:[\w ]+)$/\1New Addition/
to be more specific.
Demo 2
Use:
Find: ^(.*?)\w*-\d+:\w+(?:\s+\w+)*
Replace: $1NEW STRING

What is the regex to match exactly an alphanumeric 16 character string?

Here is a regex string I need to use but I only want it to match exactly 16 alphanumeric characters not the 16 within a longer string.
[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}
Its matches this: PLDTLL47S04L424T and MRTMTT25D09F205Z perfectly But what i dont want it to match is something like this in bold thats in middle of this long string:
FA4127E57FE52E49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
Thanks in advance!
You didn't say which regex flavor you're using, but the issue is that you're missing start and end anchors.
Add ^ and $ to your regex as such:
^[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}$
^ means match at the start of a string, or the point after any newline in multiline mode.
$ means the opposite: the end of a string, or the point before the newline in multiline mode.
In addition to my predecessors:
assuming that you want to match if and only if the line starts with something that matches your pattern, both anchor ^ and word boundary \b will do.
Ending the pattern with anchor $ and/or \b is, however, - taken into account the assumption that a line starting with something that matches, NOT correct.
See some example code:
#!/usr/bin/perl -w
my #tests = qw/
AAAAAA00A00AAAAA49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
0AAAAAA00A00AAAAA49BC1FEEECC32E1246530EE1C#BL2PRD9301MB014.024d.mgd.msft.net
/;
foreach my $test (#tests){
if ( $test =~ /^([A-Z]{6}[0-9]{2}[A-EHLMPR-T][0-9]{2}[A-Z0-9]{5})/ ) {
print "$1 matches\n";
} else {
print "NO MATCH\n";
}
}
generates output:
marc:tmp marc$ perl test.pl
AAAAAA00A00AAAAA matches
NO MATCH
if you change the pattern to
if ( $test =~ /^([A-Z]{6}[0-9]{2}[A-EHLMPR-T][0-9]{2}[A-Z0-9]{5}$)/ ) {
the result is:
marc:tmp marc$ perl test.pl
NO MATCH
NO MATCH
You can use Boundry Matchers to match the beginning and endings of lines, strings, words or other things. What is available depends on your flavour of regex. The start and end of string/input matchers are pretty universal.
^[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}$
Again depending on the flavour of regex you are using you can also POSIX character classes to match alpha numerics with \p{Alpha} and \p{Digit}. This will simplfy your regex a bit.
You should use ^ and $ to bound the regex
You can use word boundaries \b for this purpose:
\b[A-Z]{6}[0-9]{2}[A-E,H,L,M,P,R-T][0-9]{2}[A-Z0-9]{5}\b
^ ^
Edit: Word boundaries and not start ^ and end $ anchors because I am assuming you just want to avoid matches as a substring and your patterns are more like your sample string but with spaces
You may try this regex: ^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9]+){16}$

Ignoring Whitespace with Regex(perl)

I am using Perl Regular expressions.
How would i go about ignoring white space and still perform a test to see if a string match.
For example.
$var = " hello "; #I want var to igonore whitespace and still match
if($var =~ m/hello/)
{
}
what you have there should match just fine. the regex will match any occurance of the pattern hello, so as long as it sees "hello" somewhere in $var it will match
On the other hand, if you want to be strict about what you ignore, you should anchor your string from start to end
if($var =~ m/^\s*hello\s*$/) {
}
and if you have multiple words in your pattern
if($var =~ m/^\s*hello\s+world\s*$/) {
}
\s* matches 0 or more whitespace, \s+ matches 1 or more white space. ^ matches the beginning of a line, and $ matches the end of a line.
As other have said, Perl matches anywhere in the string, not the whole string. I found this confusing when I first started and I still get caught out. I try to teach myself to think about whether I need to look at the start of the line / whole string etc.
Another useful tip is use \b. This looks for word breaks so /\bbook\b/ matches
"book. "
"book "
"-book"
but not
"booking"
"ebook"
This regex is a little unrelated but if you wanted to concatenate all of the whitespaces from your string before passing it through the if.
s/[\h\v]+/ /g;
/^\shello\s$/

regex to check string is certain length

I am trying to write a regex to match pairs of cards (AA, KK, QQ ... 22) and I have the regex ([AKQJT2-9])\1. The problem I have is that this regex will match AA as well as AAbc etc. Is there a way to write the regex such that I can specify I want to match ([AKQJT2-9])\1 and only that (i.e. no more characters after).
Enclose the regex in ^ and $:
^([AKQJT2-9])\1$
^ is the "start-of-string" anchor, and $ is the "end-of-string" anchor. If your regex flavor supports it, \A and \Z might be an even better choice since ^ and $ can also match start/end of a line in a multiline string, depending on your regex engine and configuration.
You mean, like this ?
^([AKQJT2-9])\1$
It will only match if the string is "AA", "KK", …
If you want to capture both characters, but not the rest of the string, you'll have to use another parenthesis
($match,$unused) = $string ~= (([AKQJT2-9])\2); # in perl