matching two chars with multiple lines in between - regex

I am new to regex and I am using Perl.
I have below tag:
<CFSC>cfsc_service=TRUE
SEC=1
licenses=10
expires=20170511
</CFSC>
I want to match anything between <CFSC> and </CFSC> tags.
I tried /<CFSC>.*?\n.*?\n.*?\n.*?\n<\/CFSC>/
and /<CFSC>(.*)<\/CFSC>/ but had no luck.

You need the /s single line modifier to make the regex engine include line breaks in ..
Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
See this example.
my $foo = qq{<CFSC>cfsc_service=TRUE
SEC=1
licenses=10
expires=20170511
</CFSC>};
$foo =~ m{>(.*)</CFSC>}s;
print $1;
You also need to use a different delimiter than /, or escape it.

Try
/<CFSC>(.*)<\/CFSC>/s
The final s makes the . match newline chars (\n = 0x0a) which is usually doesn't match:
Treat string as single line. That is, change "." to match any
character whatsoever, even a newline, which normally it would not
match.
from http://perldoc.perl.org/perlre.html#Modifiers

Try this:
$foo =~ m/<CFSC>((?:(?!<\/CFSC>).)*)<\/CFSC>/gs;
Modifiers:
g - Matches global
s - newline
i - case sensitive
\ - escape sequence

Related

How do I replace a word with a new line and a word using regex with an empty string in Powershell?

How do I replace a word with a new line and a word using regex with an empty string in Powershell?
Below is a sample content... I need to delete all the use database and go I'm using powershell and powershell_ise for editor:
use database_instance
go
if condition
You need to match Newline and also space after the newline:
/use database_\w+\n\s*\w+/g
$sql = #"
use database_instance
go
if condition
"#
$sql -ireplace 'use\s+\w+_\w+\s*(?:\r?\n)+\s*go' , ''
How this Works:
Using -ireplace for case insensitive regex.
Find the word use followed by one or more whitespace \s+ followed by one or more word characters \w+, then an underscore _.
One or more word characters \w+, followed by 0 or more whitespace (just in case)
A non-capturing group (?:) since we don't need the result, this is just to encapsulate a newline that accounts for windows and unix line endings. It consists of an optional CR followed by a LF, and this is matched 1 or more times.
Followed by 0 or more whitespace \s* then the word go.
Replace it with nothing!
This does leave some empty space, but that shouldn't be too big of an issue since the SQL parser won't care.
Note
In your comments you said you tried:
$out -replace "/use database_\w+\n\w+/g"
Be aware that powershell does not use /regexhere/ syntax. The forward slashes are treated as literals, so the flags you specified are as well. The replace is global by default so you don't need g anyway.

Wildcard beginning of a line in perl

How to use wildcard for beginning of a line?
Example, I want to replace abc with def.
This is what my file looks like
abc
abc
abc
hg abc
Now I want that abc should be replaced in only first 3 lines. How to do it?
$_ =~ s/['\s'] * abc ['\s'] * /def/g;
What condition to be put before beginning of first space?
Thanks
What about:
s/(^ *)abc/$1def/g
(^ *) -> zero or morespaces at start of line
This will strictly replace abc with def.
Also note I've used a real space and not \s because you said "beginning of first space". \s matches more characters than only space.
You are making a couple of mistakes in your regex
$_ =~ s/['\s'] * abc ['\s'] * /def/g;
You don't need /g (global, match as many times as possible) if you only want to replace from the beginning of the string (since that can only match once).
Inside a character class bracket all characters are literal except ], - and ^, so ['\s'] means "match whitespace or apostrophe '"
Spaces inside the regex is interpreted literally, unless the /x modifier is used (which it is not)
Quantifiers apply to whatever they immediately precede, so \s* means "zero or more whitespace", but \s * means "exactly one whitespace, followed by zero or more space". Again, unless /x is used.
You do not need to supply $_ =~, since that is the variable any regex uses unless otherwise specified.
If you want to replace abc, and only abc when it is the first non-whitespace in a line, you can do this:
s/^\s*\Kabc/def/
An alternate for the \K (keep) escape is to capture and put back
s/^(\s*)abc/$1def/
If you want to keep the whitespace following the target string abc, you do not need to do anything. If you want it removed, just add \s* at the end
s/^\s*\Kabc\s*/def/
Also note that this is simply a way to condense logic into one statement. You can also achieve the same by using very simple building blocks:
if (/^\s*abc/) { # if abc is the first non-whitespace
s/abc/def/; # ...substitute it
}
Since the substitution only happens once (if the /g modifier is not used), and only the first match is affected, this will flawlessly substitute abc for def.
Try this:
$_ =~ s/^['\s'] * abc ['\s'] * /def/g;
If you need to check from start of a line then use ^.
Also, I am not sure why you have ' and spaces in your regex. This should also work for you:
$_ =~ s/^[\s]*abc[\s]*/def/g;
Use ^ character, and remove unnecessary apostrophes, spaces and [ ] :
$_ =~ s/^\s*abc/def/g
If you want to keep those spaces that were before the "abc":
$_ =~ s/^(\s*)abc/\1def/g

Regex to remove what ever comes in front of "\" using powershell

wanted one help, wanted a regex to eliminate a "\" and what ever come before it,
Input should be "vmvalidate\administrator"
and the output should be just "administrator"
$result = $subject -creplace '^[^\\]*\\', ''
removes any non-backslash characters at the start of the string, followed by a backslash:
Explanation:
^ # Start of string
[^\\]* # Match zero or more non-backslash characters
\\ # Match a backslash
This means that if there is more than one backslash in the string, only the first one (and the text leading up to it) will be removed. If you want to remove everything until the last backslash, use
$result = $subject -creplace '(?s)^.*\\', ''
No need to use regex, try the split method:
$string.Split('\')[-1]
"vmvalidate\administrator" -replace "^.*?\\"
^ - from the begin of string
.* - any amount of any chars
? - lazy mode of quantifier
\ - "backslash" using escape character ""
All together it means "Replace all characters from the begin of string until backslash"
This is the way I used to do things before I learned about regex or splitting.
"vmvalidate\administrator".SubString("vmvalidate\administrator".IndexOf('\')+1)

perl regex - quantifier * not greedy enough to pickup the newline at end of string

Is it not quantifier * , greedy ? Should not \s* match 0 or more occurence of white spaces,and which in turn would match everything till end of the given input string ?
#!/usr/bin/perl
use strict;
use warnings;
my $input="Name : www.devserver.com\n";
$input=~s/\w+.:\s*//; # /s* should not it match everthing till \n at the end ?
print $input;
Please help me understand this behaviour.
\s* will match only a string consisting entirely of characters of the same class (namely, whitespace).
In your case, there is www.devserver.com between the leading and trailing spaces.
You may have tried to use . class instead of \s:
$input=~s/\w+.:.*//;
This also wouldn't touch the trailing newline! According to perlre:
To simplify multi-line substitutions, the "." character never matches a newline unless you use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't.
So, wrapping it up: the behavior you are expecting can be reproduced with the following substitution:
$input=~s/\w+.:.*//s;

PERL-Subsitute any non alphanumerical character to "_"

In perl I want to substitute any character not [A-Z]i or [0-9] and replace it with "_" but only if this non alphanumerical character occurs between two alphanumerical characters. I do not want to touch non-alphanumericals at the beginning or end of the string.
I know enough regex to replace them, just not to only replace ones in the middle of the string.
s/(\p{Alnum})\P{Alnum}(\p{Alnum})/${1}_${2}/g;
Of course that would hurt your chanches with "#A#B%C", so you might use a look-arounds:
s/(?<=\p{Alnum})\P{Alnum}(?=\p{Alnum})/_/g;
That way you isolate it to just the non "alnum" character.
Or you could use the "keep flag", as well and get the same thing done.
s/\p{Alnum}\K\P{Alnum}(?=\p{Alnum})/_/g;
EDIT based on input:
To not eat a newline, you could do the following:
s/\p{Alnum}\K[^\p{Alnum}\n](?=\p{Alnum})/_/g;
Try this:
my $str = 'a-2=c+a()_';
$str =~ s/(?<=[A-Z0-9])[^A-Z0-9](?=[A-Z0-9])/\1_\2/gi;