PERL-Subsitute any non alphanumerical character to "_" - regex

In perl I want to substitute any character not [A-Z]i or [0-9] and replace it with "_" but only if this non alphanumerical character occurs between two alphanumerical characters. I do not want to touch non-alphanumericals at the beginning or end of the string.
I know enough regex to replace them, just not to only replace ones in the middle of the string.

s/(\p{Alnum})\P{Alnum}(\p{Alnum})/${1}_${2}/g;
Of course that would hurt your chanches with "#A#B%C", so you might use a look-arounds:
s/(?<=\p{Alnum})\P{Alnum}(?=\p{Alnum})/_/g;
That way you isolate it to just the non "alnum" character.
Or you could use the "keep flag", as well and get the same thing done.
s/\p{Alnum}\K\P{Alnum}(?=\p{Alnum})/_/g;
EDIT based on input:
To not eat a newline, you could do the following:
s/\p{Alnum}\K[^\p{Alnum}\n](?=\p{Alnum})/_/g;

Try this:
my $str = 'a-2=c+a()_';
$str =~ s/(?<=[A-Z0-9])[^A-Z0-9](?=[A-Z0-9])/\1_\2/gi;

Related

matching two chars with multiple lines in between

I am new to regex and I am using Perl.
I have below tag:
<CFSC>cfsc_service=TRUE
SEC=1
licenses=10
expires=20170511
</CFSC>
I want to match anything between <CFSC> and </CFSC> tags.
I tried /<CFSC>.*?\n.*?\n.*?\n.*?\n<\/CFSC>/
and /<CFSC>(.*)<\/CFSC>/ but had no luck.
You need the /s single line modifier to make the regex engine include line breaks in ..
Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
See this example.
my $foo = qq{<CFSC>cfsc_service=TRUE
SEC=1
licenses=10
expires=20170511
</CFSC>};
$foo =~ m{>(.*)</CFSC>}s;
print $1;
You also need to use a different delimiter than /, or escape it.
Try
/<CFSC>(.*)<\/CFSC>/s
The final s makes the . match newline chars (\n = 0x0a) which is usually doesn't match:
Treat string as single line. That is, change "." to match any
character whatsoever, even a newline, which normally it would not
match.
from http://perldoc.perl.org/perlre.html#Modifiers
Try this:
$foo =~ m/<CFSC>((?:(?!<\/CFSC>).)*)<\/CFSC>/gs;
Modifiers:
g - Matches global
s - newline
i - case sensitive
\ - escape sequence

Problems with perl regex

I need a perl regex to match A.CC3 on a line begining with something followed by anything then, my 'A.CC3 " and then anything...
I am surprised this (text =~ /^\W+\CC.*\A\.CC\[3].*/) is not working
Thanks
\A is an escape sequence that denotes beginning of line, or ^ like in the beginning of your regex. Remove the backslash to make it match a literal A.
Edit: You also seem to have \C in there. You should only use backslash to escape meta characters such as period ., or to create escape sequences, such as \Q .. \E.
At its simplest, a regex to match A.CC3 would be
$text =~ /A\.CC3/
That's all you need. This will match any string with A.CC3 in it. In the comments you mention the string you are matching is this:
my $text = "//%CC Unused Static Globals, A.CC3, Halstead Progam Volume";
You might want to avoid partial matches, in which case you can use word boundary \b
$text =~ /\bA\.CC3\b/
You might require that a line begins with //%
$text =~ m#^//%.*\bA\.CC3\b#
Of course, only you know which parts of the string should be matched and in what way. "Something followed by anything followed by A.CC3 followed by anything" really just needs the first simple regex.
It doesn't seem like you're trying to capture anything. If that's the case, and all you need to do is find lines that contain A.CC3 then you can simply do
if ( index( $str, 'A.CC3' ) >= 0 ) # Found it...
No need for a regex.
Try to give this a shot:
^.*?A\.CC.*$
That will match anything until it reaches A, then a literal ., followed by CC, then anything until end of string.
It depends what you want to match. If you want to pull back the whole line in which the A.CC3 pattern occurs then something like this should work:
^.*A\.CC3.*$

Substitution with \s does not work as expected

I write regex to remove more than 1 space in a string. The code is simple:
my $string = 'A string has more than 1 space';
$string = s/\s+/\s/g;
But, the result is something bad: 'Asstringshassmoresthans1sspace'. It replaces every single space with 's' character.
There's a work around is instead of using \s for substitution, I use ' '. So the regex becomes:
$string = s/\s+/ /g;
Why doesn't the regex with \s work?
\s is only a metacharacter in a regular expression (and it matches more than just a space, for example tabs, linebreak and form feed characters), not in a replacement string. Use a simple space (as you already did) if you want to replace all whitespace by a single space:
$string = s/\s+/ /g;
If you only want to affect actual space characters, use
$string = s/ {2,}/ /g;
(no need to replace single spaces with themselves).
The answer to your question is that \s is a character class, not a literal character. Just as \w represents alphanumeric characters, it cannot be used to print an alphanumeric character (except w, which it will print, but that's beside the point).
What I would do, if I wanted to preserve the type of whitespace matched, would be:
s/\s\K\s*//g
The \K (keep) escape sequence will keep the initial whitespace character from being removed, but all subsequent whitespace will be removed. If you do not care about preserving the type of whitespace, the solution already given by Tim is the way to go, i.e.:
s/\s+/ /g
\s stands for matching any whitespace. It's equivalent to this:
[\ \t\r\n\f]
When you replace with $string = s/\s+/\s/g;, you are replacing one or more whitespace characters with the letter s. Here's a link for reference: http://perldoc.perl.org/perlrequick.html
Why doesn't the regex with \s work?
Your regex with \s does work. What doesn't work is your replacement string. And, of course, as others have pointed out, it shouldn't.
People get confused about the substitution operator (s/.../.../). Often I find people think of the whole operator as "a regex". But it's not, it's an operator that takes two arguments (or operands).
The first operand (between the first and second delimiters) is interpreted as a regex. The second operand (between the second and third delimiters) is interpreted as a double-quoted string (of course, the /e option changes that slightly).
So a substitution operation looks like this:
s/REGEX/REPLACEMENT STRING/
The regex recognises special characters like ^ and + and \s. The replacement string doesn't.
If people stopped misunderstanding how the substitution operator is made up, they might stop expecting regex features to work outside of regular expressions :-)

Add a optional white space between characters in a string

I have a string For Exampe
string SampleString = "F456-G12345-9090-GHI"
I need to add a optional white space between all characters in the above string.
The above string needs to match the same string which may or may not have the white space between ewach character. The other string will be like
string samplestring1 = "F456-G12345- 9090 -GHI"
Thanks
Padma
I'm not positive that I'm understanding what you'll be matching. If you're looking for a specific string, then the easiest way is probably to substitute all white space for '' across the string and then do the match.
In perl I'd do:
$string =~ s/\s//g;
while ($string =~ m/F456-G12345-9090-GHI/g) {
# Do something
}
If you're looking for multiple strings, and not just a specific one, you might just want to add \s as a potential match [\w\s-]+
However, if you're going to be matching against a specific string, I'd just toss the whitespace whole cloth first rather than performing an expensive regex checking for (and discarding) any whitespace found before checking the string.
you will probably have to add \s* between each character. (or other control characters for whitespace)
\s*F\s*4\s*5\s*6\s*-\s*G\s*1\s*2\s*3\s*4\s*5\s*-\s*9\s*0\s*9\s*0\s*-\s*G\s*H\s*I\s*
Or, depending on your regex dialect, you might be able to pass an option to ignore whitespace in the source text, but it would depend on which regex library you're using.

Adding a space character to my regex

I would like some help in getting this regex to accept the space character.
The following regex works ^a|a$|a but this one doesn't ^tips to|tips to$|tips to.
Space is just as-is in a regex (you just put the space character, that should work). Alternatively you can use \s special character. For example, in Perl:
my $test = "Helloworld";
if ($test =~ m/ /)
{
print("Has space\n");
}
Also if you can specify more what you want to use the regex for, we might be able to help better.
try escaping just the last space (the regex engine will then "see" that "tips to" is one block - at least for the last OR)
^tips to|tips to$|tips\ to
or to be on the safe side group what your searching for
^(tips to)|(tips to)$|(tips to)
[EDIT 1]
so here's the solution the OP is using:
^"tips to"|"tips to"$|"tips to"
The regular expression that matches 1 space character is 1 space character.