Adding a space character to my regex - regex

I would like some help in getting this regex to accept the space character.
The following regex works ^a|a$|a but this one doesn't ^tips to|tips to$|tips to.

Space is just as-is in a regex (you just put the space character, that should work). Alternatively you can use \s special character. For example, in Perl:
my $test = "Helloworld";
if ($test =~ m/ /)
{
print("Has space\n");
}
Also if you can specify more what you want to use the regex for, we might be able to help better.

try escaping just the last space (the regex engine will then "see" that "tips to" is one block - at least for the last OR)
^tips to|tips to$|tips\ to
or to be on the safe side group what your searching for
^(tips to)|(tips to)$|(tips to)
[EDIT 1]
so here's the solution the OP is using:
^"tips to"|"tips to"$|"tips to"

The regular expression that matches 1 space character is 1 space character.

Related

Need help in matching regexp

I am having a string say
my $str = "FILLER-1-1,EQPT:MN,EQPT_MISSING,NSA,04-30,15-07-13,NEND,NA";
I want to match a pattern say
my $pattern = "FILLER-1-1";
I am using the below regexp
$reg = $str =~ /$pattern/;
This is working fine
Now the problem is it is also matching if our string is
FILLER-1-10/FILLER-1-11/FILLER-1-12 so on ...
I dont want to match this. Also I don't want my regexp to be like
$reg = $str =~ /$pattern\W+/;
This one is working against the above mentioned issue but \W may come or not come. In some strings it can come while in other it may not come. So i need the regexp to match only FILLER-1-1 without using \W+ and it should match specifically FILLER-1-10
Note: If somebody is doing -(minus) rating to my question, please let me know what's wrong in the code. It will be appreciable if the person write the comment too
As \w matches [a-zA-Z0-9], you can use the zero-width assumption \b, which denotes a change in \w state (called a "word boundary", hence the "b" shortcut):
/FILLER-1-1\b/
This means that there needs to be a character that differs from the previous word state - a word state change.
It will match
FILLER-1-1.
FILLER-1-1&
FILLER-1-1,
It will not match
FILLER-1-1a
FILLER-1-16
Read more about it here.
If you want to match FILLER at the start of the input (line) followed by two numbers, this simple regex should work:
/~FILLER-\d+-\d+/
~ matches the beginning of the input
\d matches any digit ([0-9])
+ matches at least one, but can match any number
use ? quantifier like so:
/FILLER-\d-\d\W?/
The \W? means not a word zero or one time

Problems with perl regex

I need a perl regex to match A.CC3 on a line begining with something followed by anything then, my 'A.CC3 " and then anything...
I am surprised this (text =~ /^\W+\CC.*\A\.CC\[3].*/) is not working
Thanks
\A is an escape sequence that denotes beginning of line, or ^ like in the beginning of your regex. Remove the backslash to make it match a literal A.
Edit: You also seem to have \C in there. You should only use backslash to escape meta characters such as period ., or to create escape sequences, such as \Q .. \E.
At its simplest, a regex to match A.CC3 would be
$text =~ /A\.CC3/
That's all you need. This will match any string with A.CC3 in it. In the comments you mention the string you are matching is this:
my $text = "//%CC Unused Static Globals, A.CC3, Halstead Progam Volume";
You might want to avoid partial matches, in which case you can use word boundary \b
$text =~ /\bA\.CC3\b/
You might require that a line begins with //%
$text =~ m#^//%.*\bA\.CC3\b#
Of course, only you know which parts of the string should be matched and in what way. "Something followed by anything followed by A.CC3 followed by anything" really just needs the first simple regex.
It doesn't seem like you're trying to capture anything. If that's the case, and all you need to do is find lines that contain A.CC3 then you can simply do
if ( index( $str, 'A.CC3' ) >= 0 ) # Found it...
No need for a regex.
Try to give this a shot:
^.*?A\.CC.*$
That will match anything until it reaches A, then a literal ., followed by CC, then anything until end of string.
It depends what you want to match. If you want to pull back the whole line in which the A.CC3 pattern occurs then something like this should work:
^.*A\.CC3.*$

Regex to get all character to the right of first space?

I am trying to craft a regular expression that will match all characters after (but not including) the first space in a string.
Input text:
foo bar bacon
Desired match:
bar bacon
The closest thing I've found so far is:
\s(.*)
However, this matches the first space in addition to "bar bacon", which is undesirable. Any help is appreciated.
You can use a positive lookbehind:
(?<=\s).*
(demo)
Although it looks like you've already put a capturing group around .* in your current regex, so you could just try grabbing that.
I'd prefer to use [[:blank:]] for it as it doesn't match newlines just in case we're targetting mutli's. And it's also compatible to those not supporting \s.
(?<=[[:blank:]]).*
You don't need look behind.
my $str = 'now is the time';
# Non-greedily match up to the first space, and then get everything after in a group.
$str =~ /^.*? +(.+)/;
my $right_of_space = $1; # Keep what is in the group in parens
print "[$right_of_space]\n";
You can also try this
(?s)(?<=\S*\s+).*
or
(?s)\S*\s+(.*)//group 1 has your match
With (?s) . would also match newlines

Substitution with \s does not work as expected

I write regex to remove more than 1 space in a string. The code is simple:
my $string = 'A string has more than 1 space';
$string = s/\s+/\s/g;
But, the result is something bad: 'Asstringshassmoresthans1sspace'. It replaces every single space with 's' character.
There's a work around is instead of using \s for substitution, I use ' '. So the regex becomes:
$string = s/\s+/ /g;
Why doesn't the regex with \s work?
\s is only a metacharacter in a regular expression (and it matches more than just a space, for example tabs, linebreak and form feed characters), not in a replacement string. Use a simple space (as you already did) if you want to replace all whitespace by a single space:
$string = s/\s+/ /g;
If you only want to affect actual space characters, use
$string = s/ {2,}/ /g;
(no need to replace single spaces with themselves).
The answer to your question is that \s is a character class, not a literal character. Just as \w represents alphanumeric characters, it cannot be used to print an alphanumeric character (except w, which it will print, but that's beside the point).
What I would do, if I wanted to preserve the type of whitespace matched, would be:
s/\s\K\s*//g
The \K (keep) escape sequence will keep the initial whitespace character from being removed, but all subsequent whitespace will be removed. If you do not care about preserving the type of whitespace, the solution already given by Tim is the way to go, i.e.:
s/\s+/ /g
\s stands for matching any whitespace. It's equivalent to this:
[\ \t\r\n\f]
When you replace with $string = s/\s+/\s/g;, you are replacing one or more whitespace characters with the letter s. Here's a link for reference: http://perldoc.perl.org/perlrequick.html
Why doesn't the regex with \s work?
Your regex with \s does work. What doesn't work is your replacement string. And, of course, as others have pointed out, it shouldn't.
People get confused about the substitution operator (s/.../.../). Often I find people think of the whole operator as "a regex". But it's not, it's an operator that takes two arguments (or operands).
The first operand (between the first and second delimiters) is interpreted as a regex. The second operand (between the second and third delimiters) is interpreted as a double-quoted string (of course, the /e option changes that slightly).
So a substitution operation looks like this:
s/REGEX/REPLACEMENT STRING/
The regex recognises special characters like ^ and + and \s. The replacement string doesn't.
If people stopped misunderstanding how the substitution operator is made up, they might stop expecting regex features to work outside of regular expressions :-)

PERL-Subsitute any non alphanumerical character to "_"

In perl I want to substitute any character not [A-Z]i or [0-9] and replace it with "_" but only if this non alphanumerical character occurs between two alphanumerical characters. I do not want to touch non-alphanumericals at the beginning or end of the string.
I know enough regex to replace them, just not to only replace ones in the middle of the string.
s/(\p{Alnum})\P{Alnum}(\p{Alnum})/${1}_${2}/g;
Of course that would hurt your chanches with "#A#B%C", so you might use a look-arounds:
s/(?<=\p{Alnum})\P{Alnum}(?=\p{Alnum})/_/g;
That way you isolate it to just the non "alnum" character.
Or you could use the "keep flag", as well and get the same thing done.
s/\p{Alnum}\K\P{Alnum}(?=\p{Alnum})/_/g;
EDIT based on input:
To not eat a newline, you could do the following:
s/\p{Alnum}\K[^\p{Alnum}\n](?=\p{Alnum})/_/g;
Try this:
my $str = 'a-2=c+a()_';
$str =~ s/(?<=[A-Z0-9])[^A-Z0-9](?=[A-Z0-9])/\1_\2/gi;