split one line regex in a multiline regexp in perl - regex

I have trouble spliting my regex in multiple line. I want my regex to match the line given:
* Code "l;k""dfsakd;.*[])_lkaDald"
So I created this regex which work:
my $firstRegexpr = qr/^\s*\*\s*Code\s+\"(?<Code>((\")*[^\"]+)+)\"/x;
But now I want to split it in multiline like this(and want it to match the same thing!):
my $firstRegexpr = qr/^\s*\*\s*Code\s+\"
(?<Code>((\")*[^\"]+)+)\"/x;
I read about this, but I have trouble using it:
/
^\s*\*\s*Code\s+\"
(?<Code>((\")*[^\"]+)+)\"
/x
My last question is about removing inlining variable in perl regex:
my $firstRegexpr = qr/^\s*\*\s*Code\s+\"(?<Code>((\")*[^\"$]+)+)\"\$/x;
the character $] is matched as a variable in the regex, how to define it not as a variable?
Thanks a lot for your time and please provide explicit example.

What the x flag does is very simply say 'ignore whitespace'.
So you no longer match 'space' characters , and instead have to use \s or similar.
So you can write:
if ( m/
^
\d+\s+
fish:\w+\s+
$
/x ) {
print "Matched\n";
}
You can test regular expressions with various websites but one example is https://regex101.com/
So to take your example: https://regex101.com/r/eG5jY8/1
But how is yours not working?
This matches:
my $string = q{* Code "l;k""dfsakd;.*[])_lkaDald"};
my $firstRegexpr = qr/^\s*
\*
\s*
Code\s+
\"
(?<Code>((\")*[^\"]+)+)
\"
/x;
print "Compiled_Regex: $firstRegexpr\n";
print "Matched\n" if ( $string =~ m/$firstRegexpr/ );
And as for not having $] - there's two answers. Either: Use \ to escape it, or use \Q\E.

Related

Extract first word after specific word

I'm having difficulty writing a Perl program to extract the word following a certain word.
For example:
Today i'm not going anywhere except to office.
I want the word after anywhere, so the output should be except.
I have tried this
my $words = "Today i'm not going anywhere except to office.";
my $w_after = ( $words =~ /anywhere (\S+)/ );
but it seems this is wrong.
Very close:
my ($w_after) = ($words =~ /anywhere\s+(\S+)/);
^ ^ ^^^
+--------+ |
Note 1 Note 2
Note 1: =~ returns a list of captured items, so the assignment target needs to be a list.
Note 2: allow one or more blanks after anywhere
In Perl v5.22 and later, you can use \b{wb} to get better results for natural language. The pattern could be
/anywhere\b{wb}.+?\b{wb}(.+?\b{wb})/
"wb" stands for word break, and it will account for words that have apostrophes in them, like "I'll", that plain \b doesn't.
.+?\b{wb}
matches the shortest non-empty sequence of characters that don't have a word break in them. The first one matches the span of spaces in your sentence; and the second one matches "except". It is enclosed in parentheses, so upon completion $1 contains "except".
\b{wb} is documented most fully in perlrebackslash
First, you have to write parentheses around left side expression of = operator to force array context for regexp evaluation. See m// and // in perlop documentation.[1] You can write
parentheses also around =~ binding operator to improve readability but it is not necessary because =~ has pretty high priority.
Use POSIX Character Classes word
my ($w_after) = ($words =~ / \b anywhere \W+ (\w+) \b /x);
Note I'm using x so whitespaces in regexp are ignored. Also use \b word boundary to anchor regexp correctly.
[1]: I write my ($w_after) just for convenience because you can write my ($a, $b, $c, #rest) as equivalent of (my $a, my $b, my $c, my #rest) but you can also control scope of your variables like (my $a, our $UGLY_GLOBAL, local $_, #_).
This Regex to be matched:
my ($expect) = ($words=~m/anywhere\s+([^\s]+)\s+/);
^\s+ the word between two spaces
Thanks.
If you want to also take into consideration the punctuation marks, like in:
my $words = "Today i'm not going anywhere; except to office.";
Then try this:
my ($w_after) = ($words =~ /anywhere[[:punct:]|\s]+(\S+)/);

Regular Expression to find $0.00

Need to count the number of "$0.00" in a string. I'm using:
my $zeroDollarCount = ("\Q$menu\E" =~ tr/\$0\.00//);
but it doesn't work. The issue is the $ sign is throwing the regex off. It works if I just want to count the number of $, but fails to find $0.00.
How is this a duplicate? Your solution does not address dollar sign which is an issue for me.
You are using the transliteration operator tr///. That doesn't have anything to do with a pattern. You need the match operator m// instead. And because you want it to find all occurances of the pattern, use the /g modifier.
my $count = () = $menu =~ m/\$0\.00/g;
If we run this program, the output is 2.
use strict;
use warnings;
my $menu = '$0.00 and $0.00';
my $count = () = $menu =~ m/\$0\.00/g;
print $count;
Now lets take a look at what is going on. First, the pattern of the match.
/\$0\.00/
This is fairly straight-forward. There is a literal $, which we need to escape with a backslash \. The zero is followed by a literal dot ., which again we need to escape, because like the $ it has special meanings in regular expressions.
my $count = () = $menu =~ m/\$0\.00/g;
This whole line looks weird. We can break it up into a few lines to make it more readable.
my #matches = ( $menu =~ m/\$0\.00/g );
my $count = scalar #matches;
We need the /g switch on the regular expression match to make it match all occurrences. In list context, the match operation returns all matches (which will be the string "$0.00" a number of times). Because we want the count, we then force that into scalar context, which gives us the number of elements. That can be shortened to one line by the idiom shown above.

Copy matched pattern of line at the end of it

I want to match a pattern from text and then append it at the end of line. In below case i want to match numbers and then paste it at the end of line. In case of matching two patterns want to have comma separated.
Basically i am looking how i can use the matching portion as variable.
I am looking to do it in Bash.
abc 123=
agdaf456ad
dfaf879:
abc123xyz12:
To
abc 123=123
agdaf456ad456
dfaf879:879
abc123xyz12:123,12
Something like
(\d+)(.*)$
And replace with
$1$2$1
Regex Demo
Example
$replace = preg_replace("/(\d+)(.*)$/", "$1$2$1", "abc 123=");
echo $replace;
=> abc 123=123
To get all sequences of digits in a given string, you can use a mere \d+ regex, and then just implode the obtained result array and append it to the input string:
$str = "abc123xyz12:";
preg_match_all('/\d+/', $str, $m);
$append = implode(",", $m[0]);
echo $str . $append;
See demo

perl Regex replace for specific string length

I am using Perl to do some prototyping.
I need an expression to replace e by [ee] if the string is exactly 2 chars and finishes by "e".
le -> l [ee]
me -> m [ee]
elle -> elle : no change
I cannot test the length of the string, I need one expression to do the whole job.
I tried:
`s/(?=^.{0,2}\z).*e\z%/[ee]/g` but this is replacing the whole string
`s/^[c|d|j|l|m|n|s|t]e$/[ee]/g` same result (I listed the possible letters that could precede my "e")
`^(?<=[c|d|j|l|m|n|s|t])e$/[ee]/g` but I have no match, not sure I can use ^ on a positive look behind
EDIT
Guys you're amazing, hours of search on the web and here I get answers minutes after I posted.
I tried all your solutions and they are working perfectly directly in my script, i.e. this one:
my $test2="le";
$test2=~ s/^(\S)e$/\1\[ee\]/g;
print "test2:".$test2."\n";
-> test2:l[ee]
But I am loading these regex from a text file (using Perl for proto, the idea is to reuse it with any language implementing regex):
In the text file I store for example (I used % to split the line between match and replace):
^(\S)e$% \1\[ee\]
and then I parse and apply all regex like that:
my $test="le";
while (my $row = <$fh>) {
chomp $row;
if( $row =~ /%/){
my #reg = split /%/, $row;
#if no replacement, put empty string
if($#reg == 0){
push(#reg,"");
}
print "reg found, reg:".$reg[0].", replace:".$reg[1]."\n";
push #regs, [ #reg ];
}
}
print "orgine:".$test."\n";
for my $i (0 .. $#regs){
my $p=$regs[$i][0];
my $r=$regs[$i][1];
$test=~ s/$p/$r/g;
}
print "final:".$test."\n";
This technique is working well with my other regex, but not yet when I have a $1 or \1 in the replace... here is what I am obtaining:
final:\1\ee\
PS: you answered to initial question, should I open another post ?
Something like s/(?i)^([a-z])e$/$1[ee]/
Why aren't you using a capture group to do the replacement?
`s/^([c|d|j|l|m|n|s|t])e$/\1 [ee]/g`
If those are the characters you need and if it is indeed one word to a line with no whitespace before it or after it, then this will work.
Here's another option depending on what you are looking for. It will match a two character string consisting of one a-z character followed by one 'e' on its own line with possible whitespace before or after. It will replace this will the single a-z character followed by ' [ee]'
`s/^\s*([a-z])e\s*$/\1 [ee]/`
^(\S)e$
Try this.Replace by $1 [ee].See demo.
https://regex101.com/r/hR7tH4/28
I'd do something like this
$word =~ s/^(\w{1})(e)$/$1$2e/;
You can use following regex which match 2 character and then you can replace it with $1\[$2$2\]:
^([a-zA-Z])([a-zA-Z])$
Demo :
$my_string =~ s/^([a-zA-Z])([a-zA-Z])$/$1[$2$2]/;
See demo https://regex101.com/r/iD9oN4/1

Attach a newline to every sentences

i was wondering how to turn a paragraph, into bullet sentences.
before:
sentence1. sentence2. sentence3. sentence4. sentence5. sentence6. sentence7.
after:
sentence1.
sentence2.
sentence3
sentence4.
sentence5.
Since all the other answers so far show how to do it various programming languages and you have tagged the question with Vim, here's how to do it in Vim:
:%s/\.\(\s\+\|$\)/.\r\r/g
I've used two carriage returns to match the output format you showed in the question. There are a number of alternative regular expression forms you could use:
" Using a look-behind
:%s/\.\#<=\( \|$\)/\r\r/g
" Using 'very magic' to reduce the number of backslashes
:%s/\v\.( |$)/.\r\r/g
" Slightly different formation: this will also break if there
" are no spaces after the full-stop (period).
:%s/\.\s*$\?/.\r\r/g
and probably many others.
A non-regexp way of doing it would be:
:let s = getline('.')
:let lineparts = split(s, '\.\#<=\s*')
:call append('.', lineparts)
:delete
See:
:help pattern.txt
:help change.txt
:help \#<=
:help :substitute
:help getline()
:help append()
:help split()
:help :d
You can use a regex
/\.( |$)/g
That will match the end of the sentence, then you can add newlines.
Or you can use some split function with . (dot space) and . (dot), then join with newlines.
Just replace all end of sentences /(?<=.) / with a period followed by two newline characters /.\n\n/. The syntax would of course depend on the language you are using.
Using Perl:
perl -e "$_ = <>; s/\.\s*/.\n/g; print"
Longer, somewhat more readable version:
my $input = 'foo. bar. baz.';
$input =~ s/
\. # A literal '.'
\s* # Followed by 0 or more space characters
/.\n/gx; # g for all occurences, x to allow comments and whitespace in regex
print $input;
Using Python:
import re
input = 'foo. bar. baz.'
print re.sub(r'\.\s*', '.\n', input)
An example using Ruby:
ruby-1.9.2 > a = "sentence1. sentence2. sentence3. and array.split(). the end."
=> "sentence1. sentence2. sentence3. and array.split(). the end."
ruby-1.9.2 > puts a.gsub(/\.(\s+|$)/, ".\n\n")
sentence1.
sentence2.
sentence3.
and array.split().
the end.
It goes like, for every . followed by (1 whitespace character or more, or followed by end of line), replace it with just . and two newline characters.
using awk
$ awk '{$1=$1}1' OFS="\n" file
sentence1.
sentence2.
sentence3.
sentence4.
sentence5.
sentence6.
sentence7
In PHP:
<?php
$input = "sentence. sentence. sentence.";
$output = preg_replace("/(.*?)\\.[\\s]+/", "$1\n", $input);
?&gt
Also, regular expressions are a blast, but not necessary for this problem. You can also try:
&lt?php
$input = "sentence. sentence. sentence.";
$arr = explode('.', $input);
foreach ($arr as $k => $v) $arr[$k] = trim($v);
$output = implode("\n", $arr);
?&gt
I figured out how to do this in RegExr
Search String is
(\-=?\s+)
--
Replace String is
\n\n
This is the generated information for the current regex
RegExp: /(\-=?\s+)/g
pattern: (\-=?\s+)
flags: g
capturing groups: 1
group 1: (\-=?\s+)
This will find every - in the sentence below and replace it with two newlines
Sentence 1- Sentence 2- Sentence 3- Sentence 4- Sentence 5-
The end result is
Sentence 1
Sentence 2
Sentence 3
Sentence 4
Sentence 5
I have a really simple naive solution using capturing regexs.
:%s/[.!?]/\1y\r\r/g
The main draw back is this won't handle ellipses or multiple punctuation.