Copy matched pattern of line at the end of it - regex

I want to match a pattern from text and then append it at the end of line. In below case i want to match numbers and then paste it at the end of line. In case of matching two patterns want to have comma separated.
Basically i am looking how i can use the matching portion as variable.
I am looking to do it in Bash.
abc 123=
agdaf456ad
dfaf879:
abc123xyz12:
To
abc 123=123
agdaf456ad456
dfaf879:879
abc123xyz12:123,12

Something like
(\d+)(.*)$
And replace with
$1$2$1
Regex Demo
Example
$replace = preg_replace("/(\d+)(.*)$/", "$1$2$1", "abc 123=");
echo $replace;
=> abc 123=123

To get all sequences of digits in a given string, you can use a mere \d+ regex, and then just implode the obtained result array and append it to the input string:
$str = "abc123xyz12:";
preg_match_all('/\d+/', $str, $m);
$append = implode(",", $m[0]);
echo $str . $append;
See demo

Related

Perl regex exclude optional word from match

I have a strings and need to extract only icnnumbers/numbers from them.
icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ
I need to extract below data from above example.
9876AB54321
987654321FR
987654321YQ
Here is my regex, but its working for first line of data.
(icnnumber|number):(\w+)(?:_IN)
How can I have expression which would match for three set of data.
Given your strings to extract are only upper case and numeric, why use \w when that also matches _?
How about just matching:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
m/number:([A-Z0-9]+)/;
print "$1\n";
}
__DATA__
icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ
Another alternative to get only the values as a match using \K to reset the match buffer
\b(?:icn)?number:\K[^\W_]+
Regex demo | Perl demo
For example
my $str = 'icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ';
while($str =~ /\b(?:icn)?number:\K[^\W_]+/g ) {
print $& . "\n";
}
Output
9876AB54321
987654321FR
987654321YQ
You may replace \w (that matches letters, digits and underscores) with [^\W_] that is almost the same, but does not match underscores:
(icnnumber|number):([^\W_]+)
See the regex demo.
If you want to make sure icnnumber and number are matched as whole words, you may add a word boundary at the start:
\b(icnnumber|number):([^\W_]+)
^^
You may even refactor the pattern a bit in order not to repeat number using an optional non-capturing group, see below:
\b((?:icn)?number):([^\W_]+)
^^^^^^^^
Pattern details
\b - a word boundary (immediately to the right, there must be start of string or a char other than letter, digit or _)
((?:icn)?number) - Group 1: an optional sequence of icn substring and then number substring
: - a : char
([^\W_]+) - Group 2: one or more letters or digits.
Just another suggestion maybe, but if your strings are always valid, you may consider just to split on a character class and pull the second index from the resulting array:
my $string= "number:987654321FR";
my #part = (split /[:_]/, $string)[1];
print #part
Or for the whole array of strings:
#Array = ("icnnumber:9876AB54321_IN", "number:987654321FR", "icnnumber:987654321YQ");
foreach (#Array)
{
my $el = (split /[:_]/, $_)[1];
print "$el\n"
}
Results in:
9876AB54321
987654321FR
987654321YQ
Regular expression can have 'icn' as an option and part of the interest is 11 characters after :.
my $re = qr/(icn)?number:(.{11})/;
Test code snippet
use strict;
use warnings;
use feature 'say';
my $re = qr/(icn)?number:(.{11})/;
while(<DATA>) {
say $2 if /$re/;
}
__DATA__
icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ
Output
9876AB54321
987654321FR
987654321YQ
Already you got best and better answers here anyway I trying to solve your question right now.
Get the whole string,
my $str = do { local $/; <DATA> }; #print $str;
You can check the first grouping method upto _ or \b from the below line,
#arrs = ($str=~m/number\:((?:(?!\_).)*)(?:\b|\_)/ig);
(or)
You can check the non-words \W and _ for the first grouping here, and pushing the matches in the array
#arrs = ($str=~m/number\:([^\W\_]+)(?:\_|\b)/ig);
print the output
print join "\n", #arrs;
__DATA__
icnnumber:9876AB54321_IN
number:987654321FR
icnnumber:987654321YQ

perl Regex replace for specific string length

I am using Perl to do some prototyping.
I need an expression to replace e by [ee] if the string is exactly 2 chars and finishes by "e".
le -> l [ee]
me -> m [ee]
elle -> elle : no change
I cannot test the length of the string, I need one expression to do the whole job.
I tried:
`s/(?=^.{0,2}\z).*e\z%/[ee]/g` but this is replacing the whole string
`s/^[c|d|j|l|m|n|s|t]e$/[ee]/g` same result (I listed the possible letters that could precede my "e")
`^(?<=[c|d|j|l|m|n|s|t])e$/[ee]/g` but I have no match, not sure I can use ^ on a positive look behind
EDIT
Guys you're amazing, hours of search on the web and here I get answers minutes after I posted.
I tried all your solutions and they are working perfectly directly in my script, i.e. this one:
my $test2="le";
$test2=~ s/^(\S)e$/\1\[ee\]/g;
print "test2:".$test2."\n";
-> test2:l[ee]
But I am loading these regex from a text file (using Perl for proto, the idea is to reuse it with any language implementing regex):
In the text file I store for example (I used % to split the line between match and replace):
^(\S)e$% \1\[ee\]
and then I parse and apply all regex like that:
my $test="le";
while (my $row = <$fh>) {
chomp $row;
if( $row =~ /%/){
my #reg = split /%/, $row;
#if no replacement, put empty string
if($#reg == 0){
push(#reg,"");
}
print "reg found, reg:".$reg[0].", replace:".$reg[1]."\n";
push #regs, [ #reg ];
}
}
print "orgine:".$test."\n";
for my $i (0 .. $#regs){
my $p=$regs[$i][0];
my $r=$regs[$i][1];
$test=~ s/$p/$r/g;
}
print "final:".$test."\n";
This technique is working well with my other regex, but not yet when I have a $1 or \1 in the replace... here is what I am obtaining:
final:\1\ee\
PS: you answered to initial question, should I open another post ?
Something like s/(?i)^([a-z])e$/$1[ee]/
Why aren't you using a capture group to do the replacement?
`s/^([c|d|j|l|m|n|s|t])e$/\1 [ee]/g`
If those are the characters you need and if it is indeed one word to a line with no whitespace before it or after it, then this will work.
Here's another option depending on what you are looking for. It will match a two character string consisting of one a-z character followed by one 'e' on its own line with possible whitespace before or after. It will replace this will the single a-z character followed by ' [ee]'
`s/^\s*([a-z])e\s*$/\1 [ee]/`
^(\S)e$
Try this.Replace by $1 [ee].See demo.
https://regex101.com/r/hR7tH4/28
I'd do something like this
$word =~ s/^(\w{1})(e)$/$1$2e/;
You can use following regex which match 2 character and then you can replace it with $1\[$2$2\]:
^([a-zA-Z])([a-zA-Z])$
Demo :
$my_string =~ s/^([a-zA-Z])([a-zA-Z])$/$1[$2$2]/;
See demo https://regex101.com/r/iD9oN4/1

split one line regex in a multiline regexp in perl

I have trouble spliting my regex in multiple line. I want my regex to match the line given:
* Code "l;k""dfsakd;.*[])_lkaDald"
So I created this regex which work:
my $firstRegexpr = qr/^\s*\*\s*Code\s+\"(?<Code>((\")*[^\"]+)+)\"/x;
But now I want to split it in multiline like this(and want it to match the same thing!):
my $firstRegexpr = qr/^\s*\*\s*Code\s+\"
(?<Code>((\")*[^\"]+)+)\"/x;
I read about this, but I have trouble using it:
/
^\s*\*\s*Code\s+\"
(?<Code>((\")*[^\"]+)+)\"
/x
My last question is about removing inlining variable in perl regex:
my $firstRegexpr = qr/^\s*\*\s*Code\s+\"(?<Code>((\")*[^\"$]+)+)\"\$/x;
the character $] is matched as a variable in the regex, how to define it not as a variable?
Thanks a lot for your time and please provide explicit example.
What the x flag does is very simply say 'ignore whitespace'.
So you no longer match 'space' characters , and instead have to use \s or similar.
So you can write:
if ( m/
^
\d+\s+
fish:\w+\s+
$
/x ) {
print "Matched\n";
}
You can test regular expressions with various websites but one example is https://regex101.com/
So to take your example: https://regex101.com/r/eG5jY8/1
But how is yours not working?
This matches:
my $string = q{* Code "l;k""dfsakd;.*[])_lkaDald"};
my $firstRegexpr = qr/^\s*
\*
\s*
Code\s+
\"
(?<Code>((\")*[^\"]+)+)
\"
/x;
print "Compiled_Regex: $firstRegexpr\n";
print "Matched\n" if ( $string =~ m/$firstRegexpr/ );
And as for not having $] - there's two answers. Either: Use \ to escape it, or use \Q\E.

Finding the N th Occurrence of a Match line

I have list (multiline text string) with same number of line (order of items may differ in many ways and numbers of line may be however):
Ardei
Mere
Pere
Ardei
Castraveti
I want to find 2 th occurrence of a match line that contain 'Ardei' and replace name of item with another name and, separately in another regex, find 1 st occurrence of 'Ardei' and replace name with something else (perl).
Let's say you want to replace the 2nd "Ardei" with "XYZ". You could do that like this (PCRE syntax):
^(?s)(.*?Ardei.*?)Ardei
and replace it with:
$1XYZ
The $1 contains everything that is captured in (.*?Ardei.*?) and the (?s) will cause the . to match really every character (also line break chars).
A little demo:
#!/usr/bin/perl -w
my $text = 'Ardei
Mere
Pere
Ardei
Castraveti
Ardei';
$text =~ s/^(?s)(.*?Ardei.*?)Ardei/$1XYZ/;
# or just: $text =~ s/^(.*?Ardei.*?)Ardei/$1XYZ/s;
print $text;
will print:
Ardei
Mere
Pere
XYZ
Castraveti
Ardei
Ardei[\W\w]*?(Ardei)
will match exactly the second "Ardei" by its \1, so you can use it to replace exactly the second instance.

Attach a newline to every sentences

i was wondering how to turn a paragraph, into bullet sentences.
before:
sentence1. sentence2. sentence3. sentence4. sentence5. sentence6. sentence7.
after:
sentence1.
sentence2.
sentence3
sentence4.
sentence5.
Since all the other answers so far show how to do it various programming languages and you have tagged the question with Vim, here's how to do it in Vim:
:%s/\.\(\s\+\|$\)/.\r\r/g
I've used two carriage returns to match the output format you showed in the question. There are a number of alternative regular expression forms you could use:
" Using a look-behind
:%s/\.\#<=\( \|$\)/\r\r/g
" Using 'very magic' to reduce the number of backslashes
:%s/\v\.( |$)/.\r\r/g
" Slightly different formation: this will also break if there
" are no spaces after the full-stop (period).
:%s/\.\s*$\?/.\r\r/g
and probably many others.
A non-regexp way of doing it would be:
:let s = getline('.')
:let lineparts = split(s, '\.\#<=\s*')
:call append('.', lineparts)
:delete
See:
:help pattern.txt
:help change.txt
:help \#<=
:help :substitute
:help getline()
:help append()
:help split()
:help :d
You can use a regex
/\.( |$)/g
That will match the end of the sentence, then you can add newlines.
Or you can use some split function with . (dot space) and . (dot), then join with newlines.
Just replace all end of sentences /(?<=.) / with a period followed by two newline characters /.\n\n/. The syntax would of course depend on the language you are using.
Using Perl:
perl -e "$_ = <>; s/\.\s*/.\n/g; print"
Longer, somewhat more readable version:
my $input = 'foo. bar. baz.';
$input =~ s/
\. # A literal '.'
\s* # Followed by 0 or more space characters
/.\n/gx; # g for all occurences, x to allow comments and whitespace in regex
print $input;
Using Python:
import re
input = 'foo. bar. baz.'
print re.sub(r'\.\s*', '.\n', input)
An example using Ruby:
ruby-1.9.2 > a = "sentence1. sentence2. sentence3. and array.split(). the end."
=> "sentence1. sentence2. sentence3. and array.split(). the end."
ruby-1.9.2 > puts a.gsub(/\.(\s+|$)/, ".\n\n")
sentence1.
sentence2.
sentence3.
and array.split().
the end.
It goes like, for every . followed by (1 whitespace character or more, or followed by end of line), replace it with just . and two newline characters.
using awk
$ awk '{$1=$1}1' OFS="\n" file
sentence1.
sentence2.
sentence3.
sentence4.
sentence5.
sentence6.
sentence7
In PHP:
<?php
$input = "sentence. sentence. sentence.";
$output = preg_replace("/(.*?)\\.[\\s]+/", "$1\n", $input);
?&gt
Also, regular expressions are a blast, but not necessary for this problem. You can also try:
&lt?php
$input = "sentence. sentence. sentence.";
$arr = explode('.', $input);
foreach ($arr as $k => $v) $arr[$k] = trim($v);
$output = implode("\n", $arr);
?&gt
I figured out how to do this in RegExr
Search String is
(\-=?\s+)
--
Replace String is
\n\n
This is the generated information for the current regex
RegExp: /(\-=?\s+)/g
pattern: (\-=?\s+)
flags: g
capturing groups: 1
group 1: (\-=?\s+)
This will find every - in the sentence below and replace it with two newlines
Sentence 1- Sentence 2- Sentence 3- Sentence 4- Sentence 5-
The end result is
Sentence 1
Sentence 2
Sentence 3
Sentence 4
Sentence 5
I have a really simple naive solution using capturing regexs.
:%s/[.!?]/\1y\r\r/g
The main draw back is this won't handle ellipses or multiple punctuation.