Delete everything except what matches a regex pattern - regex

I have a string and I want to replace everything but the pattern.
Right now I know what I want to do is
$line =~ s/[A-Z]{4}[0-9]{4}//g;
but inverted so that it replaces everything with nothing except the pattern.

It is a bit unclear what you are asking, but you may wish to try something like the following, which captures the pattern and then replaces the line with the capture:
#!/usr/bin/env perl
use warnings;
use strict;
my #lines = (
'HELLO WORLD',
'HELLO ABCD1234 WORLD',
'HELLOABCD1234WORLD',
'H E L LO ABCD1234 WORLD',
);
my $re_match = qr([A-Z]{4}[0-9]{4});
for my $line (#lines) {
print "$line => ";
if ($line =~ $re_match) {
$line =~ s|^.*($re_match).*$|$1|;
print $line . "\n";
} else {
print "does not match pattern $re_match \n";
}
}
Output
HELLO WORLD => does not match pattern (?^:[A-Z]{4}[0-9]{4})
HELLO ABCD1234 WORLD => ABCD1234
HELLOABCD1234WORLD => ABCD1234
H E L LO ABCD1234 WORLD => ABCD1234

perl -E '$_="xxABCD1234xxABCD1234xx"; #m = $_ =~ /[A-Z]{4}[0-9]{4}/g; #m and $_ = join "", #m; say'
Output:
ABCD1234ABCD1234

Related

perl match single occurence pattern in string

I have a list of names and I want to look for names containing two given letters asigned using variables.
$one = "A";
$two = "O";
Please note that I want those letters to be present anywhere in the checked names, so that I can get outputs like this:
Jason
Damon
Amo
Noma
Boam
...
But each letter must only be present once per name, meaning that this wouldn't work.
Alamo
I've tried this bit of code but it doesn't work.
foreach my $name (#list) {
if ($name =~ /$one/) {
if ($name =~ /$two/) {
print $name;
}}
else {next}; }
How about this?
for my $name (#list) {
my $ones = () = $name =~ /$one/gi;
my $twos = () = $name =~ /$two/gi;
if ($ones == 1 && $twos == 1) {
print $name;
}
}
#!/usr/bin/env perl
#
# test.pl is the name of this script
use warnings;
use strict;
my %char = map {$_ => 1} grep {/[a-z]/} map {lc($_)} split //, join '', #ARGV;
my #chars = sort keys %char; # the different characters appearing in the command line arguments
while (my $line = <STDIN>)
{
grep {$_ <=> 1} map {scalar(() = $line =~ /$_/ig )} #chars
or print $line;
}
Now:
echo hello world | test.pl fw will print nothing (w occurs exactly once in hello world, but f does not)
echo hello world | test.pl hw will print a line consisting of hello world (both h and w occur exactly once).
One way to get it all into a single regex is to use an expression within the regex pattern to search for the other letter (a or o) based on which one was found first:
#!/usr/bin/env perl
use 5.010; use strict; use warnings;
while(<DATA>){
chomp;
say if m/^
[^ao]* # anything but a or o
([ao]) # an 'a' or 'o'
[^ao]* # anything but a or o
(??{($1 and lc($1) eq 'a') ? 'o' : 'a'}) # the other 'a' or 'o'
[^ao]* $/xi; # anything but a or o
}
__DATA__
Abdi
Chelal
Jucal
Husham
Gallio
Pileser
Tekoa
Kenaz
Raphah
See the perlre section on Extended Expressions for more info.
This is my solution. You don't make it clear whether there will always be just two single-character strings to match but I have assumed that there may be more
Unfortunately the classical way of counting occurrences of a character -- tr/// -- doesn't interpolate variables into its searchlist and doesn't have a case-independent modifier /i. But the pattern-match operator m// does, so that is what I have used
I thoroughly dislike the so-called goatse operator, but there isn't a neater way that I know of that allows you to count the number of times a global regex pattern matches
I could have used a grep for the inner loop, but I went for a regular for loop and a next with a label as I believe it's more readable this way
use strict;
use warnings;
use v5.10.1;
use autodie;
my #list = do {
open my $fh, '<', 'names.txt';
<$fh>;
};
chomp #list;
my ($one, $two) = qw/ A O /;
NAME:
for my $name ( #list ) {
for ( $one, $two) {
my $count = () = $name =~ /$_/gi;
next NAME unless $count == 1;
}
say $name;
}
output
Gallio
Tekoa
Achbor
Clopas
This is the input that I used
Abdi
Chelal
Jucal
Husham
Gallio
Pileser
Tekoa
Kenaz
Raphah
Tiras
Jehudi
Bildad
Shemidah
Meshillemoth
Tabeel
Achbor
Jesus
Osee
Elnaam
Rephah
Asaiah
Er
Clopas
Penuel
Shema
Marsena
Jaare
Joseph
Shamariah
Levi
Aphses

match using regex in perl

HI I am trying to exract some data from a text file in perl. My file looks like this
Name:John
FirstName:Smith
Name:Alice
FirstName:Meyers
....
I want my string to look like John Smith and Alice Meyers
I tried something like this but I'm stuck and I don't know how to continue
while (<INPUT>) {
if (/^[Name]/) {
$match =~ /(:)(.*?)(\n) /
$string = $string.$2;
}
if (/^[FirstName]/) {
$match =~ /(:)(.*?)(\n)/
$string = $string.$2;
}
}
What I try to do is that when I match Name or FirstName to copy to content between : and \n but I get confused which is $1 and $2
This will put you first and last names in a hash:
use strict;
use warnings;
use Data::Dumper;
open my $in, '<', 'in.txt';
my (%data, $names, $firstname);
while(<$in>){
chomp;
($names) = /Name:(.*)/ if /^Name/;
($firstname) = /FirstName:(.*)/ if /^FirstName/;
$data{$names} = $firstname;
}
print Dumper \%data;
Through perl one-liner,
$ perl -0777 -pe 's/(?m).*?Name:([^\n]*)\nFirstName:([^\n]*).*/\1 \2/g' file
John Smith
Alice Meyers
while (<INPUT>) {
/^([A-Za-z])+\:\s*(.*)$/;
if ($1 eq 'Name') {
$surname = $2;
} elsif ($1 eq 'FirstName') {
$completeName = $2 . " " . $surname;
} else {
/* Error */
}
}
You might want to add some error handling, e.g. make sure that a Name is always followed by a FirstName and so on.
$1 $2 $3 .. $N , it's the capture result of () inside regex.
If you do something like that , you cant avoid using $1 like variables.
my ($matched1,$matched2) = $text =~ /(.*):(.*)/
my $names = [];
my $name = '';
while(my $row = <>){
$row =~ /:(.*)/;
$name = $name.' '.$1;
push(#$names,$name) if $name =~ / /;
$name = '' if $name =~ / /;
}
`while(<>){
}
`
open (FH,'abc.txt');
my(%hash,#array);
map{$_=~s/.*?://g;chomp($_);push(#array,$_)} <FH>;
%hash=#array;
print Dumper \%hash;

Find text enclosed by # and replace the inside

The problem:
Find pieces of text in a file enclosed by # and replace the inside
Input:
#abc# abc #ABC#
cba #cba CBA#
Deisred output:
абц abc АБЦ
cba цба ЦБА
I have the following:
#!/usr/bin/perl
use strict;
use warnings;
use Encode;
my $output;
open FILE,"<", 'test.txt';
while (<FILE>) {
chomp(my #chars = split(//, $_));
for (#chars) {
my #char;
$_ =~ s/a/chr(0x430)/eg;
$_ =~ s/b/chr(0x431)/eg;
$_ =~ s/c/chr(0x446)/eg;
$_ =~ s/d/chr(0x434)/eg;
$_ =~ s/e/chr(0x435)/eg;
$_ =~ s/A/chr(0x410)/eg;
$_ =~ s/B/chr(0x411)/eg;
$_ =~ s/C/chr(0x426)/eg;
push #char, $_;
$output = join "", #char;
print encode("utf-8",$output);}
print "\n";
}
close FILE;
But I'm stuck on how to process further
Thanks for help in advance!
Kluther
Here my solution. (you will fixed it, yes. It is prototype)
for (my $data = <DATA>){
$data=~s/[#]([\s\w]+)[#]/func($1)/ge;
print $data;
# while($data=~m/[#]([\s\w]+)[#]/g){
# print "marked: ",$1,"\n";
# print "position:", pos();
# }
# print "not marked: ";
}
sub func{
#do your magic here ;)
return "<< #_ >>";
}
__DATA__
#abc# abc #ABC# cba #cba CBA#
What happens here?
First, I read data. You can do it yourself.
for (my $data = <DATA>){...}
Next, I need to search your pattern and replace it.
What should I do?
Use substition operator: s/pattern/replace/
But in interesting form:
s/pattern/func($1)/ge
Key g mean Global Search
Key e mean Evaluate
So, I think, that you need to write your own func function ;)
Maybe better to use transliteration operator: tr/listOfSymbolsToBeReplaced/listOfSymbolsThatBePlacedInstead/
With minimal changes to your algorithm you need to keep track of whether you are inside the #marks or not. so add something like this
my $bConvert = 0;
chomp(my #chars = split(//, $_));
for (#chars) {
my $char = $_;
if (/#/) {
$bConvert = ($bConvert + 1) % 2;
next;
}
elsif ($bConvert) {
$char =~ s/a/chr(0x430)/eg;
$char =~ s/b/chr(0x431)/eg;
$char =~ s/c/chr(0x446)/eg;
$char =~ s/d/chr(0x434)/eg;
$char =~ s/e/chr(0x435)/eg;
$char =~ s/A/chr(0x410)/eg;
$char =~ s/B/chr(0x411)/eg;
$char =~ s/C/chr(0x426)/eg;
}
print encode("utf-8",$char);
}
Try this after $output is processed.
$output =~ s/\#//g;
my #split_output = split(//, $output);
$output = "";
my $len = scalar(#split_output) ;
while ($len--) {
$output .= shift(#split_output);
}
print $output;
It can be done with a single regex and no splitting of the string:
use strict;
use warnings;
use Encode;
my %chars = (
a => chr(0x430),
b => chr(0x431),
c => chr(0x446),
d => chr(0x434),
e => chr(0x435),
A => chr(0x410),
B => chr(0x411),
C => chr(0x426),
);
my $regex = '(' . join ('|', keys %chars) . ')';
while (<DATA>) {
1 while ($_ =~ s|\#(?!\s)[^#]*?\K$regex(?=[^#]*(?!\s)\#)|$chars{$1}|eg);
print encode("utf-8",$_);
}
It does require repeated runs of the regex due to the overlapping nature of the matches.

How to replace a variable with another variable in PERL?

I am trying to replace all words from a text except some that I have in an array. Here's my code:
my $text = "This is a text!And that's some-more text,text!";
while ($text =~ m/([\w']+)/g) {
next if $1 ~~ #ignore_words;
my $search = $1;
my $replace = uc $search;
$text =~ s/$search/$replace/e;
}
However, the program doesn't work. Basically I am trying to make all words uppercase but skip the ones in #ignore_words. I know it's a problem with the variables being used in the regular expression, but I can't figure the problem out.
#!/usr/bin/perl
my $text = "This is a text!And that's some-more text,text!";
my #ignorearr=qw(is some);
my %h1=map{$_ => 1}#ignorearr;
$text=~s/([\w']+)/($h1{$1})?$1:uc($1)/ge;
print $text;
On running this,
THIS is A TEXT!AND THAT'S some-MORE TEXT,TEXT!
You can figure the problem out of your code if instead of applying an expression to the same control variable of a while loop, just let s/../../eg do it globally for you:
my $text = "This is a text!And that's some-more text,text!";
my #ignore_words = qw{ is more };
$text =~ s/([\w']+)/$1 ~~ #ignore_words ? $1 : uc($1)/eg;
print $text;
And on running:
THIS is A TEXT!AND THAT'S SOME-more TEXT,TEXT!

How can I count the amount of spaces at the start of a string in Perl?

How can I count the amount of spaces at the start of a string in Perl?
I now have:
$temp = rtrim($line[0]);
$count = ($temp =~ tr/^ //);
But that gives me the count of all spaces.
$str =~ /^(\s*)/;
my $count = length( $1 );
If you just want actual spaces (instead of whitespace), then that would be:
$str =~ /^( *)/;
Edit: The reason why tr doesn't work is it's not a regular expression operator. What you're doing with $count = ( $temp =~ tr/^ // ); is replacing all instances of ^ and with itself (see comment below by cjm), then counting up how many replacements you've done. tr doesn't see ^ as "hey this is the beginning of the string pseudo-character" it sees it as "hey this is a ^".
You can get the offset of a match using #-. If you search for a non-whitespace character, this will be the number of whitespace characters at the start of the string:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
my $count = $s =~ /\S/ ? $-[0] : length $s;
print "'$s' has $count whitespace characters at its start\n";
}
Or, even better, use #+ to find the end of the whitespace:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
$s =~ /^\s*/;
print "$+[0] '$s'\n";
}
Here's a script that does this for every line of stdin. The relevant snippet of code is the first in the body of the loop.
#!/usr/bin/perl
while ($x = <>) {
$s = length(($x =~ m/^( +)/)[0]);
print $s, ":", $x, "\n";
}
tr/// is not a regex operator. However, you can use s///:
use strict; use warnings;
my $t = (my $s = " \t\n sdklsdjfkl");
my $n = 0;
++$n while $s =~ s{^\s}{};
print "$n \\s characters were removed from \$s\n";
$n = ( $t =~ s{^(\s*)}{} ) && length $1;
print "$n \\s characters were removed from \$t\n";
Since the regexp matcher returns the parenthesed matches when called in a list context, CanSpice's answer can be written in a single statement:
$count = length( ($line[0] =~ /^( *)/)[0] );
This prints amount of white space
echo " hello" |perl -lane 's/^(\s+)(.*)+$/length($1)/e; print'
3