I have some string which contain
$str = a > b;
$str1 = c < d;
$str2 = e = f;
What is the best way to do if i want to reverse the operator for example > become < , > become < , = become !=. The only way i know is to doing matching and get the a and b and using join function to join a and b with opposite operator. Any better way to do it?
Assuming your strings are really like this:
$str = 'a > b';
$str1 = 'c < d';
$str2 = 'e = f';
For one-character operators you can use the transliteration operator tr///, which is cheaper than regular expressions.
$str =~ tr/<>+-/><-+/;
For turning = into !=, you will need to fall back to a substitution using s///.
You can use a lookup hash for each negation, and turn the lookup keys into a pattern that you can use for the lookup.
use strict;
use warnings;
my %op = (
'<' => '>',
'>' => '<',
'=' => '!=',
'!=' => '=',
);
my $pattern = sprintf '(%s)', join '|', map quotemeta, keys %op;
my #strings = ('a > b', 'a < b', 'e = f', 'g != h');
foreach my $str (#strings) {
print $str;
$str =~ s/$pattern/$op{$1}/e;
print " --> $str\n";
}
__END__
a > b --> a < b
a < b --> a > b
e = f --> e != f
g != h --> g = h
The quotemeta makes sure there are no regular expression meta chars in the pattern. That's not important for the example chars I showed, but it might be in your full use case.
This might be a good time to read perlre.
Related
my $str = "<SampleElement oldattribs=\"sa1 sa2 sa3\">";
$str =~ s#<SampleElement[^>]*oldattribs="([^"]*)"#
my $fulcnt=$&;
my $afids=$1;
my #affs = ();
if($afids =~ m/\s+/) {
#affs = split /\s/, $afids;
my $jnafs = join ",", map { $_=~s/[a-z]*//i, } #affs;
($fulcnt." newattribs=\"$jnafs\"");
}
else {
($fulcnt);
}
#eg;
My Output:
<SampleElement oldattribs="sa1 sa2 sa3" newattribs="1,1,1">
Expected Output:
<SampleElement oldattribs="sa1 sa2 sa3" newattribs="1,2,3">
Someone could point out me where I am doing wrong. Thanks in advance.
Where you're going wrong is earlier than you think - you're parsing XML using regular expressions. XML is contextual, and regex isn't, so it's NEVER going to be better than a dirty hack.
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> parse ( \*DATA );
my $sample_elt = $twig -> get_xpath('//SampleElement',0);
my #old_att = split ( ' ', $sample_elt -> att('oldattribs') );
$sample_elt -> set_att('newattribs', join " ", map { /(\d+)/ } #old_att);
$twig -> set_pretty_print ( 'indented_a' );
$twig -> print;
__DATA__
<XML>
<SampleElement oldattribs="sa1 sa2 sa3">
</SampleElement>
</XML>
But to answer the core of your problem - you're misusing map as an iterator here.
map { $_=~s/[a-z]*//i, } #affs;
Because what that is doing is iterating all the elements in #affs and modifying those... but map is just returning the result of the expression - which is 1 because it worked.
If you want to change #affs you'd:
s/[a-z]*//i for #affs;
But if you didn't want to, then the easy answer is to use the r regex flag:
map { s/[a-z]*//ir } #affs;
Or as I've done in my example:
map { /(\d+)/ } #affs;
Which regex matches and captures the numeric part of the string, but as a result the 'captured' text is what's returned.
Here is a simple way to build shown output from the input $str.
Note: The input is in single quotes, not double. Then the \" isn't a problem in the regex.
my $str = '<SampleElement oldattribs=\"sa1 sa2 sa3\">';
# Pull 'sa1 sa2 sa3' string out of it
my ($attrs) = $str =~ /=\\"([^\\]+)/; # " # (turn off bad syntax highlight)
# Build '1,2,3' string from it
my $indices = join ',', map { /(\d+)/ } split ' ', $attrs;
# Extract content between < > so to add to it, put it back together
my ($content) = $str =~ /<(.*)>/;
my $outout = '<' . $content . " newattribs=\"$indices\"" . '>';
This gives the required output.
Some of these can be combined into single statements, if you are into that. For example
my $indices =
join ',', map { /(\d+)/ } split ' ', ($str =~ /"([^\\]+)/)[0]; # "
$str =~ s/<(.*)>/<$1 newattribs=\"$indices\">/;
All of this can be rolled into one regex, but it becomes just unwieldy and hard to maintain.
Above all – this appears to be XML or such ... please don't do it by hand, unless there is literally just a snippet or two. There are excellent parsers.
Found solution on this by searching map function:
my $str = "<SampleElement oldattribs=\"sa1 sa2 sa3\">";
$str=~s#<SampleElement[^>]*oldattribs="([^"]*)"#my $fulcnt=$&; my $afids=$1;
my #affs = ();
if($afids=~m/\s+/)
{
#affs = split /\s/, $afids;
my #newas = join ",", map { (my $foo = $_) =~ s/[a-z]*//i; $foo; } #affs ;
($fulcnt." newattribs=\"#newas\"");
}
else
{
($fulcnt);
}
#eg;
I have updated the below line on my code:
my #newas = join ",", map { (my $foo = $_) =~ s/[a-z]*//i; $foo; } #affs ;
Instead of
my $jnafs = join ",", map { $_=~s/[a-z]*//i, } #affs;
Its working thanks for all.
Lets say I have a file learning.txt with the following info :
A *
B &
C (
D )
How can I take a user-input string abc and return *&(
There is this very efficient (O(N+M)) solution in Perl
my %replace = ( A => '*', B => '&', C => '(', D => ')' );
my $re = join '|', map quotemeta, keys %replace;
$re = qr/($re)/;
# and somewhere else in the scope with $re and %replace
s/$re/$replace{$1}/g;
And but for the case insensitive it's a little bit more complicated
use feature qw(fc); # since v5.16 use lc otherwise
my %replace = ( A => '*', B => '&', C => '(', D => ')' );
my $re = join '|', map quotemeta, keys %replace;
$re = qr/($re)/i;
my %replace_fc;
#replace_fc{ map fc, keys %replace } = values %replace;
# and somewhere else in the scope with $re and %replace_fc
s/$re/$replace_fc{fc $1}/g;
Just feed %replace from your file like this
while (<>) {
my ($key, $val) = split;
$replace{$key} = $val;
}
Use /^(A|B|C)\\b\\s*(.+)/m regex (that has a multiline option) and then concatenate second groups.
See example of this regex output here.
I have a string that may contain an arbitrary number of single-letters separated by spaces. I am looking for a regex (in Perl) that will remove spaces between all (unknown number) of single letters.
For example:
ab c d should become ab cd
a bcd e f gh should become a bcd ef gh
a b c should become abc
and
abc d should be unchanged (because there are no single letters followed by or preceded by a single space).
Thanks for any ideas.
Your description doesn't really match your examples. It looks to me like you want to remove any space that is (1) preceded by a letter which is not itself preceded by a letter, and (2) followed by a letter which is not itself followed by a letter. Those conditions can be expressed precisely as nested lookarounds:
/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))/
tested:
use strict;
use warnings;
use Test::Simple tests => 4;
sub clean {
(my $x = shift) =~ s/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))//g;
$x;
}
ok(clean('ab c d') eq 'ab cd');
ok(clean('a bcd e f gh') eq 'a bcd ef gh');
ok(clean('a b c') eq 'abc');
ok(clean('ab c d') eq 'ab cd');
output:
1..4
ok 1
ok 2
ok 3
ok 4
I'm assuming you really meant one space character (U+0020); if you want to match any whitespace, you might want to replace the space with \s+.
You can do this with lookahead and lookbehind assertions, as described in perldoc perlre:
use strict;
use warnings;
use Test::More;
is(tran('ab c d'), 'ab cd');
is(tran('a bcd e f gh'), 'a bcd ef gh');
is(tran('a b c'), 'abc');
is(tran('abc d'), 'abc d');
sub tran
{
my $input = shift;
(my $output = $input) =~ s/(?<![[:lower:]])([[:lower:]]) (?=[[:lower:]])/$1/g;
return $output;
}
done_testing;
Note the current code fails on the second test case, as the output is:
ok 1
not ok 2
# Failed test at test.pl line 7.
# got: 'abcd efgh'
# expected: 'a bcd ef gh'
ok 3
ok 4
1..4
# Looks like you failed 1 test of 4.
I left it like this as your second and third examples seem to contradict each other as to how leading single characters should be handled. However, this framework should be enough to allow you to experiment with different lookaheads and lookbehinds to get the exact results you are looking for.
This piece of code
#!/usr/bin/perl
use strict;
my #strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
foreach my $string (#strings) {
print "$string --> ";
$string =~ s/\b(\w)\s+(?=\w\b)/$1/g; # the only line that actually matters
print "$string\n";
}
prints this:
a b c --> abc
ab c d --> ab cd
a bcd e f gh --> a bcd ef gh
abc d --> abc d
I think/hope this is what you're looking for.
This should do the trick:
my $str = ...;
$str =~ s/ \b(\w) \s+ (\w)\b /$1$2/gx;
That removes the space between all single nonspace characters. Feel free to replace \S with a more restrictive character class if needed. There also may be some edge cases related to punctuation characters that you need to deal with, but I can't guess that from the info you have provided.
As Ether helpfully points out, this fails on one case. Here is a version that should work (though not quite as clean as the first):
s/ \b(\w) ( (?:\s+ \w\b)+ ) /$1 . join '', split m|\s+|, $2/gex;
I liked Ether's test based approach (imitation is the sincerest form of flattery and all):
use warnings;
use strict;
use Test::Magic tests => 4;
sub clean {
(my $x = shift) =~ s{\b(\w) ((?: \s+ (\w)\b)+)}
{$1 . join '', split m|\s+|, $2}gex;
$x
}
test 'space removal',
is clean('ab c d') eq 'ab cd',
is clean('a bcd e f gh') eq 'a bcd ef gh',
is clean('a b c') eq 'abc',
is clean('abc d') eq 'abc d';
returns:
1..4
ok 1 - space removal 1
ok 2 - space removal 2
ok 3 - space removal 3
ok 4 - space removal 4
It's not a regex but since I am lazy by nature I would it do this way.
#!/usr/bin/env perl
use warnings;
use 5.012;
my #strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
for my $string ( #strings ) {
my #s; my $t = '';
for my $el ( split /\s+/, $string ) {
if ( length $el > 1 ) {
push #s, $t if $t;
$t = '';
push #s, $el;
} else { $t .= $el; }
}
push #s, $t if $t;
say "#s";
}
OK, my way is the slowest:
no_regex 130619/s -- -60% -61% -63%
Alan_Moore 323328/s 148% -- -4% -8%
Eric_Storm 336748/s 158% 4% -- -5%
canavanin 352654/s 170% 9% 5% --
I didn't include Ether's code because ( as he has tested ) it returns different results.
Now I have the slowest and the fastest.
#!/usr/bin/perl
use 5.012;
use warnings;
use Benchmark qw(cmpthese);
my #strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
cmpthese( 0, {
Eric_Storm => sub{ for my $string (#strings) { $string =~ s{\b(\w) ((?: \s+ (\w)\b)+)}{$1 . join '', split m|\s+|, $2}gex; } },
canavanin => sub{ for my $string (#strings) { $string =~ s/\b(\w)\s+(?=\w\b)/$1/g; } },
Alan_Moore => sub{ for my $string (#strings) { $string =~ s/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))//g; } },
keep_uni => sub{ for my $string (#strings) { $string =~ s/\PL\pL\K (?=\pL(?!\pL))//g; } },
keep_asc => sub{ for my $string (#strings) { $string =~ s/[^a-zA-Z][a-zA-Z]\K (?=[a-zA-Z](?![a-zA-Z]))//g; } },
no_regex => sub{ for my $string (#strings) { my #s; my $t = '';
for my $el (split /\s+/, $string) {if (length $el > 1) { push #s, $t if $t; $t = ''; push #s, $el; } else { $t .= $el; } }
push #s, $t if $t;
#say "#s";
} },
});
.
Rate no_regex Alan_Moore Eric_Storm canavanin keep_uni keep_asc
no_regex 98682/s -- -64% -65% -66% -81% -87%
Alan_Moore 274019/s 178% -- -3% -6% -48% -63%
Eric_Storm 282855/s 187% 3% -- -3% -46% -62%
canavanin 291585/s 195% 6% 3% -- -45% -60%
keep_uni 528014/s 435% 93% 87% 81% -- -28%
keep_asc 735254/s 645% 168% 160% 152% 39% --
This will do the job.
(?<=\b\w)\s(?=\w\b)
Hi I have written simple javascript to do this it's simple and you can convert into any language.
function compressSingleSpace(source){
let words = source.split(" ");
let finalWords = [];
let tempWord = "";
for(let i=0;i<words.length;i++){
if(tempWord!='' && words[i].length>1){
finalWords.push(tempWord);
tempWord = '';
}
if(words[i].length>1){
finalWords.push(words[i]);
}else{
tempWord += words[i];
}
}
if(tempWord!=''){
finalWords.push(tempWord);
}
source = finalWords.join(" ");
return source;
}
function convertInput(){
let str = document.getElementById("inputWords").value;
document.getElementById("firstInput").innerHTML = str;
let compressed = compressSingleSpace(str);
document.getElementById("finalOutput").innerHTML = compressed;
}
label{
font-size:20px;
margin:10px;
}
input{
margin:10px;
font-size:15px;
padding:10px;
}
input[type="button"]{
cursor:pointer;
background: #ccc;
}
#firstInput{
color:red;
font-size:20px;
margin:10px;
}
#finalOutput{
color:green;
font-size:20px;
margin:10px;
}
<label for="inputWords">Enter your input and press Convert</label><br>
<input id="inputWords" value="check this site p e t z l o v e r . c o m thanks">
<input type="button" onclick="convertInput(this.value)" value="Convert" >
<div id="firstInput">check this site p e t z l o v e r . c o m thanks</div>
<div id="finalOutput">check this site petzlover.com thanks</div>
I have the following Perl script counting the number of Fs and Ts in a string:
my $str = "GGGFFEEIIEETTGGG";
my $ft_count = 0;
$ft_count++ while($str =~ m/[FT]/g);
print "$ft_count\n";
Is there a more concise way to get the count (in other words, to combine line 2 and 3)?
my $ft_count = $str =~ tr/FT//;
See perlop.
If the REPLACEMENTLIST is empty, the
SEARCHLIST is replicated. This latter is useful for counting
characters in a class …
$cnt = $sky =~ tr/*/*/; # count the stars in $sky
$cnt = tr/0-9//; # count the digits in $_
Here's a benchmark:
use strict; use warnings;
use Benchmark qw( cmpthese );
my ($x, $y) = ("GGGFFEEIIEETTGGG" x 1000) x 2;
cmpthese -5, {
'tr' => sub {
my $cnt = $x =~ tr/FT//;
},
'm' => sub {
my $cnt = ()= $y =~ m/[FT]/g;
},
};
Rate tr m
Rate m tr
m 108/s -- -99%
tr 8118/s 7440% --
With ActiveState Perl 5.10.1.1006 on 32 Windows XP.
The difference seems to be starker with
C:\Temp> c:\opt\strawberry-5.12.1\perl\bin\perl.exe t.pl
Rate m tr
m 88.8/s -- -100%
tr 25507/s 28631% --
When the "m" operator has the /g flag AND is executed in list context, it returns a list of matching substrings. So another way to do this would be:
my #ft_matches = $str =~ m/[FT]/g;
my $ft_count = #ft_matches; # count elements of array
But that's still two lines. Another weirder trick that can make it shorter:
my $ft_count = () = $str =~ m/[FT]/g;
The "() =" forces the "m" to be in list context. Assigning a list with N elements to a list of zero variables doesn't actually do anything. But then when this assignment expression is used in a scalar context ($ft_count = ...), the right "=" operator returns the number of elements from its right-hand side - exactly what you want.
This is incredibly weird when first encountered, but the "=()=" idiom is a useful Perl trick to know, for "evaluate in list context, then get size of list".
Note: I have no data on which of these are more efficient when dealing with large strings. In fact, I suspect your original code might be best in that case.
Yes, you can use the CountOf secret operator:
my $ft_count = ()= $str =~ m/[FT]/g;
You can combine line 2, 3 and 4 into one like so:
my $str = "GGGFFEEIIEETTGGG";
print $str =~ s/[FT]//g; #Output 4;
I need some Perl regular expression help. The following snippet of code:
use strict;
use warnings;
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L";
my $word = "plus";
my #results = ();
1 while $str =~ s/(.{2}\b$word\b.{2})/push(#results,"$1\n")/e;
print #results;
Produces the following output:
A plus B
D plus E
2 plus F
H plus I
4 plus J
5 plus K
What I want to see is this, where a character already matched can appear in a new match in a different context:
A plus B
D plus E
E plus F
H plus I
I plus J
J plus K
How do I change the regular expression to get this result? Thanks --- Dan
General advice: Don't use s/// when you want m//. Be specific in what you match.
The answer is pos:
#!/usr/bin/perl -l
use strict;
use warnings;
my $str = 'In this example, ' . 'A plus B equals C, ' .
'D plus E plus F equals G ' .
'and H plus I plus J plus K equals L';
my $word = "plus";
my #results;
while ( $str =~ /([A-Z] $word [A-Z])/g ) {
push #results, $1;
pos($str) -= 1;
}
print "'$_'" for #results;
Output:
C:\Temp> b
'A plus B'
'D plus E'
'E plus F'
'H plus I'
'I plus J'
'J plus K'
You can use a m//g instead of s/// and assign to the pos function to rewind the match location before the second term:
use strict;
use warnings;
my $str = 'In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L';
my $word = 'plus';
my #results;
while ($str =~ /(.{2}\b$word\b(.{2}))/g) {
push #results, "$1\n";
pos $str -= length $2;
}
print #results;
Another option is to use a lookahead:
use strict;
use warnings;
my $str = "In this example, A plus B equals C, D plus E "
. "plus F equals G and H plus I plus J plus K equals L";
my $word = "plus";
my $chars = 2;
my #results = ();
push #results, $1
while $str =~ /(?=((.{0,$chars}?\b$word\b).{0,$chars}))\2/g;
print "'$_'\n" for #results;
Within the lookahead, capturing group 1 matches the word along with a variable number of leading and trailing context characters, up to whatever maximum you've set. When the lookahead finishes, the backreference \2 matches "for real" whatever was captured by group 2, which is the same as group 1 except that it stops at the end of the word. That sets pos where you want it, without requiring you to calculate how many characters you actually matched after the word.
Given the "Full Disclosure" comment (but assuming .{0,35}, not .{35}), I'd do
use List::Util qw/max min/;
my $context = 35;
while ( $str =~ /\b$word\b/g ) {
my $pre = substr( $str, max(0, $-[0] - $context), min( $-[0], $context ) );
my $post = substr( $str, $+[0], $context );
my $match = substr( $str, $-[0], $+[0] - $-[0] );
$pre =~ s/.*\n//s;
$post =~ s/\n.*//s;
push #results, "$pre$match$post";
}
print for #results;
You'd skip the substitutions if you really meant (?s:.{0,35}).
Here's one way to do it:
use strict;
use warnings;
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L";
my $word = "plus";
my #results = ();
my $i = 0;
while (substr($str, $i) =~ /(.{2}\b$word\b.{2})/) {
push #results, "$1\n";
$i += $-[0] + 1;
}
print #results;
It's not terribly Perl-ish, but it works and it doesn't use too many obscure regular expression tricks. However, you might have to look up the function of the special variable #- in perlvar.
don't have to use regex. basically, just split up the string, use a loop to go over each items, check for "plus" , then get the word from before and after.
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L";
#s = split /\s+/,$str;
for($i=0;$i<=scalar #s;$i++){
if ( "$s[$i]" eq "plus" ){
print "$s[$i-1] plus $s[$i+1]\n";
}
}