How to find and replace a regex in perl

How to find and replace a regex in perl - regex

I have a text and want to replace all \w\(, for example myword( to the same with a space, so it should be myword (. How to do that with s///? or is there another way to do that?

Try this
$s = "myword( word2(";
$s =~s/(\w+)(\()/$1 $2/g;
print $s;
As from #ikegami command. My above regex \w+ will backtrack this is needless. And no need to group the (, because known one. So i changed my regex accordingly,
New RegEx
$s =~s/(\w)\(/$1 (/g;

Here is a way:
my $str = "myword(";
$str =~ s/(\w+)(\()/$1 $2/;
print $str, "\n";
Output:
myword (

Use look ahead:
$ perl -pe 's/(\w)(?=\()/$1 /' <<< 'word('
word (
Or look ahead together with look behind:
$ perl -pe 's/(?<=\w)(?=\()/ /' <<< 'word('
word (

s/\w\K\(/ (/g # 5.10+
or
s/(\w)\(/$1 (/g
or
s/(?<=\w)\(/ (/g
The first is much faster than the other two, but all are faster than the other correct solutions provided. (Not sure which is the fastest of the second and third.)

Another way will be to use \K i.e forget what you matched before:
#!/usr/bin/perl
use strict;
use warnings;
my $string = q{myword( myword2(};
$string=~s/\w\K\(/ (/g;
print $string,"\n";

Related

How to change a pattern like XX1/XXSomething/XX1/Something to XXSomething/XX1/Something in perl

I'm having a file in which some lines have some patterns like
M1/XX2/XX1 XX2/XX1/XX2/WCLKB XX2/XX1/XX2/P001
M1/XX4/XX5 XX4/XX5/XX4/WCLKB XX4/XX5/XX4/P001
Here in some patterns XX2 is repeating. I need to change the above line to
M1/XX2/XX1 XX1/XX2/WCLKB XX1/XX2/P001
M1/XX4/XX5 XX5/XX4/WCLKB XX5/XX4/P001
These XX can vary XX[0..9]
The code is in Perl.
I tried using some regex but was confused.
open(FILE,$FilePath);
#linesInFile = <FILE>;
close(FILE);
foreach $item(#linesInFile){
if(grep(/^XX?\/XX.\/XX)
#I dont know how to complete this
}

If you're looking specifically for XXn/XXm/XXn/ (where n is the same number both times), you can use backreferences:
s{(XX[0-9]+/)(XX[0-9]+/\1)}{$2}g
Here \1 refers back to and matches the same string as the first capturing group, (XX[0-9]+/).
Live demo:
#!/usr/bin/perl
use strict;
use warnings;
while (my $line = readline DATA) {
$line =~ s{(XX[0-9]+/)(XX[0-9]+/\1)}{$2}g;
print $line;
}
__DATA__
M1/XX2/XX1 XX2/XX1/XX2/WCLKB XX2/XX1/XX2/P001
M1/XX4/XX5 XX4/XX5/XX4/WCLKB XX4/XX5/XX4/P001
Output:
M1/XX2/XX1 XX1/XX2/WCLKB XX1/XX2/P001
M1/XX4/XX5 XX5/XX4/WCLKB XX5/XX4/P001

If it's ok to blindly remove the first part:
while (<>) {
s{ \K[^\s/]+/}{}g;
print;
}
As a one-liner:
perl -pe's{ \K[^\s/]+/}{}g'
If you want to make sure it matches the pattern you specified:
while (<>) {
s{(?<!\S)(XX\d)/(?=XX[^\s/]+/\1/\S)}{}ag;
print;
}
As a one-liner:
perl -pe's{(?<!\S)(XX\d)/(?=XX[^\s/]+/\1/\S)}{}ag'
The key is \1, means which means "match what the first capture captured".

Based on what you have explained in the description of your problem XX[0..9], the following perl command should do the trick:
Input:
$ cat input
M1/XX2/XX1 XX2/XX1/XX2/WCLKB XX2/XX1/XX2/P001
M1/XX4/XX5 XX4/XX5/XX4/WCLKB XX4/XX5/XX4/P001
Command:
perl -pe 's#\bXX(\d)/XX(\d)/XX\1#XX$2/XX$1#g' input
Output:
M1/XX2/XX1 XX1/XX2/WCLKB XX1/XX2/P001
M1/XX4/XX5 XX5/XX4/WCLKB XX5/XX4/P001

using the command line and regex to determine words that start sentences

I have the text:
This is a test. This is only a test! If there were an emergency, then Information would be provided for you.
I want to be able to determine which words start sentences. What I have now is:
$ cat <FILE> | perl -pe 's/[\s.?!]/\n/g;'
This just gets rid of punctuation and replaces them with newlines, giving me:
This
is
a
test
This
is
only
a
test
If
there
were
an
emergency,
then
Information
would
be
provided
for
you
From here I could somehow extract the words that have either nothing above them (start of file) or a blank space, but I am unsure of exactly how to do this.

If you have a Perl of at least version 5.22.1 (or 5.22.0 and this case is not affected by the bug described here), then you can use the sentence boundaries in your regular expression.
use feature 'say';
foreach my $sentence (m/\b{sb}(\w+)/g) {
say $sentence;
}
Or, as a one-liner:
perl -nE 'say for /\b{sb}(\w+)/g'
If called with your example text, the output is:
This
This
If
It uses \b{sb}, which is the sentence boundary. You can read a tutorial at brian d foy's blog about it. The \b{} is called a unicode boundary and is described in perlrebackslash.

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
local $/;
my #words = <DATA> =~ m/(?:^|[\.!]+)\s+(\w+)/g;
print Dumper \#words;
__DATA__
This is a test. This is only a test! If there were an emergency, then Information would be provided for you.
So as a command line:
perl -ne 'print join "\n", m/(?:^|[\.!])\s+(\w+)/g;' somefile

You can use this gnu grep command to extract first after each period or ! or ?:
grep -oP '(?:^|[.?!])\s*\K[A-Z][a-z]+' file
This
This
If
Though I must caution you may get false results for cases like Mr. Smith.
Regex Breakup:
(?:^|[.?!]) - match start or DOT or ! or ?
\s* - match 0 or more whitespaces
\K - match reset to forget matched data
[A-Z][a-z]+ - match a word startign with upper case letter

Perl replace delimiters

I have CSV text like
1,2,3,{4,5,6,7,8},9,10,100
I want to replace the delimiter of fields between {}. The text should look like:
1,2,3,{4|5|6|7|8},9,10,100
I tried perl -0777 -pe 's/\{.*?,\}/|/g'
but nothing happens. What should I do instead?

This will do as you ask. It replaces all commas that are followed by a sequence of characters that are not braces { }, and then a closing brace
use strict;
use warnings;
use 5.010;
my $s = '1,2,3,{4,5,6,7,8},9,10,100';
$s =~ s/,(?=[^{}]*\})/|/g;
say $s;
output
1,2,3,{4|5|6|7|8},9,10,100

You can use the following regex with $1$2| replacement string:
(\{\s*|(?<!^)\G)(\d+),(?=[,0-9]*\})
Output:
1,2,3,{4|5|6|7|8},9,10,100
Sample code:
#!/usr/bin/perl
$txt = "1,2,3,{4,5,6,7,8},9,10,100";
$txt =~ s/(\{\s*|(?<!^)\G)(\d+),(?=[,0-9]*\})/$1$2|/g;
print $txt;

Here's a command line version for Perl 5.14 and greater.
perl -pe 's/([{][\d,]+[}])/$1 =~ s~,~|~gr/ge'
The /e means it's evaluating the replacement as a Perl expression and not the standard regex expression. That means that it is taking the value of the first capture ($1) and performing a substitution with return (/r) so as to avoid the error trying to modify the read-only value ($1).

You can try this:
$st = "1,2,3,{4,5,6,7,8},9,10,100";
if ( $st=~/\{(.*)\}/ ) {
$tr = $1;
$tr =~ s/,/|/g;
$st =~ s/\{*\}/{$tr}/;
print "$st \n"
}
Output:
1,2,3,{4,5,6,7,8{4|5|6|7|8},9,10,100

perl regex to remove dashes

I have some files I am processing, and I would like to remove the dashes from the non date fields.
I came up with s/([^0-9]+)-([^0-9]+)/$1 $2/g but that only works if there is one dash only in the string, or I should say it will only remove one dash.
So lets say I have:
2014-05-01
this-and
this-and-that
this-and-that-and-that-too
2015-01-01
What regex would I use to produce
2014-05-01
this and
this and that
this and that and that too
2015-01-01

Don't do it with one regex. There is no requirement that a single regex must contain all of your code's logic.
Use one regex to see if it's a date, and then a second one to do your transformation. It will be much clearer to the reader (that's you, in the future) if you split it up into two.
#!/usr/bin/perl
use warnings;
use strict;
while ( my $str = <DATA>) {
chomp $str;
my $old = $str;
if ( $str !~ /^\d{4}-\d{2}-\d{2}$/ ) { # First regex to see if it's a date
$str =~ s/-/ /g; # Second regex to do the transformation
}
print "$old\n$str\n\n";
}
__DATA__
2014-05-01
this-and
this-and-that
this-and-that-and-that-too
2015-01-01
Running that gives you:
2014-05-01
2014-05-01
this-and
this and
this-and-that
this and that
this-and-that-and-that-too
this and that and that too
2015-01-01
2015-01-01

Using look around :
$ perl -pe 's/
(?<!\d) # a negative look-behind with a digit: \d
- # a dash, literal
(?!\d) # a negative look-ahead with a digit: \d
/ /gx' file
OUTPUT
2014-05-01
this and
this and that
this and that and that too
2015-01-01
Look around are some assertions to ensure that there's no digit (in this case) around -. A look around don't make any capture, it's really just there to test assertions. It's a good tool to have near you.
Check :
http://www.perlmonks.org/?node_id=518444
http://www.regular-expressions.info/lookaround.html

Lose the + - it's catching the string up until the last -, including any previous - characters:
s/([^0-9]|^)-+([^0-9]|$)/$1 $2/g;
Example: https://ideone.com/r2CI7v

As long as your program receives each field separately in the $_ variable, all you need is
tr/-/ / if /[^-\d]/

This should do it
$line =~ s/(\D)-/$1 /g;

As I explained in a comment, you really need to use Text::CSV to split each record into fields before you edit the data. That's because data that contain whitespace need to be enclosed in double quotes, so a field like this-and-that will start out without spaces, but needs them added when the hyphens are translated to spaces.
This program shows a simple example that uses your own data.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({eol => $/});
while (my $row = $csv->getline(\*DATA)) {
for (#$row) {
tr/-/ / unless /^\d\d\d\d-\d\d-\d\d$/;
}
$csv->print (\*STDOUT, $row);
}
__DATA__
2014-05-01,this-and-that,this-and-that,this-and-that-and-that-too,2015-01-01
output
2014-05-01,"this and that","this and that","this and that and that too",2015-01-01

Deleting the data before character match

How can I delete the characters before "/", including the "/", in a string using Perl or sed?
For instance, this:
ad9a91/FFFF0000
would turn into
FFFF0000

Sed solution
sed 's|[^/]*/||' file
Will remove everything up to and including the first /
or
sed 's|.*/||' file
Will remove everything up to and including the last / .
I added both as the question was not entirely clear on what the format of the string would be every time.
Awk
awk -F/ '{$0=$NF}1' file
This replaces the entire line with whatever is after the last /

You can use substitution,
my $str = "ad9a91/FFFF0000";
$str =~ s|^.+?/||;
or regex capture,
$str = $1 if $str =~ m|/(.+)|s;

Use substitution for this:
my $string = "ad9a91/FFFF0000";
$string =~ s|\.+/||;

my $string = qq(sjdflksdjfsdj ad9a91/FFFF0000 slodjfsdf s);
$string =~ s{\b(\s).*?\b/}{$1}ig;
print $string;exit;
you can also try this split with any space or any tag.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to find and replace a regex in perl - regex

I have a text and want to replace all \w\(, for example myword( to the same with a space, so it should be myword (. How to do that with s///? or is there another way to do that?

Try this $s = "myword( word2("; $s =~s/(\w+)(\()/$1 $2/g; print $s; As from #ikegami command. My above regex \w+ will backtrack this is needless. And no need to group the (, because known one. So i changed my regex accordingly, New RegEx $s =~s/(\w)\(/$1 (/g;

Here is a way: my $str = "myword("; $str =~ s/(\w+)(\()/$1 $2/; print $str, "\n"; Output: myword (

Use look ahead: $ perl -pe 's/(\w)(?=\()/$1 /' <<< 'word(' word ( Or look ahead together with look behind: $ perl -pe 's/(?<=\w)(?=\()/ /' <<< 'word(' word (

s/\w\K\(/ (/g # 5.10+ or s/(\w)\(/$1 (/g or s/(?<=\w)\(/ (/g The first is much faster than the other two, but all are faster than the other correct solutions provided. (Not sure which is the fastest of the second and third.)

Another way will be to use \K i.e forget what you matched before: #!/usr/bin/perl use strict; use warnings; my $string = q{myword( myword2(}; $string=~s/\w\K\(/ (/g; print $string,"\n";

Related

How to change a pattern like XX1/XXSomething/XX1/Something to XXSomething/XX1/Something in perl

using the command line and regex to determine words that start sentences

Perl replace delimiters

perl regex to remove dashes

Deleting the data before character match

Categories

Resources