Deleting the data before character match - regex

How can I delete the characters before "/", including the "/", in a string using Perl or sed?
For instance, this:
ad9a91/FFFF0000
would turn into
FFFF0000

Sed solution
sed 's|[^/]*/||' file
Will remove everything up to and including the first /
or
sed 's|.*/||' file
Will remove everything up to and including the last / .
I added both as the question was not entirely clear on what the format of the string would be every time.
Awk
awk -F/ '{$0=$NF}1' file
This replaces the entire line with whatever is after the last /

You can use substitution,
my $str = "ad9a91/FFFF0000";
$str =~ s|^.+?/||;
or regex capture,
$str = $1 if $str =~ m|/(.+)|s;

Use substitution for this:
my $string = "ad9a91/FFFF0000";
$string =~ s|\.+/||;

my $string = qq(sjdflksdjfsdj ad9a91/FFFF0000 slodjfsdf s);
$string =~ s{\b(\s).*?\b/}{$1}ig;
print $string;exit;
you can also try this split with any space or any tag.

Related

regular expression that matches any word that starts with pre and ends in al

The following regular expression gives me proper results when tried in Notepad++ editor but when tried with the below perl program I get wrong results. Right answer and explanation please.
The link to file I used for testing my pattern is as follows:
(http://sainikhil.me/stackoverflow/dictionaryWords.txt)
Regular expression: ^Pre(.*)al(\s*)$
Perl program:
use strict;
use warnings;
sub print_matches {
my $pattern = "^Pre(.*)al(\s*)\$";
my $file = shift;
open my $fp, $file;
while(my $line = <$fp>) {
if($line =~ m/$pattern/) {
print $line;
}
}
}
print_matches #ARGV;
A few thoughts:
You should not escape the dollar sign
The capturing group around the whitespaces is useless
Same for the capturing group around the dot .
which leads to:
^Pre.*al\s*$
If you don't want words like precious final to match (because of the middle whitespace, change regex to:
^Pre\S*al\s*$
Included in your code:
while(my $line = <$fp>) {
if($line =~ /^Pre\S*al\s*$/m) {
print $line;
}
}
You're getting messed up by assigning the pattern to a variable before using it as a regex and putting it in a double-quoted string when you do so.
This is why you need to escape the $, because, in a double-quoted string, a bare $ indicates that you want to interpolate the value of a variable. (e.g., my $str = "foo$bar";)
The reason this is causing you a problem is because the backslash in \s is treated as escaping the s - which gives you just plain s:
$ perl -E 'say "^Pre(.*)al(\s*)\$";'
^Pre(.*)al(s*)$
As a result, when you go to execute the regex, it's looking for zero or more ses rather than zero or more whitespace characters.
The most direct fix for this would be to escape the backslash:
$ perl -E 'say "^Pre(.*)al(\\s*)\$";'
^Pre(.*)al(\s*)$
A better fix would be to use single quotes instead of double quotes and don't escape the $:
$ perl -E "say '^Pre(.*)al(\s*)$';"
^Pre(.*)al(\s*)$
The best fix would be to use the qr (quote regex) operator instead of single or double quotes, although that makes it a little less human-readable if you print it out later to verify the content of the regex (which I assume to be why you're putting it into a variable in the first place):
$ perl -E "say qr/^Pre(.*)al(\s*)$/;"
(?^u:^Pre(.*)al(\s*)$)
Or, of course, just don't put it into a variable at all and do your matching with
if($line =~ m/^Pre(.*)al(\s*)$/) ...
Try removing trailing newline character(s):
while(my $line = <$fp>) {
$line =~ s/[\r\n]+$//s;
And, to match only words that begin with Pre and end with al, try this regular expression:
/^Pre\w*al$/
(\w means any letter of a word, not just any character)
And, if you want to match both Pre and pre, do a case-insensitive match:
/^Pre\w*al$/i

How to find and replace a regex in perl

I have a text and want to replace all \w\(, for example myword( to the same with a space, so it should be myword (. How to do that with s///? or is there another way to do that?
Try this
$s = "myword( word2(";
$s =~s/(\w+)(\()/$1 $2/g;
print $s;
As from #ikegami command. My above regex \w+ will backtrack this is needless. And no need to group the (, because known one. So i changed my regex accordingly,
New RegEx
$s =~s/(\w)\(/$1 (/g;
Here is a way:
my $str = "myword(";
$str =~ s/(\w+)(\()/$1 $2/;
print $str, "\n";
Output:
myword (
Use look ahead:
$ perl -pe 's/(\w)(?=\()/$1 /' <<< 'word('
word (
Or look ahead together with look behind:
$ perl -pe 's/(?<=\w)(?=\()/ /' <<< 'word('
word (
s/\w\K\(/ (/g # 5.10+
or
s/(\w)\(/$1 (/g
or
s/(?<=\w)\(/ (/g
The first is much faster than the other two, but all are faster than the other correct solutions provided. (Not sure which is the fastest of the second and third.)
Another way will be to use \K i.e forget what you matched before:
#!/usr/bin/perl
use strict;
use warnings;
my $string = q{myword( myword2(};
$string=~s/\w\K\(/ (/g;
print $string,"\n";

Perl replace delimiters

I have CSV text like
1,2,3,{4,5,6,7,8},9,10,100
I want to replace the delimiter of fields between {}. The text should look like:
1,2,3,{4|5|6|7|8},9,10,100
I tried perl -0777 -pe 's/\{.*?,\}/|/g'
but nothing happens. What should I do instead?
This will do as you ask. It replaces all commas that are followed by a sequence of characters that are not braces { }, and then a closing brace
use strict;
use warnings;
use 5.010;
my $s = '1,2,3,{4,5,6,7,8},9,10,100';
$s =~ s/,(?=[^{}]*\})/|/g;
say $s;
output
1,2,3,{4|5|6|7|8},9,10,100
You can use the following regex with $1$2| replacement string:
(\{\s*|(?<!^)\G)(\d+),(?=[,0-9]*\})
Output:
1,2,3,{4|5|6|7|8},9,10,100
Sample code:
#!/usr/bin/perl
$txt = "1,2,3,{4,5,6,7,8},9,10,100";
$txt =~ s/(\{\s*|(?<!^)\G)(\d+),(?=[,0-9]*\})/$1$2|/g;
print $txt;
Here's a command line version for Perl 5.14 and greater.
perl -pe 's/([{][\d,]+[}])/$1 =~ s~,~|~gr/ge'
The /e means it's evaluating the replacement as a Perl expression and not the standard regex expression. That means that it is taking the value of the first capture ($1) and performing a substitution with return (/r) so as to avoid the error trying to modify the read-only value ($1).
You can try this:
$st = "1,2,3,{4,5,6,7,8},9,10,100";
if ( $st=~/\{(.*)\}/ ) {
$tr = $1;
$tr =~ s/,/|/g;
$st =~ s/\{*\}/{$tr}/;
print "$st \n"
}
Output:
1,2,3,{4,5,6,7,8{4|5|6|7|8},9,10,100

How to replace stuff in a Perl regex

I have a string $text and want to modify it with a regex. The string contains multiple sections like <NAME>John</NAME>.
I want to search for those sections, which I would normally do with something like
$text =~ m/<NAME>(.*?)<\/NAME>/g
but then make sure that there are no leading and trailing blanks and no leading non-word characters, which I would normally ensure with something like
$temp =~ s/^\s+|\s+$//g; # trim leading and trailing whitespaces
$temp = s/^\W*//g; # remove all leading non-word chars
Now my question is: How do I actually make this happen? Is it possible to use a s/// regex instead of the m//?
This is possible in a single substitution, but it's unnecessarily complex. I suggest you do a two-tier substitution using a executable replacement.
my $text = '<NAME> %^John^%
</NAME>';
$text =~ s{ (?<=<NAME>) ([^<>]*) (?=</NAME>) }{
(my $new = $1) =~ s/\A\s+|\s+\z//g;
$new =~ s/\A\W+//;
$new;
}eg;
print $text;
output
<NAME>John^%</NAME>
This is even simpler if you have version 14 or later of Perl 5, and want to use the non-destructive ( /r modifier) substitution mode.
$text =~ s{ (?<=<NAME>) ([^<>]*) (?=</NAME>) }{ $1 =~ s/\A\s+|\s+\z//gr =~ s/\A\W+//r }exg;
If I understand correctly, what you want to do is merely "clean up" the text inside the tag (insofar as it's possible to "parse" XML using regular expressions). This should do the trick:
$text =~ s/(<NAME>)\s*\W*(.*?)\s*(<\/NAME>)/$1$2$3/sgi;

Perl : How to replace a _[0-9] with a comma in perl or any language

I have a file with the following pattern
'21pro_ABCD_EDG_10800_48052_2 0.0'
How do i replace the _[0-9] with a ,(comma)
so that i can get the output as
21pro_ABCD_EDG,10800,48052,2, 0.0
To replace the _[0-9] with a , you can do this:
$s =~ s/_([0-9])/,$1/g
#the same without capturing groups
$s =~ s/_(?=[0-9])/,/g;
Edit:
To get the extra comma after the 2 you can do this:
#This puts a , before all whitespace.
$s =~ s/_(?=[0-9])|(?=\s)/,/g;
#This one puts a , between [0-9] and any whitespace
$s =~ s/_(?=[0-9])|(?<=[0-9])(?=\s)/,/g;
The sed approach would be something like the following:
rupert#hake:~ echo '21pro_ABCD_EDG_10800_48052_2 0.0' | sed 's/_\([0-9]\)/,\1/g'
21pro_ABCD_EDG,10800,48052,2 0.0
Using the expression mentioned by jacob, here is the code snippet to perform the substitution for a large file
#!/usr/local/bin/perl
open (MYFILE, 'test');
while (<MYFILE>) {
chomp;
$s=$_;
$s =~ s/_(?=[0-9])|(?<=[0-9])(?=\s)/,/g;
$s =~ s/\s//g;
print "$s\n";
}
close (MYFILE);