Regex gurus,
Here is the following line of code I want to parse with regex:
#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0/1
I want to obtain the following:
#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0
I have written the following regex on rubular.com:
(#.* *.)(!?(\/.))
My idea is to use negation to remove /1 by (!?(\/.)). However, this produces the entire line?
#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0/1
Why is (?!thisismystring) not removing /1? I googled the fire out of this, but they seemed to suggest similar things I am already trying? I deeply appreciate your help.
I think what you are trying to write is /(\#.* .*)(?=\/\d)/ (you need to escape the at sign # to prevent Perl from treating it as an array) but you need a positive look-ahead because you want to match everything up until the following characters are a slash followed by a digit.
Here is a program that demonstrates.
use strict;
use warnings;
use 5.010;
my $s = '#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0/1';
$s =~ /(\#.* .*)(?=\/.)/;
print $1, "\n";
But you would be much better off copying the whole string and removing the slash and everything after it, like this
use strict;
use warnings;
my $s = '#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0/1';
(my $fixed = $s) =~ s{/\d+$}{};
print $fixed, "\n";
output
#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0
Related
I wrote a perl snippet that strips http:// and www from the front of a domain name input from the console
#!/usr/bin/perl
use strict;
print "Enter the domain name to be queried:\n";
my $input_domain = <>;
chomp ($input_domain);
my $inter_domain = $input_domain =~ s/http:\/\///r;
my $domain = $inter_domain =~ s/www.//r;
print $domain."\n";
When http://domain-name.tld or http://www.domain-name.tld or even*www.domain-name.tld is entered, this code returns domain-name.tld.
The question I have is, can the same be achieved using a Perl one-liner that combines both the search and replace lines into one?
If you make both the http:// and the www. optional but look for both of them then it will remove either one or both. The only disparity from the original code is that it will change www.http://domain-name.tld to http://domain-name.tld which I think isn't a disadvantage
It seems odd to ask for a on-liner that modifies user input, so I've written this sample that processes four different strings from the DATA file handle. Also note that it's much tidier to use different delimiters for the substitution to avoid having to escape the slashes
use strict;
use warnings;
while ( <DATA> ) {
s|^(?:http://)?(?:www\.)?||;
print;
}
__END__
http://www.domain-name.tld
http://domain-name.tld
www.domain-name.tld
domain-name.tld
output
domain-name.tld
domain-name.tld
domain-name.tld
domain-name.tld
Combine the regex: (http:\/\/)|(www\.)
s/(http:\/\/)|(www\.)//r;
This removes http:// and/or www.
I am trying to search for a substring and replace the whole string if the substring is found. in the below example someVal could be any value that is unknown to me.
how i can search for someServer.com and replace the whole string $oldUrl and with $newUrl?
I can do it on the whole string just fine:
$directory = "/var/tftpboot";
my $oldUrl = "someVal.someServer.com";
my $newUrl = "someNewVal.someNewServer.com";
opendir( DIR, $directory ) or die $!;
while ( my $files = readdir(DIR) ) {
next unless ( $files =~ m/\.cfg$/ );
open my $in, "<", "$directory/$files";
open my $out, ">", "$directory/temp.txt";
while (<$in>) {
s/.*$oldUrl.*/$newUrl/;
print $out $_;
}
rename "$directory/temp.txt", "$directory/$files";
}
Your script will delete much of your content because you are surrounding the match with .*. This will match any character except newline, as many times as it can, from start to end of each line, and replace it.
The functionality that you are after already exists in Perl, the use of the -pi command line switches, so it would be a good idea to make use of it rather than trying to make your own, which works exactly the same way. You do not need a one-liner to use the in-place edit. You can do this:
perl -pi script.pl *.cfg
The script should contain the name definitions and substitutions, and any error checking you need.
my $old = "someVal.someServer.com";
my $new = "someNewVal.someNewServer.com";
s/\Q$old\E/$new/g;
This is the simplest possible solution, when running with the -pi switches, as I showed above. The \Q ... \E is the quotemeta escape, which escapes meta characters in your string (highly recommended).
You might want to prevent partial matches. If you are matching foo.bar, you may not want to match foo.bar.baz, or snafoo.bar. To prevent partial matching, you can put in anchors of different kinds.
(?<!\S) -- do not allow any non-whitespace before match
\b -- match word boundary
Word boundary would be suitable if you want to replace server1.foo.bar in the above example, but not snafoo.bar. Otherwise use whitespace boundary. The reason we do a double negation with a negative lookaround assertion and negated character class is to allow beginning and end of line matches.
So, to sum up, I would do:
use strict;
use warnings;
my $old = "someVal.someServer.com";
my $new = "someNewVal.someNewServer.com";
s/(?<!\S)\Q$old\E(?!\S)/$new/g;
And run it with
perl -pi script.pl *.cfg
If you want to try it out beforehand (highly recommended!), just remove the -i switch, which will make the script print to standard output (your terminal) instead. You can then run a diff on the files to inspect the difference. E.g.:
$ perl -p script.pl test.cfg > test_replaced.cfg
$ diff test.cfg test_replaced.cfg
You will have to decide whether word boundary is more desirable, in which case you replace the lookaround assertions with \b.
Always use
use strict;
use warnings;
Even in small scripts like this. It will save you time and headaches.
If you want to match and replace any subdomain, then you should devise a specific regular expression to match them.
\b(?i:(?!-)[a-z0-9-]+\.)*someServer\.com
The following is a rewrite of your script using more Modern Perl techniques, including Path::Class to handle file and directory operations in a cross platform way and $INPLACE_EDIT to automatically handle the editing of a file.
use strict;
use warnings;
use autodie;
use Path::Class;
my $dir = dir("/var/tftpboot");
while (my $file = $dir->next) {
next unless $file =~ m/\.cfg$/;
local #ARGV = "$file";
local $^I = '.bak';
while (<>) {
s/\b(?i:(?!-)[a-z0-9-]+\.)*someServer\.com\b/someNewVal.someNewServer.com/;
print;
}
#unlink "$file$^I"; # Optionally delete backup
}
Watch for the Dot-Star: it matches everything that surrounds the old URL, so the only thing remaining on the line will be the new URL:
s/.*$oldUrl.*/$newUrl/;
Better:
s/$oldUrl/$newUrl/;
Also, you might need to close the output file before you try to rename it.
If the old URL contains special characters (dots, asterisks, dollar signs...) you might need to use \Q$oldUrl to suppress their special meaning in the regex pattern.
I just want to remove double quotes(") of a string "dropDownStorePrepare(this,\'hello\')".
Tried this way but not working.
#!/usr/bin/perl
use strict;
use warnings;
my $str = '"store":"dropDownStorePrepare(this,\'hello\')","name":"Rama Rao"';
$str =~ s/"dropDownStorePrepare(.*)"/dropDownStorePrepare$1/ig;
print $str;
Double quotes which are at the beginning and ending of the dropDownStorePrepare(,,,) should be removed and rest of the double quotes should be remained.
Note: dropDownStorePrepare function should accept any no of parameters.
Can somebody help me please..
The immediate problem you've got is that the .* is matching too much. Try:
$str =~ s/"dropDownStorePrepare(.*?)"/dropDownStorePrepare$1/ig;
Though it looks like you're trying to parse JSON. Maybe you should look for a module to do that for you…
Try non greedy regex,
$str =~ s/"dropDownStorePrepare(.*?)"/dropDownStorePrepare$1/ig;
I have a string as below:
$str = "/dir1/dir2/dir3/file.txt"
I want to remove the /file.txt from this string.
So that the $str will become.
$str = "/dir1/dir2/dir3"
I am using the following regex. But it is replacing everything.
$str =~ s/\/.*\.txt//;
How can I make regex to look for last '/' instead of first.
What is the correct regular expression for this?
Please note that file.txt is not fixed name. It can be anything like file1.txt, file2.txt, etc.
If you want to get the path from that string, you can use File::Basename. It is a core module since Perl version 5.
perl -MFile::Basename -le '$str = "/dir2/dir3/file.txt"; print dirname($str);'
In script form:
use strict;
use warnings; # always use these
use File::Basename;
my $str = "/dir1/dir2/dir3/file.txt";
print dirname($str);"
Your regex does not work because it is not anchored, and .* is greedy, so it matches as much as it can, starting from the first slash / it encounters. A working regex would look something like these:
$str =~ s#/[^/]*?\.txt$##;
Note the use of a non-greedy quantifier *?, which will match smallest possible string. Also note that I use another delimiter for the substitution to avoid the "leaning toothpick syndrome", e.g. s/\/\/\///.
Very simple regex : s/\/[^\/]*$//
In this regex
m/(.*)\/[^\/]*$/
the first submatch is the path you are looking for.
EDIT:
If you are looking for substitution user1215106's soultion is the way to go:
s/\/[^\/]*$//
I've got two question about Regexp::Common qw/URI/ and Regex in Perl.
I use Regexp::Common qw/URI/ to parse URI in the strings and delete them. But I've got an error when a URI is between parentheses.
For example: (http://www.example.com)
The error is caused by ')', and when it try to parse the URI, the app crash. So I've thought two fixes:
Do a simple (or I thought so) that writes a whitespace between parentheses and ) characters
The Regexp::Common qw/URI/ has a function that implement a fix.
In my code I've tried to implement the Regex but the app freezes. The code that I've tried is this:
use strict;
use Regexp::Common qw/URI/;
my $str = "Hello!!, I love (http://www.example.com)";
while ($str =~ m/\)/){
$str =~ s/\)/ \)/;
}
my ($uri) = $str =~ /$RE{URI}{-keep}/;
print "$uri\n";
print $str;
The output that I want is: (http://www.example.com )
I'm not sure, but I think that the problem is in $str =~ s/\)/ \)/;
BTW, I've got a question about Regexp::Common qw/URI/. I've got two string type:
ablalbalblalblalbal http://www.example.com
asfasdfasdf http://www.example.com aasdfasdfasdf
I want to remove the URI if it is the last component (and save it). And, if not, save it without removing it from the text.
You don't have to first test for a match to be able to use the s/// operator correctly: If the string does not match the search pattern, it will not do anything.
#!/usr/bin/perl
use strict; use warnings;
my $str = "Hello!!, I love (GOOGLE)";
$str =~ s/\)/ )/g;
print "$str\n";
The general problem of detecting URLs correctly in text is error-prone. See for example Jeff's thoughts on this.
my $str = "Hello!!, I love (GOOGLE)";
while ($str =~ m/)/){
$str =~ s/)/ )/;
}
Your program goes into an infinite loop at this point. To see why, try printing the value of $str each time round the loop.
my $str = "Hello!!, I love (GOOGLE)";
while ($str =~ m/)/){
$str =~ s/)/ )/;
print $str, "\n";
}
The first time it prints "Hello!!, I love (GOOGLE )". The while loop condition is then evaluated again. Your string still matches your regular expression (it still contains a closing parenthesis) so the replacement is run again and this time it prints out "Hello!!, I love (GOOGLE )" with two spaces.
And so it goes on. Each time round the loop another space is added, but each time you still have a closing parenthesis, so another substitution is run.
The simplest solution I can see is to only match the closing parenthesis if it is preceded by a non-whitespace character (using \S).
my $str = "Hello!!, I love (GOOGLE)";
while ($str =~ m/\S)/){
$str =~ s/)/ )/;
print $str, "\n";
}
In this case the loop is only executed once.
Why not just include the parentheses in the search? If the URLs will always be bracketed, then something like this:
#!/usr/bin/perl
use warnings;
use strict;
use Regexp::Common qw/URI/;
my $str = "Hello!!, I love (http://www.google.com)";
my ($uri) = $str =~ / \( ( $RE{URI} ) \) /x;
print "$uri\n";
The regex from Regex::Common can be used as part of a longer regex, it doesn't have to be used on its own. Also I've used the 'x' modifier on the regex to allow whitespace so you can see more clearly what is going on - the brackets with the backslashes are treated as characters to match, those without define what is to matched (presumably like the {-keep} - I've not used that before).
You could also make the brackets optional, with something like:
/ (?: \( ( $RE{URI} ) \) | ( $RE{URI} ) ) /
although that would result in two match variables, one undefined - so something like following would be needed:
my $uri = $1 || $2 || die "Didn't match a URL!";
There's probably a better way to do this, and also if you're not bothered about matching parentheses then you could simply make the brackets optional (via a '?') in the first regex...
To answer your second question about only matching URLs at the end of the line - have a look at Regex 'anchors' which can force a match against the beginning or end of a line: ^ and $ (or \A and \Z if you prefer). e.g. matching a URL at the end of a line only:
/$RE{URI}\Z/