I went through s'flow and other sites for simple solution with regex in perl.
$str = q(//////);#
Say I've six slash or seven, or other chars like q(aaaaa)
I want them to split like ['//','//'],
I tried #my_split = split ( /\/\/,$str); but it didn't work
Is it possible with regex?
Reason for this question is, say I have this domain name:
$site_name = q(http://www.yahoo.com/blah1/blah2.txt);
I wanted to split along single slash to get 'domain-name', I couldn't do it.
I tried
split( '/'{1,1}, $sitename); #didn't work. I expected it split on one slash than two.
Thanks.
The question is rather unclear.
To break a string into pairs of consecutive characters
my #pairs = $string =~ /(..)/g;
or to split a string by repeating slash
my #parts = split /\/\//, $string;
The separator pattern, in /.../, is an actual regex so we need to escape / inside it.
But then you say you want to parse URI?
Use a module, please. For example, there is URI
use warnings;
use strict;
use feature 'say';
use URI;
my $string = q(http://www.yahoo.com/blah1/blah2.txt);
my $uri = URI->new($string);
say "Scheme: ", $uri->scheme;
say "Path: ", $uri->path;
say "Host: ", $uri->host;
# there's more, see docs
and then there's URI::Split
use URI::Split qw(uri_split uri_join);
my ($scheme, $auth, $path, $query, $frag) = uri_split($uri);
A number of other modules or frameworks, which you may already be using, nicely handle URIs.
Here's a quick way to split the full URL into its components:
my $u = q(http://www.yahoo.com/blah1/blah2.txt);
my ($protocol, $server, $path) = split(/:\/\/([^\/]+)/, $u);
print "($protocol, $server, $path)\n";
h/t #Mike
Well next piece of code does the trick
use strict;
use warnings;
use Data::Dumper;
my %url;
while( <DATA> ) {
chomp;
m|(\wttps{0,1})://([\w\d\.]+)/(.+)/([^/]+)$|;
#url{qw(proto dn path file)} = ($1,$2,$3,$4);
print Dumper(\%url);
}
__DATA__
http://www.yahoo.com/blah1/blah2.txt
http://www.google.com/dir1/dir2/dir3/file.ext
ftp://www.server.com/dir1/dir2/file.ext
https://www.inter.net/dir/file.ext
So it seems you want to simply get the Domain name:
my $url = q(http://www.yahoo.com/blah1/blah2.txt);
my #vars = split /\//, $url;
print $vars[2];
results:
www.yahoo.com
Related
I'm working with legacy data which is usually in the format:
QID RESPONSE
However on some occasions the response contains multiple values of different types:
01320 2,35,6,"warm"
I have tried using
my #dataRowAsList = split('\t', $_);
my $questionID = $dataRowAsList[0];
my $response = substr($dataRowAsList[1],0,-2);
my #thisResponse = split(',', $response);
on relevant cases to split the output into question and response and then each response into component parts
However I've just discovered this type of case:
01320 2,35,6,"warm,windy"
The comma in quotes is not escaped
Is there a neat way to parse this into its components?
2
35
6
"warm,windy"
Quick example of Text::CSV usage with reading from a string:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use Text::CSV;
my $str = q/01320 2,35,6,"warm,windy"/;
my $csv = Text::CSV->new({auto_diag => 2});
my #fields = split " ", $str, 2;
say '$fields[0] is ', $fields[0];
say '$fields[1] is ', $fields[1];
say 'Parsed out $fields[1] is:';
$csv->parse($fields[1]);
say for $csv->fields;
Running this will produce:
$fields[0] is 01320
$fields[1] is 2,35,6,"warm,windy"
Parsed out $fields[1] is:
2
35
6
warm,windy
This is a non-core module, so you'll have to install it with your favorite CPAN client or your OS's package manager. If doing so doesn't automatically also install Text::CSV_XS, you'll probably want to do so as well to get an optimized implementation that Text::CSV with automatically use if present.
In your case I will use regexp and check the group that I need, this is an example I hope it will help you
use warnings;
use strict;
my $string = '01320 2,35,6,"warm,windy"';
if ($string =~ /^(\d+)\t(\d+),{1}(\d+),{1}(\d+),{1}(\S+)$/gu) {
print "$1\n$2\n$3\n$4\n$5\n\n";
}
I wrote a perl snippet that strips http:// and www from the front of a domain name input from the console
#!/usr/bin/perl
use strict;
print "Enter the domain name to be queried:\n";
my $input_domain = <>;
chomp ($input_domain);
my $inter_domain = $input_domain =~ s/http:\/\///r;
my $domain = $inter_domain =~ s/www.//r;
print $domain."\n";
When http://domain-name.tld or http://www.domain-name.tld or even*www.domain-name.tld is entered, this code returns domain-name.tld.
The question I have is, can the same be achieved using a Perl one-liner that combines both the search and replace lines into one?
If you make both the http:// and the www. optional but look for both of them then it will remove either one or both. The only disparity from the original code is that it will change www.http://domain-name.tld to http://domain-name.tld which I think isn't a disadvantage
It seems odd to ask for a on-liner that modifies user input, so I've written this sample that processes four different strings from the DATA file handle. Also note that it's much tidier to use different delimiters for the substitution to avoid having to escape the slashes
use strict;
use warnings;
while ( <DATA> ) {
s|^(?:http://)?(?:www\.)?||;
print;
}
__END__
http://www.domain-name.tld
http://domain-name.tld
www.domain-name.tld
domain-name.tld
output
domain-name.tld
domain-name.tld
domain-name.tld
domain-name.tld
Combine the regex: (http:\/\/)|(www\.)
s/(http:\/\/)|(www\.)//r;
This removes http:// and/or www.
I need to do pattern match with two variables one contains the string and the other contains the regex pattern
I tried with the following program
#!/usr/bin/perl
my $name = "sathish.java";
my $other = '*.java';
if ( $name =~ m/$other/ )
{
print "sathish";
}
kindly help where am missing
Thanks
Sathishkumar
#Shmuel answer suits your needs, but if you are looking for common way of extract the filename from a complete path name, you can use File::Basename:
use strict;
use warnings;
use File::Basename;
my ($name, $path, $suffix) = fileparse("/example/path/test.java", qw/.java/);
print "name: $name\n";
print "path: $path\n";
print "suffix: $suffix\n";
it prints:
name: test
path: /example/path/
suffix: .java
'*.java' is not a valid regex. you probably want to use this code:
my $other = '\.java$';
if ($name =~ m/$other/) {
you can use following style which is more appropriate of your need
$other = "*.java";
if ($name =~m/^$other/){}
--SJ
I like Shmuel's answer, but I'm guessing you probably want to capture the first part of the regex into variable as well?
if so, use
my $other = '\.java$';
if ($name =~ m/(\D*)$other/) {
print $1;
# prints "sathish"
}
so what I want to do is remove everything after and including the first "/" to appear after a "."
so: http://linux.pacific.net.au/primary.xml.gz
would become: http://linux.pacific.net.au
How do I do this using regex? The system I'm running on can't use URI tool.
$url = 'http://linux.pacific.net.au/primary.xml.gz';
($domain) = $url =~ m!(https?://[^:/]+)!;
print $domain;
output:
http://linux.pacific.net.au
and this is the official regular expression can be used to decode a URI:
my($scheme, $authority, $path, $query, $fragment) =
$uri =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
I suggest you use URI::Split which will separate a standard URL into its constuent parts for you and rejoin them. You want the first two parts - the scheme and the host.
use strict;
use warnings;
use URI::Split qw/ uri_split uri_join /;
my $scheme_host = do {
my (#parts) = uri_split 'http://linux.pacific.net.au/primary.xml.gz';
uri_join #parts[0,1];
};
print $scheme_host;
output
http://linux.pacific.net.au
Update
If your comment The system I'm running on can't use URI tool means you can't install modules, then here is a regular expression solution.
You say you want to remove everything after and including the first "/" to appear after a ".", so /^.*?\./ finds the first dot, and m|[^/]+| finds everything after it up tot he next slash.
The output is identical to that of the preceding code
use strict;
use warnings;
my $url = 'http://linux.pacific.net.au/primary.xml.gz';
my ($scheme_host) = $url =~ m|^( .*?\. [^/]+ )|x;
print $scheme_host;
The system I'm running on can't use URI tool.
I really recommend doing whatever you can to fix that problem first. If you're not able to use CPAN modules then you'll be missing out on a lot of the power of Perl and your Perl programming life will be far more frustrating than it needs to be.
I'm still useless when it comes to creating regex patterns. This one has got to be really easy.
Given the string: /home/users/cheeseconqueso/perl.pl
I want to place the string ./ right in front of the very last / in the original string to produce: /home/users/cheeseconqueso/./perl.pl
Thanks in advance - I'm sure this simple example will help me for a lot of other stupid stuff that shouldn't be up here!
Here's my solution based on what I was thinking of when I left the comment to your question:
use strict;
use warnings;
my $str = '/home/users/cheeseconqueso/perl.pl';
my #arr = split('/',$str);
my $newstr = join('/', #arr[0..(#arr-2)], '.', $arr[-1]);
Edit: If you're really keen on using a regex, this is the simplest one I've found:
$str =~ s|(.*/)(.*)|$1./$2|;
It takes advantage of the greediness of the initial * in the first group to take every character up to the last /, then matches everything else in the second group. It's a little easier to read with the | delimiters so you avoid leaning toothpick syndrome.
my $variable = "/home/users/cheeseconqueso/perl.pl";
$variable =~ s/(.*\/)([^\/]+)$/$1\.\/$2/;
this is using File::Basename not with regex, you can use this cpan module if you find hard to write regex.
use File::Basename;
$path = '/home/users/cheeseconqueso/perl.pl';
$newpath = dirname($path) . './'.basename($path);
You actually don't need a regex, you can just split the string:
my $str='/home/users/cheeseconqueso/perl.pl';
#Array: ('','home','users','cheeseconqueso','perl.pl')
my #arr=split(/\//,$str);
my $output=join('/',#arr[0..($#arr-2)]); # '/home/users/cheeseconqueso'
$output.='/./' . $arr[$#arr]; #Pops on the '/./' and the last element of #arr.
s/(.*)\/(.*)/\1.\/\2/
As noted by CanSpice, split works fine here too.
Try this:
my $str = "/home/users/cheeseconqueso/perl.pl";
if($str =~ /^(.*)(\/)(\w+\.\w+)$/) {
$str = $1.'/.'.$2.$3;
}
Regex? We don't need no stinkin' regex.
l-value substr() to the rescue!
#!/usr/bin/perl
use warnings;
use strict;
$_ = "/home/users/cheeseconqueso/perl.pl\n";
substr($_, rindex($_, '/'), 0) = '/.';
print;