perl regex replace only part of string - regex

I need to write a perl regex to convert
site.company.com => dc=site,dc=company,dc=com
Unfortunately I am not able to remove the trailing "," using the regex I came with below. I could of course remove the trailing "," in the next statement but would prefer that to be handled as a part of the regex.
$data="site.company.com";
$data =~ s/([^.]+)\.?/dc=$1,/g;
print $data;
This above code prints:
dc=site,dc=company,dc=com,
Thanks in advance.

When handling urls it may be a good idea to use a module such as URI. However, I do not think it applies in this case.
This task is most easily solved with a split and join, I think:
my $url = "site.company.com";
my $string = join ",", # join the parts with comma
map "dc=$_", # add the dc= to each part
split /\./, $url; # split into parts

$data =~s/\./,dc=/g&&s/^/dc=/g;
tested below:
> echo "site.company.com" | perl -pe 's/\./,dc=/g&&s/^/dc=/g'
dc=site,dc=company,dc=com

Try doing this :
my $x = "site.company.com";
my #a = split /\./, $x;
map { s/^/dc=/; } #a;
print join",", #a;

just put like this,
$data="site.company.com";
$data =~ s/,dc=$1/dc=$1/g; #(or) $data =~ s/,dc/dc/g;
print $data;

I'm going to try the /ge route:
$data =~ s{^|(\.)}{
( $1 && ',' ) . 'dc='
}ge;
e = evaluate replacement as Perl code.
So, it says given the start of the string, or a dot, make the following replacement. If it captured a period, then emit a ','. Regardless of this result, insert 'dc='.
Note, that I like to use a brace style of delimiter on all my evaluated replacements.

Related

Perl regex substitution

I need to match exactly "", but not ,"",
for example in this string
"abc","123","def","","asd","876"",345
I need to substitute the "", following the "876" but leave the "" after "def" alone.
The regex I have right now is
$line =~ s/[^,]"",/",/g
However this substitutes the 6 from "876".
Use a group replacement:
$line =~ s/([^,])"",/$1",/g
Or a lookbehind:
$line =~ s/(?<!,)"",/",/g
Having said that, "" is a CSV quoted quote, it can appear inside a string. For example, this is valid: """abc""". To avoid breaking that, also exclude " from the lookbehind:
$line =~ s/(?<![,"])"",/",/g
You are trying to fix a broken CSV file.
Doing it with a pattern match is not the best solution.
You should try fixing it by using Text::CSV_XS module with allow_loose_quotes => 1 option.
This converts the broken col to an empty col and double quotes the last column. Not quite sure if that is what your looking for but there you go.
use strict ;
my $line = '"abc","123","def","","asd","876"",345' ;
$line =~ s/\,{0,1}\"\"\,/\,\"\"\,/g ;
my #foo = split(/\,/,$line) ;
$foo[$#foo] =~ s/^|$/\"/g ;
$line = join ",", #foo ;
print "\n$line\n" ;

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

need regexp to help me extract data from within double-quotes

I have sought the answer to this question here in stackoverflow but can't get acceptable results. (Sorry!)
I have a data file that looks like this:
share "SHARE1" "/path/to/some/share" umask=022 maxusr=4294967295 netbios=SOMECIFSHOST
share "SHARE2" "/path/to/a/different/share with spaces in the dir name" umask=022 maxusr=4294967295 netbios=ANOTHERCIFSHOST
... from which I need to extract the values inside double-quotes. In other words, I'd like to get something like this:
share,SHARE1,/path/to/some/share/,umask=022,maxusr=4294967295,netbios=SOMECIFSHOST
share,SHARE2,/path/to/a/different/share with spaces in the dir name,umask=022,maxusr=4294967295,netbios=ANOTHERCIFSHOST
The tricky part I've found is in trying to extract the data inside quotes. Suggestions made here have not worked for me, so I'm guessing I'm just doing it wrong. I also need to extract BOTH values from each line's double-quoted strings, not just the first one; I figure the remaining stuff could easily be parsed by splitting on whitespace.
In case it's relevant, I'm running this on a RHEL box and I need to pull it out with a regexp using Perl.
Thx!
One option is to treat your data as a CSV file and use Text::CSV_XS to parse it, setting the separator character to a space:
use strict;
use warnings;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new( { binary => 1, sep_char => ' ' } )
or die "Cannot use CSV: " . Text::CSV->error_diag();
open my $fh, "<:encoding(utf8)", "data.txt" or die "data.txt: $!";
while ( my $row = $csv->getline($fh) ) {
print join ',', #$row;
print "\n";
}
$csv->eof or $csv->error_diag();
close $fh;
Output on your dataset:
share,SHARE1,/path/to/some/share,umask=022,maxusr=4294967295,netbios=SOMECIFSHOST
share,SHARE2,/path/to/a/different/share with spaces in the dir name,umask=022,maxusr=4294967295,netbios=ANOTHERCIFSHOST
Hope this helps!
You can do this:
if literal quotes inside quotes are escaped with a backslash: share "SHA \" RE1" ...
$str =~ s/(?|"((?>[^"\\]++|\\{2}|\\.)*)"|()) /$1,/gs;
if literal quotes are escaped with an other quote: share "SHA "" RE1" ...
$str =~ s/(?|"((?>[^"]++|"")*)"|()) /$1,/g;
if you are absolutly sure that there is no escaped quote between quotes in all your data:
$str =~ s/(?|"([^"]*)"|()) /$1,/g;
Try this.
[^\" ]*
It selects every char but the quotation marks and the spaces.
my $str = 'share "SHARE1" "/path/to/some/share" umask=022 maxusr=4294967295 netbios=SOMECIFSHOST';
$str =~ s/"?\s*"\s*/,/g;
print $str;
This regex replaces like below:
"space" = ,
"space = ,
space" = ,
"" = ,
Not sure if I understand the question, you say one thing in the text but the example says something different, annyway, try this:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my #matches = $_ =~ /"(.*?)"/g;
print "#matches\n";
}
__DATA__
share "SHARE1" "/path/to/some/share" umask=022 maxusr=4294967295 netbios=SOMECIFSHOST
share "SHARE2" "/path/to/a/different/share with spaces in the dir name" umask=022 maxusr=4294967295 netbios=ANOTHERCIFSHOST
output:
$ ./p.pl
SHARE1 /path/to/some/share
SHARE2 /path/to/a/different/share with spaces in the dir name
#!/usr/bin/env perl
while(<>){
my #a = split /\s+\"|\"\s+/ , $_; # split on any spaces + ", or any " + spaces
for my $item ( #a ) {
if ( $item =~ /\"/ ) { # if there's a quote, remove
$item =~ s/\"//g;
} elsif ( $item !~ /\"/ ){ # else just replace spaces with comma
$item =~ s/\s+/,/g;
}
}
print join(",", #a);
print "\n";
}
output:
share,SHARE1,/path/to/some/share,umask=022,maxusr=4294967295,netbios=SOMECIFSHOST,
share,SHARE2,/path/to/a/different/share with spaces in the dir name,umask=022,maxusr=4294967295,netbios=ANOTHERCIFSHOST,
Leave it to you to remove the last comma :)

Find and replace any or both patterns in a string

I have list of urls. I need to strip off the protocol from it.
Some may have only http:// in it some may have www in it or some both.
I have written the code for it as:
my #list = qw'http://de.yahoo.com http://mail.example.org http://www.aol.com';
foreach(#list)
{
my $string = $_;
$string =~ s/http:\/\///;
$string =~ s/www.//;
print $string,"\n";
}
It works fine but is there a better way to write it in one line?
This should do the trick:
my #list = qw(http://de.yahoo.com http://mail.example.org http://www.aol.com);
foreach(#list)
{
my $string = $_;
$string =~ s/^(?:http:\/\/)?(?:www\.)?//;
print $string,"\n";
}
For future reference, http://www.regextester.com/ is your friend :)
** Edit ** Modified to use ikegami's suggestion of (?:...) as it should be more efficient when the values captured are not needed.
I guess you may want:
s!^(http://)?(www\.)?!!;
A few points:
use s!a!b! instead of s/a/b/, this save the \/\/ escape.
use ^, this ensure http:// is at the start of string
As a single line:
print join("\n", map {s!^(http://)?(www\.)?!!;} #list);
Yes:
s{http://(.*)www.|www.(.*)http://|http://|www.}{$1$2}g;
But you probably meant to do:
s{^http://}[};
s{^www\.}[};
which can be combined into:
s{^(?:http://)?(?:www\.)?}{};
http://www.foo.bar/www.html?http://xxx => foo.bar/www.html?http://xxx
http://foo.bar/www.html => foo.bar/www.html?http://xxx
www.foo.bar/www.html => foo.bar/www.html?http://xxx
foo.bar/www.html => foo.bar/www.html?http://xxx

How do I use Perl to intersperse characters between consecutive matches with a regex substitution?

The following lines of comma-separated values contains several consecutive empty fields:
$rawData =
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n"
I want to replace these empty fields with 'N/A' values, which is why I decided to do it via a regex substitution.
I tried this first of all:
$rawdata =~ s/,([,\n])/,N\/A/g; # RELABEL UNAVAILABLE DATA AS 'N/A'
which returned
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,,N/A,\n
Not what I wanted. The problem occurs when more than two consecutive commas occur. The regex gobbles up two commas at a time, so it starts at the third comma rather than the second when it rescans the string.
I thought this could be something to do with lookahead vs. lookback assertions, so I tried the following regex out:
$rawdata =~ s/(?<=,)([,\n])|,([,\n])$/,N\/A$1/g; # RELABEL UNAVAILABLE DATA AS 'N/A'
which resulted in:
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,,N/A,,N/A\n
That didn't work either. It just shifted the comma-pairings by one.
I know that washing this string through the same regex twice will do it, but that seems crude. Surely, there must be a way to get a single regex substitution to do the job. Any suggestions?
The final string should look like this:
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,N/A,N/A,N/A,N/A\n
EDIT: Note that you could open a filehandle to the data string and let readline deal with line endings:
#!/usr/bin/perl
use strict; use warnings;
use autodie;
my $str = <<EO_DATA;
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,
EO_DATA
open my $str_h, '<', \$str;
while(my $row = <$str_h>) {
chomp $row;
print join(',',
map { length $_ ? $_ : 'N/A'} split /,/, $row, -1
), "\n";
}
Output:
E:\Home> t.pl
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A
You can also use:
pos $str -= 1 while $str =~ s{,(,|\n)}{,N/A$1}g;
Explanation: When s/// finds a ,, and replaces it with ,N/A, it has already moved to the character after the last comma. So, it will miss some consecutive commas if you only use
$str =~ s{,(,|\n)}{,N/A$1}g;
Therefore, I used a loop to move pos $str back by a character after each successful substitution.
Now, as #ysth shows:
$str =~ s!,(?=[,\n])!,N/A!g;
would make the while unnecessary.
I couldn't quite make out what you were trying to do in your lookbehind example, but I suspect you are suffering from a precedence error there, and that everything after the lookbehind should be enclosed in a (?: ... ) so the | doesn't avoid doing the lookbehind.
Starting from scratch, what you are trying to do sounds pretty simple: place N/A after a comma if it is followed by another comma or a newline:
s!,(?=[,\n])!,N/A!g;
Example:
my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";
use Data::Dumper;
$Data::Dumper::Useqq = $Data::Dumper::Terse = 1;
print Dumper($rawData);
$rawData =~ s!,(?=[,\n])!,N/A!g;
print Dumper($rawData);
Output:
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n"
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A\n"
You could search for
(?<=,)(?=,|$)
and replace that with N/A.
This regex matches the (empty) space between two commas or between a comma and end of line.
The quick and dirty hack version:
my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";
while ($rawData =~ s/,,/,N\/A,/g) {};
print $rawData;
Not the fastest code, but the shortest. It should loop through at max twice.
Not a regex, but not too complicated either:
$string = join ",", map{$_ eq "" ? "N/A" : $_} split (/,/, $string,-1);
The ,-1 is needed at the end to force split to include any empty fields at the end of the string.