Regex (or bash), get pipes between quotes (perl) - regex

Update: Please keep in mind is that regex is my only option.
Update 2: Actually, I can use a bash based solution as well.
Trying to replace the pipes(can be more than one) that are between double quotes with commas in perl regex
Example
continuer|"First, Name"|123|12412|10/21/2020|"3|7"||Yes|No|No|
Expected output (3 and 7 are separated by a comma)
continuer|"First, Name"|123|12412|10/21/2020|"3,7"||Yes|No|No|
There may be more digits, it may not be just the two d\|d. It could be "3|7|2" and the correct output has to be "3,7,2" for that one. I've tried the following
cat <filename> | perl -pi -e 's/"\d+\|[\|\d]+/\d+,[\|\d]+/g'
but it just puts the actual string of d+ etc...
I'd really appreciate your help. ty

If it must be a regex here is a simpler one
perl -wpe's/("[^"]+")/ $1 =~ s{\|}{,}gr /eg' file
Not bullet-proof but it should work for the shown use case.†
Explanation. With /e modifier the replacement side is evaluated as code. There, a regex runs on $1 under /r so that the original ($1) is unchanged; $N are read-only and so we can't change $1 and thus couldn't run a "normal" s/// on it. With this modifier the changed string is returned, or the original if there were no changes. Just as ordered.
Once it's tested well enough add -i to change the input file "in-place" if wanted.
I must add, I see no reason that at least this part of the job can't be done using a CSV parser...
Thanks to ikegami for an improved version
perl -wpe's/"[^"]+"/ $& =~ tr{|}{,}r /eg' file
It's simpler, with no need to capture, and tr is faster
† Tested with strings like in the question, extended only as far as this
con|"F, N"|12|10/21|"3|7"||Yes|"2||4|12"|"a|b"|No|""|end|

I'd use a CSV parser, not regular expressions:
#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({ binary => 1, sep_char => "|"});
while (my $row = $csv->getline(*ARGV)) {
#$row = map { tr/|/,/r } #$row;
$csv->say(*STDOUT, $row);
}
example:
$ perl demo.pl input.txt
continuer|"First, Name"|123|12412|10/21/2020|3,7||Yes|No|No|
More verbose, but also more robust and a lot easier to understand.

If you cannot install modules, Text::ParseWords is a core module you can try. It can split a string and handle quoted delimiters.
use Text::ParseWords;
my $q = q(continuer|"First, Name"|123|12412|10/21/2020|"3|7"||Yes|No|No|);
print join "|", map { tr/|/,/; $_ } quotewords('\|', 1, $q);
As a one-liner, it would be:
perl -MText::ParseWords -pe'$_ = join "|", map { tr/|/,/; $_ } quotewords('\|', 1, $_);' yourfile.txt

You said Update 2: Actually, I can use a bash based solution as well. and while this script isn't bash you could call it from bash (or any other shell) which I assume is what you really mean by "bash based" so - this will work using any awk in any shell in every Unix box:
$ awk 'BEGIN{FS=OFS="\""} {for (i=2; i<=NF; i+=2) gsub(/\|/,",",$i)} 1' file
continuer|"First, Name"|123|12412|10/21/2020|"3,7"||Yes|No|No|
Imagine yourself having to debug or enhance the clear, simple loop above above vs the regexp incantation you posted in your answer:
's/(?:(?<=")|\G(?!^))(\s*[^"|\s]+(?:\s+[^"|\s]+)*)\s*\|\s*(?=[^"]*")/$1,/g'
Remember - Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems..
I'm sure you could do what I'm doing with awk above natively in perl instead if you're trying to modify a perl script to add this functionality.

I'd use Text::CSV_XS.
perl -MText::CSV_XS=csv -e'
csv
in => \*ARGV,
sep_char => "|",
on_in => sub { tr/|/,/ for #{ $_[1] } };
'
You can provide the file name as an argument or provide the data via STDIN.

This is working right now
's/(?:(?<=")|\G(?!^))(\s*[^"|\s]+(?:\s+[^"|\s]+)*)\s*\|\s*(?=[^"]*")/$1,/g'
Credit goes to my boss at work
Thanks everyone for looking.
I hope some of you realize that some projects require certain ways and complicating an already very complicated pre existing structure is not always an option at work. I knew there would be a one liner for this, do not hate because you did not like that.

Related

Perl regex replacement when replacement is numeric

This is probably something really silly, and I apologize if that is the case. I don't know exactly what to search for, and I haven't had any luck with the searches I've ran over the past half hour or so. Anyway...
So I want to automate making a simple change to xml with perl as part of a build process. This is the change I'm making, it's part of a config file called mapred-site.xml
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
- <value>1024</value>
+ <value>4096</value>
</property>
I've got a perl regex replacement that does exactly what I need it to do, until I change this FOO to 4096
cat mapred-site.xml | perl -p0e "s/(yarn.app.mapreduce.am.resource.mb<\/name>\s*?<value>)....(<\/value>)/\\1FOO\\2/s"
Guessing that the problem is that there are numbers directly next to the \\1 referring to the first portion, and it's pulling them in and trying to do \\14096 or similar, but I haven't been able to come up with a solution.
I apologize if the command itself is sloppy/inefficient, I'm still just getting started with these commands.
Using \1, \2 etc. on the right side of a regex is about a million years old anyway; the recommended way is to use $1, $2, etc. And if you use those you can use braces to separate the variable name from any neighboring stuff, like ${1}FOO${2} (or, just as well, ${1}4096${2}).
Here is a less fragile/more maintainable way to do it, using Mojo::DOM:
cat mapred-site.xml | perl -CSD -0777 -MMojo::DOM -pe '$_ = Mojo::DOM->new->xml(1)->parse($_); $_->find("property > name")->first(sub { $_->text eq "yarn.app.mapreduce.am.resource.mb" })->following("value")->first->content(4096)'
As a more readable script:
use strict;
use warnings;
use Mojo::DOM;
use open ':std', ':encoding(UTF-8)';
my $dom = do { local $/; Mojo::DOM->new->xml(1)->parse(readline \*STDIN) };
my $name = $dom->find('property > name')
->first(sub { $_->text eq 'yarn.app.mapreduce.am.resource.mb' });
$name->following('value')->first->content(4096);
print $dom->to_string;

Perl regexp substitution of an array

I would like to use Perl script A to generate and replace an array in Perl script B.
Script B originally contains something like:
my #old_array = (value1, value2, etc);
Script A contains something like:
for ( $a = 0; $a < $nr_values; $a++ ) {
$list .= "$new_values[$a], ";
}
`perl -pi -e 's/^my \#old_array.*/my \#new_array \= \( $list \)\;/g' script_B.pl;`
However, when I run Perl script A
The substitution occurs to all of the my declared variables
The array # symbol and name are not changed: only the updated values
Please advise how to properly substitute arrays using Perl?
I found a proper solution using triple backslashes:
perl -pi -e 's/^my \\\#temps.*/my \\\#temps \= \( NEW \)\;/g' modify.pl;
This solves both my regex find and replace issues.
For those attempting to help with substantive comments, thank you! For those insisting that I was not providing enough information, or that my issue was caused by unlisted code, I found your responses to be highly unprofessional and not focused on the question being asked.

Adding quotes to a CSV using perl

I've got a CSV that looks as follows:
A,01,ALPHA
00,D,CHARLIE
E,F,02
This is the desired file after transformation:
"A",01,"ALPHA"
00,"D","CHARLIE"
"E","F",02
As you can see, the fields that are entirely numeric are left unquoted, whilst the alpha (or alphanumeric ones) are quoted.
What would be a sensible way to go about this in Perl ?
Already commented below, but I've tried stuff like
perl -pe 's/(\w+)/"$1"/g'
And that doesn't work because \w obviously picks up the numerics.
I recommend not reinventing the wheel, but rather to use an already existing module, as zdim recommends. Here is your example using Text::CSV_XS
test.pl
#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;
use Scalar::Util qw( looks_like_number );
my $csv = Text::CSV_XS->new();
while (my $row = $csv->getline(*STDIN)) {
my #quoted_row = map { looks_like_number($_) ? $_ : '"'. $_ .'"' } #$row;
print join(',',#quoted_row) . "\n";
}
Output
cat input | perl test.pl
"A",01,"ALPHA"
00,"D","CHARLIE"
"E","F",02
Another one-liner, input file modified to add a line with alphanumeric fields
$ cat ip.csv
A,01,ALPHA
00,D,CHARLIE
E,F,02
23,AB12,53C
$ perl -F, -lane 's/.*[^0-9].*/"$&"/ foreach(#F); print join ",", #F' ip.csv
"A",01,"ALPHA"
00,"D","CHARLIE"
"E","F",02
23,"AB12","53C"
To modify OP's attempt:
$ perl -pe 's/(^|,)\K\d+(?=,|$)(*SKIP)(*F)|\w+/"$&"/g' ip.csv
"A",01,"ALPHA"
00,"D","CHARLIE"
"E","F",02
23,"AB12","53C"
(^|,)\K\d+(?=,|$)(*SKIP)(*F) this will skip the fields with digits alone and the alternate pattern \w+ will get replaced
It seems that you are after a one-liner. Here is a basic one
perl -lpe '$_ = join ",", map /^\d+$/ ? $_ : "\"$_\"", split ",";' input.csv
Splits each line by , and passes obtained list to map. There each element is tested for digits-only /^\d+$/ and passed untouched, or padded with " otherwise. Then map's return is joined by ,.
The -l removes newline, what is needed since " pad the whole line. The result is assigned back to $_ in order to be able to use -p so that there is no need for explicit print.
The code is very easily used in a script, if you don't insist on an one-liner.
Processing of csv files is far better done by modules, for example Text::CSV

Perl regexp substitution - multiple matches

Friends,
need some help with substitution regex.
I have a string
;;;;;;;;;;;;;
and I need to replace it by
;\N;\N;\N;\N;\N;\N;\N;\N;\N;\N;\N;\N;
I tried
s/;;/;\\N/;/g
but it gives me
;\N;;\N;;\N;;\N;;\N;;\N;;
tried to fiddle with lookahead and lookbehind, but can't get it solved.
I wouldn't use a regex for this, and instead make use of split:
#!/usr/bin/env perl
use strict;
use warnings;
my $str = ';;;;;;;;;;;;;';
print join ( '\N', split ( //, $str ) );
Splitting on nulls, to get each character, and making use of the fact that join puts delimiters between characters. (So not before first, and not after last).
This gives:
;\N;\N;\N;\N;\N;\N;\N;\N;\N;\N;\N;\N;
Which I think matches your desired output?
As a oneliner, this would be:
perl -ne 'print join ( q{\N}, split // )'
Note - we need single quotes ' rather than double around the \N so it doesn't get interpolated.
If you need to handle variable content (e.g. not just ; ) you can add grep or map into the mix - I'd need some sample data to give you a useful answer there though.
I use this for infile edit, the regexp suits me better
Following on from that - perl is quite clever. It allows you to do in place editing (if that's what you're referring to) without needing to stick with regular expressions.
Traditionally you might do
perl -i.bak -p -e 's/something/somethingelse/g' somefile
What this is doing is expanding out that out into a loop:
LINE: while (defined($_ = <ARGV>)) {
s/someting/somethingelse/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
E.g. what it's actually doing is:
opening the file
iterating it by lines
transforming the line
printing the new line
And with -i that print is redirected to the new file name.
You don't have to restrict yourself to -p though - anything that generates output will work in this way - although bear in mind if it doesn't 'pass through' any lines that it doesn't modify (as a regular expression transform does) it'll lose data.
But you can definitely do:
perl -i.bak -ne 'print join ( q{\N}, split // )'
And inplace edit - but it'll trip over on lines that aren't just ;;;;; as your example.
So to avoid those:
perl -i.bak -ne 'if (m/;;;;/) { print join ( q{\N}, split // ) } else { print }'
Or perhaps more succinctly:
perl -i.bak -pe '$_ = join ( q{\N}, split // ) if m/;;;/'
Since you can't match twice the same character you approach doesn't work. To solve the problem you can only check the presence of a following ; with a lookahead (the second ; isn't a part of the match) :
s/;(?=;)/;\\N/g

How to remove the last 6 digits from the filename in Perl using regex

I need your help in creating a regex to delete the hh:hh:ss bits from the file name.
I have the file name in format of:
abcd_efgh_ijkl_mnop_20140720151617.txt
And I want to rename it to:
abcd_efgh_ijkl_mnop_20140720.txt
Before moving it to the server. The Perl code I am using is doesn't work.I cannot use SUBSTR or rename function due to script requirement.
$file_name = #file_array;
$file_name =~s/$\s+\d{8}(.*)/$1/;
Please help me in creating the correct regex to do the same.
Instead of focusing on what you don't want, specify a regex that states what you DO want.
In this case, you specifically want to keep the first 8 digits of numbers and truncate the rest:
use strict;
use warnings;
while (<DATA>) {
s/\d{8}\K\d+//;
print;
}
__DATA__
abcd_efgh_ijkl_mnop_20140720151617.txt
Outputs:
abcd_efgh_ijkl_mnop_20140720.txt
Or if positive lookbehind assertions are not an option because you're working with a particularly ancient version of perl, then a capture group can achieve the same result: s/(\d{8})\d+/$1/;
Try this:
$filename="abcd_efgh_ijkl_mnop_20140720151617.txt";
$filename=~s/\d{6}.txt$/.txt/sg;
You could try the below perl command,
$ echo 'abcd_efgh_ijkl_mnop_20140720151617.txt' | perl -pe 's/^(.*).{6}(\..*)$/\1\2/g'
abcd_efgh_ijkl_mnop_20140720.txt
So it would be,
$file_name =~s/^(.*).{6}(\..*)$/$1$2/g;
my $in = 'abcd_efgh_ijkl_mnop_20140720151617.txt';
print "$in\n";
my ($new) = $in =~ /(.*2014\d{4})/;
print "$new\n";
Try with non-greedy way
$file_name = 'abcd_efgh_ijkl_mnop_20140720151617.txt';
$file_name =~s/(.*?)\d{6}\.txt/$1.txt/;
print $file_name;