String to split a complicated string in Perl [duplicate] - regex

This question already has answers here:
How can I parse quoted CSV in Perl with a regex?
(7 answers)
Closed 7 years ago.
I have a string that looks something like this:
'som,ething', another, 'thin#'g', 'her,e', gggh*
I am trying to get it to split on the commas that are NOT in the elements, like this:
'som,ething'
another
'thin#'g'
'her,e'
gggh*
I am using parse_line(q{,}, 1, $string) but it seems to fail when the string has single quotes in it. Is there something I'm missing?

#!/usr/bin/perl
use strict;
use warnings;
my $string = q{'som,ething', another, 'thin'g', 'her,e', gggh*};
my #splitted = split(/,(?=\s+)/, $string);
print $_."\n" foreach #splitted;
Output:
'som,ething'
another
'thin'g'
'her,e'
gggh*
Demo

It looks like you're trying to parse comma-separated values. The answer is to use Text::CSV_XS since that handles the various weird cases you're likely to find in the data. See How can I parse quoted CSV in Perl with a regex?

Using split is not the way to go. If you are sure your string is well formatted using a global match is more simple, example:
my $line = "'som,ething', another , 'thin#'g', 'her,e' , gggh*";
my #list = $line =~ /\s*('[^#']*(?:#.[^#']*)*+'|[^,]+(?<=\S))\s*/g;
print join("|", #list);
(the (?<=\S) is only here to trim items on the right)

Related

perl regex extracted from one file to use on a variable [duplicate]

This question already has answers here:
How can I use a variable in the replacement side of the Perl substitution operator?
(9 answers)
Closed 3 years ago.
I have a file containing search and replace string in a single line.
And I am reading that file, using split to separate search and replace string
and apply it on a variable.
File:
(.*) pre_$1
Perl Code:
$str = "a";
$line = < FILEHANDLE>; # Read above file.Contains (.*) pre_$1
my ($ss,$rs) = split /\s/,$line;
$str =~ s/$ss/$rs/ee;
This seems to be not working.
I tried to look online, one result is close which is wrap the replace string in both single and double quotes.
i.e.:
$rs = '"pre_$1"';
This works if its in the script, but if I read from file I done see any replacement.
Can someone point me to what I am doing wrong here?
Thanks.
s//$rs/ee expects $rs to contains valid Perl code. pre_$1 is not valid Perl code. It's a very bad idea to expect the user to provide Perl code anyway.
Solution:
use String::Substitution qw( gsub_modify );
gsub_modify($str, $ss, $rs);

RegEx match all word characters, with umlauts from different languages [duplicate]

This question already has answers here:
Why do Perl string operations on Unicode characters add garbage to the string?
(7 answers)
Closed 3 years ago.
I want to check if a person's name is valid.
It should check latin letters, also with umlauts (i.e. öäüÖÄÜé).
unfortunately nothing i've tried works.
regarding many sources (following some links),
https://www.regular-expressions.info/unicode.html
Regex for word characters in any language
\p{L} should work, but it doesn't works for me.
Do i have to use a library for this?
use strict;
use warnings;
my $test = "testString";
print $1 if ($test =~ m/^(\p{L}+)$/); #testString
$test = "testStringö";
print $1 if ($test =~ m/^(\p{L}+)$/); #no print msg
$test = "testéString";
print $1 if ($test =~ m/^(\p{L}+)$/); #no print msg
You need to tell Perl that the source code of your file is in utf8. Add
use utf8;
After
use strict;

Perl print matched regex string [duplicate]

This question already has answers here:
How can I extract the matches from the Perl match operator into variables?
(6 answers)
Closed 5 years ago.
The following perl code only gives back true or false (1 & 0)
#!/usr/bin/perl
use strict;
use warnings;
my $string;
$string ="interface Ethernet1/20
shutdown";
my $test = $string =~ m/^.+$(?=\s+shutdown)/mg;
print "'$test'\n";
I get back a 1.
But how can I get back the matched string 'interface Ethernet1/20' ?
Thanks for every help!
Simply give it list context:
my ($test) = $string =~ m/^.+$(?=\s+shutdown)/mg;
The concept of evaluation context (list vs scalar) is fundamental to Perl programming, so it may be time to review some tutorials and/or a reference manual.

How can I use regex to remove /1 or /2?

Regex gurus,
Here is the following line of code I want to parse with regex:
#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0/1
I want to obtain the following:
#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0
I have written the following regex on rubular.com:
(#.* *.)(!?(\/.))
My idea is to use negation to remove /1 by (!?(\/.)). However, this produces the entire line?
#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0/1
Why is (?!thisismystring) not removing /1? I googled the fire out of this, but they seemed to suggest similar things I am already trying? I deeply appreciate your help.
I think what you are trying to write is /(\#.* .*)(?=\/\d)/ (you need to escape the at sign # to prevent Perl from treating it as an array) but you need a positive look-ahead because you want to match everything up until the following characters are a slash followed by a digit.
Here is a program that demonstrates.
use strict;
use warnings;
use 5.010;
my $s = '#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0/1';
$s =~ /(\#.* .*)(?=\/.)/;
print $1, "\n";
But you would be much better off copying the whole string and removing the slash and everything after it, like this
use strict;
use warnings;
my $s = '#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0/1';
(my $fixed = $s) =~ s{/\d+$}{};
print $fixed, "\n";
output
#ERR030882.2595 HWI-BRUNOP16X_0001:3:1:6649:5175#0

Perl regex store matches in array

I have a file with strings in each row as follows
"229269_2,190594_2,94552_2,266076_2,269628_2,165328_2,99319_2,263339_2,263300_2,99315_2,271509_2,2714",A,1
the next line could look like
84545,X,2
I'm trying to parse this text in Perl. Note: quotes are present in the strings when there are several of them in a row, but not present if there is only item
I would like to parse each item into an array. I tried the following regex
#fields = ($_ =~ /(\d+\_\d+),*/g);
but it is missing the last 2714. How do I capture that edge case? Any help appreciated. Thanks in advance
It looks like you have a CSV File, so use an actual CSV parser for it like Text::CSV.
After you parse the columns, you can separate your first field into the array:
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attribute.
or die "Cannot use CSV: ".Text::CSV->error_diag ();
my $line = qq{"229269_2,190594_2,94552_2,266076_2,269628_2,165328_2,99319_2,263339_2,263300_2,99315_2,271509_2,2714",A,1 the next line could look like 84545,X,2};
if ($csv->parse($line)) {
my #columns = $csv->fields();
my #nums = split ',', $columns[0];
print "#nums\n";
}
Outputs:
229269_2 190594_2 94552_2 266076_2 269628_2 165328_2 99319_2 263339_2 263300_2 99315_2 271509_2 2714
Why not a regex ?
Yes, of course it's possible to use a regex for practically anything. But what you need to understand is that this will make your code extremely fragile and difficult to maintain.
Even if you want to use a regular expression, you should STILL do this in two steps. First separate the initial column(s) of your CSV, and then process the specific column that you're worried about.
Because you're just working with the first column, you could use code like the following:
use strict;
use warnings;
my $line = qq{"229269_2,190594_2,94552_2,266076_2,269628_2,165328_2,99319_2,263339_2,263300_2,99315_2,271509_2,2714",A,1 the next line could look like 84545,X,2};
if ($line =~ /^"(.*?)"|^([^,]*)/) {
my $column0 = $1 // $2;
my #nums = split ',', $column0;
print "#nums\n";
}
The above happens to accomplish the same thing as the previous code. However, it has one big flaw, it's not nearly as obvious to the maintaining programmer what's going on.
Whenever a new coder, or even yourself in 6 months, views the first set of code, it is extremely obvious what format your data is in. You're working with a CSV file, and the first column is a list separated by commas. The second code also works, but the new maintainer must actually read the regex and figure out what's going on to understand both what format the data is in, and whether the code is actually doing it correctly.
Anyway, do whatever you will, but I strongly advise you to use an actual CSV Parser for parsing csv files.
If all you want is all but the last two fields...
my $string = qq("229269_2,190594_2,94552_2,266076_2,269628_2,165328_2,99319_2,263339_2,263300_2,99315_2,271509_2,2714",A,1);
$string =~ s/"//g; # delete the quotes
my #f = split (/,/, $string); # split on the comma
pop #f; pop #f; # jettison the last two columns
# #f contains what you're looking for