split first from rest of list using regex substitution - regex

I need to split a list between its first item and the rest of its items using regex substitution only.
The lists of items are input as strings using '##' as a separator, e.g.:
''
'one'
'one##two'
'one##two##three'
'one##two words##three'
My Perl attempt doesn't really work:
my $sampleText = 'one##two words##three';
my $first = $sampleText;
my $rest = $sampleText;
$first =~ s/(.+?)(##.*)?/$1/g;
$rest =~ s/(.?+)(##)?(.*)/$3/g;
print "sampleText = '$sampleText', first = '$first', rest = '$rest'\n";
sampleText = 'one##two words##three', first = 'one', rest = 'ne##two words##three'
Please note the constraints:
the separator is a multi-character string
only regex substitutions are allowed (1)
I could "chain" regex substitutions if necessary
The expected end result is two strings: the first element, and the initial string with the first element cut off (2)
the list may have from 0 to n items, each being any string not containing the separator.
(1) I work with this rather large Perl system where at some point lists of items are processed using provided operations. One of them is a regex substitution. None of the others one are applicable. Solving the problem using full Perl code is easy, but that would mean modifying the system, which is not an option as this time.
(2) the context is the Unimarc bibliographic format, where authors of a publication are to be split into the standard Unimarc fields 700$a for the first author, and 701$a for any remaining authors.

I assume point (1) means you cannot use the split builtin? It would be easy using splits optional third parameter which lets you specify the maximum number of items.
my( $first, $rest ) = split( '##', $sampleText, 2 );
But if it has to be regex replace then your is almost right, but using .+? wont work when there's no sperators (because it will just take the first character You can fix this by anchoring the end. Instead something like:
my $sampleText = 'one##two words##three';
my $first = $sampleText;
my $rest = $sampleText;
$first =~ s/(.+?)(|##(.*))$/$1/g;
$rest =~ s/(.+?)(|##(.*))$/$3/g;
print "sampleText = '$sampleText', first = '$first', rest = '$rest'\n";

Whatever is the matter with :
my ( $first, $rest ) = split /##/, $sampleText, 2;
?

try
my ($first, $rest) = /(.+?)\#\#(.*)/;
// (or, m//) is matching; you don't need to use s/// for substitution. It returns the matches (here, to $first, $rest), or you can capture them later using $1, $2, &c.

You have reversed the quantifiers ? and + in the second regex, it should be:
$rest =~ s/(.+?)(##)?(.*)/$3/g;
___^^
or more concise:
$rest =~ s/.+?##(.*)/$1/;

I'd must match; not substitute:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
m{([^#]*?)##(.*)} and print "[$1][$2]\n";
}
__DATA__
''
'one'
'one##two'
'one##two##three'
'one##two words##three'

Related

Replace only the second occurance of string in a line in perl regex

I have a string like "ven|ven|vett|vejj|ven|ven". Treat each "|" delimiter for each column.
By splitting the string with "|" saving all the columns in array and reading each column into $str
So, I'm trying to do this as
$string =~ s/$str/venky/g if $str =~ /ven/i; # it will do globally.
Which not met the requirement.
On-demand basis, I need to replace string at the particular number of occurrence of the string.
For example, I've a request to change 2nd occurrence of "ven" to venky.
Then how can I met this requirement simply? Is it some-thing like
$string =~ s/ven/venky/2;
As of my knowledge we have 'o' for replace once and 'g' for globally. I'm struggling for the solution to get the replacement at particular occurrence. And I should not use pos() to get the position, because string keeps on change. It becomes difficult to trace it every-time. That's my intention.
Please help me on this regard.
There is no flag that you can add to the regex that will do this.
The easiest way would be to split and loop. However, if you insist to use one regex, it is doable:
/^(?:[^v]|v[^e]|ve[^n])*ven(?:[^v]|v[^e]|ve[^n])*\Kven/
If you want to replace the Nth occurrence instead of the second, you can do:
/^(?:(?:[^v]|v[^e]|ve[^n])*ven){N-1}(?:[^v]|v[^e]|ve[^n])*\Kven/
The general idea:
(?:[^v]|v[^e]|ve[^n])* - matches any string that isn't part of ven
\K is a cool matcher that drops everything matched so far, so you can sort of use it as a lookbehind with variable length
Currently you're replacing every instance of'ven' with 'venky' if your string contains a match for ven, which of course it does.
What I assume you're trying to do is to substitute 'ven' for 'venky' within your string if it's the second element:
my $string = 'ven|ven|vett|vejj|ven|ven';
my #elements = split(/\|/, $string);
my $count;
foreach (#elements){
$count++;
s/$_/venky/g if /ven/i and $count == 2;
}
print join('|', #elements);
print "\n";
Your approach was already pretty good. What you described makes sense, but I think you are having trouble implementing it.
I created a function to do the work. It takes 4 arguments:
$string is the string we want to work on
$n is the nth occurance you want to replace
$needle is the thing you want to replace – thing needle in a haystack
Note that right now we allow to pass stuff that might contain regular expressions. So you would have to use quotemeta on it or match with /\Q$needle\E/
$replacement is the replacement for the $needle
The idea is to split up the string, then check each element if it matches the pattern ($needle) and keep track of how many have matched. If the nth one is reached, replace it and stop processing. Then put the string back together.
use strict;
use warnings;
use feature 'say';
say replace_nth_occurance("ven|ven|vett|vejj|ven|ven", 2, 'ven', 'venky');
sub replace_nth_occurance {
my ($string, $n, $needle, $replacement) = #_;
# take the string appart
my #elements = split /\|/, $string;
my $count = 0; # keep track of ...
foreach my $e (#elements) {
$count++ if $e =~ m/$needle/; # ... how many matches we've found
if ($count == $n) {
$e =~ s/$needle/$replacement/; # replace
last; # and stop processing
}
}
# put it back into the pipe-separated format
return join '|', #elements;
}
Output:
ven|venky|vett|vejj|ven|ven
To replace the n'th occurrence of "ven" to "venky":
my $n = 3;
my $test = "seven given ravens";
$test =~ s/ven/--$n == 0 ? "venky" : $&/eg;
This uses the ability with the /e flag to specify the substitution part as an expression.

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

How do I tokenise a word given tokens that are subsumed incompletely in the word?

I understand how to use regex in Perl in the following way:
$str =~ s/expression/replacement/g;
I understand that if any part of the expression is enclosed in parentheses, it can be used and captured in the replacement part, like this:
$str =~ s/(a)/($1)dosomething/;
But is there a way to capture the ($1) above outside of the regex expression?
I have a full word which is a string of consonants, e.g. bEdmA, its vowelized version baEodamaA (where a and o are vowels), as well its split up form of two tokens, separated by space, bEd maA. I want to just pick up the vowelized form of the tokens from the full word, like so: beEoda, maA. I'm trying to capture the token within the full word expression, so I have:
$unvowelizedword = "bEdmA";
$tokens[0] = "bEd", $tokens[1] = "mA";
$vowelizedword = "baEodamA";
foreach $t(#tokens) {
#find the token within the full word, and capture its vowels
}
I'm trying to do something like this:
$vowelizedword = m/($t)/;
This is completely wrong for two reasons: the token $t is not present in exactly its own form, such as bEd, but something like m/b.E.d/ would be more relevant. Also, how do I capture it in a variable outside the regular expression?
The real question is: how can I capture the vowelized sequences baEoda and maA, given the tokens bEd, mA from the full word beEodamaA?
Edit
I realized from all the answers that I missed out two important details.
Vowels are optional. So if the tokens are : "Al" and "ywm", and the fully vowelized word is "Alyawmi", then the output tokens would be "Al" and "yawmi".
I only mentioned two vowels, but there are more, including symbols made up of two characters, like '~a'. The full list (although I don't think I need to mention it here) is:
#vowels = ('a', 'i', 'u', 'o', '~', '~a', '~i', '~u', 'N', 'F', 'K', '~N', '~K');
The following seems to do what you want:
#!/usr/bin/env perl
use warnings;
use strict;
my #tokens = ('bEd', 'mA');
my $vowelizedword = "beEodamaA";
my #regex = map { join('.?', split //) . '.?' } #tokens;
my $regex = join('|', #regex);
$regex = qr/($regex)/;
while (my ($matched) = $vowelizedword =~ $regex) {
$vowelizedword =~ s{$regex}{};
print "matched $matched\n";
}
Update as per your updated question (vowels are optional). It works from the end of the string so you'll have to gather the tokens into an array and print them in reverse:
#!/usr/bin/env perl
use warnings;
use strict;
my #tokens = ('bEd', 'mA', 'Al', 'ywm');
my $vowelizedword = "beEodamaA Alyawmi"; # Caveat: Without the space it won't work.
my #regex = map { join('.?', split //) . '.?$' } #tokens;
my $regex = join('|', #regex);
$regex = qr/($regex)/;
while (my ($matched) = $vowelizedword =~ $regex) {
$vowelizedword =~ s{$regex}{};
print "matched $matched\n";
}
Use the m// operator in so-called "list context", as this:
my #tokens = ($input =~ m/capturing_regex_here/modifiershere);
ETA: From what I understand now, what you were trying to say is that you want to match an optional vowel after each character of the tokens.
With this, you can tweak the $vowels variable to only contain the letters you seek. Optionally, you may also just use . to capture any character.
use strict;
use warnings;
use Data::Dumper;
my #tokens = ("bEd", "mA");
my $full = "baEodamA";
my $vowels = "[aeiouy]";
my #matches;
for my $rx (#tokens) {
$rx =~ s/.\K/$vowels?/g;
if ($full =~ /$rx/) {
push #matches, $full =~ /$rx/g;
}
}
print Dumper \#matches;
Output:
$VAR1 = [
'baEoda',
'mA'
];
Note that
... $full =~ /$rx/g;
does not require capturing groups in the regex.
I suspect that there is an easier way to do whatever you're trying to accomplish. The trick is not to make the regex generation code so tricky that you forget what it's actually doing.
I can only begin to guess at your task, but from your single example, it looks like you want to check that the two subtokens are in the larger token, ignoring certain characters. I'm going to guess that those sub tokens have to be in order and can't have anything else between them besides those vowel characters.
To match the tokens, I can use the \G anchor with the /g global flag in scalar context. This anchors the match to the character one after the end of the last match for the same scalar. This way allows me to have separate patterns for each sub token. This is much easier to manage since I only need to change the list of values in #subtokens.
Once you go through each of the pairs and find which ones match all the patterns, I can extract the original string from the pair.
use v5.14;
my $vowels = '[ao]*';
my #subtokens = qw(bEd mA);
# prepare the subtoken regular expressions
my #patterns = map {
my $s = join "$vowels", map quotemeta, (split( // ), '');
qr/$s/;
} #subtokens;
my #tokens = qw( baEodamA mAabaEod baEoda mAbaEoda );
my #grand_matches;
TOKEN: foreach my $token ( #tokens ) {
say "-------\nMatching $token..........";
my #matches;
PATTERN: foreach my $pattern ( #patterns ) {
say "Position is ", pos($token) // 0;
# scalar context /g and \G
next TOKEN unless $token =~ /\G($pattern)/g;
push #matches, $1;
say "Matched with $pattern";
}
push #grand_matches, [ $token, \#matches ];
}
# Now report the original
foreach my $tuple ( #grand_matches ) {
say "$tuple->[0] has both fragments: #{$tuple->[1]}";
}
Now, here's the nice thing about this structure. I've probably guessed wrong about your task. If I have, it's easy to fix without changing the setup. Let's say that the subtokens don't have to be in order. That's an easy change to the pattern I created. I just get rid of the
\G anchor and the /g flag;
next TOKEN unless $token =~ /($pattern)/;
Or, suppose that the tokens have to be in order, but other things may be between them. I can insert a .*? to match that stuff, effectively skipping over it:
next TOKEN unless $token =~ /\G.*?($pattern)/g;
It would be much nicer if I could manage all of this from the map where I create the patterns, but the /g flag isn't a pattern flag. It has to go with the operator.
I find it much easier to manage changing requirements when I don't wrap everything in a single regular expression.
Assuming the tokens need to appear in order and without anything (other than a vowel) between them:
my #tokens = ( "bEd", "mA" );
my $vowelizedword = "baEodamaA";
my $vowels = '[ao]';
my (#vowelized_sequences) = $vowelizedword =~ ( '^' . join( '', map "(" . join( $vowels, split( //, $_ ) ) . "(?:$vowels)?)", #tokens ) . '\\z' );
print for #vowelized_sequences;

How can I store regex captures in an array in Perl?

Is it possible to store all matches for a regular expression into an array?
I know I can use ($1,...,$n) = m/expr/g;, but it seems as though that can only be used if you know the number of matches you are looking for. I have tried my #array = m/expr/g;, but that doesn't seem to work.
If you're doing a global match (/g) then the regex in list context will return all of the captured matches. Simply do:
my #matches = ( $str =~ /pa(tt)ern/g )
This command for example:
perl -le '#m = ( "foo12gfd2bgbg654" =~ /(\d+)/g ); print for #m'
Gives the output:
12
2
654
Sometimes you need to get all matches globally, like PHP's preg_match_all does. If it's your case, then you can write something like:
# a dummy example
my $subject = 'Philip Fry Bender Rodriguez Turanga Leela';
my #matches;
push #matches, [$1, $2] while $subject =~ /(\w+) (\w+)/g;
use Data::Dumper;
print Dumper(\#matches);
It prints
$VAR1 = [
[
'Philip',
'Fry'
],
[
'Bender',
'Rodriguez'
],
[
'Turanga',
'Leela'
]
];
See the manual entry for perldoc perlop under "Matching in List Context":
If the /g option is not used, m// in list context returns a list consisting of the
subexpressions matched by the parentheses in the pattern, i.e., ($1 , $2 , $3 ...)
The /g modifier specifies global pattern matching--that is, matching as many times as
possible within the string. How it behaves depends on the context. In list context, it
returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.
You can simply grab all the matches by assigning to an array, or otherwise performing the evaluation in list context:
my #matches = ($string =~ m/word/g);
I think this is a self-explanatory example. Note /g modifier in the first regex:
$string = "one two three four";
#res = $string =~ m/(\w+)/g;
print Dumper(#res); # #res = ("one", "two", "three", "four")
#res = $string =~ m/(\w+) (\w+)/;
print Dumper(#res); # #res = ("one", "two")
Remember, you need to make sure the lvalue is in the list context, which means you have to surround scalar values with parenthesis:
($one, $two) = $string =~ m/(\w+) (\w+)/;
Is it possible to store all matches for a regular expression into an array?
Yes, in Perl 5.25.7, the variable #{^CAPTURE} was added, which holds "the contents of the capture buffers, if any, of the last successful pattern match". This means it contains ($1, $2, ...) even if the number of capture groups is unknown.
Before Perl 5.25.7 (since 5.6.0) you could build the same array using #- and #+ as suggested by #Jaques in his answer. You would have to do something like this:
my #capture = ();
for (my $i = 1; $i < #+; $i++) {
push #capture, substr $subject, $-[$i], $+[$i] - $-[$i];
}
I am surprised this is not already mentioned here, but perl documentation provides with the standard variable #+. To quote from the documentation:
This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope.
So, to get the value caught in first capture, one would write:
print substr( $str, $-[1], $+[1] - $-[1] ), "\n"; # equivalent to $1
As a side note, there is also the standard variable %- which is very nifty, because it not only contains named captures, but also allows for duplicate names to be stored in an array.
Using the example provided in the documentation:
/(?<A>1)(?<B>2)(?<A>3)(?<B>4)/
would yield an hash with entries such as:
$-{A}[0] : '1'
$-{A}[1] : '3'
$-{B}[0] : '2'
$-{B}[1] : '4'
Note that if you know the number of capturing groups you need per match, you can use this simple approach, which I present as an example (of 2 capturing groups.)
Suppose you have some 'data' like
my $mess = <<'IS_YOURS';
Richard Rich
April May
Harmony Ha\rm
Winter Win
Faith Hope
William Will
Aurora Dawn
Joy
IS_YOURS
With the following regex
my $oven = qr'^(\w+)\h+(\w+)$'ma; # skip the /a modifier if using perl < 5.14
I can capture all 12 (6 pairs, not 8...Harmony escaped and Joy is missing) in the #box below.
my #box = $mess =~ m[$oven]g;
If I want to "hash out" the details of the box I could just do:
my %hash = #box;
Or I just could have just skipped the box entirely,
my %hash = $mess =~ m[$oven]g;
Note that %hash contains the following. Order is lost and dupe keys (if any had existed) are squashed:
(
'April' => 'May',
'Richard' => 'Rich',
'Winter' => 'Win',
'William' => 'Will',
'Faith' => 'Hope',
'Aurora' => 'Dawn'
);

How do I use Perl to intersperse characters between consecutive matches with a regex substitution?

The following lines of comma-separated values contains several consecutive empty fields:
$rawData =
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n"
I want to replace these empty fields with 'N/A' values, which is why I decided to do it via a regex substitution.
I tried this first of all:
$rawdata =~ s/,([,\n])/,N\/A/g; # RELABEL UNAVAILABLE DATA AS 'N/A'
which returned
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,,N/A,\n
Not what I wanted. The problem occurs when more than two consecutive commas occur. The regex gobbles up two commas at a time, so it starts at the third comma rather than the second when it rescans the string.
I thought this could be something to do with lookahead vs. lookback assertions, so I tried the following regex out:
$rawdata =~ s/(?<=,)([,\n])|,([,\n])$/,N\/A$1/g; # RELABEL UNAVAILABLE DATA AS 'N/A'
which resulted in:
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,,N/A,,N/A\n
That didn't work either. It just shifted the comma-pairings by one.
I know that washing this string through the same regex twice will do it, but that seems crude. Surely, there must be a way to get a single regex substitution to do the job. Any suggestions?
The final string should look like this:
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,N/A,N/A,N/A,N/A\n
EDIT: Note that you could open a filehandle to the data string and let readline deal with line endings:
#!/usr/bin/perl
use strict; use warnings;
use autodie;
my $str = <<EO_DATA;
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,
EO_DATA
open my $str_h, '<', \$str;
while(my $row = <$str_h>) {
chomp $row;
print join(',',
map { length $_ ? $_ : 'N/A'} split /,/, $row, -1
), "\n";
}
Output:
E:\Home> t.pl
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A
You can also use:
pos $str -= 1 while $str =~ s{,(,|\n)}{,N/A$1}g;
Explanation: When s/// finds a ,, and replaces it with ,N/A, it has already moved to the character after the last comma. So, it will miss some consecutive commas if you only use
$str =~ s{,(,|\n)}{,N/A$1}g;
Therefore, I used a loop to move pos $str back by a character after each successful substitution.
Now, as #ysth shows:
$str =~ s!,(?=[,\n])!,N/A!g;
would make the while unnecessary.
I couldn't quite make out what you were trying to do in your lookbehind example, but I suspect you are suffering from a precedence error there, and that everything after the lookbehind should be enclosed in a (?: ... ) so the | doesn't avoid doing the lookbehind.
Starting from scratch, what you are trying to do sounds pretty simple: place N/A after a comma if it is followed by another comma or a newline:
s!,(?=[,\n])!,N/A!g;
Example:
my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";
use Data::Dumper;
$Data::Dumper::Useqq = $Data::Dumper::Terse = 1;
print Dumper($rawData);
$rawData =~ s!,(?=[,\n])!,N/A!g;
print Dumper($rawData);
Output:
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n"
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A\n"
You could search for
(?<=,)(?=,|$)
and replace that with N/A.
This regex matches the (empty) space between two commas or between a comma and end of line.
The quick and dirty hack version:
my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";
while ($rawData =~ s/,,/,N\/A,/g) {};
print $rawData;
Not the fastest code, but the shortest. It should loop through at max twice.
Not a regex, but not too complicated either:
$string = join ",", map{$_ eq "" ? "N/A" : $_} split (/,/, $string,-1);
The ,-1 is needed at the end to force split to include any empty fields at the end of the string.