Perl regex substitution not working with global modifier

Perl regex substitution not working with global modifier - regex

I have code that looks like the following:
s/(["\'])(?:\\?+.)*?\1/(my $x = $&) =~ s|^(["\'])(.*src=)([\'"])\/|$1$2$3$1.\\$baseUrl.$1\/|g;$x/ge
Ignoring the last bit (and only leaving the part where the problems occur) the code becomes:
s/(["\'])(?:\\?+.)*?\1/replace-text-here/g
I have tried using both, but I still get the same problem, which is that even though I am using the g modifier, this regex only matches and replaces the first occurrence. If this is a Perl bug, I don't know, but I was using a regex that matches everything between two quotes, and also handles escaped quotes, and I was following this blog post. In my eyes, that regex should match everything between the two quotes, then replace it, then try and find another instance of this pattern, because of the g modifier.
For a bit of background information, I am not using and version declarations, and strict and warnings are turned on, yet no warnings have shown up. My script reads an entire file into a scalar (including newlines) then the regex operates directly on that scalar. It does seem to work on each line individually - just not multiple times on one line. Perl version 5.14.2, running on Cygwin 64-bit. It could be that Cygwin (or the Perl port) is messing something up, but I doubt it.
I also tried another example from that blog post, with atomic groups and possessive quantifiers replaced with equivalent code but without those features, but this problem still plagued me.
Examples:
<?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
Should become (with the shortened regex):
<?php echo ($watched_dir->getExistsFlag())?replace-text-here:replace-text-here?>
Yet it only becomes:
<?php echo ($watched_dir->getExistsFlag())?replace-text-here:"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
<?php echo ($sub->getTarget() != "")?"target=\"".$sub->getTarget()."\"":""; ?>
Should become:
<?php echo ($sub->getTarget() != replace-text-here)?replace-text-here.$sub->getTarget().replace-text-here:replace-text-here; ?>
And as above, only the first occurrence is changed.
(And yes, I do realise that this will spark into some sort of - don't use regex for parsing HTML/PHP. But in this case I think that regex is more appropriate, as I am not looking for context, I am looking for a string (anything within quotes) and performing an operation on that string - which is regex.)
And just a note - these regexes are running in an eval function, and the actual regex is encoded in a single quoted string (which is why the single quotes are escaped). I will try any presented solution directly though to rule out my bad programming.
EDIT: As requested, a short script that presents the problems:
#!/usr/bin/perl -w
use strict;
my $data = "this is the first line, where nothing much happens
but on the second line \"we suddenly have some double quotes\"
and on the third line there are 'single quotes'
but the fourth line has \"double quotes\" AND 'single quotes', but also another \"double quote\"
the fifth line has the interesting one - \"double quoted string 'with embedded singles' AND \\\"escaped doubles\\\"\"
and the sixth is just to say - we need a new line at the end to simulate a properly structured file
";
my $regex = 's/(["\'])(?:\\?+.)*?\1/replaced!/g';
my $regex2 = 's/([\'"]).*?\1/replaced2!/g';
print $data."\n";
$_ = $data; # to make the regex operate on $_, as per the original script
eval($regex);
print $_."\n";
$_ = $data;
eval($regex2);
print $_; # just an example of an eval, but without the fancy possessive quantifiers
This produces the following output for me:
this is the first line, where nothing much happens
but on the second line "we suddenly have some double quotes"
and on the third line there are 'single quotes'
but the fourth line has "double quotes" AND 'single quotes', but also another "double quote"
the fifth line has the interesting one - "double quoted string 'with embedded singles' AND \"escaped doubles\""
and the sixth is just to say - we need a new line at the end to simulate a properly structured file
this is the first line, where nothing much happens
but on the second line "we suddenly have some double quotes"
and on the third line there are 'single quotes'
but the fourth line has "double quotes" AND 'single quotes', but also another "double quote"
the fifth line has the interesting one - "double quoted string 'with embedded singles' AND \"escaped doubles\replaced!
and the sixth is just to say - we need a new line at the end to simulate a properly structured file
this is the first line, where nothing much happens
but on the second line replaced2!
and on the third line there are replaced2!
but the fourth line has replaced2! AND replaced2!, but also another replaced2!
the fifth line has the interesting one - replaced2!escaped doubles\replaced2!
and the sixth is just to say - we need a new line at the end to simulate a properly structured file

Even within single-quotes, \\ gets processed as \, so this:
my $regex = 's/(["\'])(?:\\?+.)*?\1/replaced!/g';
sets $regex to this:
s/(["'])(?:\?+.)*?\1/replaced!/g
which requires each character in the quoted-string to be preceded by one or more literal question-marks (\?+). Since you don't have lots of question-marks, this effectively means that you're requiring the string to be empty, either "" or ''.
The minimal fix is to add more backslashes:
my $regex = 's/(["\'])(?:\\\\?+.)*?\\1/replaced!/g';
but you really might want to rethink your approach. Do you really need to save the whole regex-replacement command as a string and run it via eval?

Update: this:
my $regex = 's/(["\'])(?:\\?+.)*?\1/replaced!/g';
should be:
my $regex = 's/(["\'])(?:\\\\?+.)*?\1/replaced!/g';
since those single quotes there in the assignment turn \\ into \ and you want the regex to end up with \\.
Please boil your problem down to a short script that demonstrates the problem (including input, bad output, eval and all). Taking what you do show and trying it:
use strict;
use warnings;
my $input = <<'END';
<?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
END
(my $output = $input) =~ s/(["\'])(?:\\?+.)*?\1/replace-text-here/g;
print $input,"becomes\n",$output;
produces for me:
<?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?>
becomes
<?php echo ($watched_dir->getExistsFlag())?replace-text-here:replace-text-here?>
as I would expect. What does it do for you?

Related

Perl regex - print only modified line (like sed -n 's///p')

I have a command that outputs text in the following format:
misc1=poiuyt
var1=qwerty
var2=asdfgh
var3=zxcvbn
misc2=lkjhgf
etc. I need to get the values for var1, var2, and var3 into variables in a perl script.
If I were writing a shell script, I'd do this:
OUTPUT=$(command | grep '^var-')
VAR1=$(echo "${OUTPUT}" | sed -ne 's/^var1=\(.*\)$/\1/p')
VAR2=$(echo "${OUTPUT}" | sed -ne 's/^var2=\(.*\)$/\1/p')
VAR3=$(echo "${OUTPUT}" | sed -ne 's/^var3=\(.*\)$/\1/p')
That populates OUTPUT with the basic content that I want (so I don't have to run the original command multiple times), and then I can pull out each value using sed VAR1 = 'qwerty', etc.
I've worked with perl in the past, but I'm pretty rusty. Here's the best I've been able to come up with:
my $output = `command | grep '^var'`;
(my $var1 = $output) =~ s/\bvar1=(.*)\b/$1/m;
print $var1
This correctly matches and references the value for var1, but it also returns the unmatched lines, so $var1 equals this:
qwerty
var2=asdfgh
var3=zxcvbn
With sed I'm able to tell it to print only the modified lines. Is there a way to do something similar with in perl? I can't find the equivalent of sed's p modifier in perl.
Conversely, is there a better way to extract those substrings from each line? I'm sure I could match match each line and split the contents or something like that, but was trying to stick with regex since that's how I'd typically solve this outside of perl.
Appreciate any guidance. I'm sure I'm missing something relatively simple.

One way
my #values = map { /\bvar(?:1|2|3)\s*=\s*(.*)/ ? $1 : () } qx(command);
The qx operator ("backticks") returns a list of all lines of output when used in list context, here imposed by map. (In a scalar context it returns all output in a string, possibly multiline.) Then map extracts wanted values: the ternary operator in it returns the capture, or an empty list when there is no match (so filtering out such lines). Please adjust the regex as suitable.
Or one can break this up, taking all output, then filtering needed lines, then parsing them. That allows for more nuanced, staged processing. And then there are libraries for managing external commands that make more involved work much nicer.
A comment on the Perl attempt shown in the question
Since the backticks is assigned to a scalar it is in scalar context and thus returns all output in a string, here multiline. Then the following regex, which replaces var1=(.*) with $1, leaves the next two lines since . does not match a newline so .* stops at the first newline character.
So you'd need to amend that regex to match all the rest so to replace it all with the capture $1. But then for other variables the pattern would have to be different. Or, could replace the input string with all three var-values, but then you'd have a string with those three values in it.
So altogether: using the substitution here (s///) isn't suitable -- just use matching, m//.
Since in list context the match operator also returns all matches another way is
my #values = qx(command) =~ /\bvar(?:1|2|3)\s*=\s*(.*)/g;
Now being bound to a regex, qx is in scalar context and so it returns a (here multiline) string, which is then matched by regex. With /g modifier the pattern keeps being matched through that string, capturing all wanted values (and nothing else). The fact that . doesn't match a newline so .* stops at the first newline character is now useful.
Again, please adjust the regex as suitable to yoru real problem.
Another need came up, to capture both the actual names of variables and their values. Then add capturing parens around names, and assign to a hash
my %val = map { /\b(var(?:1|2|3))\s*=\s*(.*)/ ? ($1, $2) : () } qx(command);
or
my %val = qx(command) =~ /\b(var(?:1|2|3))\s*=\s*(.*)/g;
Now the map for each line of output from command returns a pair of var-name + value, and a list of such pairs can be assigned to a hash. The same goes with subsequent matches (under /g) in the second case..

In scalar context, s/// and s///g return whether it found a match or not. So you can use
print $s if $s =~ s///;

Perl multiline regex in windows

I'm stuck with this scenario, I have this regex
*Input added here for clarity:
181221533;MG;3;1476729;<vars> <vint> <name>mtest</name> <storedPrecedure>f_sc_mtest</SP> <base>M_data</base> <dataType>I</dataType> <timeMS>17</timeMS> <ttidr>abc</ttidr> <base>S</base> <valor>0</valor> </vint> </vars>;889;6;85;112;01/01/2019;29/05/2019 17:17:48
182652972;MG;6314429;740484;<vars> <vint> <name>mtest</name> <sP>f_sc_mtest</sP> <base>sscy</base> <dataType>I</dataType> <timeMS>16</timeMS> <ttidr>abc</Idtype> <base>S</base> <valor>4</valor> </vint></vars>;-1;8;57217;57228;01/01/2019;06/06/2019 22:20:48
182652984;ModeloSP;6314429;740484;<vars> <vint> <name>tc_p_act</name> <sP>rndom_name</sP> <base>sscyo</base> <dataType>I</dataType> <timeMS>0</timeMS> <Idtype>XYZ</Idtype> <base>O</base> </vint>
</vars>;0;;0;41;01/01/2019;06/06/2019 22:31:22
182652988;ModeloSP;6314429;740484;<vars> <vint> <name>tc_p_act</name> <sP>rndom_name</sP> <base>sscyo</base> <dataType>I</dataType> <timeProcess>1</timeProcess> <Idtype>XYZ</Idtype> <base>O</base> </vint>
</vars>;0;;0;85;01/01/2019;06/06/2019 22:37:36
And I want to implement this regex in perl with multiline support because as you can see in the sample, there are line breaks in records and this regex searchs 'incomplete' lines (and the extra line) and fixes them (one record/line should end with a datetime)
this is what I'm attempting with perl:
perl.exe -0777 -i -pe "s/(?m)^(.*)(>)([\n]+)(<)(.*)([\n]+)(\s*)$/$1$2 $4$5/igs" "sample.txt"
And doesn't seem to work, I keep getting the same text file. I'm using perl inside a portable GIT installation (v5.34.0)
Is there something I'm missing?
edit: This is how the output should look like:
181221533;MG;3;1476729;<vars> <vint> <name>mtest</name> <storedPrecedure>f_sc_mtest</SP> <base>M_data</base> <dataType>I</dataType> <timeMS>17</timeMS> <ttidr>abc</ttidr> <base>S</base> <valor>0</valor> </vint> </vars>;889;6;85;112;01/01/2019;29/05/2019 17:17:48
182652972;MG;6314429;740484;<vars> <vint> <name>mtest</name> <sP>f_sc_mtest</sP> <base>sscy</base> <dataType>I</dataType> <timeMS>16</timeMS> <ttidr>abc</Idtype> <base>S</base> <valor>4</valor> </vint></vars>;-1;8;57217;57228;01/01/2019;06/06/2019 22:20:48
182652984;ModeloSP;6314429;740484;<vars> <vint> <name>tc_p_act</name> <sP>rndom_name</sP> <base>sscyo</base> <dataType>I</dataType> <timeMS>0</timeMS> <Idtype>XYZ</Idtype> <base>O</base> </vint> </vars>;0;;0;41;01/01/2019;06/06/2019 22:31:22
182652988;ModeloSP;6314429;740484;<vars> <vint> <name>tc_p_act</name> <sP>rndom_name</sP> <base>sscyo</base> <dataType>I</dataType> <timeProcess>1</timeProcess> <Idtype>XYZ</Idtype> <base>O</base> </vint> </vars>;0;;0;85;01/01/2019;06/06/2019 22:37:36

Capture the whole record and replace all newlines in it by a space, using another regex inside the replacement part (courtesy of /e modifier). Then replace all multiple newlines by a single one
perl.exe -0777 -wpe'
s{ (?:^|\R)\K (\d{9}; .*? \s+\d\d:\d\d:\d\d) }{$1 =~ s/\n+/ /r}segx; s{\n+}{\n}g
' file.txt
I consider a "record" to be: [0-9]{9}; on line/file beginning, then all up to and including a timestamp after spaces. The details for beginning and end of record should protect against accidental matching of possible unexpected patterns inside those tags.
This is cumbersome but it captures the record correctly I hope, even if some details change.
Apparently the above fails on Windows as it stands, while it is confirmed to work on Linux (the only system I can try it on right now).
The issue must be in newlines -- so try replacing \n in matches with \R or \r\n. In particular in the regex embedded in the replacement part. Or, to be safe and perhaps portable, replace \n with (\r?\n) (so the carriage return character is optional, need not be there in order to match).
So either
s{ (?:^|\R)\K (\d{9}; .*? \s+\d\d:\d\d:\d\d) }{$1 =~ s/\R+/ /r}segx; s{\R+}{\r\n}g
or
s{ (?:^|\R)\K(\d{9};.*?\s+\d\d:\d\d:\d\d) }{$1 =~ s/(\r\n)+/ /r}segx; s{(\r\n)+}{\r\n}g
But \R should match it on Windows, so you should be able to use \R for matching and \r\n when needed in replacements. See it under Misc in perlbackslash
Better yet, if it works, is to use PerlO layers. Normally a Windows build of Perl adds the :crlf layer by default but that seems not to be the case here.
In a one-liner try:
perl.exe -0777 -Mopen=:std,IO,:crlf -wpe'...'
Or, use the "one-liner" as a normal program, without file-processing switches, and set this up via open pragma and open a file manually
perl -wE'use open IO => ":crlf"; $_ = do { local $/; <> }; s{...}{...}; say' file
With layers set like this (in either way) use the regex with \n.

If the issue is having newlines in the wrong place, either multiple newlines in a row, or before a <, you may get away with something simple like this:
use strict;
use warnings;
my $str = do { local $/; <DATA> };
$str =~ s/\n(?=[<\n])//g;
print $str;
__DATA__
181221533;<valor>0</valor></vars>;889;6;85;112;01/01/2019;29/05/2019 17:17:48
182652972;</vars>;-1;8;57217;57228;01/01/2019;06/06/2019 22:20:48
182652984;</vint>
</vars>;0;;0;41;01/01/2019;06/06/2019 22:31:22
182652988; </vint>
</vars>;0;;0;85;01/01/2019;06/06/2019 22:37:36
(I shortened the input to make it readable)
Output:
181221533;<valor>0</valor></vars>;889;6;85;112;01/01/2019;29/05/2019 17:17:48
182652972;</vars>;-1;8;57217;57228;01/01/2019;06/06/2019 22:20:48
182652984;</vint></vars>;0;;0;41;01/01/2019;06/06/2019 22:31:22
182652988; </vint></vars>;0;;0;85;01/01/2019;06/06/2019 22:37:36

This seems to produce the wanted output:
perl.exe -0777 -pe "s: *\n(?=</): :g;s/\n+/\n/g"
The first substitution replaces whitespace followed by a newline before </ by four spaces.
The second substitution replaces multiple newlines by a single one. You can also replace it by a transliteration: tr/\n//s, the /s "squeezes" the newlines.

Edit within multi-line sed match

I have a very large file, containing the following blocks of lines throughout:
start :234
modify 123 directory1/directory2/file.txt
delete directory3/file2.txt
modify 899 directory4/file3.txt
Each block starts with the pattern "start : #" and ends with a blank line. Within the block, every line starts with "modify # " or "delete ".
I need to modify the path in each line, specifically appending a directory to the front. I would just use a general regex to cover the entire file for "modify #" or "delete ", but due to the enormous amount of other data in that file, there will likely be other matches to this somewhat vague pattern. So I need to use multi-line matching to find the entire block, and then perform edits within that block. This will likely result in >10,000 modifications in a single pass, so I'm also trying to keep the execution down to less than 30 minutes.
My current attempt is a sed one-liner:
sed '/^start :[0-9]\+$/ { :a /^[modify|delete] .*$/ { N; ba }; s/modify [0-9]\+ /&Appended_DIR\//g; s/delete /&Appended_DIR\//g }' file_to_edit
Which is intended to find the "start" line, loop while the lines either start with a "modify" or a "delete," and then apply the sed replacements.
However, when I execute this command, no changes are made, and the output is the same as the original file.
Is there an issue with the command I have formed? Would this be easier/more efficient to do in perl? Any help would be greatly appreciated, and I will clarify where I can.

I think you would be better off with perl
Specifically because you can work 'per record' by setting $/ - if you're records are delimited by blank lines, setting it to \n\n.
Something like this:
#!/usr/bin/env perl
use strict;
use warnings;
local $/ = "\n\n";
while (<>) {
#multi-lines of text one at a time here.
if (m/^start :\d+/) {
s/(modify \d+)/$1 Appended_DIR\//g;
s/(delete) /$1 Appended_DIR\//g;
}
print;
}
Each iteration of the loop will pick out a blank line delimited chunk, check if it starts with a pattern, and if it does, apply some transforms.
It'll take data from STDIN via a pipe, or myscript.pl somefile.
Output is to STDOUT and you can redirect that in the normal way.
Your limiting factor on processing files in this way are typically:
Data transfer from disk
pattern complexity
The more complex a pattern, and especially if it has variable matching going on, the more backtracking the regex engine has to do, which can get expensive. Your transforms are simple, so packaging them doesn't make very much difference, and your limiting factor will be likely disk IO.
(If you want to do an in place edit, you can with this approach)
If - as noted - you can't rely on a record separator, then what you can use instead is perls range operator (other answers already do this, I'm just expanding it out a bit:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
if ( /^start :/ .. /^$/)
s/(modify \d+)/$1 Appended_DIR\//g;
s/(delete) /$1 Appended_DIR\//g;
}
print;
}
We don't change $/ any more, and so it remains on it's default of 'each line'. What we add though is a range operator that tests "am I currently within these two regular expressions" that's toggled true when you hit a "start" and false when you hit a blank line (assuming that's where you would want to stop?).
It applies the pattern transformation if this condition is true, and it ... ignores and carries on printing if it is not.

sed's pattern ranges are your friend here:
sed -r '/^start :[0-9]+$/,/^$/ s/^(delete |modify [0-9]+ )/&prepended_dir\//' filename
The core of this trick is /^start :[0-9]+$/,/^$/, which is to be read as a condition under which the s command that follows it is executed. The condition is true if sed currently finds itself in a range of lines of which the first matches the opening pattern ^start:[0-9]+$ and the last matches the closing pattern ^$ (an empty line). -r is for extended regex syntax (-E for old BSD seds), which makes the regex more pleasant to write.

I would also suggest using perl. Although I would try to keep it in one-liner form:
perl -i -pe 'if ( /^start :/ .. /^$/){s/(modify [0-9]+ )/$1Append_DIR\//;s/(delete )/$1Append_DIR\//; }' file_to_edit
Or you can use redirection of stdout:
perl -pe 'if ( /^start :/ .. /^$/){s/(modify [0-9]+ )/$1Append_DIR\//;s/(delete )/$1Append_DIR\//; }' file_to_edit > new_file

with gnu sed (with BRE syntax):
sed '/^start :[0-9][0-9]*$/{:a;n;/./{s/^\(modify [0-9][0-9]* \|delete \)/\1NewDir\//;ba}}' file.txt
The approach here is not to store the whole block and to proceed to the replacements. Here, when the start of the block is found the next line is loaded in pattern space, if the line is not empty, replacements are performed and the next line is loaded, etc. until the end of the block.
Note: gnu sed has the alternation feature | available, it may not be the case for some other sed versions.
a way with awk:
awk '/^start :[0-9]+$/,/^$/{if ($1=="modify"){$3="newdirMod/"$3;} else if ($1=="delete"){$2="newdirDel/"$2};}{print}' file.txt

This is very simple in Perl, and probably much faster than the sed equivalent
This one-line program inserts Appended_DIR/ after any occurrence of modify 999 or delete at the start of a line. It uses the range operator to restrict those changes to blocks of text starting with start :999 and ending with a line containing no printable characters
perl -pe"s<^(?:modify\s+\d+|delete)\s+\K><Appended_DIR/> if /^start\s+:\d+$/ .. not /\S/" file_to_edit

Good grief. sed is for simple substitutions on individual lines, that is all. Once you start using constructs other than s, g, and p (with -n) you are using the wrong tool. Just use awk:
awk '
/^start :[0-9]+$/ { inBlock=1 }
inBlock { sub(/^(modify [0-9]+|delete) /,"&Appended_DIR/") }
/^$/ { inBlock=0 }
{ print }
' file
start :234
modify 123 Appended_DIR/directory1/directory2/file.txt
delete Appended_DIR/directory3/file2.txt
modify 899 Appended_DIR/directory4/file3.txt
There's various ways you can do the above in awk but I wrote it in the above style for clarity over brevity since I assume you aren't familiar with awk but should have no trouble following that since it reuses your own sed scripts regexps and replacement text.

Why do I get "-bash: syntax error near unexpected token `('" when I run my Perl one-liner?

This is driving me insane. Here's my dilemma, I have a file in which I need to make a match. Usually I use Perl and it works like a charm but in this case I am writing a shell script and for some reason it is throwing errors.
Here is what I am trying to match:
loop_loopStorage_rev='latest.integration'
I need to match loop and latest.integration.
This is my regex:
^(?!\#)(loop_.+rev).*[\'|\"](.*)[\'|\"]$
When I use this in a Perl script, $1 and $2 give me the appropriate output. If I do this:
perl -nle "print qq{$1 => $2} while /^(?!#)(loop_.+rev).+?[\'|\"](.+?)[\'|\"]$/g" non-hadoop.env
I get the error:
-bash: syntax error near unexpected token `('
I believe it has something to do with the beginning part of my regex. So my real question is would there be an easier solution using sed, egrep or awk? If so, does any one know where to begin?

Using single quotes around your arguments to prevent special processing of $, \, etc. If you need to include a single quote within, the generic solution is to use '\''. In this particular case, however, we can avoid trying to include a ' by using the equivalent \x27 in the regex pattern instead.
perl -nle'
print "$1 => $2"
while /^(?!#)(loop_.+rev).+?[\x27\"|](.+?)[\x27\"|]$/g;
' non-hadoop.env
[I added some line breaks for readability. You can actually leave them in if you want to, but you don't need to.]
Note that there are some problems with your regex pattern.
(?!\#)(loop_.+rev) is the same as (loop_.+rev) since l isn't #, so (?!\#) isn't doing whatever you think it's doing.
[\'|\"] matches ', " and |, but I think you only meant it to match ' and ". If so, you want to use [\'\"], which can be simplified to ['"].
Don't use the non-greedy modifier (? after +, *, etc). It's used for optimization, not for excluding characters. In fact, the second ? in your pattern has absolutely no effect, so it's not doing what you think it's doing.
Fixed?
perl -nle'
print "$1 => $2"
while /^(loop_.+rev).+[\x27"]([^\x27"]*)[\x27"]$/g;
' non-hadoop.env

Double quotes cause Bash to replace variable references like $1 and $2 with the values of these shell variables. Use single quotes around your Perl script to avoid this (or quote every dollar sign, backtick, etc in the script).
However, you cannot escape single quotes inside single quotes easily; a common workaround in Perl strings is to use the character code \x27 instead. If you need single-quoted Perl strings, use the generalized single-quoting operator q{...}.
If you need to interpolate a shell variable name, a common trick is to use "see-saw" quoting. The string 'str'"in"'g' in the shell is equal to 'string' after quote removal; you can similarly use adjacent single-quoted and double-quoted strings to build your script ... although it does tend to get rather unreadable.
perl -nle 'print "Instance -> $1\nRevision -> $2"
while /^(?!#)('"$NAME"'_.+rev).+[\x27"]([^\x27"]*)[\x27"]$/g' non-hadoop.en
(Notice that the options -nle are not part of the script; the script is the quoted argument to the -e option. In fact perl '-nle script ...' coincidentally works, but it is decidedly unidiomatic, to the point of confusing.)

I ended up figuring out due to all of you guys help. Thanks again. Here is my final command
perl -nle 'print "$1 $2" while /^($ENV{NAME}_.+rev).+\x27(.+)\x27/g;' $ENVFILE

Perl: Grabbing the nth and mth delimited words from each line in a file

Because of the more tedious way of adding hosts to be monitored in Nagios (it requires defining a host object, as opposed to the previous program which only required the IP and hostname), I figured it'd be best to automate this, and it'd be a great time to learn Perl, because all I know at the moment is C/C++ and Java.
The file I read from looks like this:
xxx.xxx.xxx.xxx hostname #comments. i.dont. care. about
All I want are the first 2 bunches of characters. These are obviously space delimited, but for the sake of generality, it might as well be anything. To make it more general, why not the first and third, or fourth and tenth? Surely there must be some regex action involved, but I'll leave that tag off for the moment, just in case.

The one-liner is great, if you're not writing more Perl to handle the result.
More generally though, in the context of a larger Perl program, you would either write a custom regular expression, for example:
if($line =~ m/(\S+)\s+(\S+)/) {
$ip = $1;
$hostname = $2;
}
... or you would use the split operator.
my #arr = split(/ /, $line);
$ip = $arr[0];
$hostname = $arr[1];
Either way, add logic to check for invalid input.

Let's turn this into code golf! Based on David's excellent answer, here's mine:
perl -ane 'print "#F[0,1]\n";'
Edit: A real golf submission would look more like this (shaving off five strokes):
perl -ape '$_="#F[0,1]
"'
but that's less readable for this question's purposes. :-P

Here's a general solution (if we step away from code-golfing a bit).
#!/usr/bin/perl -n
chop; # strip newline (in case next line doesn't strip it)
s/#.*//; # strip comments
next unless /\S/; # don't process line if it has nothing (left)
#fields = (split)[0,1]; # split line, and get wanted fields
print join(' ', #fields), "\n";
Normally split splits by whitespace. If that's not what you want (e.g., parsing /etc/passwd), you can pass a delimiter as a regex:
#fields = (split /:/)[0,2,4..6];
Of course, if you're parsing colon-delimited files, chances are also good that such files don't have comments and you don't have to strip them.

A simple one-liner is
perl -nae 'print "$F[0] $F[1]\n";'
you can change the delimiter with -F

David Nehme said:
perl -nae 'print "$F[0] $F[1}\n";
which uses the -a switch. I had to look that one up:
-a turns on autosplit mode when used with a -n or -p. An implicit split
command to the #F array is done as the first thing inside the implicit
while loop produced by the -n or -p.
you learn something every day. -n causes each line to be passed to
LINE:
while (<>) {
... # your program goes here
}
And finally -e is a way to directly enter a single line of a program. You can have more than -e. Most of this was a rip of the perlrun(1) manpage.

Since ray asked, I thought I'd rewrite my whole program without using Perl's implicitness (except the use of <ARGV>; that's hard to write out by hand). This will probably make Python people happier (braces notwithstanding :-P):
while (my $line = <ARGV>) {
chop $line;
$line =~ s/#.*//;
next unless $line =~ /\S/;
#fields = (split ' ', $line)[0,1];
print join(' ', #fields), "\n";
}
Is there anything I missed? Hopefully not. The ARGV filehandle is special. It causes each named file on the command line to be read, unless none are specified, in which case it reads standard input.
Edit: Oh, I forgot. split ' ' is magical too, unlike split / /. The latter just matches a space. The former matches any amount of any whitespace. This magical behaviour is used by default if no pattern is specified for split. (Some would say, but what about /\s+/? ' ' and /\s+/ are similar, except for how whitespace at the beginning of a line is treated. So ' ' really is magical.)
The moral of the story is, Perl is great if you like lots of magical behaviour. If you don't have a bar of it, use Python. :-P

To Find Nth to Mth Character In Line No. L --- Example For Finding Label
#echo off
REM Next line = Set command value to a file OR Just Choose Your File By Skipping The Line
vol E: > %temp%\justtmp.txt
REM Vol E: = Find Volume Lable Of Drive E
REM Next Line to choose line line no. +0 = line no. 1
for /f "usebackq delims=" %%a in (`more +0 %temp%\justtmp.txt`) DO (set findstringline=%%a& goto :nextstep)
:nextstep
REM Next line to read nth to mth Character here 22th Character to 40th Character
set result=%findstringline:~22,40%
echo %result%
pause
exit /b
Save as find label.cmd
The Result Will Be Your Drive E Label
Enjoy

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl regex substitution not working with global modifier - regex

Related

Perl regex - print only modified line (like sed -n 's///p')

Perl multiline regex in windows

Edit within multi-line sed match

Why do I get "-bash: syntax error near unexpected token `('" when I run my Perl one-liner?

Perl: Grabbing the nth and mth delimited words from each line in a file

Categories

Resources