Print how many substitutions took place in a Perl s///g? - regex

I've used web search, found similarly titled question How many substitutions took place in a Perl s///g? and tried to use it to print the number but have not been able to succeed.
My initial code was
perl -0777 -i.original -pe 's-\r\n-\n-igs' test.txt
When I tried
perl -0777 -i.original -pe "$c=s-\r\n-\n-igs;say qq'$c'" test.txt
I got nothing - no output and no replacements, when I tried
perl -0777 -i.original -pe '$c=s-\r\n-\n-igs;print qq($c\n)' test.txt
(print similar to other one-liners I used before) I got empty string in standard output but 454847 added as the beginning of file (and proper replacements).
I understand =~ is not needed in my case (What does =~ do in Perl?), so what is wrong with my code? How to print number of replacements made?

Because of -i, the default output handle isn't STDOUT but the output file. To print to STDOUT, you'll need to do so explicitly.
If using the cmd shell,
perl -0777pe"CORE::say STDOUT s/\r//g" -i.original test.txt
If using sh or similar,
perl -0777pe'CORE::say STDOUT s/\r//g' -i.original test.txt
Notes:
s/// is more idiomatic than s---
/i is useless here.
/s is useless here.
There no point in checking for Line Feeds. s/\r//g will work just as fine as s/\r\n/\n/g.
There's point in using a variable for the count, especially if using say instead of print.
One must use CORE::say instead of say for backwards compatibility reasons unless use feature qw( say ); or equivalent is used.
Neither s/\r\n/\n/g nor s/\r//g will work with a Windows build of Perl, and there's no way to do what you want when using -i on a Windows build of Perl. However, you are using a unix build of Perl (since MSYS is a unix emulation environment), so it's not an issue.

Related

Match multiline pattern in bash using Perl on macOS

On macOS, using built-in bash, I need to match two words on two consecutive lines within a file, say myKey and myValue.
Example file:
<dict>
<key>myKey</key>
<string>myValue</string>
</dict>
I already have a working command for substituting a value in such a pair using perl:
perl -i -p0e 's/(<key>myKey<\/key>\s*\n\s*<string>).+(<\/string>)/$1newValue$2/' -- "$filepath"
Question is, how do I simply find whether the file contains that key/value pair, without substituting anything, or, more to the point, just get to know, whether any substitution was made?
EDIT:
Within replacement pattern: \1 -> $1.
Added clarification to the question.
For the basic question you only need to change the substitution operator to the match operator, and print conditionally on whether it matches or not. This can also be done with substitution.
However, since this is in a bash script you can also exit from the perl program (one-liner) with a code that indicates whether there was a match/substitution; then the script can check $?.
To only check whether a pattern is in a file
perl -0777 -nE'say "yes" if /pattern/' -- "$file"
The -0777, that "slurps" the whole file (into $_), is safer than -0 which uses the null byte as records separator. Also, here you don't want -i (change file in place) and want -n (loop over records) instead of -p (also prints each). I use -E instead of -e to enable (all) features, for say. See all this in perlrun.
Inside a shell script you can use the truthy/falsy return of the match operator in exit
perl -0777 -nE'exit(/pattern/)' -- "$file"
# now check $? in shell
where you can now programatically check whether the pattern was found in the file.
Finally, to run the original substitution and be able to check whether any were made
perl -i -0777 -pe'exit(s/pattern/replacement/)' -- "$file"
# now check $? in shell
where now the exit code, so $? in the shell, is the number of substitutions made.
Keep in mind that this does abuse the basic success/failure logic of return codes.
See perlretut for a regex tutorial.

Perl & Regex within Windows CMD Line

Is there anyway to accomplish matching + storing all in one cmd line? So instead of saving the matches to an array: i.e.
($matches) = $filecontents =~ m/.../g
...the matches would save to a *.txt file? I have been experimenting for a couple of days now, and believe that I am close to a solution. But a few nuances of Perl and Windows CMD Prompt are preventing me from accomplishing this task. Here's what I most recently tried:
% perl -p -i.bak -e "m/(?<=")(\d\.\d+)(?=")/g" filename.extension
I am a beginner with the CMD line, and I am running Windows 7 (soon to be switching over to Linux). Obviously I need to specify a file to which I can save my matches. The trouble is, this is where my knowledge drops off. Could someone give me a hand with this? Any comments are appreciated. Thank you!
If I understand correctly, you want to pull out all of the matches from an entire file, and write those results to a separate file.
This will work if the below results are what you're after. I don't have a Windows box to test on, but this should work (you might have to use double quotes on the outside of the one-liner and escape the ones inside, but I'm not sure.
This one-liner iterates without printing (-n) the 'file.txt' file, and prints a match combined with a newline if there is one into the 'results.txt' file via command-line redirection:
perl -ne 'print "$_\n" for m/(?<=")(\d\.\d+)(?=")/g' file.txt > results.txt
Input file:
$ cat file.txt
one
two "9.162"
three one "6.3"
five one six
Output file:
$ cat results.txt
9.162
6.3

perl regex works in script but not when executed on the command line

I'm trying to manipulate multiline CoffeeScript comments in a file using perl. This is my regex:
^\t*###[\S\s]*?^\t*###
When I run this in a script where data is the file data, it does what I expect and replaces all multiline comments with "foo":
$data =~ s{^\t*###[\S\s]*?^\t*###}{"foo"}gme;
However, when I run this on the command line the file is unchanged:
perl -pi -e 's{^\t*###[\S\s]*?^\t*###}{"foo"}gme' file.coffee
I've used similar commands with different regular expressions and without the 'm' option and they all work. Is it the m option that's causing the issue? I'm sure its something simple.
In the implicit loops set by -n and -p it can be useful to define the values of $/ and $\. Using the -0 option puts Perl in paragraph mode and the special value 0777 puts Perl into file slurp mode.
perl -0777 -i -pe 's{^\t*###[\S\s]*?^\t*###}{"foo"}gme' file.coffee
The perl documentation for the -n/-p option states:
assume "while (<>) { ... }" loop around program
This means that each time the -e expression is executed, $_ is one line of the input file. Your s/// expression is expecting to operate on the whole entire file at once, so it won't work in this mode.

Create directory based on part of filename

First of all, I'm not a programmer — just trying to learn the basics of shell scripting and trying out some stuff.
I'm trying to create a function for my bash script that creates a directory based on a version number in the filename of a file the user has chosen in a list.
Here's the function:
lav_mappe () {
shopt -s failglob
echo "[--- Choose zip file, or x to exit ---]"
echo ""
echo ""
select zip in $SRC/*.zip
do
[[ $REPLY == x ]] && . $HJEM/build
[[ -z $zip ]] && echo "Invalid choice" && continue
echo
grep ^[0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}$ $zip; mkdir -p $MODS/out/${ver}
done
}
I've tried messing around with some other commands too:
for ver in $zip; do
grep "^[0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}$" $zip; mkdir -p $MODS/out/${ver}
done
And also find | grep — but I'm doing it wrong :(
But it ends up saying "no match" for my regex pattern.
I'm trying to take the filename the user has selected, then grep it for the version number (ALWAYS x.xx.x somewhere in the filename), and fianlly create a directory with just that.
Could someone give me some pointers what the command chain should look like? I'm very unsure about the structure of the function, so any help is appreciated.
EDIT:
Ok, this is how the complete function looks like now: (Please note, the sed(1) commands besides the directory creation is not created by me, just implemented in my code.)
Pastebin (Long code.)
I've got news for you. You are writing a Bash script, you are a programmer!
Your Regular Expression (RE) is of the "wrong" type. Vanilla grep uses a form known as "Basic Regular Expressions" (BRE), but your RE is in the form of an Extended Regular Expression (ERE). BRE's are used by vanilla grep, vi, more, etc. EREs are used by just about everything else, awk, Perl, Python, Java, .Net, etc. Problem is, you are trying to look for that pattern in the file's contents, not in the filename!
There is an egrep command, or you can use grep -E, so:
echo $zip|grep -E '^[0-9]\.[0-9]{1,2}\.[0-9]{1,2}$'
(note that single quotes are safer than double). By the way, you use ^ at the front and $ at the end, which means the filename ONLY consists of a version number, yet you say the version number is "somewhere in the filename". You don't need the {1} quantifier, that is implied.
BUT, you don't appear to be capturing the version number either.
You could use sed (we also need the -E):
ver=$(echo $zip| sed -E 's/.*([0-9]\.[0-9]{1,2}\.[0-9]{1,2}).*/\1/')
The \1 on the right means "replace everything (that's why we have the .* at front and back) with what was matched in the parentheses group".
That's a bit clunky, I know.
Now we can do the mkdir (there is no merit in putting everything on one line, and it makes the code harder to maintain):
mkdir -p "$MODS/out/$ver"
${ver} is unnecessary in this case, but it is a good idea to enclose path names in double quotes in case any of the components have embedded white-space.
So, good effort for a "non-programmer", particularly in generating that RE.
Now for Lesson 2
Be careful about using this solution in a general loop. Your question specifically uses select, so we cannot predict which files will be used. But what if we wanted to do this for every file?
Using the solution above in a for or while loop would be inefficient. Calling external processes inside a loop is always bad. There is nothing we can do about the mkdir without using a different language like Perl or Python. But sed, by it's nature is iterative, and we should use that feature.
One alternative would be to use shell pattern matching instead of sed. This particular pattern would not be impossible in the shell, but it would be difficult and raise other questions. So let's stick with sed.
A problem we have is that echo output places a space between each field. That gives us a couple of issues. sed delimits each record with a newline "\n", so echo on its own won't do here. We could replace each space with a new-line, but that would be an issue if there were spaces inside a filename. We could do some trickery with IFS and globbing, but that leads to unnecessary complications. So instead we will fall back to good old ls. Normally we would not want to use ls, shell globbing is more efficient, but here we are using the feature that it will place a new-line after each filename (when used redirected through a pipe).
while read ver
do
mkdir "$ver"
done < <(ls $SRC/*.zip|sed -E 's/.*([0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}).*/\1/')
Here I am using process substitution, and this loop will only call ls and sed once. BUT, it calls the mkdir program n times.
Lession 3
Sorry, but that's still inefficient. We are creating a child process for each iteration, to create a directory needs only one kernel API call, yet we are creating a process just for that? Let's use a more sophisticated language like Perl:
#!/usr/bin/perl
use warnings;
use strict;
my $SRC = '.';
for my $file (glob("$SRC/*.zip"))
{
$file =~ s/.*([0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}).*/$1/;
mkdir $file or die "Unable to create $file; $!";
}
You might like to note that your RE has made it through to here! But now we have more control, and no child processes (mkdir in Perl is a built-in, as is glob).
In conclusion, for small numbers of files, the sed loop above will be fine. It is simple, and shell based. Calling Perl just for this from a script will probably be slower since perl is quite large. But shell scripts which create child processes inside loops are not scalable. Perl is.

Perl regex: replace all backslashes with double-backslashes

Within a set of large files, I need to replace all occurrences of "\" with "\\". I'd like to use Perl for this purpose. Right now, I have the following:
perl -spi.bak -e '/s/\\/\\\\/gm' inputFile
This command was suggested to me, but it results in no change to inputFile (except an updated timestamp). Thinking that the problem might be that the "\"s were not surrounded by blanks, I tried
perl -spi.bak -e '/s/.\\./\\\\/gm' inputFile
Again, this had no effect on the file. Finally, I thought I might be missing a semicolon, so I tried:
perl -spi.bak -e '/s/.\\./\\\\/gm;' inputFile
This also has no effect. I know that my file contains "\"s, for example in the following line:
("C:\WINDOWS\system32\iac25_32.ax","Indeo audio)
I'm not sure whether there is a problem with the regex, or if something is wrong with the way I'm invoking Perl. I have a basic understanding of regexes, but I'm an absolute beginner when it comes to Perl.
Is there anything obviously wrong here? One thing I notice is that the command returns quite quickly, despite the fact that inputFile is ~10MB in size.
The hard part with handling backslashes in command lines is knowing how many processes are going to manipulate the command line - and what their quoting rules are.
On Unix, under any shell, the first command line you show would work.
You appear to be on Windows, and there, you have the DOS command 'shell' to deal with.
I would put the replacement into a file and pass that to Perl:
#!/bin/perl -spi.bak
s/\\/\\\\/g;
That should do the trick - save as 'subber.pl' and then run:
perl subber.pl file1 ...
How about this it should replace all \ with two \s.
s/\\/\\\\/g
perl -pi -e 's/\\/\\\\/g' inputfile
will replace all of them in one file
this
s/\\/\\\\/g
works for me
You've got a renegade / in the front of the substitution flag at the beginning of the regex
don't use
.\\.
otherwise it will trash whatever's before and after the \ in the file
perl -spi.bak -e 's/.\\./\\\\/gm;' inputFile
maybe?
Why did you type that leading /?
You appear to be on Windows, and
there, you have the DOS command
'shell' to deal with.
Hopefully I am not splitting hairs but Windows hasn't come with DOS for a long time now. I think ME (circa 1999/2000) was last version that still came with DOS or was built on DOS. The "command" shell is controlled by cmd.exe since XP (a sort of DOS simulation), but the affects of running a perl one-liner in a command shell might still be the same as running them in a DOS shell. Maybe someone can verify that.