How to pass a variable line number in sed substitute command - regex

I am trying to do a sed operation like this
sed -i '100s/abc/xyz/' filename.txt
I wanted 100 in a variable say $var from a perl script. So, I am trying like this
system("sed -i "${vars}s/abc/xyz/" filename.txt").
This is throwing some error.
Again when I am doing like this putting system command in single quotes:
system('sed -i "${vars}s/abc/xyz/" filename.txt')
this is substituting wrongly. What can be done?

Better and safer is to use the LIST variant of system, because it avoids unsafe shell command line parsing. The command, sed in your case, will receive the command line arguments un-alterated and without the need to quote them.
NOTE: I added -MO=Deparse just to illustrate what the one-liner compiles to.
NOTE: I added -e to be on the safe side as you have -i on the command line which expects a parameter.
$ perl -MO=Deparse -e 'system(qw{sed -i -e}, "${vars}s/abc/xyz/", qw{filename.txt})'
system(('sed', '-i', '-e'), "${vars}s/abc/xyz/", 'filename.txt');
-e syntax OK
Of course in reality it would be easier just to do the processing in Perl itself instead of calling sed...

Shelling out to sed from within perl is a road to unnecessary pain. You're introducing additional quoting and variable expansion layers, and that's at best making your code less clear, and at worst introducing bugs accidentally.
Why not just do it in native perl which is considerably more effective. Perl even allows you to do in place editing if you want.
But it's as simple as:
open ( my $input, '<', 'filename.txt');
open ( my $output, '>', 'filename.txt.new');
select $output;
while ( <$input> ) {
if ( $. == $vars ) {
s/abc/xyz/
}
print;
}
Or if you're really keen on the in place edit, you can look into setting `$^I:
Perl in place editing within a script (rather than one liner)
But I'd suggest 'just' renaming the file after you're done is as easy.

Related

How to update path in Perl script in multiple files

I am working on creating some training material where I am using perl. One of the things I want to do is have the scripts be set up for the student correctly, regardless of where they extra the compressed files. I am working on a Windows batch file that will copy the perl templates to the working location and then update path in the copy of the perl template files to the correct location. The perl template have this as the first line:
#!_BASE_software/perl/bin/perl.exe
The batch file looks like this:
SET TRAINING=%~dp0
copy %TRAINING%\template\*.pl %TRAINING%work
%TRAINING%software\perl\bin\perl -pi.bak -e 's/_BASE_/%TRAINING%/g' %TRAINING%work\*.pl
I have a few problems with this:
Perl doesn't seem to like the wildcard in the filename
It turns out that %TRAINING% is going to expand into a string with backslashes which need to be converted into forwardslashes and needs to be escaped within the regex.
How do I fix this?
First of all, Windows doesn't use the shebang line, so I'm not sure why you're doing any of this work in the first place.
Perl will read the shebang line and look for options if perl is found in the path, even on Windows, but that means that #!perl is sufficient if you want to pass options via the shebang line (e.g. #!perl -n).
Now, it's possible that you use Cygwin, MSYS or some other unix emulation instead of Windows to run the program, but you are placing a Windows path in the shebang line (C:...) rather than a unix path, so that doesn't make sense either.
There are three additional problems with the attempt:
cmd uses double-quotes for quoting.
cmd doesn't perform wildcard expansion like sh, so it's up to your program do it.
You are trying to generate Perl code from cmd. ouch.
If we go ahead, we get:
"%TRAINING%software\perl\bin\perl" -MFile::DosGlob=glob -pe"BEGIN { #ARGV = map glob, #ARGV; $base = $ENV{TRAINING} =~ s{\\}{/}rg } s/_BASE_/$base/g" -i.bak -- %TRAINING%work\*.pl
If we add line breaks for readability, we get the following (that cmd won't accept):
"%TRAINING%software\perl\bin\perl"
-MFile::DosGlob=glob
-pe"
BEGIN {
#ARGV = map glob, #ARGV;
$base = $ENV{TRAINING} =~ s{\\}{/}rg
}
s/_BASE_/$base/g
"
-i.bak -- %TRAINING%work\*.pl

How to run list of perl regex from file in terminal

I'm fairly new to the whole coding game, and am very grateful for every answer!
I am working on a directory with many .txt files in them and have a file with looong list of regex like "perl -p -i -e 's/\n\n/\n/g' *.xml" they all work if I copy them to terminal. But is there a possibility to run them straight from the file?
I tried ./unicode.sh but that resulted in:
No such file or directory.
Any ideas?
Thank you so much!
Here's a (mostly) equivalent Perl script to the oneliner perl -p -i -e 's/\n\n/\n/g' *.xml (one main difference being that this has strict and warnings enabled, which is strongly recommended), which you could expand upon by putting more code to modify the current line in the body of the while loop.
#!/usr/bin/env perl
use warnings;
use strict;
if (!#ARGV) { # if no files on command line
#ARGV = glob('*.xml'); # get a default list of files
}
local $^I = ''; # enable inplace editing (like perl -i)
while (<>) { # read each line of each file into $_
s/\n\n/\n/g; # modify $_ with a regex
# more regexes here...
print; # write the line $_ back out
}
You can save this script in a file such as process.pl, and then run it with perl process.pl, or do chmod u+x process.pl and then run it via ./process.pl.
On the other hand, you really shouldn't modify XML files with regular expressions, there are lots of Perl modules to do XML processing - I wrote about that some more here. Also, in the example you showed, s/\n\n/\n/g actually won't have any effect, since when reading files line-by-line, no string will contain two \n's (you can change how Perl reads files, but I don't see any mention of that in the question).
Edit: You've named the script in your example unicode.sh - if you're processing Unicode files, then Perl has very powerful features to help with that, although the code won't necessarily end up as nice and short as I've showed above. You'll have to tell us some more about what you're doing, and show some example input and output, to get suggestions about that. See also e.g. perlunitut.
It's likely if you got no such file or directory, your problem was you forgot to make unicode.sh executable, as in chmod +x unicode.sh, assuming that's a script that you wrote.
Of course the normal way to run multiple perl commands is this thing that looks like runme.pl which you write, i.e., a perl script.
That said, yes, everything will work from the terminal, you just need to be careful about escaping that bash performs.

Execute command from variable

in the continuation of this, Awk doesn't match all match all my entries, I am now trying to write a script to execute this on different machine. In the script, I want to run /usr/xpg4/bin/awk if it exists else regular awk.
I can't do just a simple if else because my script is too complex - I wan't to do something user friendly and it has some options.
So I record the proper awk in a variable like this :
command='awk '"'"'match($0,/^[[:alpha:]_][[:alnum:]_]*\**[[:space:]]+[[:alpha:]_][[:alnum:]_]*[[:space:]]*\([^)]*\)/) { print substr($0,RSTART,RLENGTH) ";\n" }'"'";
after what I try to execute it
code=$($command $file);
I get this error :
awk: command line:1: 'match($0,/^[[:alpha:]_][[:alnum:]_]*\**[[:space:]]+[[:alpha:]_][[:alnum:]_]*[[:space:]]*\([^)]*\)/)
awk: command line:1: ^ bad character « ' » in expression
It doesn't mean anything if I take them off...
Roughly, don't do it like that.
AWK=/usr/xpg4/bin/awk
if [ ! -x "$AWK" ]
then AWK="awk"
fi
Then you can use:
code=$("$AWK" '…your awk script…' "$file")
Or you can put your script into a file, script.awk, and use:
code=$("$AWK" -f script.awk "$file")
We can debate the merits of the double quotes around the use of "$AWK"; there are pros and cons.
If you need different awk scripts for the different sub-species of (Solaris?) awk, then you can create different script.awk files and still use the common notation with -f script.awk to execute the scripts.
And there's no obligation to use the name script.awk; it is just illustrative. Indeed, if you create it on the fly, you should ensure it is uniquely named (e.g. by adding $$, the current process ID, into the name). Beware of security issues. I'm not sure if Solaris comes with mktemp command to create a temporary file securely.

Extracting group from regex in shell script using grep

I want to extract the output of a command run through shell script in a variable but I am not able to do it. I am using grep command for the same. Please help me in getting the desired output in a variable.
x=$(pwd)
pw=$(grep '\(.*\)/bin' $x)
echo "extracted is:"
echo $pw
The output of the pwd command is /opt/abc/bin/ and I want only /root/abc part of it. Thanks in advance.
Use dirname to get the path and not the last segment of the path.
You can use:
x=$(pwd)
pw=`dirname $x`
echo $pw
Or simply:
pw=`dirname $(pwd)`
echo $pw
All of what you're doing can be done in a single echo:
echo "${PWD%/*}"
$PWD variable represents current directory and %/* removes last / and part after last /.
For your case it will output: /root/abc
The second (and any subsequent) argument to grep is the name of a file to search, not a string to perform matching against.
Furthermore, grep prints the matching line or (with -o) the matching string, not whatever the parentheses captured. For that, you want a different tool.
Minimally fixing your code would be
x=$(pwd)
pw=$(printf '%s\n' "$x" | sed 's%\(.*\)/bin.*%\1%')
(If you only care about Bash, not other shells, you could do sed ... <<<"$x" without the explicit pipe; the syntax is also somewhat more satisfying.)
But of course, the shell has basic string manipulation functions built in.
pw=${x%/bin*}

How can I remove text at beginning of a file using a regex?

I have a bunch of files that contain a semi-standard header. That is, the look of it is very similar but the text changes somewhat.
I want to remove this header from all of the files.
From looking at the files, I know that what I want to remove is encapsulated between similar words.
So, for instance, I have:
Foo bar...some text here...
more text
Foo bar...I want to keep everything after this point
I tried this command in perl:
perl -pi -e "s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
But it doesn't work. I'm not a regex expert but hoping someone knows how to basically remove a chunk of text from the beginning of a file based on a text match and not the number of characters...
By default, ARGV (aka <> which is used behind-the-scenes by -p) only reads a single line at a time.
Workarounds:
Unset $/, which tells Perl to read a whole file at a time.
perl -pi -e "BEGIN{undef$/}s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
BEGIN is necessary to have that code run before the first read is done.
Use -0, which sets $/ = "\0".
perl -pi -0 -e "s/\A.*?Foo.bar*?Foo.bar//simxg" 00ws110.txt
Take advantage of the flip-flop operator.
perl -ni -e "print unless 1 ... /^Foo.bar/'
This will skip printing starting from line 1 to /^Foo.bar/.
If your header stretches across more than one line you must tell perl how much to read. If the files are small in comparison to memory you may want to just slurp the whole file into memory:
perl -0777pi.orig -e 's/your regex/your replace/s' file1 file2 file3
The -0777 option sets perl to slurp mode, so $_ will hold the each whole file each time through the loop. Also, always remember to set the backup extension. If you don't you may find that you have wiped out your data accidentally and have no way to get it back. See perldoc perlrun for more information.
Given information from the comments, it looks like you are trying to strip all of the annoying stuff from the front of a Project Gutenberg ebook. If you understand all of the copyright issues involved, you should be able to get rid of the front matter like this:
perl -ni.orig -e 'print unless 1 .. /^\*END/' 00ws110.txt
The Project Gutenberg header ends with
*END*THE SMALL PRINT! FOR PUBLIC DOMAIN ETEXTS*Ver.04.29.93*END*
A safer regex would take into account the *END* at the end of the line as well, but I am lazy.
I might be misinterpreting what you're asking for, but it looks to me that simple:
perl -ni -e 'print unless 1..($. > 1 && /^Foo bar/)'
Here you go! This replaces the first line of the file:
use Tie::File;
tie my #array,"Tie::File","path_to_file" or die("can't tie the file");
$array[0] =~s/text_i_want_to_replace/replacement_text/gi;
untie #array;
You can operate on the array and you will see the modifications in the array. You can delete elements from the array and it will erase the line from the file. Applying substitution on elements will substitute text from the lines.
If you want to delete the first two lines, and keep something from the third, you can do something like this :
# tie the #array before this
shift #array;
shift #array;
$array[0]=~s/foo bar\.\.\.//gi;
# untie the #array
and this will do exactly what you need!