Match multiline pattern in bash using Perl on macOS - regex

On macOS, using built-in bash, I need to match two words on two consecutive lines within a file, say myKey and myValue.
Example file:
<dict>
<key>myKey</key>
<string>myValue</string>
</dict>
I already have a working command for substituting a value in such a pair using perl:
perl -i -p0e 's/(<key>myKey<\/key>\s*\n\s*<string>).+(<\/string>)/$1newValue$2/' -- "$filepath"
Question is, how do I simply find whether the file contains that key/value pair, without substituting anything, or, more to the point, just get to know, whether any substitution was made?
EDIT:
Within replacement pattern: \1 -> $1.
Added clarification to the question.

For the basic question you only need to change the substitution operator to the match operator, and print conditionally on whether it matches or not. This can also be done with substitution.
However, since this is in a bash script you can also exit from the perl program (one-liner) with a code that indicates whether there was a match/substitution; then the script can check $?.
To only check whether a pattern is in a file
perl -0777 -nE'say "yes" if /pattern/' -- "$file"
The -0777, that "slurps" the whole file (into $_), is safer than -0 which uses the null byte as records separator. Also, here you don't want -i (change file in place) and want -n (loop over records) instead of -p (also prints each). I use -E instead of -e to enable (all) features, for say. See all this in perlrun.
Inside a shell script you can use the truthy/falsy return of the match operator in exit
perl -0777 -nE'exit(/pattern/)' -- "$file"
# now check $? in shell
where you can now programatically check whether the pattern was found in the file.
Finally, to run the original substitution and be able to check whether any were made
perl -i -0777 -pe'exit(s/pattern/replacement/)' -- "$file"
# now check $? in shell
where now the exit code, so $? in the shell, is the number of substitutions made.
Keep in mind that this does abuse the basic success/failure logic of return codes.
See perlretut for a regex tutorial.

Related

how to change pattern in file's line

I have file with one line:
22:50133-MM:MM1,52-MM:MM2;23:254940-MM:MM1,63-MM:MM2;24:15574-MM:MM1,65-MM:MM2;
I need find this part of line 24:15574-MM and then replace the number 15574 to another one. The number can be any length.
I want to use bash for it, but I have no idea how to do it.
How can I do it? Please help.
Since you asked for I want to use bash for it, here is an attempt using only native operators in it; using the regEx feature with its ~ operator (supported from bash 3.0 onwards) .
Assuming your file has only one single line in it, you can do the following steps,
The below commands can be run directly on the command-line (or)
wrap-it up in a shell script with the bash she-bang(#!/bin/bash).
Capturing the file contents for regEx match using the <file, which stores the entire file contents in the variable.
fileContent=$(<file)
[[ $fileContent =~ .*24:([[:digit:]]+)-MM.* ]] && replacement="${BASH_REMATCH[1]}"
replaceValue=5555
printf "%s\n" "${fileContent/$replacement/$replaceValue}"
For your input file, the commands produce a result
22:50133-MM:MM1,52-MM:MM2;23:254940-MM:MM1,63-MM:MM2;24:5555-MM:MM1,65-MM:MM2;
It can be easily achieved using sed command with -i option:
new_number=11111
sed -i "s/24:\(15574\)-MM/24:$new_number-MM/" /tmp/test.txt
/tmp/test.txt - replace with your current filepath
new_number - is a variable for replacement number
To replace using regexp pattern use the following command with -E option enabled(extended regular expressions mode):
sed -i -E "s/24:(15574)-MM/24:$new_number-MM/" /tmp/test.txt

difference between 'i' and 'I' in sed

I thought i and I both mean ignorecase in sed, e.g.
$ echo "abcABC"|sed -e 's/a/j/gi'
jbcjBC
$ echo "abcABC"|sed -e 's/a/j/gI'
jbcjBC
However, looks like it's only for substitution:
$ echo "abcABC"|sed -e '/a/id' # <--
d
abcABC
$ echo "abcABC"|sed -e '/a/Id'
$
It's really confusing.
Where can I find full reference of the meaning of regular expression for sed?
i and I are indeed flags to the s command; they are not generally applicable to all uses of regular expressions in sed. The GNU man page is oddly silent on which flags s accepts (or even the fact that s accepts flags), so you'll have to look in the info page (run info sed).
Other uses of regular expressions are governed by the function in which they are used.
In your other examples, i and I are the actual sed functions applied to lines that match the regular expression a; i means to insert text. As far as I can tell, I is an unrecognized function and so ignored, leaving d as the function, deleting the line. (My interpretation of I may be wrong.)
The sed man page in FreeBSD in the section describing options to the s (substitute) command, says only:
i or I Match the regular expression in a case-insensitive
way
Thus, the following are identical:
s/a/j/gi
s/a/j/gI
But that's only using i as a modifier to the s command. In your second example, you're using i as a command. The man page in this case states:
[1addr]i\
text Write text to the standard output.
and at least in FreeBSD's sed, there is no I (capital-I) command. So your sed script /a/id would (1) match lines containing an a, and if found (2) print the text "d". Which is what you saw.
And since I is not a command, I would have expected an error, but my results match yours -- /a/Id appears to eliminate output.
Note that commands, commands, and completeness of documentation may differ depending on the variant of sed you are using.

Replace more than 150000 character with sed

I want to replace this LONG string with sed
And I got the string from grep which I store it into variable var
Here is my grep command and my var variable :
var=$(grep -P -o "[^:]//.{0,}" /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map | grep -P -o "//.{0,})
Here is the output from grep : string
Then I try to replace it with sed command
sed -i "s|$var||g" /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map
But it give me output bash: /bin/sed: Argument list too long
How can I replace it?
NB : That string has 183544 character in one line.
What are you actually trying to accomplish here? sed is line-oriented, so you cannot replace a multi-line string (not even if you replace literal newlines with \n .... Well, there are ways to write a sed script which effectively replaces a sequence of lines, but it gets tortured quickly).
bash$ var=$(head -n 2 /etc/mtab)
bash$ sed "s|$var||" /etc/mtab
sed: -e expression #1, char 25: unterminated `s' command
bash$ sed "s|${var//$'\n'/\\n}||" /etc/mtab | diff -u /etc/mtab -
bash$ # (didn't replace anything, so no output)
As a workaround, what you probably want could be approached by replacing the newlines in $var with \| (or possibly just |, depending on your sed dialect) similarly to what was demonstrated above, but you'd still be bumping into the ARG_MAX limit and have a bunch of other pesky wrinkles to iron out, so let's not go there.
However, what you are attempting can be magnificently completed by sed itself, all on its own. You don't need a list of the strings; after all, sed too can handle regular expressions (and nothing in the regex you are using actually requires Perl extensions, so the -P option is by and large superfluous).
sed -i 's%\([^:]\)//.*%\1%' file
There is a minor caveat -- if there are strings which occur both with and without : in front, your original command would have replaced them all (if it had worked), whereas this one will only replace the occurrences which do not have a colon in front. That means comments at beginning of line will not be touched -- if you want them removed too, just add a line anchor as an alternative; sed -i 's%\(^\|[^:]\)//.*%\1%' file
If you want the comments in var for other reasons, the grep can be cleaned up significantly, too. (Obviously, you'd run this before performing the replacement.)
var=$(grep -P -o '[^:]\K//.*' file)
(The \K extension is one which genuinely requires -P. And of course, the common, clear, standard, readable, portable, obvious, simple way to write {0,} is *.)
On most systems these days, the value of ARG_MAX is big enough to handle 150k without problems, but it is important to note that while the limit is called ARG_MAX and the error message indicates that the command line is too long, the real limit is the sum of the sizes of the arguments and all (exported) environment variables. Also, Linux imposes a limit of 128k (131,072 bytes) for a single argument string. Exceeding any of these limits triggers an error return of E2BIG, which is printed as "Argument list too long".
In any case, bash built-ins are exempt from the limit, so you should be able to feed the command into sed as a command file:
echo "s|$var||g" | sed -f - -i /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map
That may not help you much, though. Your variable is full of regex metacharacters, so it will not match the string itself. You'll need to clean it up in order to be able to use it as a regular expression.
There's probably a cleaner way to do that edit, though.

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

Create directory based on part of filename

First of all, I'm not a programmer — just trying to learn the basics of shell scripting and trying out some stuff.
I'm trying to create a function for my bash script that creates a directory based on a version number in the filename of a file the user has chosen in a list.
Here's the function:
lav_mappe () {
shopt -s failglob
echo "[--- Choose zip file, or x to exit ---]"
echo ""
echo ""
select zip in $SRC/*.zip
do
[[ $REPLY == x ]] && . $HJEM/build
[[ -z $zip ]] && echo "Invalid choice" && continue
echo
grep ^[0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}$ $zip; mkdir -p $MODS/out/${ver}
done
}
I've tried messing around with some other commands too:
for ver in $zip; do
grep "^[0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}$" $zip; mkdir -p $MODS/out/${ver}
done
And also find | grep — but I'm doing it wrong :(
But it ends up saying "no match" for my regex pattern.
I'm trying to take the filename the user has selected, then grep it for the version number (ALWAYS x.xx.x somewhere in the filename), and fianlly create a directory with just that.
Could someone give me some pointers what the command chain should look like? I'm very unsure about the structure of the function, so any help is appreciated.
EDIT:
Ok, this is how the complete function looks like now: (Please note, the sed(1) commands besides the directory creation is not created by me, just implemented in my code.)
Pastebin (Long code.)
I've got news for you. You are writing a Bash script, you are a programmer!
Your Regular Expression (RE) is of the "wrong" type. Vanilla grep uses a form known as "Basic Regular Expressions" (BRE), but your RE is in the form of an Extended Regular Expression (ERE). BRE's are used by vanilla grep, vi, more, etc. EREs are used by just about everything else, awk, Perl, Python, Java, .Net, etc. Problem is, you are trying to look for that pattern in the file's contents, not in the filename!
There is an egrep command, or you can use grep -E, so:
echo $zip|grep -E '^[0-9]\.[0-9]{1,2}\.[0-9]{1,2}$'
(note that single quotes are safer than double). By the way, you use ^ at the front and $ at the end, which means the filename ONLY consists of a version number, yet you say the version number is "somewhere in the filename". You don't need the {1} quantifier, that is implied.
BUT, you don't appear to be capturing the version number either.
You could use sed (we also need the -E):
ver=$(echo $zip| sed -E 's/.*([0-9]\.[0-9]{1,2}\.[0-9]{1,2}).*/\1/')
The \1 on the right means "replace everything (that's why we have the .* at front and back) with what was matched in the parentheses group".
That's a bit clunky, I know.
Now we can do the mkdir (there is no merit in putting everything on one line, and it makes the code harder to maintain):
mkdir -p "$MODS/out/$ver"
${ver} is unnecessary in this case, but it is a good idea to enclose path names in double quotes in case any of the components have embedded white-space.
So, good effort for a "non-programmer", particularly in generating that RE.
Now for Lesson 2
Be careful about using this solution in a general loop. Your question specifically uses select, so we cannot predict which files will be used. But what if we wanted to do this for every file?
Using the solution above in a for or while loop would be inefficient. Calling external processes inside a loop is always bad. There is nothing we can do about the mkdir without using a different language like Perl or Python. But sed, by it's nature is iterative, and we should use that feature.
One alternative would be to use shell pattern matching instead of sed. This particular pattern would not be impossible in the shell, but it would be difficult and raise other questions. So let's stick with sed.
A problem we have is that echo output places a space between each field. That gives us a couple of issues. sed delimits each record with a newline "\n", so echo on its own won't do here. We could replace each space with a new-line, but that would be an issue if there were spaces inside a filename. We could do some trickery with IFS and globbing, but that leads to unnecessary complications. So instead we will fall back to good old ls. Normally we would not want to use ls, shell globbing is more efficient, but here we are using the feature that it will place a new-line after each filename (when used redirected through a pipe).
while read ver
do
mkdir "$ver"
done < <(ls $SRC/*.zip|sed -E 's/.*([0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}).*/\1/')
Here I am using process substitution, and this loop will only call ls and sed once. BUT, it calls the mkdir program n times.
Lession 3
Sorry, but that's still inefficient. We are creating a child process for each iteration, to create a directory needs only one kernel API call, yet we are creating a process just for that? Let's use a more sophisticated language like Perl:
#!/usr/bin/perl
use warnings;
use strict;
my $SRC = '.';
for my $file (glob("$SRC/*.zip"))
{
$file =~ s/.*([0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}).*/$1/;
mkdir $file or die "Unable to create $file; $!";
}
You might like to note that your RE has made it through to here! But now we have more control, and no child processes (mkdir in Perl is a built-in, as is glob).
In conclusion, for small numbers of files, the sed loop above will be fine. It is simple, and shell based. Calling Perl just for this from a script will probably be slower since perl is quite large. But shell scripts which create child processes inside loops are not scalable. Perl is.