difference between 'i' and 'I' in sed - regex

I thought i and I both mean ignorecase in sed, e.g.
$ echo "abcABC"|sed -e 's/a/j/gi'
jbcjBC
$ echo "abcABC"|sed -e 's/a/j/gI'
jbcjBC
However, looks like it's only for substitution:
$ echo "abcABC"|sed -e '/a/id' # <--
d
abcABC
$ echo "abcABC"|sed -e '/a/Id'
$
It's really confusing.
Where can I find full reference of the meaning of regular expression for sed?

i and I are indeed flags to the s command; they are not generally applicable to all uses of regular expressions in sed. The GNU man page is oddly silent on which flags s accepts (or even the fact that s accepts flags), so you'll have to look in the info page (run info sed).
Other uses of regular expressions are governed by the function in which they are used.
In your other examples, i and I are the actual sed functions applied to lines that match the regular expression a; i means to insert text. As far as I can tell, I is an unrecognized function and so ignored, leaving d as the function, deleting the line. (My interpretation of I may be wrong.)

The sed man page in FreeBSD in the section describing options to the s (substitute) command, says only:
i or I Match the regular expression in a case-insensitive
way
Thus, the following are identical:
s/a/j/gi
s/a/j/gI
But that's only using i as a modifier to the s command. In your second example, you're using i as a command. The man page in this case states:
[1addr]i\
text Write text to the standard output.
and at least in FreeBSD's sed, there is no I (capital-I) command. So your sed script /a/id would (1) match lines containing an a, and if found (2) print the text "d". Which is what you saw.
And since I is not a command, I would have expected an error, but my results match yours -- /a/Id appears to eliminate output.
Note that commands, commands, and completeness of documentation may differ depending on the variant of sed you are using.

Related

sed is failing when regex pattern is a group

My sed command works when I try to replace a single pattern.
sed -i '' 's/^\(feature\)/replacement/g' change.md
It doesn’t do anything when the regex is a group.
sed -i '' 's/^\(feature\|bug\|security\)/replacement/g' change.md
Does anyone know what I'm doing wrong? I tried a number of variations, but none seem to work.
My file:
## Features
feature:foo
feature: baz
## Bugs
bug: bar
bug: KK
On macos sed refers to re_format(7) man page to describe regular expressions. Without -E, sed uses "basic regular expressions". The re_format man page calls these "obsolete REs".
This sentence seems to answer the question:
Obsolete (“basic”) regular expressions differ in several respects. ‘|’ is an ordinary character and there is no equivalent for its functionality.
(emphasis mine)
It appears sed -E 's/^(feature|bug|security)/replacement/g' is your easiest option.
Or perl -i -pe 's/^(feature|bug|security)/replacement/g' file

Match multiline pattern in bash using Perl on macOS

On macOS, using built-in bash, I need to match two words on two consecutive lines within a file, say myKey and myValue.
Example file:
<dict>
<key>myKey</key>
<string>myValue</string>
</dict>
I already have a working command for substituting a value in such a pair using perl:
perl -i -p0e 's/(<key>myKey<\/key>\s*\n\s*<string>).+(<\/string>)/$1newValue$2/' -- "$filepath"
Question is, how do I simply find whether the file contains that key/value pair, without substituting anything, or, more to the point, just get to know, whether any substitution was made?
EDIT:
Within replacement pattern: \1 -> $1.
Added clarification to the question.
For the basic question you only need to change the substitution operator to the match operator, and print conditionally on whether it matches or not. This can also be done with substitution.
However, since this is in a bash script you can also exit from the perl program (one-liner) with a code that indicates whether there was a match/substitution; then the script can check $?.
To only check whether a pattern is in a file
perl -0777 -nE'say "yes" if /pattern/' -- "$file"
The -0777, that "slurps" the whole file (into $_), is safer than -0 which uses the null byte as records separator. Also, here you don't want -i (change file in place) and want -n (loop over records) instead of -p (also prints each). I use -E instead of -e to enable (all) features, for say. See all this in perlrun.
Inside a shell script you can use the truthy/falsy return of the match operator in exit
perl -0777 -nE'exit(/pattern/)' -- "$file"
# now check $? in shell
where you can now programatically check whether the pattern was found in the file.
Finally, to run the original substitution and be able to check whether any were made
perl -i -0777 -pe'exit(s/pattern/replacement/)' -- "$file"
# now check $? in shell
where now the exit code, so $? in the shell, is the number of substitutions made.
Keep in mind that this does abuse the basic success/failure logic of return codes.
See perlretut for a regex tutorial.

sed regexp, number reformatting: how to escape for bash

I have a working (in macOS app Patterns) RegExp that reformats GeoJSON MultiPolygon coordinates, but don't know how to escape it for sed.
The file I'm working on is over 90 Mb in size, so bash terminal looks like the ideal place and sed the perfect tool for the job.
Search Text Example:
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
Desired outcome:
[[[37.9017735,69.400367955],[37.90098431,69.400425761],[37.90004869,69.400489545],[37.89915455,69.400578128],[37.89840665,69.400660744],[37.89747072,69.400762152],[37.89628639,69.400905283],[37.89545822,69.401014028],[37.89479369,69.401113128],[37.89414564,69.401195094],[37.89362565,69.401281229],[37.89276089,69.401414764],[37.89196611,69.401540312],[37.891721,69.401587053],[37.89137614,69.401634443],[37.89136515,69.401635893],[37.89114453,69.401663531],
My current RegExp:
((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))
and reformatting:
$1\.$2$4,$6.$7$9
The command should be something along these lines:
sed -i -e 's/ The RegExp escaped /$1\.$2$4,$6.$7$9/g' large_file.geojson
But what should be escaped in the RegExp to make it work?
My attempts always complain of being unbalanced.
I'm sorry if this has already been answered elsewhere, but I couldn't find even after extensive searching.
Edit: 2017-01-07: I didn't make it clear that the file contains properties other than just the GPS-points. One of the other example values picked from GeoJSON Feature properties is "35.642.1.001_001", which should be left unchanged. The braces check in my original regex is there for this reason.
That regex is not legal in sed; since it uses Perl syntax, my recommendation would be to use perl instead. The regular expression works exactly as-is, and even the command line is almost the same; you just need to add the -p option to get perl to operate in filter mode (which sed does by default). I would also recommend adding an argument suffix to the -i option (whether using sed or perl), so that you have a backup of the original file in case something goes horribly wrong. As for quoting, all you need to do is put the substitution command in single quotation marks:
perl -p -i.bak -e \
's/((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))/$1\.$2$4,$6.$7$9/g' \
large_file.geojson
If your data is just like you showed, you needn't worry about the brackets. You may use a POSIX ERE enabled with -E (or -r in some other distributions) like this:
sed -i -E 's/([0-9]{2})([0-9]*)\.([0-9]+)/\1.\2\3/g' large_file.geojson
Or a POSIX BRE:
sed -i 's/\([0-9]\{2\}\)\([0-9]*\)\.\([0-9]\+\)/\1.\2\3/g' large_file.geojson
See an online demo.
You may see how this regex works here (just a demo, not proof).
Note that in POSIX BRE you need to escape { and } in limiting / range quantifiers and ( and ) in grouping constructs, and the + quantifier, else they denote literal symbols. In POSIX ERE, you do not need to escape the special chars to make them special, this POSIX flavor is closer to the modern regexes.
Also, you need to use \n notation inside the replacement pattern, not $n.
A simple sed will do it:
$ echo "$var"
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
$ echo "$var" | sed 's/\([0-9]\{3\}\)\./.\1/g'
[[[379.017735,6940.0367955],[379.0098431,6940.0425761],[379.0004869,6940.0489545],[378.9915455,6940.0578128],[378.9840665,6940.0660744],[378.9747072,6940.0762152],[378.9628639,6940.0905283],[378.9545822,6940.1014028],[378.9479369,6940.1113128],[378.9414564,6940.1195094],[378.9362565,6940.1281229],[378.9276089,6940.1414764],[378.9196611,6940.1540312],[378.91721,6940.1587053],[378.9137614,6940.1634443],[378.9136515,6940.1635893],[378.9114453,6940.1663531],

Bash variable search and replace instead of sed

See Code Review
See Github Project
I need to parse out instances of +word+ line by line (replace +word+ with blank). I'm currently using the following (working) sed regex:
newLine=$(echo "$line" | sed "s/+[a-Z]\++//g")
This violates "SC2001" according to "ShellCheck" validation;
SC2001: See if you can use ${variable//search/replace} instead.
I've attempted several variations without success (The string "+word+" remains in the output):
newLine=$(line//+[a-Z]+/)
newLine=$(line/+[a-Z]+//)
newLine=$(line/+[a-Z]\++/)
newLine=${line//+[a-Z]+/}
and more..
I've heard that in some cases sed is necessary, but I would like to use Bash's built in find and replace if possible.
The substitution in parameter expansion doesn't use regular expressions, but patterns. To get closer to regular expressions, you can turn on extended patterns:
shopt -s extglob
new_line=${line//++([a-Z])+}

Replace more than 150000 character with sed

I want to replace this LONG string with sed
And I got the string from grep which I store it into variable var
Here is my grep command and my var variable :
var=$(grep -P -o "[^:]//.{0,}" /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map | grep -P -o "//.{0,})
Here is the output from grep : string
Then I try to replace it with sed command
sed -i "s|$var||g" /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map
But it give me output bash: /bin/sed: Argument list too long
How can I replace it?
NB : That string has 183544 character in one line.
What are you actually trying to accomplish here? sed is line-oriented, so you cannot replace a multi-line string (not even if you replace literal newlines with \n .... Well, there are ways to write a sed script which effectively replaces a sequence of lines, but it gets tortured quickly).
bash$ var=$(head -n 2 /etc/mtab)
bash$ sed "s|$var||" /etc/mtab
sed: -e expression #1, char 25: unterminated `s' command
bash$ sed "s|${var//$'\n'/\\n}||" /etc/mtab | diff -u /etc/mtab -
bash$ # (didn't replace anything, so no output)
As a workaround, what you probably want could be approached by replacing the newlines in $var with \| (or possibly just |, depending on your sed dialect) similarly to what was demonstrated above, but you'd still be bumping into the ARG_MAX limit and have a bunch of other pesky wrinkles to iron out, so let's not go there.
However, what you are attempting can be magnificently completed by sed itself, all on its own. You don't need a list of the strings; after all, sed too can handle regular expressions (and nothing in the regex you are using actually requires Perl extensions, so the -P option is by and large superfluous).
sed -i 's%\([^:]\)//.*%\1%' file
There is a minor caveat -- if there are strings which occur both with and without : in front, your original command would have replaced them all (if it had worked), whereas this one will only replace the occurrences which do not have a colon in front. That means comments at beginning of line will not be touched -- if you want them removed too, just add a line anchor as an alternative; sed -i 's%\(^\|[^:]\)//.*%\1%' file
If you want the comments in var for other reasons, the grep can be cleaned up significantly, too. (Obviously, you'd run this before performing the replacement.)
var=$(grep -P -o '[^:]\K//.*' file)
(The \K extension is one which genuinely requires -P. And of course, the common, clear, standard, readable, portable, obvious, simple way to write {0,} is *.)
On most systems these days, the value of ARG_MAX is big enough to handle 150k without problems, but it is important to note that while the limit is called ARG_MAX and the error message indicates that the command line is too long, the real limit is the sum of the sizes of the arguments and all (exported) environment variables. Also, Linux imposes a limit of 128k (131,072 bytes) for a single argument string. Exceeding any of these limits triggers an error return of E2BIG, which is printed as "Argument list too long".
In any case, bash built-ins are exempt from the limit, so you should be able to feed the command into sed as a command file:
echo "s|$var||g" | sed -f - -i /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map
That may not help you much, though. Your variable is full of regex metacharacters, so it will not match the string itself. You'll need to clean it up in order to be able to use it as a regular expression.
There's probably a cleaner way to do that edit, though.