How to comment a include line using sed [duplicate] - regex

I am using sed in a shell script to edit filesystem path names. Suppose I want to replace
/foo/bar
with
/baz/qux
However, sed's s/// command uses the forward slash / as the delimiter. If I do that, I see an error message emitted, like:
▶ sed 's//foo/bar//baz/qux//' FILE
sed: 1: "s//foo/bar//baz/qux//": bad flag in substitute command: 'b'
Similarly, sometimes I want to select line ranges, such as the lines between a pattern foo/bar and baz/qux. Again, I can't do this:
▶ sed '/foo/bar/,/baz/qux/d' FILE
sed: 1: "/foo/bar/,/baz/qux/d": undefined label 'ar/,/baz/qux/d'
What can I do?

You can use an alternative regex delimiter as a search pattern by backslashing it:
sed '\,some/path,d'
And just use it as is for the s command:
sed 's,some/path,other/path,'
You probably want to protect other metacharacters, though; this is a good place to use Perl and quotemeta, or equivalents in other scripting languages.
From man sed:
/regexp/
Match lines matching the regular expression regexp.
\cregexpc
Match lines matching the regular expression regexp. The c may be any character other than backslash or newline.
s/regular expression/replacement/flags
Substitute the replacement string for the first instance of the regular expression in the pattern space. Any character other than backslash or newline can be used instead of a slash to delimit the RE and the replacement. Within the RE and the replacement, the RE delimiter itself can be used as a literal character if it is preceded by a backslash.

Perhaps the closest to a standard, the POSIX/IEEE Open Group Base Specification says:
[2addr] s/BRE/replacement/flags
Substitute the replacement string for instances of the BRE in the
pattern space. Any character other than backslash or newline can
be used instead of a slash to delimit the BRE and the replacement.
Within the BRE and the replacement, the BRE delimiter itself can be
used as a literal character if it is preceded by a backslash."

When there is a slash / in theoriginal-string or the replacement-string, we need to escape it using \. The following command is work in ubuntu 16.04(sed 4.2.2).
sed 's/\/foo\/bar/\/baz\/qux/' file

Related

How can I express this regex with sed?

I have this regex that I would like to use with sed. I would like to use sed, since I want to batch process a few thousand files and my editor does not like that
Find: "some_string":"ab[\s\S\n]+"other_string_
Replace: "some_string":"removed text"other_string_
Find basically matches everything between some_string and other_string, including special chars like , ; - or _ and replaces it with a warning that text was removed.
I was thinking about combining the character classes [[:space:]] and [[:alnum:]], which did not work.
In MacOS FreeBSD sed, you can use
sed -i '' -e '1h;2,$H;$!d;g' -e 's/"some_string":"ab.*"other_string_/"some_string":"removed text"other_string_/g' file
The 1h;2,$H;$!d;g part reads the whole file into memory so that all line breaks are exposed to the regex, and then "some_string":"ab.*"other_string_ matches text from "some_string":"ab till the last occurrence of "other_string_ and replaces with the RHS text.
You need to use -i '' with FreeBSD sed to enforce inline file modification.
By the way, if you decide to use perl, you really can use the -0777 option to enable file slurping with the s modifier (that makes . match any chars including line break chars) and use something like
perl -i -0777 's/"some_string":"\Kab.*(?="other_string_)/removed text/gs' file
Here,
"some_string":" - matches literal text
\K - omits the text matched so far from the current match memory buffer
ab - matches ab
.* - any zero or more chars as many as possible
OR .*? - any zero or more chars as few as possible
(?="other_string_) - a positive lookahead (that matches the text but does not append to the match value) making sure there is "other_string_ immediately on the right.

How to use zgrep and regular expression?

I'm trying to do some research in a .gz file so I found out I should use zcat / zgrep now after a bit of research I can't figure out how to use a regex with zgrep
I tried to do it like this zgrep '[\s\S]{10,}' a.gz but nothing comes out even if there are string of minimum 10 characters in the file.
So how could I use zgrep to display string of minimum 10 characters ?
You should not use \S and \s in a POSIX BRE regex bracket expression as [\S\s] bracket expression matches either \, S or s. Use . instead of [\s\S] to match any char with a POSIX BRE/ERE regex.
Also, in a BRE pattern, {10,} must be written as \{10,\} as otherwise, when unescaped, {10,} matches a literal {10,} string.
Use
zgrep '.\{10,\}' a.gz

Recursively wrapping a regular expression with given text

For a given path, I wish to wrap a given regular expression in all files in that path or that path's sub-directories with some given text using standard Linux shell commands.
More specifically, wrap all my syslog commands with an assert command such as syslog(LOG_INFO,json_encode($obj)); becomes assert(syslog(LOG_INFO,json_encode($obj)));.
I thought the following might work, but received sed: -e expression #1, char 47: Invalid preceding regular expression error.
sed -i -E "s/(?<=syslog\()(.*)(?=\);)/assert(syslog(\1));/" /path/to/somewhere
BACKUP INFO IN RESPONSE TO Wiktor Stribiżew's ANSWER
I've never used sed before. Please confirm my understanding of your answer:
sed -i "s/syslog(\(.*\));/assert(syslog(\1));/g" /path/to/somewhere
-i edit files in place. One could first leave out to see on the screen what will be changed.
s substitute text
The three /'s surrounding the pattern and replacement (i.e. /pattern/replacement/) are deliminator and can be any single character and not just /.
syslog(\(.*\)); The pattern with one placeholder. Uses escaped parentheses.
assert(syslog(\1)); The replacement using escaped 1 (or 2, 3, etc) for replacement sub-strings.
g Replace all and not just the first match.
Would sed -i "s/syslog(.*);/assert(&);/g" /path/to/somewhere work as well?
sed patterns do not support lookarounds like (?<=...) and (?=...).
You may use a capturing group/replacement backreference:
sed -i "s/syslog(\(.*\));/assert(syslog(\1));/g" /path/to/somewhere
The pattern is of BRE POSIX flavor (no -E option is passed), so to define a capturing group you need to use escaped parentheses, and unescaped ones will match literal parentheses.
Details
syslog( - syslog( substring
\(.*\) - Group 1: any 0+ chars as many as possible
); - a ); substring
The replacement is assert(syslog(\1));, that is, the match is replaced with assert(syslog(, the contents of Group 1, and then ));.
If you need Perl-compatible regex constructs, you can use Perl (sic).
perl -i -pe 's/(?<=syslog\()(.*)(?=\);)/assert(syslog($1));/' /path/to/somewhere
Regardless of this specific solution I switched to single quotes on the assumption that you are on a Unix-ish platform. Backslashes inside double quotes are pesky (sometimes you need to double them, sometimes not).
Perl prefers $1 over \1 in the replacement pattern, though the latter will also technically work.

when to escape special character in shell

guys:
it is hard for me to judge when to escape special characters in shell, and which character should be escaped. for example:
sed '/[0-9]\{3\}/d' filename.txt
like above, why we should escape { while leave [ unchanged, i think they are both special chars.
Can you help me with this?
/br
ruan
The general answer is that you need to escape characters that have special meaning when you want to treat them as literal characters, not for their special meaning. The rules for what characters have special meaning vary from program to program.
Your specific question involves characters that have special meaning to sed; single quotes prevent any enclosed characters from being interpreted by bash.
In this case, you are escaping the { and } to prevent sed from interpreting them. First, consider this command:
sed '/[0-9]{3}/d' filename.txt
If you are using a version of sed that treats both [ and { specially, this command says to delete any line which contains a sequence of exactly 3 digits. The [0-9] is not a literal 5-character string; it's a regular expression that matches any single numeral. The {3} isn't a literal 3-character string; it's a modifier that matches exactly 3 of the preceding regular expression. Lines like the following will be matched:
593
3296
but not
34a7
because there aren't 3 digits in a row.
Now, consider your command:
sed '/[0-9]\{3\}/d' filename.txt
The [0-9] is still a regular expression that matches a single numeral. But now, you have escaped the braces. Instead of being a modifier for the preceding regular expression, sed will treat it as the literal characters {, 3, and }. So it will match lines like the following:
0{3}
1{3}
5{3}
but not lines like
346
because there are no braces.
Difference in this behavior is related to sed only.
In regular mode sed supports very basic regex only and hence { is matched literally unless escaped as you noticed.
sed '/[0-9]\{3\}/d'
In extended regex mode both [ and { don't need escaping:
sed -r '/[0-9]{3}/d'
OR on OSX:
sed -E '/[0-9]{3}/d'
[ and ] is considered a character class in both regular and extended regex modes (even shell's glob pattern supports it)
I think your question pertains to special characters in regular expressions. Check this out:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03
It mainly depend on sed version (posix compliant or extended behavior) and then you need to adapt depending of the shell because, indeed, some modification occur before the sed action is received like you state. The best example is the use of simple of double quote at shell level and the \( or ( at sed level.
so:
define the pattern (reg ex) you want
adapt for the sed version/option you are using
adapt for shell interpretation
let's have fun to create the substitution sed order of \{ by &/$IFS (literal, not IFS value) using double quote surrounding sed script in BASH/KSH shell and posix or GNU sed.

Regular expression to match beginning and end of a line?

Could anyone tell me a regex that matches the beginning or end of a line? e.g. if I used sed 's/[regex]/"/g' filehere the output would be each line in quotes? I tried [\^$] and [\^\n] but neither of them seemed to work. I'm probably missing something obvious, I'm new to these
Try:
sed -e 's/^/"/' -e 's/$/"/' file
To add quotes to the start and end of every line is simply:
sed 's/.*/"&"/g'
The RE you were trying to come up with to match the start or end of each line, though, is:
sed -r 's/^|$/"/g'
Its an ERE (enable by "-r") so it will work with GNU sed but not older seds.
matthias's response is perfectly adequate, but you could also use a backreference to do this. if you're learning regular expressions, they are a handy thing to know.
here's how that would be done using a backreference:
sed 's/\(^.*$\)/"\1"/g' file
at the heart of that regex is ^.*$, which means match anything (.*) surrounded by the start of the line (^) and the end of the line ($), which effectively means that it will match the whole line every time.
putting that term inside parenthesis creates a backreference that we can refer to later on (in the replace pattern). but for sed to realize that you mean to create a backreference instead of matching literal parentheses, you have to escape them with backslashes. thus, we end up with \(^.*$\) as our search pattern.
the replace pattern is simply a double quote followed by \1, which is our backreference (refers back to the first pattern match enclosed in parentheses, hence the 1). then add your last double quote to end up with "\1".