use sed to replace "file={{bla-bla}}" with "file={bla-bla}" - regex

my bibtex file is corrupted in a sense that I need to change
file = {{name:/path/to/file.pdf:application/pdf}},
with file = {name:/path/to/file.pdf:application/pdf}, that is, remove the first pair of curly brackets.
All the strings I am interested start with file = {{.
My first attempt is
echo "file = {{name:/path/to/file.pdf:application/pdf}}," | sed 's/file = {{/file = {/g;s/}}/}/g'
The problem with this one is that it also alters lines like
title = {{ blablabla }} which i don't it want to.
How does one write a REGEX with something like s/file = {{EVERYTHING-IN-BETWEEN/file = {KEEP-WHAT-WAS-THERE}/g ?
p.s. if it's not possible with sed, any other unix commands are welcome.
p.p.s. I am on OS-X, sed here is apparently different to GNU, so some answers below do not work for me, unfortunately.

You can do the following:
sed 's/\(file = \){\({[^}]*}\)}/\1\2/g'

This is probably wrong for your situation, but this will change any double brace to a single brace:
sed 's/\([{}]\)\1/\1/g' <<END
{{
}}
file={{bar-blah}}
{}
}{
END
{
}
file={bar-blah}
{}
}{
The search part \([{}]\)\1 finds a single open or close brace followed by what was just captured. The replacement part is the single captured character.

Assuming both opening and closing braces are on the same line and no other pairs of braces exist on that same line then this should do what you want:
sed '/file *= *{{/{s/{{/{/; s/}}/}/}' file
That's:
/file = {{/ - match lines that have file = {{ on it
{ - start a group of commands
s/{{/{/ - replace {{ with { once
s/}}/}/ - replace }} with } once
} - end a group of commands
If OS X sed cannot handle that command, and this version without the command grouping does not work either:
sed '/file *= *{{/s/}}/}/; /file *= *{{/s/{{/{/'
then this, hopefully, should:
sed -e '/file *= *{{/s/}}/}/' -e '/file *= *{{/s/{{/{/'
or, to steal from glenn jackman's answer a bit:
sed -e '/file *= *{{/s/\([{}]\)\1/\1/g'

huh, GNU sed
echo "file={{bla-bla}}" | sed 's/\(file\s*=\s*\){\s*{\s*\([^}]*\)}\s*}\s*/\1{\2}/'
file={bla-bla}
\s is catching eventual white space characters
assuming that '}' is not inside internal string

Related

Adding a line using sed

Can't seem to find the right way to do this, despite checking my regex in a reg checker.
Given a text file containing, amongst others, this entry:
zone "example.net" {
type master;
file "/etc/bind/zones/db.example.net";
allow-transfer { x.x.x.x;y.y.y.y; };
also-notify { x.x.x.x;y.y.y.y; };
};
I want to add lines after the also-notify line, for that domain specifically.
So using this sed command string:
sed '/"example\.net".*?also-notify.*?};/a\nxxxxxxx/s' named.conf.local
I thought should work to add 'xxxxxxx' after the line. But nope. What am I doing wrong?
With POSIX sed, you can use the a for append command with an escaped literal new line:
$ sed '/^[[:blank:]]*also-notify/ a\
NEW LINE' file
With GNU sed, a is slightly more natural since the new line is assumed:
$ gsed '/^[[:blank:]]*also-notify/ a NEW LINE' file
The issue with the sed in your example is two fold.
The first is any sed regex cannot be for a multi-line match as in example\.net".*?also-notify.*?. That is more of a perl type match. You would need to use a range operator for the start as in:
$ sed '/"example\.net/,/also-notify/{
/^[[:blank:]]*also-notify/ a\
NEW LINE
}' file
The second issue is the \n in the appended text. With POSIX sed, the \n is not supported in any context. With GNU sed, the new line is assumed and the \n is out of context (if immediately after the a) and interpreted as an escaped literal n. You can use \n with GNU sed after 1 character but not immediately after. In POSIX sed, leading spaces of the appended line will always be stripped.
Following awk may help on this.
awk -v new_lines="new_line here" '/also-notify/{flag=1;print new_lines} /^};/{flag=""} !flag' Input_file
In case you want to edit Input_file itself then append > temp_file && mv temp_file Input_file to above code too. Also print new_lines here new_lines is a variable you could print the new liens directly too in there.
You're pretty close already. Just use a range (/pattern/,/pattern/{ #commands }) to select the text you want to operate on and then use /pattern/a/\ ... to add the line you want.
/"example\.net"/,/also-notify/{
/also-notify/a\
\ this is the text I want to add.
}
sed trims leading space on text to be appended. Adding a backslash \ at the start of the line prevents this.
In Bash, this would look like something like:
sed -e '/"example\.net"/,/also-notify/{
/also-notify/a\
\ this is the text I want to add.
}' named.conf.local
Also note that sed uses an older dialect of regular expressions that doesn't support non-greedy quantifies like *?.

How can I use sed to find a line starting with AAA but NOT end with BBB

I'm trying to create a script to append oracleserver to /etc/hosts as an alias of localhost. Which means I need to:
Locate the line that ^127.0.0.1 and NOT oracleserver$
Then, append oracleserver to this line
I know the best practice is probably using negative look ahead. However, sed does not have look around feature: What's wrong with my lookahead regex in GNU sed?. Can anyone provide me some possible solutions?
sed -i '/oracleserver$/! s/^127\.0\.0\.1.*$/& oracleserver/' filename
/oracleserver$/! - on lines not ending with oracleserver
^127\.0\.0\.1.*$ - replace the whole line if it is starting with 127.0.0.1
& oracleserver - with the line plus a space separator ' ' (required) and oracleserver after that
Just use awk with && to combine the two conditions:
awk '/^127\.0\.0\.1/ && !/oracleserver$/ { $0 = $0 "oracleserver" } 1' file
This appends the string when the first pattern is matched but the second one isn't. The 1 at the end is always true, so awk prints each line (the default action is { print }).
I wouldn't use sed but instead perl:
Locate the line that ^127.0.0.1 and NOT oracleserver$
perl -pe 'if ( m/^127\.0\.0\.1/ and not m/oracleserver$/ ) { s/$/oracleserver/ }'
Should do the trick. You can add -i.bak to inplace edit too.

Sed not reading multiline input?

I have some text as below:
path => ["/home/Desktop/**/auditd.log",
"/home/Desktop/**/rsyslog*.log",
"/home/Desktop/**/snmpd.log",
"/home/Desktop/**/kernel.log",
"/home//Desktop/**/ntpd.log",
"/home/Desktop/**/mysql*.log",
"/home/Desktop/**/sshd.log",
"/hme/Desktop/**/su.log",
"/home/Desktop/**/run-parts(.log"
]
I want to extract the values inside [ ], so I am doing:
sed -n 's/.*\[\(.*\)\]/\1/p'
Sed is not returning anything.
If I do sed -n 's/.*\[\(.*\)log/\1/p it's returning properly the string between [ and log.
"/home/Desktop/**/auditd.",
So it's able to search within the line.
How to make this work??
EDIT:
I created a file with content:
path => [asd,masd,dasd
sdalsd,ad
asdlmas;ldasd
]
When I do grep -o '\[.*\]' it does not work but grep -o '\[.*' this returns the 1st line [asd,masd,dasd. So it's working for single line not for multiple lines.
Try doing this :
$ grep -o '".*",?' file
OUTPUT:
"/home/Desktop/**/auditd.log",
"/home/Desktop/**/rsyslog*.log",
"/home/Desktop/**/snmpd.log",
"/home/Desktop/**/kernel.log",
"/home//Desktop/**/ntpd.log",
"/home/Desktop/**/mysql*.log",
"/home/Desktop/**/sshd.log",
"/hme/Desktop/**/su.log",
"/home/Desktop/**/run-parts(.log"
-o for grep print only the matching part
" is a literal double quote
.* is anything
" si the closing double quote
, is a literal double quote
? mean o or 1 occurrence
Well, I was a bit too slow, but I think the question of how to apply sed substitutions to a whole file as a block rather than on a line-by-line basis merits a general answer, so I'll leave one here.
In your specific case, you could use this pattern:
sed -n 'H; $ { x; s/.*\[\(.*\)\].*/\1/; p; }' foo.txt
The general trick is
sed -n 'H; $ { x; s/pattern/replacement/flags; p; }' file
What this means is: Every line that comes in is appended to the hold buffer (with the H command), and at the end of the file ($), when the whole file is in the hold buffer, the stuff between the brackets is executed. In that block, the hold buffer is swapped with the pattern space (x), then the substitution is done, and what remains in the pattern space is printed (p).
EDIT: One caveat of this simple form is that it doesn't work properly if your pattern wants to match the beginning of the file; for reasons the hold buffer is not entirely empty when sed is first called (it contains an empty line). If this is important, the slightly more complicated form
sed -n '1 h; 1 !H; $ { x; s/pattern/replacement/flags; p; }' file
fixes it. This will use h instead of H for the first line, which overwrites the hold buffer with the pattern space rather than appending the pattern space to the hold buffer.
You can do it with awk by replacing .*[ or ] or white spaces with nothing:
$ awk '{gsub(/.*\[|\]| /, ""); print}' filename
"/home/Desktop/**/auditd.log",
"/home/Desktop/**/rsyslog*.log",
"/home/Desktop/**/snmpd.log",
"/home/Desktop/**/kernel.log",
"/home//Desktop/**/ntpd.log",
"/home/Desktop/**/mysql*.log",
"/home/Desktop/**/sshd.log",
"/hme/Desktop/**/su.log",
"/home/Desktop/**/run-parts(.log"
from your sample, it could also be (treat only first and last line)
sed '1s/^[^[]*\[//;$d' YourFile

sed - Include newline in pattern

I am still a noob to shell scripts but am trying hard. Below, is a partially working shell script which is supposed to remove all JS from *.htm documents by matching tags and deleting their enclosed content. E.g. <script src="">, <script></script> and <script type="text/javascript">
find $1 -name "*.htm" > ./patterns
for p in $(cat ./patterns)
do
sed -e "s/<script.*[.>]//g" $p #> tmp.htm ; mv tmp.htm $p
done
The problem with this is script is that because sed reads text input line-by-line, this script will not work as expected with new-lines. Running:
<script>
//Foo
</script>
will remove the first script tag but will omit the "foo" and closing tag which I don't want.
Is there a way to match new-line characters in my regular expression? Or if sed is not appropriate, is there anything else I can use?
Assuming that you have <script> tags on different lines, e.g. something like:
foo
bar
<script type="text/javascript">
some JS
</script>
foo
the following should work:
sed '/<script/,/<\/script>/d' inputfile
This awk script will look for the <script*> tag, set the in variable and then read the next line. When the closing </script*> tag is found the variable is set to zero. The final print pattern outputs all lines if the in variable is zero.
awk '/<script.*>/ { in=1; next }
/<\/script.*>/ { if (in) in=0; next }
{ if (!in) print; } ' $1
As you mentioned, the issue is that sed processes input line by line.
The simplest workaround is therefore to make the input a single line, e.g. replacing newlines with a character which you are confident doesn't exist in your input.
One would be tempted to use tr :
… |tr '\n' '_'|sed 's~<script>.*</script>~~g'|tr '_' '\n'
However "currently tr fully supports only single-byte characters", and to be safe you probably want to use some improbable character like ˇ, for which tr is of no help.
Fortunately, the same thing can be achieved with sed, using branching.
Back on our <script>…</script> example, this does work and would be (according to the previous link) cross-platform :
… |sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ˇ/g' -e 's~<script>.*</script>~~g' -e 's/ˇ/\n/g'
Or in a more condensed form if you use GNU sed and don't need cross-platform compatibility :
… |sed ':a;N;$!ba;s/\n/ˇ/g;s~<script>.*</script>~~g;s/ˇ/\n/g'
Please refer to the linked answer under "using branching" for details about the branching part (:a;N;$!ba;). The remaining part is straightforward :
s/\n/ˇ/g replaces all newlines with ˇ ;
s~<script>.*</script>~~g removes what needs to be removed (beware that it requires some securing for actual use : as is it will delete everything between the first <script> and the last </script> ; also, note that I used ~ instead of / to avoid escaping of the slash in </script> : I could have used just about any single-byte character except a few reserved ones like \) ;
s/ˇ/\n/g readds newlines.

Search and replace patterns on multiple line

I have a pattern like
Fixed pattern
text which can change(world)
I want to replace this with
Fixed pattern
text which can change(hello world)
What I am trying to use
cat myfile | sed -e "s#\(Fixed Pattern$A_Z_a_z*\(\)#\1 hello#g > newfile
UPDATE:
The above word world is also a variable and will change
Basically add hello after the first parenthesis encountered after the expression.
Thanks in advance.
Assuming your goal is to add 'hello ' inside of every opening parentheses on the line after 'Fixed pattern', here is a solution that should work:
sed -e '/^Fixed pattern$/!b' -e 'n' -e 's/(/(hello /' myfile
Here is an explanation of each portion:
/^Fixed pattern$/!b # skip all of the following commands if 'Fixed pattern'
# doesn't match
n # if 'Fixed pattern' did match, read the next line
s/(/(hello / # replace '(' with '(hello '
To do this with sed, use n:
sed '/Fixed pattern/{n; s/world/hello world/}' myfile
You may need to be more careful, but this should work for most situations. Whenever sed sees the Fixed pattern (you may want to use line anchors ^ and $), it will read the next line and then apply the substitution to it.