Remove everything between pairs of braces with sed - regex

I've got a string that looks like this:
[%{%B%F{blue}%}master %{%F{red}%}*%{%f%k%b%}%{%f%k%b%K{black}%B%F{green}%}]
I want to remove the substrings matching %{...}, which may or may not contain further substrings of the same order.
I should get: [master *] as the final output. My progress so far:
gsed -E 's/%\{[^\}]*\}//g'
which gives:
echo '[%{%B%F{blue}%}master %{%F{red}%}*%{%f%k%b%}%{%f%k%b%K{black}%B%F{green}%}]' | gsed -E 's/%\{[^\}]*\}//g'
[%}master %}*%B%F{green}%}]
So, this works fine for %{...} sections which do not contain %{...}. It fails for strings like %{%B%F{blue}%} (it returns %}).
What I want to do is parse the string until I find the matching }, then remove everything up to that point, rather than removing everything between %{ and the first } I encounter. I'm not sure how to do this.
I'm fully aware that there are probably multiple ways to do this; I'd prefer an answer regarding the way specified in the question if it is possible, but any ideas are more than welcome.

This might work for you:
echo '[%{%B%F{blue}%}master %{%F{red}%}*%{%f%k%b%}%{%f%k%b%K{black}%B%F{green}%}]' |
sed 's/%{/{/g;:a;s/{[^{}]*}//g;ta'
[master *]

Use recursion to eat it out from the inside out.
s/%{.*?%}//g
Then wrap in
while(there's at least one more brace)
(probably while $? -ne 0 ... whatever rcode sed uses to say "no matches!")

Try this:
sed -E 's/%{([^{}]*({[^}]*})*[^{}]*)*}//g'

Related

Linux search and replace a patterns case within a string

Been struggling to figure out a way to do this. Basically I need to change the case of anything enclosed in {} from lower to upper within a string representing a uri (and also strip out the braces but I can use sed to do that)
E.g
/logs/{server_id}/path/{os_id}
To
/logs/SERVER_ID/path/OS_ID
The case of the rest of the string must be preserved in lower which is what has been beating me. Looked at combos of sed,awk,tr with regex so far. Any help appreciated.
sed "s/{\([^{}]*\)}/\U\1/g"
This works by matching all text enclosed within {} and replacing it with its uppercase version.
echo "/logs/{server_id}/path/{os_id}" | sed "s/{\([^{}]*\)}/\U\1/g"
Gives /logs/SERVER_ID/path/OS_ID as the result.
echo "/logs/{server_id}/path/{os_id}" \
| sed 's#{\([^{}][^{}]*\)}#\U\1#;s#{\([^{}][^{}]*\)}#\U\1#'
output
/logs/SERVER_ID/path/OS_ID
The part of the solution you seem to have missed is the 'capture groups' available in sed, i.e. \(regex\). This is then referenced by \1. You could have anywhere from 1-9 capture groups if you're a real masochist ;-)
Also note that I just repeat the same cmd 2 times, as the first {...} pair as been converted to the UC version (without surrounding {}s), so only remaining {...} targets will match.
There are probably less verbose syntax available for [^{}][^{}* but this will work with just about any sed going back to the 80s. I seem to recall that some seds don't support the \U directive, but for the systems I have access to, this works.
Does that help?
$ awk '{
while(match($0,/{[^}]+}/))
$0=substr($0,1,RSTART-1) toupper(substr($0,RSTART+1,RLENGTH-2)) substr($0,RSTART+RLENGTH)
}1' file
/logs/SERVER_ID/path/OS_ID
This one handles arbitrary number and format of braces:
echo "/logs/{server_id}/path/{os_id}/{foo}" | awk -v RS='{' -v FS='}' -v ORS='\0' -v OFS='\0' '!/}/ { print } /}/ { $1 = toupper($1); print}'
Output:
/logs/SERVER_ID/path/OS_ID/FOO

sed backreferences returning their numerical index rather than their value

Weird problem here that I don't seem to see repeated anywhere else, so posting here. Thanks in advance.
I have the following multiline sed code that is printing further sed and copy commands into a script (yep, using a script to insert code into a script). The code looks like this:
sed -i -r '/(rpub )([$][a-zA-Z0-9])/i\
sed -i '\''/#PBS -N/d'\'' \1\
cp \1 '"$filevariable"'' $masterscript
which is supposed to do the following:
1.) Open the master script
2.) Navigate to each instance of rpub $[a-zA-Z0-9] in the script
3.) Insert the second line (sed) and third line (cp) as lines before the rpub instance, using \1 as a backreference of the matched $[a-zA-Z0-9] from step 1.
This works great; all lines print well enough in relation to each other. However, all of my \1 references are appearing explicitly, minus their backslashes. So all of my \1's are appearing as 1.
I know my pattern match specifications are working correctly, as they nail all instances of rpub $[a-zA-Z0-9] well enough, but I guess I'm just not understanding the use of backreferences. Anyone see what is going on here?
Thanks.
EDIT 1
Special thanks to Ed Morton below, implemented the following, which gets me 99% closer, but I still can't close the gap with unexpected behavior:
awk -v fv="$filevariable" '
match($0, /rpub( [$][[:alnum:]])/, a)
{
print "sed -i '\''/#PBS -N/d'\''", a[1]
}
1' "$masterscript" > tmpfile && mv tmpfile "$masterscript"
Note: I removed one of the multiline print statements, as it isn't important here. But, as I said, though this gets me much closer I am still having an issue where the printed lines appear between every line in the masterscript; it is as if the matching function is considering every line to be a match. This is my fault, as I should probably have specified that I'd like the following to occur:
stuff here
stuff here
rpub $name
stuff here
rpub $othername
stuff here
would become:
stuff here
stuff here
inserted line $name
rpub $name
stuff here
insertedline $othername
rpub $othername
Any help would be greatly appreciated. Thanks!
It LOOKS like what you're trying to do could be written simply in awk as:
awk -i inplace -v fv="$filevariable" '
match($0,/rpub ([$][[:alnum:]])/,a) {
print "sed -i \"/#PBS -N/d\", a[1]
print "cp", a[1], fv
}
1' "$masterscript"
but without sample input and expected output it's just a guess.
The above uses GNU awk for inplace editing and the 3rd arg for match().
If you want a backreference to work the regular expression for it should be enclosed in parentheses, your second line is a second invocation of sed, nothing is saved from the first line.

BASH : How to extract text between specific curly braces and let the other be

I'm writing files in Latex and I'm looking for a way to automatically delete some color in my documents. When there is a text within a \textcolor{add|update}{...}, it should stay and evreything within a \textcolor{delete}{...} should be deleted. Of course the \textcolor should be deleted as well.
I just can't find a Regex that will match. It just take the first and the last brace of the line without cheking they belong to the textcolor.
This is my code :
for i in $#
do
sed -i 's/\\textcolor{update}{\(.*\)\}/\1/g' $i
sed -i 's/\\textcolor{add}{\(.*\)\}/\1/g' $i
sed -i 's/\\textcolor{delete}{\(.*\)\}//g' $i
done
For example, if I have this :
This doesn't change and \textcolor{udpate}{this is my modified
\textbf{text} !!} \vspace{0.3}
I get this :
This doesn't change and this is my modified \textbf{text} !!}
\vspace{0.3
I should get this result :
This doesn't change and this is my modified \textbf{text} !!
\vspace{0.3}
Furthermore, I also would like to be able to get that result :
This doesn't change and \textcolor{delete}{this is my deleted
\textbf{text} !!} \vspace{0.3}
gives
This doesn't change and \vspace{0.3}
So far I tried to avoid this issue with a 200 lines script but I'm pretty sure there is an easier way with sed for example.
Thanks a lot !
I'm not sure how to do this in sed and know that it's not possible in general for regexes alone to tackle, but this perl snippet might do the trick:
perl -0e '$_ = <>;s/\\textcolor{(?:update|delete|add)}({((?s:[^{}]++|(?1))*)})/$2/sg; print'
This might work for you (GNU sed):
sed -i 's/\\textcolor{\(update\|add\)}{\([^}]*\)\}/\2/g;s/\\textcolor{delete}{\([^}]*\)\}//g' file

Regular expression help - what's wrong?

I would like to ask for help with my regex. I need to extract the very last part from each URL. I marked it as 'to_extract' within the example below.
I want to know what's wrong with the following regex when used with sed:
sed 's/^[ht|f]tp.*\///' file.txt
Sample content of file.txt:
http://a/b/c/to_extract
ftp://a/b/c/to_extract
...
I am getting only correct results for the ftp links, not for the http.
Thanks in advance for your explanation on this.
i.
Change [ht|f] to (ht|f), that would give better results.
[abc] means "one character which is a, b or c".
[ht|f] means "one character which is h, t, | or f", not at all what you want.
On some versions of sed, you'll have to call it with the -r option so that extended regex can be used :
sed -r 's/^(ht|f)tp.*\///' file.txt
If you just want to extract the last part of the url and don't want anything else, you probably want
sed -rn 's/^(ht|f)tp.*\///p' file.txt
How about use "basename" :
basename http://a/b/c/to_extract
to_extract
you can simply achieve what you want with a for loop.
#!/bin/bash
myarr=( $(cat ooo) )
for i in ${myarr[#]}; do
basename $i
done

Using sed to remove all console.log from javascript file

I'm trying to remove all my console.log, console.dir etc. from my JS file before minifying it with YUI (on osx).
The regex I got for the console statements looks like this:
console.(log|debug|info|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*)\);?
and it works if I test it with the RegExr.
But it won't work with sed.
What do I have to change to get this working?
sed 's/___???___//g' <$RESULT >$RESULT_STRIPPED
update
After getting the first answer I tried
sed 's/console.log(.*)\;//g' <test.js >result.js
and this works, but when I add an OR
sed 's/console.\(log\|dir\)(.*)\;//g' <test.js >result.js
it doesn't replace the "logs":
Your original expression looks fine. You just need to pass the -E flag to sed, for extended regular expressions:
sed -E 's/console.(log|debug|info|...|count)\((.*)\);?//g'
The difference between these types of regular expressions is explained in man re_format.
To be honest I have never read that page, but instead simply tack on an -E when things don't work as expected. =)
You must escape ( (for grouping) and | (for oring) in sed's regex syntax. E.g.:
sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
UPDATE example:
$ sed 's/console.\(log\|debug\|info\|warn\|error\|assert\|dir\|dirxml\|trace\|group\|groupEnd\|time\|timeEnd\|profile\|profileEnd\|count\)(.*);\?//g'
console.log # <- input line, not matches, no replacement printed on next line
console.log
console.log() # <- input line, matches, no printing
console.log(blabla); # <- input line, matches, no printing
console.log(blabla) # <- input line, matches, no printing
console.debug(); # <- input line, matches, no printing
console.debug(BAZINGA) # <- input line, matches, no printing
DATA console.info(ditto); DATA2 # <- input line, matches, printing of expected data
DATA DATA2
HTH
I also find the way to remove all the console.log ,
and i am trying to use python to do this,
but i find the Regex is not work for.
my writing like this:
var re=/^console.log(.*);?$/;
but it will match the following string:
'console.log(23);alert(234dsf);'
does it work? with the
"s/console.(log|debug|info|...|count)((.*));?//g"
I try this:
sed -E 's/console.(log|debug|info)( ?| +)\([^;]*\);//g'
See the test:
Regex Tester
Here's my implementation
for i in $(find ./dir -name "*.js")
do
sed -E 's/console\.(log|warn|error|assert..timeEnd)\((.*)\);?//g' $i > ${i}.copy && mv ${i}.copy $i
done
took the sed thing from github
I was feeling lazy and hoping to find a script to copy & paste. Alas there wasn't one, so for the lazy like me, here is mine. It goes in a file named something like 'minify.sh' in the same directory as the files to minify. It will overwrite the original file and it needs to be executable.
#!/bin/bash
for f in *.js
do
sed -Ei 's/console.(log|debug|info)\((.*)\);?//g' $f
yui-compressor $f -o $f
done
I'd just like to add here that I was running into issues with namespaced console.logs such as window.console.log. Also Tweenmax.js has some interesting uses of console.log in some parts such as
window.console&&console.log(t)
So I used this
sed -i.bak s/[^\&a-zA-Z0-9\.]console.log\(/\\/\\//g js/combined.js
The regex effectively says replace all console.logs that don't start with &, alphanumerics, and . with a '//' comment, which uglify later takes out.
Rodrigocorsi's works with nested parentheses. I added a ? after the ; because yuicompressor was omitting some semicolons.
It is probable that the reason this is not working is that you are not 'limiting'
the regex to not include a closing parenthesises ()) in the method parameters.
Try this regular expression:
console\.(log|trace|error)\(([^)]+)\);
Remember to include the rest of your method names in the capture group.