How to conditionally remove characters and preserve a text in between? - regex

How could sed or another POSIX command be used to remove the braces but only when we encounter "codeBlock":{"_id":{"varying24characters"}. There may be multiple matches with this condition in the line and I want to avoid removing the braces on something that looks similar like the smoreBlock.
Input (a single line)
test,"codeBlock":{"_id":{"4c9d4e1fe2c101000138eb4b"},morestuff,"smoreBlock":{"_id":{"6c9d4e1fe2c101000138eb4b"},hey,stuff,test,"codeBlock":{"_id":{"7c9d4e1fe7c101111138eb4b"},otherstuff
Desired output
test,"codeBlock":{"_id":"4c9d4e1fe2c101000138eb4b",morestuff,"smoreBlock":{"_id":{"6c9d4e1fe2c101000138eb4b"},hey,stuff,test,"codeBlock":{"_id":"7c9d4e1fe7c101111138eb4b",otherstuff
I've been banging my head reading about sed backreferences and can't even get close to what I'm looking for. Unfortunately this is not homework. I could write a small program to brute force through it but I know there has got to be a way for sed, awk, or perl to handle this. Planning to run this on a RHEL7 or CENTOS7 host.

Think it the other way, match both needed and unneeded together, but keep former in capturing groups. Thus you can replace whole match with only needed parts.
sed 's/\("codeBlock":{"_id":\){\("[0-9a-f]\{24\}"\)}/\1\2/g' file
Or, if you have GNU sed:
sed -E 's/("codeBlock":\{"_id":)\{("[0-9a-f]{24}")\}/\1\2/g' file
both yield:
test,"codeBlock":{"_id":"4c9d4e1fe2c101000138eb4b",morestuff,"smoreBlock":{"_id":{"6c9d4e1fe2c101000138eb4b"},hey,stuff,test,"codeBlock":{"_id":"7c9d4e1fe7c101111138eb4b",otherstuff

Related

Is there an alternative to negative look ahead in sed

In sed I would like to be able to match /js/ but not /js/m I cannot do /js/[^m] because that would match /js/ plus whatever character comes after. Negative look ahead does not work in sed. Or I would have done /js/(?!m) and called it a day. Is there a way to achieve this with sed that would work for most similar situations where you want a section of text that does not end in another section of text?
Is there a better tool for what I am trying to do than sed? Possibly one that allows look ahead. awk seems a bit too much with its own language.
Well you could just do this:
$ echo 'I would like to be able to match /js/ but not /js/m' |
sed 's:#:#A:g; s:/js/m:#B:g; s:/js/:<&>:g; s:#B:/js/m:g; s:#A:#:g'
I would like to be able to match </js/> but not /js/m
You didn't say what you wanted to do with /js/ when you found it so I just put <> around it. That will work on all UNIX systems, unlike a perl solution since perl isn't guaranteed to be available and you're not guaranteed to be allowed to install it.
The approach I use above is a common idiom in sed, awk, etc. to create strings that can't be present in the input. It doesn't matter what character you use for # as long as it's not present in the string or regexp you're really interested in, which in the above is /js/. s/#/#A/g ensures that every occurrence of # in the input is followed by A. So now when I do s/foobar/#B/g I have replaced every occurrence of foobar with #B and I KNOW that every #B represents foobar because all other #s are followed by A. So now I can do s/foo/whatever/ without tripping over foo appearing within foobar. Then I just unwind the initial substitutions with s/#B/foobar/g; s/#A/#/g.
In this case though since you aren't using multi-line hold-spaces you can do it more simply with:
sed 's:/js/m:\n:g; s:/js/:<&>:g; s:\n:/js/m:g'
since there can't be newlines in a newline-separated string. The above will only work in seds that support use of \n to represent a newline (e.g. GNU sed) but for portability to all seds it should be:
sed 's:/js/m:\
:g; s:/js/:<&>:g; s:\
:/js/m:g'

Use Sed (or Perl or Ruby) to replace patterns that span across lines

It's common to use Sed (or Perl or Ruby) to replace things in a file:
sed -i.bak 's/\s+\\{/ {/g' some.code
In this example I want to remove line breaks before curly braces in code, since they are not part of my programming 'dialect' (for any language I use) and make reading the code less natural and smooth. But the basic problem is how to match a pattern that spans lines, rather than a pattern that is within any given line of the file.
Existing SO questions that appear to be similar did not specifically address how to span across lines in general, but instead gave solutions to the specific problem the user was trying to solve, sometimes through workarounds instead.
I poked around in the Sed man pages and couldn't find any switches to do what I'm describing. Perhaps I'm just not looking with the right keywords, though.
Using sed:
sed -r ':a;N;$!ba;s/\s+\{/ {/g' some.code
sed -r -unbuffered '$ !{N;s/\n[[:blank:]]*{/ {/;P;D;};$ p' some.code
allow to work on huge file (stream version). On small and medium (several thousand of line at least) the code of #BMW is more efficient i imagine (1 request of substitution, in this code, substitution at each loaded line)

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

sed add text around regex

I would like to be able to go:
sed "s/^\(\w+\)$/leftside\1rightside/"
and have the group matched by (\w+\) appear in between 'leftside' and 'rightside'.
But it seems like I have to pipe it twice, one for the left of the text, another time for the right. If anyone knows a way to do it in one pass, I'd appreciate it.
The reason it's not working is that you probably specify the wrong regex. In your case, text will be added in the end and beginning of the line only if it consists only of word characters (given that your version of sed supports the \w notation). Also you didn't escape the + which you should do if not using the -r option.
Try starting with sed "s/^\(.*\)$/leftside\1rightside/" or just sed "s/.*/leftside&rightside/" and working from that.

Regular expression with sed

I'm having hard time selecting from a file using a regular expression. I'm trying to replace a specific text in the file which is full of lines like this.
/home/user/test2/data/train/train38.wav /home/user/test2/data/train/train38.mfc
I'm trying to replace the bolded text. The problem is the i don't know how to select only the bolded text since i need to use .wav in my regexp and the filename and the location of the file is also going to be different.
Hope you can help
Best regards,
Jökull
This assumes that what you want to replace is the string between the last two slashes in the first path.
sed 's|\([^/]*/\)[^/]*\(/[^/]* .*\)|\1FOO\2|' filename
produces:
/home/user/test2/data/FOO/train38.wav /home/user/test2/data/train/train38.mfc
sed processes lines one at a time, so you can omit the global option and it will only change the first 'train' on each line
sed 's/train/FOO/' testdat
vs
sed 's/train/FOO/g' testdat
which is a global replace
This is quite a bit more readable and less error-prone than some of the other possibilities, but of course there are applications which will not simplify quite as readily.
sed 's;\(\(/[^/]\+\)*\)/train\(\(/[^/]\+\)*\)\.wav;\1/FOO\3.wav;'
You can do it like this
sed -e 's/\<train\>/plane/g'
The \< tells sed to match the beginning of that work and the \> tells it to match the end of the word.
The g at the end means global so it performs the match and replace on the entire line and does not stop after the first successful match as it would normally do without g.