sed search and replace between string and last occurence of character - regex

I currently have a bunch of .md5sum files with a md5sum hash value and it's corresponding file name with full absolute path. I'd like to modify these files from being absolute pathing to relative. I think I have it pretty close.
> cat example.md5sum
197f76c53d2918764cfa6463b7221dec /example/path/to/file/example.null
> cat example.md5sum | sed 's/( ).*\// \.\//'
197f76c53d2918764cfa6463b7221dec /example/path/to/file/example.null
Throwing the regex ( ).*\/ into notepad++ returns /example/path/to/file/ which is what I want. Moving it over to sed does not produce the same match.
The end goal here as mentioned previously is the following:
197f76c53d2918764cfa6463b7221dec ./example.null

Looks like a job for sed.
sed -i.bak 's:/.*/:./:' file ...
The -i option tells sed to modify files "in-place" rather than sending the results to stdout. With the substitute command, you can use alternate delimiters -- in this case, I've used a colon, since the text you're matching and using as replacement includes slashes. Makes things easier to read.
I haven't bothered to match the whitespace before the path, because in an md5sum file has a pretty predictable format.
Back up your input files before experimenting.
Note that this is shell agnostic -- you can run it in tcsh or bash or anything else that is able to launch sed with options.

Related

Sed - How to read a file line by line and go the path mentioned in the file then replace string?

I am on a new project where I need to add some strings to all the API names, which are exported
Someone hinted this can be done with simple sed commands.
What really needed is : Example :
In my project say 100 files and many files have something like the below pattern
in file1 its mentioned at some line : export(xyx);
in file2 its mentioned at some line : export (abc);
What is needed here is to replace the
xyz with xyz_temp and
abc with abc_temp.
Now the problem is these APIs are in different folders and different files.
Fortunately, I got to know we can redirect the result of cscope tool to some file with matching patterns.
so I did redirect the result of a search of the "export" string and I got below. Say file I have exported the scope result - export_api.txt as below.
/path1/file1.txt export(xyz);
/path2/file2.txt export(abc);
Now, I am not sure how to use sed to do this automation of
Reading this export_ap.txt
Reading each line
Replacing the string as above.
Any direction would highly appriciated.
Thanks in advance.
If you have a list of files which need to be changed and your replacement only needs to append _tmp, then this can be accomplished with a single sed call:
sed -i 's/export(\(abc\|xyz\));/export(\1_tmp);/' files...
-i will modify the files in-place, overwriting them.
If you don't care for what you are going to replace, but append a postfix to all export expressions, match any identifier. Here is one such example:
export(\([^)]*\))
Depending on your expressions and valid identifier names, you might want to or need to change this to one of:
export(\(.*\))
export(\([_a-zA-Z][_a-zA-Z0-9]*\))
export(\([_a-zA-Z"'][_a-zA-Z0-9"']*\))
export(\([_a-zA-Z]*\))
…
Another option would be to only match lines containing "export(" and then replace the closing parenthisis (given that your input lines contain the token ");" only once):
sed -i '/export(/s/);/_tmp);/' files...
# or reusing the complete match:
sed -i '/export(/s/);/_tmp&/' files...
This avoids the backreference and makes the regular expression simpler, because they can now be of fixed size
You can use the read builtin to parse the line in your export_api.txt file, then call sed on each file. Pattern match the export snippet to choose the correct sed invocation. The way read is invoked here assumes that your path and snippet are delimited by IFS and that path does not contain any whitespace or separators:
while read -r path snippet; do
case "$snippet" in
*abc*) sed -i 's/export(abc);/export(abc_tmp);/' "$path" ;;
*xyz*) sed -i 's/export(xyz);/export(xyz_tmp);/' "$path" ;;
esac
done < export_api.txt
NOTE: this will change/overwrite any of your files. Your files might be left in a broken state.
PS I wonder why you cannot use your IDE to search/replace those occurrences?

Sed doesn't replace a pattern that is understood by gedit

I need to delete some content that is followed by 5 hyphens (that are in separate line) from 1000 files. Basically it looks like this:
SOME CONTENT
-----
SOME CONTENT TO BE DELETED WITH 5 HYPHENS ABOVE
I've tried to do that with this solution, but it didn't work for me:
this command — sed '/-----/,$ d' *.txt -i — can't be used because some of these texts have lines with more than 5 hyphens;
this command — sed '/^-----$/,$ d' *.txt -i — resulted in having all the files unchanged).
So I figured out that it might be something about "^" and "$" characters, but I am both sed and RegEx newbie, to be honest, and I don't know, what's the problem.
I've also found out that this RegEx — ^-{5}$(\s|\S)*$ — is good for capturing only these blocks which start exactly with 5 hyphens, but putting it into sed command gives no effect (both hyphens and text after them stay, where they were).
There's something I don't understand about sed probably, because when I use the above expression with gedit's Find&Replace, it works flawlessly. But I don't want to open, change and save 1000 files manually.
I am asking this question kinda again, because the given solution (the above link) didn't help me.
The first command I've posted (sed /-----/,$ d' *.txt -i) also resulted in deleting full content of some files, for instance a file that had 5 hyphens, new line with a single space (and no more text) at the bottom of it:
SOME CONTENT
-----
single space
EDIT:
Yes, I forgot about ' here, but in the Terminal I used these commands with it.
Yes, these files end with \n or \r. Is there a solution for it?
I think you want this:
sed '/^-\{5\}/,$ d' *.txt -i
Note that { and } need escaping.
$ sed '/^-----/p;q' file
SOME CONTENT
or
$ sed -E '/^-{5}/p;q' file
SOME CONTENT
Are you just trying to delete from ----- on it's own line (which may end with \r) to the end of the file? That'd be:
awk '{print} /^-----\r?/{exit}' file
The above will work using all awks in all shells in all UNIX systems.

Use of grep + sed based on a pattern file?

Here's the problem: i have ~35k files that might or might not contain one or more of the strings in a list of 300 lines containing a regex each
if I grep -rnwl 'C:\out\' --include=*.txt -E --file='comp.log' i see there are a few thousands of files that contain a match.
now how do i get sed to delete each line in these files containing the strings in comp.log used before?
edit: comp.log contains a simple regex in each line, but for the most part each string to be matched is unique
this is is an example of how it is structured:
server[0-9]\/files\/bobba fett.stw
[a-z]+ mochaccino
[2-9] CheeseCakes
...
etc. silly examples aside, it goes to show each line is unique save for a few variations so it shouldn't affect what i really want: see if any of these lines match the lines in the file being worked on. it's no different than 's/pattern/replacement/' except that i want to use the patterns in the file instead of inline.
Ok here's an update (S.O. gets inpatient if i don't declare the question answered after a few days)
after MUCH fiddling with the #Kenavoz/#Fischer approach, i found a totally different solution, but first things first.
creating a modified pattern list for sed to work with does work.
as well as #werkritter's approach of dropping sed altogether. (this one i find the most... err... "least convoluted" way around the problem).
I couldn't make #Mklement's answer work under windows/cygwin (it did work on under ubuntu, so...not sure what that means. figures.)
What ended up solving the problem in a more... long term, reusable form was a wonderful program pointed out by a colleage called PowerGrep. it really blows every other option out of the water. unfortunately it's windows only AND it's not free. (not even advertising here, the thing is not cheap, but it does solve the problem).
so considering #werkiter's reply was not a "proper" answer and i can't just choose both #Lars Fischer and #Kenavoz's answer as a solution (they complement each other), i am awarding #Kenavoz the tickmark for being first.
final thoughts: i was hoping for a simpler, universal and free solution but apparently there is not.
You can try this :
sed -f <(sed 's/^/\//g;s/$/\/d/g' comp.log) file > outputfile
All regex in comp.log are formatted to a sed address with a d command : /regex/d. This command deletes lines matching the patterns.
This internal sed is sent as a file (with process substitition) to the -f option of the external sed applied to file.
To delete just string matching the patterns (not all line) :
sed -f <(sed 's/^/s\//g;s/$/\/\/g/g' comp.log) file > outputfile
Update :
The command output is redirected to outputfile.
Some ideas but not a complete solution, as it requires some adopting to your script (not shown in the question).
I would convert comp.log into a sed script containing the necessary deletes:
cat comp.log | sed -r "s+(.*)+/\1/ d;+" > comp.sed`
That would make your example comp.sed look like:
/server[0-9]\/files\/bobba fett.stw/ d;
/[a-z]+ mochaccino/ d;
/[2-9] CheeseCakes/ d;
then I would apply the comp.sed script to each file reported by grep (With your -rnwl that would require some filtering to get the filename.):
sed -i.bak -f comp.sed $AFileReportedByGrep
If you have gnu sed, you can use -i inplace replacement creating a .bak backup, otherwise use piping to a temporary file
Both Kenavoz's answer and Lars Fischer's answer use the same ingenious approach:
transform the list of input regexes into a list of sed match-and-delete commands, passed as a file acting as the script to sed via -f.
To complement these answers with a single command that puts it all together, assuming you have GNU sed and your shell is bash, ksh, or zsh (to support <(...)):
find 'c:/out' -name '*.txt' -exec sed -i -r -f <(sed 's#.*#/\\<&\\>/d#' comp.log) {} +
find 'c:/out' -name '*.txt' matches all *.txt files in the subtree of dir. c:/out
-exec ... + passes as many matching files as will fit on a single command line to the specified command, typically resulting only in a single invocation.
sed -i updates the input files in-place (conceptually speaking - there are caveats); append a suffix (e.g., -i.bak) to save backups of the original files with that suffix.
sed -r activates support for extended regular expressions, which is what the input regexes are.
sed -f reads the script to execute from the specified filename, which in this case, as explained in Kenavoz's answer, uses a process substitution (<(...)) to make the enclosed sed command's output act like a [transient] file.
The s/// sed command - which uses alternative delimiter # to facilitate use of literal / - encloses each line from comp.log in /\<...\>/d to yield the desired deletion command; the enclosing of the input regex in \<...\>ensures matching as a word, as grep -w does.
This is the primary reason why GNU sed is required, because neither POSIX EREs (extended regular expressions) nor BSD/OSX sed support \< and \>.
However, you could make it work with BSD/OSX sed by replacing -r with -E, and \< / \> with [[:<:]] / [[:>:]]

Simplest, Safe Method for Trimming File Paths

I have a script that does a lot of file-processing, and it's good enough to receive its paths using null-characters as a separator for safety.
However, it process all paths as absolute (saves some headaches), but these are a bit unwieldy for output purposes, so I'd like to remove a chunk of the path from my output. Now, plenty of options spring to mind, but the difficulty is in using these in a way that's safe for any arbitrary path that I might encounter, which is where things get a bit trickier.
Here's a quick example:
#!/bin/sh
TARGET="$1"
find "$TARGET" -print0 | while IFS= read -rd '' path; do
# Process path for output here
path_str="$path"
echo "$path_str"
done
So in the above script I want to take path and remove TARGET from it, in the most compatible way possible (e.g - nothing bash specific), it needs to be able to remove only from the start of the string, i.e - /foo/bar becomes bar, /foo/bar/foo becomes bar/foo and /bar/foo remains /bar/foo. It should also cope with any possible characters in a file-name, including characters that some file-systems support such as tildes, colons etc., as well as pesky inverted quotation characters.
I've hacked together some messy solutions using sed by first escaping any characters that might break my regular expression, but this is a very messy way of doing things, so I'm hoping there are some simpler methods out there. In case there isn't, here's by solution so far:
SAFE_CHARS='s:\([[/.*]\):\\\1:g'
target_safe=$(printf '%s' "$TARGET" | sed "$SAFE_CHARS")
path_str=$(printf '%s' "$path" | sed "s/^$target_safe//g')
There's probably a few characters missing that I should be escaping in addition to those ones, and apologies for any typos.
To remove a prefix from a string,
$ TARGET=/foo/
$ path=/foo/bar
$ echo "${path#$TARGET}"
bar
The # operator for parameter expansion is part of the POSIX standard and will work in any POSIX-compliant shell.
You can try this simple find:
export TARGET="$1"
find "$TARGET" -exec bash -c 'sed "s|^$TARGET\/||" <<< "$1"' - '{}' \;

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.