Sed doesn't replace a pattern that is understood by gedit - regex

I need to delete some content that is followed by 5 hyphens (that are in separate line) from 1000 files. Basically it looks like this:
SOME CONTENT
-----
SOME CONTENT TO BE DELETED WITH 5 HYPHENS ABOVE
I've tried to do that with this solution, but it didn't work for me:
this command — sed '/-----/,$ d' *.txt -i — can't be used because some of these texts have lines with more than 5 hyphens;
this command — sed '/^-----$/,$ d' *.txt -i — resulted in having all the files unchanged).
So I figured out that it might be something about "^" and "$" characters, but I am both sed and RegEx newbie, to be honest, and I don't know, what's the problem.
I've also found out that this RegEx — ^-{5}$(\s|\S)*$ — is good for capturing only these blocks which start exactly with 5 hyphens, but putting it into sed command gives no effect (both hyphens and text after them stay, where they were).
There's something I don't understand about sed probably, because when I use the above expression with gedit's Find&Replace, it works flawlessly. But I don't want to open, change and save 1000 files manually.
I am asking this question kinda again, because the given solution (the above link) didn't help me.
The first command I've posted (sed /-----/,$ d' *.txt -i) also resulted in deleting full content of some files, for instance a file that had 5 hyphens, new line with a single space (and no more text) at the bottom of it:
SOME CONTENT
-----
single space
EDIT:
Yes, I forgot about ' here, but in the Terminal I used these commands with it.
Yes, these files end with \n or \r. Is there a solution for it?

I think you want this:
sed '/^-\{5\}/,$ d' *.txt -i
Note that { and } need escaping.

$ sed '/^-----/p;q' file
SOME CONTENT
or
$ sed -E '/^-{5}/p;q' file
SOME CONTENT

Are you just trying to delete from ----- on it's own line (which may end with \r) to the end of the file? That'd be:
awk '{print} /^-----\r?/{exit}' file
The above will work using all awks in all shells in all UNIX systems.

Related

why using "$$" (two dollar sign) instead of just "$" (single dollar sign) for blank line removal in Makefile? [duplicate]

This question already has answers here:
What is the meaning of a double dollar sign in bash/Makefile?
(3 answers)
Closed 3 years ago.
I met some old Makefile script which was designed to delete blank lines using the sed command. The code itself is pretty easy and straightforward.
sed -i -e '/^[ \t]*$$/ d'
I know the ^[ \t]*$ part just indicates that a line starts with space or tab then repeated zero or more times until the end of the line. I didn't quite understand why there is an additional "$" sign at the end of the regular expression.
I also tried using only single $ sign. It seems that the same effects can be achieved.
sed -i -e '/^[ \t]*$/ d'
Then what is the purpose of using the double-dollar sign in this case?
-------------------------Additional comments--------------------
It's my fault that I didn't mention that it comes from a Makefile. I naively thought it would be the same thing no matter it is inside or outside a Makefile. The command is like this:
RM_BLANK: org_file
#cpp org_file | sed -e 's/ */ /g' > file
#sed -i -e '/^[ \t]*$$/ d' file
org_file is the file that contains a lot of blanks lines.
It (I mean with $$) behaves exactly as hek2mgl's answer below has predicted when used outside a Makefile if the sed command is performed directly on org_file. It only deletes lines that end with a $ and leaves the empty lines without $ intact. But when used in a Makefile environment, it simply deletes blank lines that don't have a $ at the end of line. I think it might have to do with the Makefile convention. Would someone help with this puzzle?
This is not a safety mechanism, it's just part of the regular expression. In basic posix regular expressions, which sed is using, the $ has no special meaning, except of when being used at the end of the pattern. Meaning the expression matches lines which optionally contain tabs and which end with a literal $. If you remove the second $, the sed command would remove lines which don't end with a $ as well.
http://man7.org/linux/man-pages/man7/regex.7.html

sed and regular expression: unexpected replacement pattern

I am trying to use a small bash script using sed to append a string, but I do not seem to be able to make it work.
I want to append a string to another string pattern:
Strings in input file:
Xabc
Xdef
Desired output:
XabcZ
XdefZ
Here is the script:
#!/bin/bash
instring="$2"
sed -r "s/${instring}/${instring}Z/g" $1
Where $1 is the file name and $2 is the string pattern I am looking for
Then I run the script:
bash script.test.sh test.txt X
output:
XZabc
XZdef
As expected.
but if I use regular expressions:
bash script.test.sh test.txt X...
All I get is:
X...Z
X...Z
So obviously it is not reading it correctly in the replacement part of the command. Smae thing if I use X[a-z09] (but there may be "_" in my strings, I want to include those as well). I had a look at several previous similar topics, but I do not seem able to implement any of the solutions correctly (bear with a newbie...). Thank you for your kind help.
EDIT: After receiving the answers from Glenn Jackman (accepted solution) and RavinderSingh13, I would like to clarify two important points for whoever is having a similar issue:
1) Glenn Jackman solution did not work because I needed to convert the text file from DOS to Unix. I did it with dos2unix , but for some reason did not work (maybe forgot to overwrite the output to the old file?). I later did it using sed -i 's/\r$//' test.txt ; that solved the issue, and Glenn's solution now works. having a dos-formatted text file has been the source of many trouble, for me at least.
2) I probably did not make clear that I only wanted to target specific lines in the input files; my example only has target strings, but the actual file has strings that I do not want to edit. That was probably the misunderstanding occurred with RavinderSingh13's script, which actually works, but targets every single line.
Hope this can help future readers. Thank you, Stackers, you saved the day once again :)
What you have (sed -r "s/${instring}/${instring}Z/g" $1) uses the variable as a pattern on the left-hand side and as plain text on the right-hand side.
What you want to do is:
sed -r "s/${instring}/&Z/g" $1
# ....................^
where the & marker is replaced by whatever text the pattern matched. In the documentation for The s Command:
[T]he replacement can contain unescaped & characters which reference the whole matched portion of the pattern space.
EDIT: In case you need to pass a regex to script then following may help, where my previous solution was only appending a character to last of the line.
cat script.ksh
value="$2"
sed "s/$value/&Z/" "$1"
After running the script:
./script.ksh X.*
XabcZ
XdefZ
After seeing OP's comment to match everything which starts from either small letter or capital letter run script in following style then.
./script.ksh [A-Za-z]+*
Could you please try following and let me know if this helps you.
cat script.ksh
value="$2"
sed "s/$/$value/" "$1"
After running script I am getting following output on terminal too.
./script.ksh Input_file Z
XabcZ
XdefZ
You could use sed -i option in above code in case you want to save output into Input_file itself too.

Using sed with regex to replace text on OSX and Linux

I am trying to replace some strings inside a file with sed using Regular Expressions. To complicate the matter, this is being done inside a Makefile script that needs to work on both osx and linux.
Specifically, within file.tex I want to replace
\subimport{chapters/}{xxx}
with
\subimport{chapters/}{xxx-yyy}
(xxx and yyy are just example text.)
Note, xxx could contain any letters, numbers, and _ (underscore) but really the regex can simply match anything inside the brackets. Sometimes there is some whitespace at the beginning of the line before \subimport....
The design of the string being searched for requires a lot of escaping (when searched for with regex) and I am guessing somewhere therein lies my error.
Here's what I've tried so far:
sed -i'.bak' -e 's/\\subimport\{chapters\/\}\{xxx\}/\\subimport\{chapters\/\}\{xxx-yyy\}/g' file.tex
# the -i'.bak' is required so SED works on OSX and Linux
rm -f file.tex.bak # because of this, we have to delete the .bak files after
This results in an error of RE error: invalid repetition count(s) when I build my Makefile that contains this script.
I thought part of my problem was that the -E option for sed was not available in the osx version of sed. It turns out, when using the -E option, fewer things should be escaped (see comments on my question).
POSIX-ly:
sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#'
# is used as the parameter separator for sed's s (Substitution)
\(\\subimport{chapters/}{[[:alnum:]_]\+\) is the captured group, containing everything required upto last }, preceeded by one or more alphabetics, digits, and underscore
In the replacement, the first captured group is followed by the required string, closed by a }
Example:
$ sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#' <<<'\subimport{chapters/}{foobar9}'
\subimport{chapters/}{foobar9-yyy}
$ sed 's#^\(\\subimport{chapters/}{[[:alnum:]_]\+\)}$#\1-yyy}#' <<<'\subimport{chapters/}{spamegg923}'
\subimport{chapters/}{spamegg923-yyy}
Here's is the version that ended up working for me.
sed -i.bak -E 's#^([[:blank:]]*\\subimport{chapters/}{[[:alnum:]_]+)}$#\1-yyy}#' file.tex
rm -f file.tex.bak
Much thanks go to #heemayl. Their answer is the better written one, it simply required some tweaking to get a version that worked for me.

Use of grep + sed based on a pattern file?

Here's the problem: i have ~35k files that might or might not contain one or more of the strings in a list of 300 lines containing a regex each
if I grep -rnwl 'C:\out\' --include=*.txt -E --file='comp.log' i see there are a few thousands of files that contain a match.
now how do i get sed to delete each line in these files containing the strings in comp.log used before?
edit: comp.log contains a simple regex in each line, but for the most part each string to be matched is unique
this is is an example of how it is structured:
server[0-9]\/files\/bobba fett.stw
[a-z]+ mochaccino
[2-9] CheeseCakes
...
etc. silly examples aside, it goes to show each line is unique save for a few variations so it shouldn't affect what i really want: see if any of these lines match the lines in the file being worked on. it's no different than 's/pattern/replacement/' except that i want to use the patterns in the file instead of inline.
Ok here's an update (S.O. gets inpatient if i don't declare the question answered after a few days)
after MUCH fiddling with the #Kenavoz/#Fischer approach, i found a totally different solution, but first things first.
creating a modified pattern list for sed to work with does work.
as well as #werkritter's approach of dropping sed altogether. (this one i find the most... err... "least convoluted" way around the problem).
I couldn't make #Mklement's answer work under windows/cygwin (it did work on under ubuntu, so...not sure what that means. figures.)
What ended up solving the problem in a more... long term, reusable form was a wonderful program pointed out by a colleage called PowerGrep. it really blows every other option out of the water. unfortunately it's windows only AND it's not free. (not even advertising here, the thing is not cheap, but it does solve the problem).
so considering #werkiter's reply was not a "proper" answer and i can't just choose both #Lars Fischer and #Kenavoz's answer as a solution (they complement each other), i am awarding #Kenavoz the tickmark for being first.
final thoughts: i was hoping for a simpler, universal and free solution but apparently there is not.
You can try this :
sed -f <(sed 's/^/\//g;s/$/\/d/g' comp.log) file > outputfile
All regex in comp.log are formatted to a sed address with a d command : /regex/d. This command deletes lines matching the patterns.
This internal sed is sent as a file (with process substitition) to the -f option of the external sed applied to file.
To delete just string matching the patterns (not all line) :
sed -f <(sed 's/^/s\//g;s/$/\/\/g/g' comp.log) file > outputfile
Update :
The command output is redirected to outputfile.
Some ideas but not a complete solution, as it requires some adopting to your script (not shown in the question).
I would convert comp.log into a sed script containing the necessary deletes:
cat comp.log | sed -r "s+(.*)+/\1/ d;+" > comp.sed`
That would make your example comp.sed look like:
/server[0-9]\/files\/bobba fett.stw/ d;
/[a-z]+ mochaccino/ d;
/[2-9] CheeseCakes/ d;
then I would apply the comp.sed script to each file reported by grep (With your -rnwl that would require some filtering to get the filename.):
sed -i.bak -f comp.sed $AFileReportedByGrep
If you have gnu sed, you can use -i inplace replacement creating a .bak backup, otherwise use piping to a temporary file
Both Kenavoz's answer and Lars Fischer's answer use the same ingenious approach:
transform the list of input regexes into a list of sed match-and-delete commands, passed as a file acting as the script to sed via -f.
To complement these answers with a single command that puts it all together, assuming you have GNU sed and your shell is bash, ksh, or zsh (to support <(...)):
find 'c:/out' -name '*.txt' -exec sed -i -r -f <(sed 's#.*#/\\<&\\>/d#' comp.log) {} +
find 'c:/out' -name '*.txt' matches all *.txt files in the subtree of dir. c:/out
-exec ... + passes as many matching files as will fit on a single command line to the specified command, typically resulting only in a single invocation.
sed -i updates the input files in-place (conceptually speaking - there are caveats); append a suffix (e.g., -i.bak) to save backups of the original files with that suffix.
sed -r activates support for extended regular expressions, which is what the input regexes are.
sed -f reads the script to execute from the specified filename, which in this case, as explained in Kenavoz's answer, uses a process substitution (<(...)) to make the enclosed sed command's output act like a [transient] file.
The s/// sed command - which uses alternative delimiter # to facilitate use of literal / - encloses each line from comp.log in /\<...\>/d to yield the desired deletion command; the enclosing of the input regex in \<...\>ensures matching as a word, as grep -w does.
This is the primary reason why GNU sed is required, because neither POSIX EREs (extended regular expressions) nor BSD/OSX sed support \< and \>.
However, you could make it work with BSD/OSX sed by replacing -r with -E, and \< / \> with [[:<:]] / [[:>:]]

Append a string to another with Regex tools

For the life of me, I cannot find out how to do this very simple procedure. I want to:
read a file, which consists only of number on each line
append characters before and after the number in the file.
For example, in the contents of the file:
1
2
3
would turn into:
file1.txt
file2.txt
file3.txt
I've tried this:
sed 's/[0-9]+/{file&.txt}/' file_name.txt
but nothing happened. I see snippets online that say use {0} or {/1} but I am having a hard time finding an explanation of what this means.
The end goal of this is to have xargs copy all the filenames in this text to another directory. I am sure there is probably another way to accomplish this without the text file I am doing here. If anyone has a simpler answer to that end goal, that would be nice to hear, although I also want to figure out how to use sed! Thanks.
Yes, this does work:
$cat file
1
2
3
$sed 's/.*/File&.txt/' file
File1.txt
File2.txt
File3.txt
In sed, + is not a special character, and literally means the + character. You need to escape it with backslash:
sed 's/[0-9]\+/file&.txt/' file_name.txt
Alternatively you can use the -r option, which adds several special characters (including +):
sed -r 's/[0-9]+/file&.txt/' file_name.txt
Not sure why you need sed for your end goal (or xargs for that matter). You can simply do:
while read -r name; do
cp "File${name}.txt" /path/to/copy
done < file_name.txt