sed and regular expression: unexpected replacement pattern - regex

I am trying to use a small bash script using sed to append a string, but I do not seem to be able to make it work.
I want to append a string to another string pattern:
Strings in input file:
Xabc
Xdef
Desired output:
XabcZ
XdefZ
Here is the script:
#!/bin/bash
instring="$2"
sed -r "s/${instring}/${instring}Z/g" $1
Where $1 is the file name and $2 is the string pattern I am looking for
Then I run the script:
bash script.test.sh test.txt X
output:
XZabc
XZdef
As expected.
but if I use regular expressions:
bash script.test.sh test.txt X...
All I get is:
X...Z
X...Z
So obviously it is not reading it correctly in the replacement part of the command. Smae thing if I use X[a-z09] (but there may be "_" in my strings, I want to include those as well). I had a look at several previous similar topics, but I do not seem able to implement any of the solutions correctly (bear with a newbie...). Thank you for your kind help.
EDIT: After receiving the answers from Glenn Jackman (accepted solution) and RavinderSingh13, I would like to clarify two important points for whoever is having a similar issue:
1) Glenn Jackman solution did not work because I needed to convert the text file from DOS to Unix. I did it with dos2unix , but for some reason did not work (maybe forgot to overwrite the output to the old file?). I later did it using sed -i 's/\r$//' test.txt ; that solved the issue, and Glenn's solution now works. having a dos-formatted text file has been the source of many trouble, for me at least.
2) I probably did not make clear that I only wanted to target specific lines in the input files; my example only has target strings, but the actual file has strings that I do not want to edit. That was probably the misunderstanding occurred with RavinderSingh13's script, which actually works, but targets every single line.
Hope this can help future readers. Thank you, Stackers, you saved the day once again :)

What you have (sed -r "s/${instring}/${instring}Z/g" $1) uses the variable as a pattern on the left-hand side and as plain text on the right-hand side.
What you want to do is:
sed -r "s/${instring}/&Z/g" $1
# ....................^
where the & marker is replaced by whatever text the pattern matched. In the documentation for The s Command:
[T]he replacement can contain unescaped & characters which reference the whole matched portion of the pattern space.

EDIT: In case you need to pass a regex to script then following may help, where my previous solution was only appending a character to last of the line.
cat script.ksh
value="$2"
sed "s/$value/&Z/" "$1"
After running the script:
./script.ksh X.*
XabcZ
XdefZ
After seeing OP's comment to match everything which starts from either small letter or capital letter run script in following style then.
./script.ksh [A-Za-z]+*
Could you please try following and let me know if this helps you.
cat script.ksh
value="$2"
sed "s/$/$value/" "$1"
After running script I am getting following output on terminal too.
./script.ksh Input_file Z
XabcZ
XdefZ
You could use sed -i option in above code in case you want to save output into Input_file itself too.

Related

Sed doesn't replace a pattern that is understood by gedit

I need to delete some content that is followed by 5 hyphens (that are in separate line) from 1000 files. Basically it looks like this:
SOME CONTENT
-----
SOME CONTENT TO BE DELETED WITH 5 HYPHENS ABOVE
I've tried to do that with this solution, but it didn't work for me:
this command — sed '/-----/,$ d' *.txt -i — can't be used because some of these texts have lines with more than 5 hyphens;
this command — sed '/^-----$/,$ d' *.txt -i — resulted in having all the files unchanged).
So I figured out that it might be something about "^" and "$" characters, but I am both sed and RegEx newbie, to be honest, and I don't know, what's the problem.
I've also found out that this RegEx — ^-{5}$(\s|\S)*$ — is good for capturing only these blocks which start exactly with 5 hyphens, but putting it into sed command gives no effect (both hyphens and text after them stay, where they were).
There's something I don't understand about sed probably, because when I use the above expression with gedit's Find&Replace, it works flawlessly. But I don't want to open, change and save 1000 files manually.
I am asking this question kinda again, because the given solution (the above link) didn't help me.
The first command I've posted (sed /-----/,$ d' *.txt -i) also resulted in deleting full content of some files, for instance a file that had 5 hyphens, new line with a single space (and no more text) at the bottom of it:
SOME CONTENT
-----
single space
EDIT:
Yes, I forgot about ' here, but in the Terminal I used these commands with it.
Yes, these files end with \n or \r. Is there a solution for it?
I think you want this:
sed '/^-\{5\}/,$ d' *.txt -i
Note that { and } need escaping.
$ sed '/^-----/p;q' file
SOME CONTENT
or
$ sed -E '/^-{5}/p;q' file
SOME CONTENT
Are you just trying to delete from ----- on it's own line (which may end with \r) to the end of the file? That'd be:
awk '{print} /^-----\r?/{exit}' file
The above will work using all awks in all shells in all UNIX systems.

Linux script to parse each line, check the regex and modify the line

I'm trying to write a linux bash script that takes in input a csv file with lines written in the following format (something can be blank):
something,something,,number,something,something,something,something,something,something,,,
something,something.something,,number,something,something,something,something,something,something,,,
and i have to have as output the following format (if the lines contains . it has to separate the two substring in substring1,substring2 and remove one , character, else do nothing)
something,something,,number,something,something,something,something,something,something,,,
something,something,something,number,something,something,something,something,something,something,,,
I tried to parse each line of the file and check if it respects a regex, but the command starts a never ending loop (don't know why) and morevor don't know how to divide the substring to have as output substring1,substring2
for f in /filepath/filename.csv
do
while read p; do
if [[$p == .\..]] ; then echo $p; fi
done <$f
done
Thanks in advance!
I can't provide you with a working code at the moment but a piece of quick advice:
1. Try with tool called sed
2. Learn about "capture groups" for regex to get info on how to divide the text based on expressions.
To separate strings AWK will be useful
echo "Hello.world" | awk -F"." '{print "STR1="$1", STR2="$2 }'
Hope it will help.
As your task is more about transforming unrelated lines of text than of parsing fields of csv formatted files, sed is indeed the tool to go.
Learning to use sed properly, even for the most basic tasks, is synonym to learning regular expressions. The following invocation of sed command transforms your input sample to your expected output:
sed 's/\.\([^,]*\),/,\1/g' input.csv >output.csv
In the above example, s/// is the replacement command.
From the manpage:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful,
replace that portion matched with replacement. [...]
Explaining the regexp and replacement of the above command is probably out of the scope for the question, so I'll finish my answer here... Hope it helps!
Ok, i managed to use regexp, but the following command seems not working again:
sed '\([^,]*\),\([^,]*\)\.\([^,]*\),,\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,\11,\12,'
sed: -e expression #1, char 125: unknown command: `\'

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

How can I get regex/sed to recognize the line the search string is on?

I'm brand new to regex. I am trying to write a script to comment out lines in a file, so that when we retire a network computer we can remove it from our administrative files (rdist, etc) without having to comment them out by hand. What I have so far is
#!/bin/bash
echo $*
NAMES=$*
FILES="/foo/testfile1
/foo/testfile2"
for name in $NAMES
do
sed -i "s/${name}/#&/g" $FILES
done
exit 0
This works when the testfiles have the target string appear at the beginning of the line, but not if the string is somewhere in the middle. How can I tell sed or regex to insert a hash at the beginning of the line that the string is found on?
(I've been reading my way through a bunch of tutorials online, but the closest thing to what I want seems to be the carat ^. What I'm getting from the explanation is that in multiline mode, it only returns instances of the string that are located at the beginning of the line.)
I'm working on RedHat 5.5, using gedit 2.8.1 as my text editor and sed 4.1.2.
Thank you in advance for your help!
The script below will take in the passed in arguments and look for them as whole words. i.e if an argument is foo then blah foo bar will be commented out but blahfoo bar will not. I also added a bit of code so that if a line matches multiple arguments, you will still only get one # at the beginning of the line.
#!/bin/bash
FILES="./test1 ./test2"
for name; do
sed -i "/\<$name\>/s/^#*/#/" $FILES
done
Stealing the structure from SiegeX and simplifying the sed program:
#!/bin/bash
FILES="./test1 ./test2"
for name; do
sed -i "s/^.*$name.*$/#&/" $FILES
done
The idea is that rather than using a pattern to select and then an s to edit, you use the s pattern to do both - recognise a complete line that contains the target name, and replace it with a commented-out version.
You could do this more elegantly by merging the names into one big regular expression; that would let sed make one pass rather than N. That's easy enough that i leave it as an exercise to the reader ...

Perl regex: replace all backslashes with double-backslashes

Within a set of large files, I need to replace all occurrences of "\" with "\\". I'd like to use Perl for this purpose. Right now, I have the following:
perl -spi.bak -e '/s/\\/\\\\/gm' inputFile
This command was suggested to me, but it results in no change to inputFile (except an updated timestamp). Thinking that the problem might be that the "\"s were not surrounded by blanks, I tried
perl -spi.bak -e '/s/.\\./\\\\/gm' inputFile
Again, this had no effect on the file. Finally, I thought I might be missing a semicolon, so I tried:
perl -spi.bak -e '/s/.\\./\\\\/gm;' inputFile
This also has no effect. I know that my file contains "\"s, for example in the following line:
("C:\WINDOWS\system32\iac25_32.ax","Indeo audio)
I'm not sure whether there is a problem with the regex, or if something is wrong with the way I'm invoking Perl. I have a basic understanding of regexes, but I'm an absolute beginner when it comes to Perl.
Is there anything obviously wrong here? One thing I notice is that the command returns quite quickly, despite the fact that inputFile is ~10MB in size.
The hard part with handling backslashes in command lines is knowing how many processes are going to manipulate the command line - and what their quoting rules are.
On Unix, under any shell, the first command line you show would work.
You appear to be on Windows, and there, you have the DOS command 'shell' to deal with.
I would put the replacement into a file and pass that to Perl:
#!/bin/perl -spi.bak
s/\\/\\\\/g;
That should do the trick - save as 'subber.pl' and then run:
perl subber.pl file1 ...
How about this it should replace all \ with two \s.
s/\\/\\\\/g
perl -pi -e 's/\\/\\\\/g' inputfile
will replace all of them in one file
this
s/\\/\\\\/g
works for me
You've got a renegade / in the front of the substitution flag at the beginning of the regex
don't use
.\\.
otherwise it will trash whatever's before and after the \ in the file
perl -spi.bak -e 's/.\\./\\\\/gm;' inputFile
maybe?
Why did you type that leading /?
You appear to be on Windows, and
there, you have the DOS command
'shell' to deal with.
Hopefully I am not splitting hairs but Windows hasn't come with DOS for a long time now. I think ME (circa 1999/2000) was last version that still came with DOS or was built on DOS. The "command" shell is controlled by cmd.exe since XP (a sort of DOS simulation), but the affects of running a perl one-liner in a command shell might still be the same as running them in a DOS shell. Maybe someone can verify that.