Remove all hyperlinks in a text file, linux scripting - regex

I am very new in scripting, but I want to learn it.
What I have to do is to remove all occurrences of something like http://* from a text file. I want to do it with sed command and regular expressions.
Here is what I have come up to so far:
sed 's/http:\/\/.*/ /' < input.txt > output.txt
This code replaces all the hyperlinks with a space. But the problem is that it also removes the rest of the line.
How can I fix this problem? I have tried adding space, "http://.* " or end of word "http://.*\>" or other tricks that I found in the internet, but they didn't work.
And is there a better way to do so instead of using sed?

Sed is a fine way to do this. Try changing your regex to s!http://[^[:space:]]*! !g.

Related

sed and regular expression: unexpected replacement pattern

I am trying to use a small bash script using sed to append a string, but I do not seem to be able to make it work.
I want to append a string to another string pattern:
Strings in input file:
Xabc
Xdef
Desired output:
XabcZ
XdefZ
Here is the script:
#!/bin/bash
instring="$2"
sed -r "s/${instring}/${instring}Z/g" $1
Where $1 is the file name and $2 is the string pattern I am looking for
Then I run the script:
bash script.test.sh test.txt X
output:
XZabc
XZdef
As expected.
but if I use regular expressions:
bash script.test.sh test.txt X...
All I get is:
X...Z
X...Z
So obviously it is not reading it correctly in the replacement part of the command. Smae thing if I use X[a-z09] (but there may be "_" in my strings, I want to include those as well). I had a look at several previous similar topics, but I do not seem able to implement any of the solutions correctly (bear with a newbie...). Thank you for your kind help.
EDIT: After receiving the answers from Glenn Jackman (accepted solution) and RavinderSingh13, I would like to clarify two important points for whoever is having a similar issue:
1) Glenn Jackman solution did not work because I needed to convert the text file from DOS to Unix. I did it with dos2unix , but for some reason did not work (maybe forgot to overwrite the output to the old file?). I later did it using sed -i 's/\r$//' test.txt ; that solved the issue, and Glenn's solution now works. having a dos-formatted text file has been the source of many trouble, for me at least.
2) I probably did not make clear that I only wanted to target specific lines in the input files; my example only has target strings, but the actual file has strings that I do not want to edit. That was probably the misunderstanding occurred with RavinderSingh13's script, which actually works, but targets every single line.
Hope this can help future readers. Thank you, Stackers, you saved the day once again :)
What you have (sed -r "s/${instring}/${instring}Z/g" $1) uses the variable as a pattern on the left-hand side and as plain text on the right-hand side.
What you want to do is:
sed -r "s/${instring}/&Z/g" $1
# ....................^
where the & marker is replaced by whatever text the pattern matched. In the documentation for The s Command:
[T]he replacement can contain unescaped & characters which reference the whole matched portion of the pattern space.
EDIT: In case you need to pass a regex to script then following may help, where my previous solution was only appending a character to last of the line.
cat script.ksh
value="$2"
sed "s/$value/&Z/" "$1"
After running the script:
./script.ksh X.*
XabcZ
XdefZ
After seeing OP's comment to match everything which starts from either small letter or capital letter run script in following style then.
./script.ksh [A-Za-z]+*
Could you please try following and let me know if this helps you.
cat script.ksh
value="$2"
sed "s/$/$value/" "$1"
After running script I am getting following output on terminal too.
./script.ksh Input_file Z
XabcZ
XdefZ
You could use sed -i option in above code in case you want to save output into Input_file itself too.

Bash sed how to mask brackets

I want to search for 'start') in the file /etc/init.d/fhem, and write code which I read from a textfile to that file after the statement above. At the moment I get a message that I have to close the bracket from 'start'). I think I have to mask it properly, but so far no luck with trying that. May someone give me the missing link?
CocConf=$(<COC.txt)#Reading Cod from File to insert in other file
sed -r "\'start\')/a $CocConf" /etc/init.d/fhem #Inserting said Code
You're missing the / before the regular expression. And there's no need to escape single quotes inside double quotes. But when you use extended regexps, you need to escape parentheses. The a command also requires a backslash after it, and the text to be added must be on the next line.
sed -r "/start\)/a\
$CocConf" /etc/init.d/fhem

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

Remove all spaces before a specific symbol/character in a text file using notepad++

Okay I'm trying to set up a set of tag lists for a translation script but have encountered a problem I need to remove the spaces in one of the languages using regex
the lines are set up like "JP JP":"ENG ENG" and I would like it to be "JPJP":"ENG ENG"
I'm new when it comes to regex so I'm out of ideas of what to try
Thanks!
Try this, this expression uses positive lookahead
\s+(?=\w+":")
You can use Unix utility sed
sed -i 's/old_name/new_name/g' your-file
In your example this should be:
sed -i 's/"JP JP":"ENG ENG"/"JPJP":"ENGENG"/g' your-file

sed add text around regex

I would like to be able to go:
sed "s/^\(\w+\)$/leftside\1rightside/"
and have the group matched by (\w+\) appear in between 'leftside' and 'rightside'.
But it seems like I have to pipe it twice, one for the left of the text, another time for the right. If anyone knows a way to do it in one pass, I'd appreciate it.
The reason it's not working is that you probably specify the wrong regex. In your case, text will be added in the end and beginning of the line only if it consists only of word characters (given that your version of sed supports the \w notation). Also you didn't escape the + which you should do if not using the -r option.
Try starting with sed "s/^\(.*\)$/leftside\1rightside/" or just sed "s/.*/leftside&rightside/" and working from that.