global transformation in sed between two tokens - regex

My problem seems to be simple but I found it suprisingly difficult to convert into sed expression.
I need to 'underscorize' names between certain tokens in a file. So if the line has tokens : and = I need to convert 'multi word name' into 'multi_word_name' between these two tokens.
I know it is fairly easy to do two step match anything between the tokens and then global replace spaces with underscores but I can't find a way of retaining unmatched part of line for writing back into file.

This might work for you (GNU sed, tr and bash):
sed 's/:[^=]*=/$(tr " " "_" <<<"&")/g;s/.*/echo "&"/e' file
or just using sed:
sed ':a;/:[^_=]*=/!b;s//\n&\n/;h;s/.*\n\(.*\)\n.*/\1/;y/ /_/;H;g;s/\n.*\n\(.*\)\n\(.*\)/\2\1/;ta' file

You could do it with two invocations of sed:
echo 'pre with space :multi word name=post with space' \
| sed 's/[:=]/\n&/g' | sed '/^:/s/ /_/g' | tr -d '\n'
To make this work on a file, you would do something like this (in bash):
while read; do
echo "$REPLY" | sed 's/[:=]/\n&/g' | sed '/^:/s/ /_/g' | tr -d '\n'
echo
done < infile
Although awk would be a more suitable tool for the task:
awk -F '[:=]' '{ gsub(" ", "_", $2); print $1 ":" $2 "=" $3 }' infile

Related

shell multiline selection from word to character

.textexpandrc
[yoro] よろしくお願いします。
[ohayo] おはようございます。
元気ですか?
[otsu] お疲れさまでします。
Looking for
$ KEY=ohayo; awk "???" ~/.textexpandrc
おはようございます。
元気ですか?
awk or sed is fine, but I'd like to avoid using a mix of awk/sed/perl/tr/cut etc because I'm under the impression that awk is robust enough to handle this on its own.
The best I could find on my own was
$ KEY=ohayo; awk "/\[${KEY}/,/\[otsu/" ~/.textexpandrc | sed "s/\[${KEY}\] //" | grep -v otsu
おはようございます。
元気ですか?
But I need to know the next key in advance (not impossible but ugly). Strangely, if asking awk to search until the square bracket, it fails to select a multiline
$ KEY=ohayo; awk "/\[${KEY}/,/\[/" ~/.textexpandrc
[ohayo] おはようございます。
Currently using a single-line parser solution as follow
#!/usr/bin/env bash
CONFIG=${HOME}/.textexpandrc
ALL_KEYS=$(sed 's/\].*/]/' ${CONFIG} | tr -d '[]')
KEY=$(echo $ALL_KEYS | rofi -sep ' ' -dmenu -p "autocomplete")
grep "\[${KEY}\]" $CONFIG | sed "s/\[${KEY}\] //" | xsel -ib # ← HERE
xdotool key ctrl+shift+v
If you set up the RS and FS variables to match [ and ], this works quite well:
awk 'BEGIN{ RS="\["; FS="\] " }; $1 ~ key { print $2 }' key=ohayo tmp.txt
You pass in the parameter you're searching for using key=.... on the command line instead of setting a variable. This makes it much easier to write the awk script within single quotes.

How to remove special characters like a single quote from a string?

Using Sed I tried but it did not worked out.
Basically, I have a string say:-
Input:-
'http://www.google.com/photos'
Output required:-
http://www.google.com
I tried using sed but escaping ' is not possible.
what i did was:-
sed 's/\'//' | sed 's/photos//'
sed for photos worked but for ' it didn't.
Please suggest what can be the solution.
Escaping ' in sed is possible via a workaround:
sed 's/'"'"'//g'
# |^^^+--- bash string with the single quote inside
# | '--- return to sed string
# '------- leave sed string and go to bash
But for this job you should use tr:
tr -d "'"
Perl Replacements have a syntax identical to sed, works better than sed, is installed almost in every system by default and works for all machines the same way (portability):
$ echo "'http://www.google.com/photos'" |perl -pe "s#\'##g;s#(.*//.*/)(.*$)#\1#g"
http://www.google.com/
Mind that this solution will keep only the domain name with http in front, discarding all words following http://www.google.com/
If you want to do it with sed , you can use sed "s/'//g" as advised by Wiktor Stribiżew in comments.
PS: I sometimes refer to special chars with their ascii hex code of the special char as advised by man ascii, which is \x27 for '
So for sed you can do it:
$ echo "'http://www.google.com/photos'" |sed -r "s#'##g; s#(.*//.*/)(.*$)#\1#g;"
http://www.google.com/
# sed "s#\x27##g' will also remove the single quote using hex ascii code.
$ echo "'http://www.google.com/photos'" |sed -r "s#'##g; s#(.*//.*)(/.*$)#\1#g;"
http://www.google.com #Without the last slash
If your string is stored in a variable, you can achieve above operations with pure bash, without the need of external tools like sed or perl like this:
$ a="'http://www.google.com/photos'" && a="${a:1:-1}" && echo "$a"
http://www.google.com/photos
# This removes 1st and last char of the variable , whatever this char is.
$ a="'http://www.google.com/photos'" && a="${a:1:-1}" && echo "${a%/*}"
http://www.google.com
#This deletes every char from the end of the string up to the first found slash /.
#If you need the last slash you can just add it to the echo manually like echo "${a%/*}/" -->http://www.google.com/
It's unclear if the ' are actually around your string, although this should take care it:
str="'http://www.google.com/photos'"
echo "$str" | sed s/\'//g | sed 's/\/photos//g'
Combined:
echo "$str" | sed -e "s/'//g" -e 's/\/photos//g'
Using tr:
echo "$str" | sed -e "s/\/photos//g" | tr -d \'
Result:
http://www.google.com
If the single quotes are not around your string it should work regardless.

Which characters to escape to match these in find regex expression in Bourne shell?

I writing a little bourne shell script which load a conf file content a string, this string is uses in find (after some awk tricks) like this following example:
original string:
rx='~ #'
find command:
find -regex "^.*~$\|^.*#$"
EDIT: the original string is in a conf file, so the problem is when the string content special characters as "*.".. Exemple:
original string (with characters to escape):
rx='~ # $*'
EDIT2: I trying to match any file ended by word in rx (separates with space). If rx="st ar", I want to match with "test" and "bar". But if the word content any characters as * $, my regex doesn't work properly.. So, I wanted to know which is all characters that I have to escape to make it work..
Thank's ! :)
As I understand it, you want to split your string on spaces, and match any substring from that split.
The irc.freenode.org #bash channel has a factoid providing a function for performing quoting, used below with some minor tweaks for POSIX compatibility:
requote() { printf '%s\n' "$1" | sed 's/[^^]/[&]/g; s/\^/\\^/g'; }
input_string='hello# cruel*world how~are~you'
output_string=$(printf '%s\n' "$input_string" | tr ' ' '\n' | {
out_s=''
while read -r line; do
if [ -n "$out_s" ]; then
out_s="${out_s}|$(requote "$line")"
else
out_s="$(requote "$line")"
fi
done
printf '%s\n' "$out_s"
})
find . -regex ".*(${output_string}).*"
Ok, thank's to Charles Duffy, I understand that the good method is to encapsule any characters in "[]" to make there safe in a regex. Except for '^', we make it like this '\^'. here's what I did bases on the answer of Mr. Duffy.
So, I have an init string and I want to match with any words in this string.
Init string (emacs tmp and example for this trick)
rx=' ~ # oo ^ '
First, I trim the strign like this:
rx=`printf '%s\n' "$rx" | awk '{$1=$1};1'`
==> rx='~ # oo ^'
Second, I do the sed trick of Duffy with some change to apply in my case:
rx=`printf '%s\n' "$rx" | sed 's/[[:blank:]]/ /g; s/[^^ ]/[&]/g; s/\^/\\^/g'`;
==> rx='[~] [#] [oo] [^]'
Third, I apply a little awk command to make a regex:
rx=`printf '%s\n' "$rx" | awk '{ gsub(" ", "$\\|^.*", $0); print "^.*"$0"$" }'`;
==> rx='^.*[~]$\|^.*[#]$\|^.*[o][o]$\|^.*\^$'
Finally, I just exec my find command like this:
find -regex "$rx"
et voilà !
BTW, i'm doing this:
rx=`printf '%s\n' "$rx" | awk '{$1=$1};1 | sed 's/[[:blank:]]/ /g; s/[^^ ]/[&]/g; s/\^/\\^/g' | awk '{ gsub(" ", "$\\|^.*", $0);'

sed to remove single and double quotes at the same time

I am trying to remove single quotes and double quotes from a file. Can I do it in a single sed command?
I am trying :
sed 's/\"//g;s/\'//g' txt file
but get this error
`' ' is unmatched.
Please help.
Another possibility would be to use tr:
tr -d \'\" file
You cannot escape a single quote inside a pair of singe quotes in shell. Escaping double quotes is allowed though. Following sed command should work:
sed "s/['\"]//g" file
Try this one instead :
sed -e 's|["'\'']||g' txt
To remove single quotes, simply use double quotes for the regex in sed:
sed -e "s/'//g" ./new_file.csv
You can use commands below
sed "s/'/ /g" file.txt > newfile.txt
sed 's/\"//g' newfile.txt > Required_file.txt
Required_file.txt is the final output.
I solved it (in Centos 7) by removing surrounding quotes all together like:
sed -i s/\'//g file;sed -i s/\"//g file
Well, here's what I've came to.
First, I found out with ord() what are codes for single and double quotes characters, and then used $(..) syntax to pass it into unquoted sed expression. I used XX and yy instead of empty strings. Obviously it is faaar from optimal, i.e. they perhaps should be combined into one expression, I encourage you to experiment with it.
There are several techniques to avoid quoting problems, you can also put sed expression into separate file, to avoid it to be interpreted by shell. The ord() / chr() trick is also useful when trying to deal with single unreadable characters in output, e.g. UTF strings on non-UTF console.
dtpwmbp:~ pwadas$ echo '"' | perl -pe 'print ord($_) . "\n";'
34
"
dtpwmbp:~ pwadas$ echo "'" | perl -pe 'print ord($_) . "\n";'
39
'
dtpwmbp:~ pwadas$ echo \'\"
'"
dtpwmbp:~ pwadas$ echo \'\" | sed -e s/$(perl -e 'print chr(34) . "\n"')/XX/g | sed -e s/$(perl -e 'print chr(39) . "\n"')/yy/g
yyXX
dtpwmbp:~ pwadas$
EDIT (note that this time, both characters are replaced with the same string "yy").There might be some shell utilities for "translating" characters to character codes and opposite, i.e. it should be possible to do this without using perl or other language interpreter.
dtpwmbp:~ pwadas$ echo \'\" | sed -e s/[`perl -e 'print chr(34) . chr(39)'`]/yy/g
yyyy
dtpwmbp:~ pwadas$
and here's yet another way in shell, perhaps even simpler
dtpwmbp:~ pwadas$ X="'"; Y='"' ; echo $X$Y; echo $X$Y | sed -e "s/$X/aa/g;s/$Y/bb/g"
'"
aabb
dtpwmbp:~ pwadas$

Substitute a regex pattern using awk

I am trying to write a regex expression to replace one or more '+' symbols present in a file with a space. I tried the following:
echo This++++this+++is+not++done | awk '{ sub(/\++/, " "); print }'
This this+++is+not++done
Expected:
This this is not done
Any ideas why this did not work?
Use gsub which does global substitution:
echo This++++this+++is+not++done | awk '{gsub(/\++/," ");}1'
sub function replaces only 1st match, to replace all matches use gsub.
Or the tr command:
echo This++++this+++is+not++done | tr -s '+' ' '
The idiomatic awk solution would be just to translate the input field separator to the output separator:
$ echo This++++this+++is+not++done | awk -F'++' '{$1=$1}1'
This this is not done
Try this
echo "This++++this+++is+not++done" | sed -re 's/(\+)+/ /g'
You could use sed too.
echo This++++this+++is+not++done | sed -e 's/+\{1,\}/ /g'
This matches one or more + and replaces it with a space.
For this case I recommend sed, this is powerful for substitution and has a short syntax.
Solution sed:
echo This++++this+++is+not++done | sed -En 's/\\++/ /gp'
Result:
This this is not done
For awk:
You must use the gsub function for global line substitution (more than one substitution).
The syntax:
gsub(regexp, replacement [, target]).
If the third parameter is ommited then $0 is the target.
Target must a variable or array element. gsub works in target, overwritten target with the replacement.
Solution awk:
echo This++++this+++is+not++done | awk 'gsub(/\\++/," ")
Result:
This this is not done
echo "This++++this+++is+not++done" | sed 's/++*/ /g'
If you have access to node on your computer you can do it by installing rexreplace
npm install -g regreplace
and then run
rexreplace '\++' ' ' myfile.txt
Of if you have more files in a dir data you can do
rexreplace '\++' ' ' data/*.txt