sed command to replace multiple spaces into single spaces - regex

I tried to replace multiple spaces in a file to single space using sed.
But it splits each and every character like below.
Please let me know what the problem is ...
$ cat test.txt
iiHi Hello Hi
this is loga
$
$ cat test.txt | tr [A-Z] [a-z]|sed -e "s/ */ /g"
i i h i h e l l o h i
t h i s i s l o g a

Your sed command does the wrong thing because it's matching on "zero or more spaces" which of course happens between each pair of characters! Instead of s/ */ /g you want s/ */ /g or s/ +/ /g.

Using tr, the -s option will squeeze consecutive chars to a single one:
tr -s '[:space:]' < test.txt
iiHi Hello Hi
this is loga
To downcase as well: tr -s '[:space:]' < test.txt | tr '[:upper:]' '[:lower:]'

sed 's/ \+/ /g' test.txt | tr [A-Z] [a-z]
or
sed 's/\s\+/ /g' test.txt | tr [A-Z] [a-z]
Good grief that was terse, because * matches zero or more it inserted a space after every character, you want + which matches one or more. Also I switched the order because in doing so you don't have to cat the file.

You can use awk to solve this:
awk '{$0=tolower($0);$1=$1}1' test.txt
iihi hello hi
this is loga

Maybe you can match the following regex for multiple spaces:
'\s+'
and replace with just one space as follows:
' '

Related

How to match a regex 1 to 3 times in a sed command?

Problem
I want to get any text that consists of 1 to three digits followed by a % but without the % using sed.
What I tried
So i guess the following regex should match the right pattern : [0-9]{1,3}%.
Then i can use this sed command to catch the three digits and only print them :
sed -nE 's/.*([0-9]{1,3})%.*/\1/p'
Example
However when i run it, it shows :
$ echo "100%" | sed -nE 's/.*([0-9]{1,3})%.*/\1/p'
0
instead of
100
Obviously, there's something wrong with my sed command and i think the problem comes from here :
[0-9]{1,3}
which apparently doesn't do what i want it to do.
edit:
Solution
The .* at the start of sed -nE 's/.*([0-9]{1,3})%.*/\1/p' "ate" the two first digits.
The right way to write it, according to Wicktor's answer, is :
sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'
The .* grabs all digits leaving just the last of the three digits in 100%.
Use
sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'
Details
(.*[^0-9])? - (Group 1) an optional sequence of any 0 or more chars up to the non-digit char including it
([0-9]{1,3}) - (Group 2) one to three digits
% - a % char
.* - the rest of the string.
The match is replaced with Group 2 contents, and that is the only value printed since n suppresses the default line output.
It will be easier to use a cut + grep option:
echo "abc 100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100
echo "100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100
Or else you may use this awk:
echo "100%" | awk 'match($0, /[0-9]{1,3}%/){print substr($0, RSTART, RLENGTH-1)}'
100
Or else if you have gnu grep then use -P (PCRE) option:
echo "abc 100%" | ggrep -oP '[0-9]{1,3}(?=%)'
100
This might work for you (GNU sed):
sed -En 's/.*\<([0-9]{1,3})%.*/\1/p' file
This is a filtering exercise, so use the -n option.
Use a back reference to capture 1 to 3 digits, followed by % and print the result if successful.
N.B. The \< ensures the digits start on a word boundary, \b could also be used. The -E option is employed to reduce the number of back slashes which would normally be necessary to quote (,),{ and } metacharacters.

sed Back-references used to replace

there is a string a_b_c_d. I want to replace _ with - in the string between a_ and _d. Below is processing.
echo "a_b_c_d" | sed -E 's/(.+)_(.+)_(.+)/\1`s/_/-/g \2`\3/g'
But it does not work. how can I reuse the \2 to replace its content?
Perl allows to use code in replacement section with e modifier
$ echo 'a_b_c_d' | perl -pe 's/a_\K.*(?=_d)/$&=~tr|_|-|r/e'
a_b-c_d
$ echo 'x_a_b_c_y' | perl -pe 's/x_\K.*(?=_y)/$&=~tr|_|-|r/e'
x_a-b-c_y
$&=~tr|_|-|r here $& is the matched portion, and tr is applied on that to replace _ to -
a_\K this will match a_ but won't be part of matched portion
(?=_d) positive lookahead to match _d but won't be part of matched portion
With sed (tested on GNU sed 4.2.2, not sure of syntax for other versions)
$ echo 'a_b_c_d' | sed -E ':a s/(a_.*)_(.*_d)/\1-\2/; ta'
a_b-c_d
$ echo 'x_a_b_c_y' | sed -E ':a s/(x_.*)_(.*_y)/\1-\2/; ta'
x_a-b-c_y
:a label a
s/(a_.*)_(.*_d)/\1-\2/ substitute one _ with - between a_ and _d
ta go to label a as long as the substitution succeeds
gnu sed:
$ sed -r 's/_/-/g;s/(^[^-]+)-/\1_/;s/-([^-]+$)/_\1/' <<<'x_a_b_c_y'
x_a-b-c_y
The idea is, replacing all _ by -, then restoring the ones you want to keep.
update
if the fields separated by _ contains -, we can make use ge of gnu sed:
sed -r 's/(^[^_]+_)(.*)(_[^_]+$)/echo "\1"$(echo "\2"\|sed "s|_|-|g")"\3"/ge'
For example we want ----_f-o-o_b-a-r_---- to be ----_f-o-o-b-a-r_----:
sed -r 's/(^[^_]+_)(.*)(_[^_]+$)/echo "\1"$(echo "\2"\|sed "s|_|-|g")"\3"/ge' <<<'----_f-o-o_b-a-r_----'
----_f-o-o-b-a-r_----
Following Kent's suggestion, and if you do not need a general solution, this works:
$ echo 'a_b_c+d_x' | tr '_' '-' | sed -E 's/^([a-z]+)-(.+)-([a-z]+)$/\1_\2_\3/g'
$ a_b-c+d_x
The character classes should be adjusted to match the leading and trailing parts of your input string. Fails, of course, if a or x contain the '-' character.

Grepping for overlapping pattern matches

This is what I'm running
grep -o ',[tcb],' <<< "r,t,c,q,c b,b,"
The output is
,t,
,b,
But I want to get
,t,
,c,
,b,
(I do not want the b without a preceding , or the c without a trailing , to be matched)
Because ,[tcb], should be found in 'r",t,"c,q b,b,' 'r,t",c,"q b,b,' and 'r,t,c,q b",b,"'
But it seems that when the , is included in the first pattern match then grep does not look for this in the second instance of the pattern match
Is there a way around this or is grep not meant to do this
You can use awk instead of grep for this with record separator as comma:
awk -v RS=, '/^[tcb]$/{print RS $0 RS}' <<< "r,t,c,q,c b,b,"
,t,
,c,
,b,
You can use grep with a Perl RE, which allows non-capturing look-behind and look-ahead patterns to extract letters surrounded by commas. You can then restore the separators just as you need them as by:
grep -o -P '(?<=,)[tcb](?=,)' <<< "r,t,c,q,c b,b,"|while read c; do echo ",$c,"; done
The awk solution is nice. I have another with sed+grep:
echo "r,t,c,q,c b,b," | sed "s/,/,,/g" | grep -o ',[tcb],'
,t,
,c,
,b,

How to remove " character within the double quotes on ubuntu?

I have a file with the condition like this :
"one","two","three"" four","five"
So I want to remove the quotes mark within the double quotes, so the output be like this :
"one","two","three four ","five"
How can I do that with awk function and regular expression on ubuntu? Thanks...
You can simply look for "" and replace it by an empty string.
Like:
sed -i 's/""//' *.txt
For example:
echo '"one","two","three"" four","five"' | sed 's/""//'
"one","two","three four","five"
sed is the right tool for this.
$ echo '"one","two","three"" four","five"' | sed 's/\([^,]\)"\+\([^,]\)/\1\2/g'
"one","two","three four","five"
The above regex captures the character (character not of a comma) which exits before and after to one or more double quotes. So this would match the double quotes which exists at the center.
OR
$ echo '"one","two","three"" four","five"' | sed -r 's/([^,])"+([^,])/\1\2/g'
"one","two","three four","five"
[^,] matches any character but not of a comma.
([^,]) matched character was captured into group 1. It's like aa temporary storage area.
"+ one or more +
([^,]) captures the following character which won't be a comma.
\1\2 all the matched chars are replaced with the characters stored inside group index 1 and the group index 2.
Update:
$ echo '"one","two","three" vg " "gfh" four","five"' | sed -r 's/([^,])"+([^,])/\1\2/g;s/([^,])"+([^,])/\1\2/g'
"one","two","three vg gfh four","five"
Using awk you can do:
s="one","two","three"" four","five"'
awk 'BEGIN{FS=OFS=","} {for (i=1; i<=NF; i++) gsub(/""/, "", $i)} 1' <<< "$s"
"one","two","three four","five"

having a regex replacing across lines, retain the newlines?

I'd like to have a substitute or print style command with a regex working across lines. And lines retained.
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | tr -d '\n' | grep -or 'b.*f'
bcdef
or
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | tr -d '\n' | sed -r 's|b(.*)f|y\1z|'
aycdezg
i'd like to use grep or sed because i'd like to know what people would've done before awk or perl ..
would they not have? was .* not available? had they no other equivalent?
to possibly modify some input with a regex that spans across lines, and print it to stdout or output to a file, retaining the lines.
This should do what you're looking for:
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | sed ':a;$s/b\([^f]*\)f/y\1z/;N;ba'
a
y
c
d
e
z
g
It accumulates all the lines then does the replacement. It looks for the first "f". If you want it to look for the last "f", change [^f] to ..
Note that this may make use of features added to sed after AWK or Perl became available (AWK has been around a looong time).
Edit:
To do a multi-line grep requires only a little modification:
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | sed ':a;$s/^[^b]*\(b[^f]*f\)[^f]*$/\1/;N;ba'
b
c
d
e
f
sed can match across newlines through the use of its N command. For example, the following sed command will replace bar followed a newline followed by foo with ###:
$ echo -e "foo\nbar\nbaz\nqux" | sed 'N;s/bar\nbaz/###/;P;D'
foo
###
qux
The N command will append the next input line to the current pattern space separated by an embedded newline (\n)
The P command will print the current pattern space up to and including the first embedded newline.
The D command will delete up to and including the first embedded newline in the pattern space. It will also start next cycle but skip reading from the input if there is still data in the pattern space.
Through the use of these 3 commands, you can essentially do any sort of s command replacement looking across N-lines.
Edit
If your question is how can I remove the need for tr in the two examples above and just use sed then here you go:
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | sed ':a;N;$!ba;s/\n//g;y/ag/yz/'
ybcdefz
Proven tools to the rescue.
echo -e "foo\nbar\nbaz\nqux" | perl -lpe 'BEGIN{$/=""}s/foo\nbar/###/'