Not able to write a proper regular expression to remove multiple spaces - regex

$ echo "Anirudh Tomer" | sed 's/ +/ /g'
Anirudh Tomer
I was expecting it to remove those 3 spaces between Anirudh and Tomer and give me result as "Anirudh Tomer"
I am a beginner.
Thanks in advance for the help.

You need to enable sed's extended regexp support with the -r flag.
echo "Anirudh Tomer" | sed -r 's/ +/ /g'
In extended regular expressions, the ?, + and | metacharacters must not be escaped (see wikipedia). The * metacharacter works because it belongs to the basic regular expressions.

Similar to VIM regex, you need to escape the + quantifier with a backslash:
sed 's/ \+/ /g'

echo "Anirudh Tomer" | tr -s ' '

Related

sed Back-references used to replace

there is a string a_b_c_d. I want to replace _ with - in the string between a_ and _d. Below is processing.
echo "a_b_c_d" | sed -E 's/(.+)_(.+)_(.+)/\1`s/_/-/g \2`\3/g'
But it does not work. how can I reuse the \2 to replace its content?
Perl allows to use code in replacement section with e modifier
$ echo 'a_b_c_d' | perl -pe 's/a_\K.*(?=_d)/$&=~tr|_|-|r/e'
a_b-c_d
$ echo 'x_a_b_c_y' | perl -pe 's/x_\K.*(?=_y)/$&=~tr|_|-|r/e'
x_a-b-c_y
$&=~tr|_|-|r here $& is the matched portion, and tr is applied on that to replace _ to -
a_\K this will match a_ but won't be part of matched portion
(?=_d) positive lookahead to match _d but won't be part of matched portion
With sed (tested on GNU sed 4.2.2, not sure of syntax for other versions)
$ echo 'a_b_c_d' | sed -E ':a s/(a_.*)_(.*_d)/\1-\2/; ta'
a_b-c_d
$ echo 'x_a_b_c_y' | sed -E ':a s/(x_.*)_(.*_y)/\1-\2/; ta'
x_a-b-c_y
:a label a
s/(a_.*)_(.*_d)/\1-\2/ substitute one _ with - between a_ and _d
ta go to label a as long as the substitution succeeds
gnu sed:
$ sed -r 's/_/-/g;s/(^[^-]+)-/\1_/;s/-([^-]+$)/_\1/' <<<'x_a_b_c_y'
x_a-b-c_y
The idea is, replacing all _ by -, then restoring the ones you want to keep.
update
if the fields separated by _ contains -, we can make use ge of gnu sed:
sed -r 's/(^[^_]+_)(.*)(_[^_]+$)/echo "\1"$(echo "\2"\|sed "s|_|-|g")"\3"/ge'
For example we want ----_f-o-o_b-a-r_---- to be ----_f-o-o-b-a-r_----:
sed -r 's/(^[^_]+_)(.*)(_[^_]+$)/echo "\1"$(echo "\2"\|sed "s|_|-|g")"\3"/ge' <<<'----_f-o-o_b-a-r_----'
----_f-o-o-b-a-r_----
Following Kent's suggestion, and if you do not need a general solution, this works:
$ echo 'a_b_c+d_x' | tr '_' '-' | sed -E 's/^([a-z]+)-(.+)-([a-z]+)$/\1_\2_\3/g'
$ a_b-c+d_x
The character classes should be adjusted to match the leading and trailing parts of your input string. Fails, of course, if a or x contain the '-' character.

Whats wrong with below regex

What is wrong with below regex in unix ?
echo AB345678 | sed -n '/^\([a-zA-Z]\{2\}[0-9]\{6\}|[0-9]\{8\}\)$/p'
echo 12345678 | sed -n '/^\([a-zA-Z]\{2\}[0-9]\{6\}|[0-9]\{8\}\)$/p'
i am not getting the output :(
I mean the string I echoed why is it not matching with my regex?
Whats wrong with my regex?
The alternation operator in the BRE regex syntax must be defined as an escaped pipe \| (similar to ( and )):
echo "AB345678" | sed -n '/^\([a-zA-Z]\{2\}[0-9]\{6\}\|[0-9]\{8\}\)$/p'
^^
See an online demo.
In a more complicated expression you can add '-r' to sed options instead of escaping sensitive characters.
From sed manual:
-r, --regexp-extended
use extended regular expressions in the script.
Answer:
echo AB345678 | sed -nr '/^([a-zA-Z]{2}[0-9]{6}|[0-9]{8})$/p'
^
echo 12345678 | sed -nr '/^([a-zA-Z]{2}[0-9]{6}|[0-9]{8})$/p'
^

How to ignore word delimiters in sed

So I have a bash script which is working perfectly except for one issue with sed.
full=$(echo $full | sed -e 's/\b'$first'\b/ /' -e 's/ / /g')
This would work great except there are instances where the variable $first is preceeded immediately by a period, not a blank space. In those instances, I do not want the variable removed.
Example:
full="apple.orange orange.banana apple.banana banana";first="banana"
full=$(echo $full | sed -e 's/\b'$first'\b/ /' -e 's/ / /g')
echo $first $full;
I want to only remove the whole word banana, and not make any change to orange.banana or apple.banana, so how can I get sed to ignore the dot as a delimiter?
You want "banana" that is preceded by beginning-of-string or a space, and followed by a space or end-of-string
$ sed -r 's/(^|[[:blank:]])'"$first"'([[:blank:]]|$)/ /g' <<< "$full"
apple.orange orange.banana apple.banana
Note the use of -r option (for bsd sed, use -E) that enables extended regular expressions -- allow us to omit a lot of backslashes.

Linux SED RegEx replace, but keep wildcards

If I have a string that contains this somewhere (Foo could be anything):
<tag>Foo</tag>
How would I, using SED and RegEx, replace it with this:
[tag]Foo[/tag]
My failed attempt:
echo "<tag>Foo</tag>" | sed "s/<tag>\(.*\)<\\/tag>/[tag]\1[\\/tag]"
Your regex is missing the terminating /
$ echo "<tag>Foo</tag>" | sed "s/<tag>\(.*\)<\\/tag>/[tag]\1[\\/tag]/"
[tag]Foo[/tag]
With this you can replace all types of tags and don't have to be tag specific.
$echo "<tag>Foo</tag>" | sed "s/[^<]*<\([^>]*\)>\([^<]*\)<\([^>]*\)>/[\1]\2[\3]/"
hope this helps.

Why does sed -e 's/[ +-]?[0-9]*\.[0-9]*//g' not work?

I want to remove all floating point numbers from a string using sed. Therefore I use
sed -e 's/[ +-]?[0-9]*\.[0-9]*//g'
But it does not work:
echo 1.2456 | sed -e 's/[ +-]?[0-9]*\.[0-9]*//g'
gives 1.2456. If I remove the [ +-]? block, it works for positive numbers.
You need to escape the question mark:
echo 1.2456 | sed -e 's/[ +-]\?[0-9]*\.[0-9]*//g'
The ? sign is an extended regex character. sed needs to be called with the -r option to enable the extended expressions.
escape ?
or sed -r
then it should work.
This version is more comparable. ? doesn't work on all systems, and + can mean repeat once or more.
echo 1.2456 | sed -e 's/[ \+\-]*[0-9]*\.[0-9]*//g'