Linux sed - Delete words do not start with a specific character - regex

How to remove words that do not start with a specific character by sed?
Sample:
echo "--foo imhere -abc anotherone" | sed ...
Result must be;
"--foo -abc"

echo "--foo imhere -abc anotherone" |\
sed -e 's/^/ /g' -e 's/ [^-][^ ]*//g' -e 's/^ *//g'
The first and last -e commands are needed if only when the first word can be wrong either.

gnu sed with -r:
kent$ echo "--foo imhere -abc anotherone" | sed -r 's/^|\s[^-]\S*//g'
--foo -abc
However I prefer awk to solve it, more straightforward:
awk '{for(i=1;i<=NF;i++)$i=($i~/^-/?$i:"")}7'
output:
--foo -abc

You can use ssed to enable PCRE regex and then you can use this one:
(?<!-)\b\w+
Working demo
echo "--foo imhere -abc anotherone" | ssed 's/(?<!-)\b\w+//'

Related

how to replace continuous pattern in text

i have text like 1|2|3||| , and try to replace each || with |0|, my command is following
echo '1|2|3|||' | sed -e 's/||/|0|/g'
but get result 1|2|3|0||, the pattern is only replaced once.
could someone help me improve the command, thx
Just do it 2 times
l_replace='s#||#|0|#g'
echo '1|2|3||||||||4||5|||' | sed -e "$l_replace;$l_replace"
Using any sed or any awk in any shell on every Unix box:
$ echo '1|2|3|||' | sed -e 's/||/|0|/g; s/||/|0|/g'
1|2|3|0|0|
$ echo '1|2|3|||' | awk '{while(gsub(/\|\|/,"|0|"));}1'
1|2|3|0|0|
This might work for you (GNU sed):
sed 's/||/|0|/g;s//[0]/g' file
or:
sed ':a;s/||/|0|/g;ta' file
The replacement needs to actioned twice because part of the match is in the replacement.

unexpected result by cutting the last column with sed

echo '60 test' | sed -r 's/(.*)\s+[^\s]+$/\1/'
result:
60 test
the last column is not cut. but it works pretty well with
echo '60 home' | sed -r 's/(.*)\s+[^\s]+$/\1/'
result:
60
why?
[^\s]+ means not backslash or s repeated 1 or more times and test contains s while home does not and so the latter matches the regexp while the former doesn't.
You should have used either of these instead to match non-space:
$ echo '60 test' | sed -r 's/(.*)\s+\S+$/\1/'
60
$ echo '60 test' | sed -r 's/(.*)\s+[^[:space:]]+$/\1/'
60
As #potong suggested in a comment, to remove the last column with sed all you really need is:
sed -E 's/\s+\S+$//'
I switched from -r to -E as -r is GNU sed only while -E is GNU or OSX/BSD sed so it's generally the better option to use BUT OSX/BSD sed won't recognize \s or \S so changing from -r to -E doesn't really make the script more portable in this case, you'd have to use this instead:
sed -E 's/[[:space:]]+[^[:space:]]+//'
and then to be completely portable to all POSIX seds it'd be:
sed 's/[[:space:]]\{1,\}[^[:space:]]\{1,\}//'
or this would behave the same if there's always 2 or more fields:
sed 's/[[:space:]]*[^[:space:]]*//'
If you are just printing the first part of your string before the space without doing any other modification, you can simply use cut
echo '60 test' | cut -d' ' -f1
60
where you define your delimiter (-d) and the field (-f) you want to select.
No need to go for a complex solution using sed and doing some replacement operations.
With awk you can also print the first field:
echo '60 test' | awk '{print $1}'
60
or via grep in perl mode to have the \s taken into account
echo '60 test' | grep -oP '^.*?(?=\s)'
60

Sed replace asterisk symbols

I'm am trying to replace a series of asterix symbols in a text file with a -999.9 using sed. However I can't figure out how to properly escape the wildcard symbol.
e.g.
$ echo "2006.0,1.0,************,-5.0" | sed 's/************/-999.9/g'
sed: 1: "s/************/-999.9/g": RE error: repetition-operator operand invalid
Doesn't work. And
$ echo "2006.0,1.0,************,-5.0" | sed 's/[************]/-999.9/g'
2006.0,1.0,-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9,-5.0
puts a -999.9 for every * which isn't what I intended either.
Thanks!
Use this:
echo "2006.0,1.0,************,-5.0" | sed 's/[*]\+/-999.9/g'
Test:
$ echo "2006.0,1.0,************,-5.0" | sed 's/[*]\+/-999.9/g'
2006.0,1.0,-999.9,-5.0
Any of these (and more) is a regexp that will modify that line as you want:
$ echo "2006.0,1.0,************,-5.0" | sed 's/\*\**/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed 's/\*\+/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed -r 's/\*+/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed 's/\*\{12\}/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed -r 's/\*{12}/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed 's/\*\{1,\}/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed -r 's/\*{1,}/999.9/g'
2006.0,1.0,999.9,-5.0
sed operates on regular expressions, not strings, so you need to learn regular expression syntax if you're going to use sed and in particular the difference between BREs (which sed uses by default) and EREs (which some seds can be told to use instead) and PCREs (which sed never uses but some other tools and "regexp checkers" do). Only the first solution above is a BRE that will work on all seds on all platforms. Google is your friend.
* is a regex symbol that needs to be escaped.
You can even use BASH string replacement:
s="2006.0,1.0,************,-5.0"
echo "${s/\**,/-999.9,}"
2006.0,1.0,-999.9,-5.0
Using sed:
sed 's/\*\+/999.9/g' <<< "$s"
2006.0,1.0,999.9,-5.0
Ya, * are special meta character which repeats the previous token zero or more times. Escape * in-order to match literal * characters.
sed 's/\*\*\*\*\*\*\*\*\*\*\*\*/-999.9/g'
When this possibility was introduced into gawk I have no idea!
gawk -F, '{sub(/************/,"-999.9",$3)}1' OFS=, file
2006.0,1.0,-999.9,-5.0

Bash shave a first and/or last character from string, but only if it is a certain character

In bash I need to shave a first and/or last character from string, but only if it is a certain character.
If I have | I need
/foo/bar/hah/ => foo/bar/hah
foo/bar/hah => foo/bar/hah
You can downvote me for not listing everything I've tried. But the fact is I've tried at least 35 differents sed strings and bash character stuff, many of which was from stack overflow. I simply cannot get this to happen.
what's the problem with the simple one?
sed "s/^\///;s/\/$//"
Output is
foo/bar/hah
foo/bar/hah
In pure bash :
$ var=/foo/bar/hah/
$ var=${var%/}
$ echo ${var#/}
foo/bar/hah
$
Check bash parameter expansion
or with sed :
$ sed -r 's#(^/|/$)##g' file
How about simply this:
echo "$x" | sed -e 's:^/::' -e 's:/$::'
Further to #sputnick's answer and from this answer, here's a function that would do it:
STR="/foo/bar/etc/";
STRB="foo/bar/etc";
function trimslashes {
STR="$1"
STR=${STR#"/"}
STR=${STR%"/"}
echo "$STR"
}
trimslashes $STR
trimslashes $STRB
# foo/bar/etc
# foo/bar/etc
echo '/foo/bar/hah/' | sed 's#^/##' | sed 's#/$##'
assuming the / character is the only one you're trying to remove, then sed -E 's_^[/](.*)_\1_' should do the job:
$ echo "$var1"; echo "$var2"
/foo/bar/hah
foo/bar/hah
$ echo "$var1" | sed -E 's_^[/](.*)_\1_'
foo/bar/hah
$ echo "$var2" | sed -E 's_^[/](.*)_\1_'
foo/bar/hah
if you also need to replace other characters at the start of the line, add it to the [/] class. for example, if you need to replace / or -, it would be sed -E 's_^[/-](.*)_\1_'
Here is an awk version:
echo "/foo/bar/hah/" | awk '{gsub(/^\/|\/$/,"")}1'
foo/bar/hah

(GNU)Sed: how to replace any character from nth character to nth+10?

I need to replace characters from 10th to 20th in the string which looks like that:
123456789012345678901234567890
So far I've tried:
a)
Works for the 10th character ONLY:
echo "123456789012345678901234567890" | sed 's/./X/10'
b)
Doesn't work on the range:
echo "123456789012345678901234567890" | sed 's/./X/10,20'
echo "123456789012345678901234567890" | sed 's/./X/10\,20'
echo "123456789012345678901234567890" | sed 's/./X/\{10,20\}'
echo "123456789012345678901234567890" | sed 's/./X/\{10\,20\}'
Does not work and I get error
unknown option to `s'
So - the question is - how do I make this to work:
echo "123456789012345678901234567890" | sed 's/./X/10,20'
Try:
$ sed -r "s/^(.{9})(.{11})/\1XXXXXXXXXX/" <<< 123456789012345678901234567890
123456789XXXXXXXXXX1234567890
It is a complex sed problem, I could just find this solution:
$ sed 's/^\(.\{10\}\)\(.\{10\}\)/\1XXXXXXXXXX/' <<< 123456789012345678901234567890
1234567890XXXXXXXXXX1234567890
With awk it looks nicer:
$ awk 'BEGIN{FS=OFS=""} {for (i=10;i<=20;i++) $i="X"} {print}' <<< 123456789012345678901234567890
123456789XXXXXXXXXXX1234567890
You can do it with bash parameter substitution like this:
#!/bin/bash
s="123456789012345678901234567890"
l=${s:0:9} # Extract left part
m=${s:10:11} # Extract middle part
r=${s:20} # Extract right part
# Diddle with middle part to your heart's content and re-assemble "$l$m$r" when done
m=$(sed 's/./X/g' <<<$m)
See here for more explanation and examples.
Or, you can do this:
transform the row of letters into a column so each is on its own line
apply your edits to LINES 10 through 20 (as opposed to characters 10 through 20)
transform column of letters back into a row (by deleting linefeeds)
as shown in the one-liner below:
$ echo "123456789012345678901234567890" | sed "s/\(.\)/\1\n/g" | sed "10,20s/./X/" | tr -d "\n"
I know, that it looks ugly, but:
echo "123456789012345678901234567890" | \
sed 's/^\(.\{10\}\).\{10\}\(.*\)/\1XXXXXXXXXX\2/'
Without placing multiple X in sed command:
sed -r 's/^(.{9})(.{10,20})(.*)$/\1\n\2\n\3/' | sed -e '2s/./X/g' -e 'N;N;s/\n//g'
To replace the 10th to 20th characters, inclusive, try:
echo 123456789012345678901234567890 | sed 's/\(.\{9\}\).\{11\}/\1XXXXXXXXXX/'
123456789XXXXXXXXXX1234567890
With the GNU sed, you can use the -r switch to remove most of the backslashes:
echo 123456789012345678901234567890 | sed -r 's/(.{9}).{11}/\1XXXXXXXXXX/'
Or the naive approach also works here:
echo 123456789012345678901234567890 | sed 's/\(.........\).........../\1XXXXXXXXXX/'
This might work for you (GNU sed):
sed ':a;/.\{9\}X\{11\}/!s/\(.\{9\}X*\)./\1X/;ta' file
or with a bit of syntactic sugar:
sed -r ':a;/.{9}X{11}/!s/(.{9}X*)./\1X/;ta' file