sed - replace all strings that begin with - replace

Hi I want to find all strings anywhere in a file that begin with the letters rs i.e.
rs12345 100
rs54321 200
300 rs13579
and delete all strings that begin with the criteria so that I get:
100
200
300
i.e. replace the string with nothing. I am not bothered about leading whitespace before final output as I deal with that later. I have tried sed 's/rs*//g' however this gives:
12345 100
54321 200
i.e. only removes the rs.
How do I edit my sed command to delete the entire string? Thanks

You can replace from starting rs up to a space by an empty string, so that the rsXXX gets removed.
sed 's/^rs[^ ]*//' file
This supposed your rs was in the beginning of the line. If rs can appear anywhere in the file, use:
sed 's/\brs[^ ]*//' file
The \b works as word boundary, so that things like hellorshello does not match.
Test
$ cat a
rs12345 100
rs54321 200
hellooo 300
$ sed 's/^rs[^ ]*//' a
100
200
hellooo 300
Note I am not dealing with the whitespace, since you mention you are handling it later on. In case you needed it, you can say sed 's/^rs[^ ]* \+//' file.
rs anywhere:
$ cat a
rs12345 100
rs54321 200
hellooo 300
and rs12345 100
thisrsisatest 24
$ sed 's/\brs[^ ]*//' a
100
200
hellooo 300
and 100
thisrsisatest 24
Note you current approach wasn't working because with rs* you are saying: r followed by 0 or more s.

Let the word be a pattern starting with word break \b and having characters \w (and ending with \b too, but we will not need it).
The command that removes words starting with rs will be sed -r 's/\brs\w+//g' file
$ cat file
rs12345 100
rs54321 200
ars1234 000
$ sed -r 's/\brs\w+//g' file
100
200
ars1234 000

Related

Using grep, how to match beginning of line with pattern from stdin

I have a one-liner that prints out a series a numbers:
124
132
186
I am then piping this output into grep to match these numbers to the beginning of lines in another file but sometimes the second number in the line matches one of the patterns and I get an incorrect match like so:
$ get_id_command | grep -f - users.list
124 => 3456, Charles Charmichael, ccharmichael
132 => 2498, Sarah Walker, swalker
186 => 8934, John Casey, jcasey
240 => 1245, Morgan Grimes, mgrimes
What options do I need for grep to only match patterns at the beginning of the line? I would really like to keep this as a one-linter.
Prepend a circumflex to each line of your file and it will work. Circumflex does indicate the line start within the pattern. So modify your users.list as described, e.g.
sed -Ei 's|(.*)|^\1|' users.list
After that you should get the desired result by your command
$ get_id_command | grep -f - users.list

Bash/sed: delete everything from text file except match(es)

I have a text file which I need to extract a match from in a bash script. There might be more than one match and everything else is supposed to be discarded.
Sample snippet of input.txt file content:
PART TWO OF TWO PARTS-
E RESNO 56/20 56/30 54/40 52/50 TUDEP
EAST LVLS NIL
WEST LVLS 310 320 330 340 350 360 370 380 390
EUR RTS WEST NIL
NAR NIL-
REMARKS.
1.TMI IS 142 AND OPERATORS ARE REMINDED TO INCLUDE THE
TMI NUMBER AS PART OF THE OCEANIC CLEARANCE READ BACK.
2.ADS-C AND CPDLC MANDATED OTS ARE AS FOLLOWS
TRACK A 350 360 370 380 390
TRACK B 350 360 370 380 390
I try to match for 142 from the line
1.TMI IS 142 AND OPERATORS ARE REMINDED TO INCLUDE THE
The match is always a number (one to three digits, may have leading zeroes) and always preceded by TMI IS.
My experiments so far led to nothing: I tried .*TMI IS ([0-9]+).* with the following sed command in my bash script
sed -n 's/.*TMI IS \([0-9]+\).*/\1/g' input.txt > output.txt
but only got an empty output.txt.
My script runs in GNU Bash-4.2. Where do I make my mistake? I ran out of ideas so your input is highly appreciated!
Thanks,
Chris
Two moments about your sed approach to make it work:
+ quantifier should be escaped in sed basic regular expressions
to print matched pattern use p subcommand:
sed -n 's/.*TMI IS \([0-9]\+\).*/\1/gp' input.txt
142
To get only the first match for your current format use:
sed -n 's/^\S\+TMI IS \([0-9]\+\).*/\1/gp' input.txt
With GNU grep:
$ grep -oP 'TMI IS \K([0-9]*)' input.txt
142
You could also do this using perl as an alternative to the above:
$ perl -nle 'print $1 if /TMI IS (\d+)/;' < input.txt
142

splitting bash string by delimiter (last line with delimiter) into array

I'm having a hard time splitting a string like this:
444,555,text with, separator
into this:
444
555
text with, separator
i.e. into a 3-element array (last element may contain comma)
I tried sed but I end up having 4 elements due to the last comma.
Any ideas?
Thanks,
With bash and array:
s='444,555,text with, separator'
IFS=, read -r a b c <<< "$s"
array=("$a" "$b" "$c")
declare -p array
Output:
declare -a array='([0]="444" [1]="555" [2]="text with, separator")'
sed editor allows replacing the number th match of the regexp(i.e. the k-th occurence of the string within a line):
str="444,555,text with, separator"
sed 's/,/\n/1; s/,/\n/1' <<< $str
The output:
444
555
text with, separator
s/,/\n/1 - 1 here is a number flag which points to the first occurrence of , to replace with \n
The following will give the same result(implying the first match on each substitution):
sed 's/,/\n/; s/,/\n/' <<< $str
Two consecutive substitutions will give 3 lines(chunks)
echo "444,555,text with, separator" | sed "s/\([0-9]*\),\([0-9]*\),\(.*\)/\1\n\2\n\3/"
Output:
444
555
text with, separator

egrep command for lines that have one or more instance of 1234 but no other numbers?

So I'm fairly new to regular expressions and I'm wondering how this would be implemented as a egrep command.
I basically want to look for lines in a file that have one or more instances of "1234", but no other numbers. (non-digit characters are allowed).
Examples:
1234 - valid
12341234 - valid
12345 - invalid (since 5 is there)
You can use grep to extract the lines that contain 1234, then replace 1234 with something that doesn't appear in the input, then remove lines that still contain any digits, and replace the special string back by 1234:
< input-file grep 1234 \
| sed 's/1234/\x1/g' \
| grep -v '[0-9]' \
| sed 's/\x1/1234/g'
So, we want to select lines that have 1234 one or more times but no other digits:
grep -E '^([^[:digit:]]*1234)+[^[:digit:]]*$' file
How it works
The regex begins with ^ and ends with $. That means that is must match the whole line.
Inside the regex are two parts:
([^[:digit:]]*1234)+ matches one or more 1234 with no other digits.
[^[:digit:]]* matches any non-digits that follows the last 1234.
In olden times, one would use [0-9] to match digits. With unicode, that is no longer reliable. So, we are using [:digit:] which is unicode safe.
Example
Let's use this test file:
$ cat file
this 1234 is valid
12341234 valid
not valid 12345
not 2 valid 1234 line
no numbers so not valid
Here is the result:
$ grep -E '^([^[:digit:]]*1234)+[^[:digit:]]*$' file
this 1234 is valid
12341234 valid
If you want no other digit after your 1234 block:
egrep '\<(1234)+(\>|[^0-9])' *
-- -- --> word delimiters
---- --> the word you're looking for
------ --> non digit characters
- --> one or more times
If you want only "words" made up by the "1234" block, then you can egrep this:
egrep '\<(1234)+\>' *
-- -- --> word delimiters
---- --> the word you're looking for
- --> one or more times.

N-Insert In Sed

How would I replace the first 150 characters of every line with spaces using sed. I don't want to bring out the big guns (Python, Perl, etc) and it seems like sed is pretty powerful in itself (some people have written the dc calculator in it for example). A related problem is inserting 150 spaces in front of every line.
s/^.\{50\}/?????/
That matches the first 50 characters but then what? I can't do output
s/^.\{50\}/ \{50\}/
One could use readline to simulate 150 presses (alt+150, ) but that's lame.
Use the hold space.
/^.\{150\}/ {;# simply echo lines shorter than 150 chars
h;# make a backup
s/^.\{150\}\(.*\)/\1/;# strip 150 characters
x;# swap backup in and remainder out
s/^\(.\{150\}\).*/\1/;# only keep 150 characters
s/./ /g;# replace every character with a space
G;# append newline + remainder to spaces
s/\n//;# strip pesky newline
};
Alternately, you could code a loop (somewhat shorter code):
s/^.\{150\}/x/;# strip 150 characters & mark beginning of line
t l
# if first substitution matched, then loop
d; # else abort and go to next input line
:l
# begin loop
s/^/ /;# insert a space
/^ \{150\}x/!t l
# if line doesn't begin with 150 spaces, loop
s/x//;# strip beginning marker
However, I don't think you can use a loop without sed -f, or finding some other way to escape newlines. Label names seem to run to the end of the line, right through a ;.
Basically you want to do this:
sed -r "s/^.{150}/$(giveme150spaces)/"
The question is just what's the most elegant way to programatically get 150 spaces. A variety of techniques are discussed here: Bash Hacker's Wiki. The simplest is to to use printf '%150s', like this:
sed -r "s/^.{150}/$(printf '%150s')/"
The logic is to iterate the file, print 150 spaces, and then print the rest of the line from 151 till the end using "substringing". eg for first 10 chars.
$ more file
1234567890abc
0987654321def
$ awk '{printf "%10s%s\n", " ",substr($0,11)}' file
abc
def
this is much simpler than crafting regex.
Bash:
while read -r line
do
printf "%10s%s\n" " " "${line:10}"
done <"file"
This might work for you:
sed ':a;/^ \{150\}/!{s/[^ ]/ /;ta}' file
This will replace the first 150 characters of a line with spaces. However if the line is shorter than 150 it replace it all with spaces but retain the original length.
This solution replaces the first 150 characters with spaces or increases the line to 150 spaces long:
sed ':a;/^ \{150\}/!{s/[^ ]/ /;ta;s/^/ /;ba}' file
And this replaces the first 150 characters of a line with spaces but leaves lines shorter untouched:
sed '/.\{150\}/!b:a;/^ \{150\}/!{s/[^ ]/ /;ta}' file