sed: delete everything that starts with $p, but not just $p

sed: delete everything that starts with $p, but not just $p - regex

I'm trying to find a sed command that will delete every instance of a wrod that begins with another word, but not the word itself. So if I have
aardvark
aardvarky
aardvarkiest
I want to delete aardvarky and aardvarkiest, but not aardvark.
I tried
sed -n "/^$p.*/ d"
hoping to do some kind of regex that meant starting with $p and then some characters *, but it didn't seem to work.

This deletes all lines that start with $p and have at least one more character:
$ sed "/^$p./d" file
aardvark
To change the file in place, use the -i option. With GNU sed:
sed -i "/^$p./d" file
With BSD (OSX) sed:
sed -i "" "/^$p./d" file
Discussion
Consider:
sed -n "/^$p.*/ d"
This command will print nothing: -n means print nothing unless explicitly asked to and there is no command with an explicit print (p).
Further, * means zero or move of the preceding character. Thus, $p.* matches $p also.
We could use:
$ sed "/^$p.\+/d" file
aardvark
\+ means one or more of the preceding character. However, the \+ is not useful because any line that matches ^$p.\+ also matches the simpler ^$p. (and vice versa).
Warning
The use of shell variables in sed commands is potentially dangerous. As an example, the following writes a file to the current directory:
p=$'a/w hi.there\n/'; sed "/^$p.\+/d" file
A shell variable should not be used in a sed command unless the shell variable is created by code that is trusted.

use grep as below to keep all lines except aardvark
grep -v -w 'aardvark' file
if you want to delete everything except aardvark:
grep -w aardvark file

This might work for you (GNU sed):
sed -i /^'"$word"'\B/d' file
This deletes any line that begins with $word but does not end on a word boundary.

Related

removing unmatched lines with SED

I'm trying to remove everything but 3 separate lines with specific matching pattern and leave just the 3 lines I want
Here is my code;
sed -n '/matching pattern/matching pattern/matching pattern/p' > file.txt

If you have multiple commands on the same line, you need to separate the commands by a ;:
sed -n '/matching pattern/p;/matching pattern2/p;/matching pattern3/p' file
Alternatively you can put them onto separate lines:
sed -n '/matching pattern/p
/matching pattern2/p
/matching pattern3/p' file
Beside that, you can also use regex alternation:
sed -rn '/(pattern|pattern2|pattern3)/p' file
or (better) use grep:
grep -E '(pattern|pattern2|pattern3)' file
However, this might get messy if the patterns getting longer and more complicated.

awk to the rescue!
awk '/pattern1/ || /pattern2/ || /pattern3/' filename
I think it's cleaner than alternatives.

Sed with Deletion
There's always more than one way to do this sort of thing, but one useful sed programming pattern is using alternation with deletion. For example:
# BSD sed
sed -E '/root|daemon|nobody/!d' /etc/passwd
# GNU sed
sed -r '/root|daemon|nobody/!d' /etc/passwd
This makes it possible to express ideas like "delete everything except for the listed terms." Even when expressions are functionally equivalent, it can be helpful to use a construct that most closely matches the idea you're trying to convey.

This might work for you (GNU sed):
sed '/pattern1/b;/pattern2/b;/pattern3/b;d' file
The normal flow of sed is to print what remains in the pattern space after processing. Therefore if the required pattern is in the pattern space let sed do its thing otherwise delete the line.
N.B. the b command is like a goto and if it has no following identifier, it means break out of any further sed commands and print (or not print if the -n option is in action) the contents of the pattern space.

If I understood you correctly:
sed -n '/\(pattern1\|pattern2\|pattern3\)/p' file > newfile

replace \n\t pattern in a file

ok I have a recordset that is pipe delimited
I am checking the number of delimiters on each line as they have started including | in the data (and we cannot change the incoming file)
while using a great awk to parse out the bad records into a bad file for processing we discovered that some data has a new line character (\n) (followed by a tab (\t) )
I have tried sed to replace \n\t with just \t but it always either changes the \n\t with \r\n or replaces all the \n (file is \r\n for line end)
yes to answer some quetions below...
files can be large 200+ mb
the line feed is in the data spuriously (not every row.. but enought to be a pain)
I have tried
sed ':a;N;$!ba;s/\n\t/\t/g' Clicks.txt >test2.txt
sed 's/\n\t/\t/g' Clicks.txt >test1.txt
sample record
12345|876|testdata\n
\t\t\t\tsome text|6209\r\n
would like
12345|876|testdata\t\t\t\tsome text|6209\r\n
please help!!!
NOTE must be in KSH (MKS KSH to be specific)
i don't care if it is sed or not.. just need to correct the issue...
several of the solutions below woke on small data or do part of the job...
as an aside i have started playing with removing all linefeeds and then replacing the caraige return with carrige return linefeed.. but can't quite get that to work either
I have tried TR but since it is single char it only does part of the issue
tr -d '\n' test.txt
leave me with a \r ended file....
need to get it to \r\n (and no-no dos2unix or unix2dos exists on this system)

If the input file is small (and you therefore don't mind processing it twice), you can use
cat input.txt | tr -d "\n" | sed 's/\r/\r\n/g'
Edit:
As I should have known by now, you can avoid using cat about everywhere.
I had reviewed my old answers in SO for UUOC, and carefully checked for a possible filename in the tr usage. As Ed pointed out in his comment, cat can be avoided here as well:
The command above can be improved by
tr -d "\n" < input.txt | sed 's/\r/\r\n/g'

It's unclear what you are trying to do but given this input file:
$ cat -v file
12345|876|testdata
some text|6209^M
Is this what you're trying to do:
$ gawk 'BEGIN{RS=ORS="\r\n"} {gsub(/\n/,"")} 1' file | cat -v
12345|876|testdata some text|6209^M
The above uses GNU awk for multi-char RS. Alternatively with any awk:
$ awk '{rec = rec $0} /\r$/{print rec; rec=""}' file | cat -v
12345|876|testdata some text|6209^M
The cat -vs above are just there to show where the \rs (^Ms) are.

Note that the solution below reads the input file as a whole into memory, which won't work for large files.
Generally, Ed Morton's awk solution is better.
Here's a POSIX-compliant sed solution:
tab=$(printf '\t')
sed -e ':a' -e '$!{N;ba' -e '}' -e "s/\n${tab}/${tab}/g" Clicks.txt
Keys to making this POSIX-compliant:
POSIX sed doesn't recognize \t as an escape sequence, so a literal tab - via variable $tab, created with tab=$(printf '\t') - must be used in the script.
POSIX sed - or at least BSD sed - requires label names (such as :a and the a in ba above) - whether implied or explicit - to be terminated with an actual newline, or, alternatively, terminated implicitly by continuing the script in the next -e option, which is the approach chosen here.
-e ':a' -e '$!{N;ba' -e '}' is an established Sed idiom that simply "slurps" the entire input file (uses a loop to read all lines into its buffer first). This is the prerequisite for enabling subsequent string substitution across input lines.
Note how the option-argument for the last -e option is a double-quoted string so that the references to shell variable $tab are expanded to actual tabs before Sed sees them. By contrast, \n is the one escape sequence recognized by POSIX sed itself (in the regex part, not the replacement-string part).
Alternatively, if your shell supports ANSI C-quoted strings ($'...'), you can use them directly to produce the desired control characters:
sed -e ':a' -e '$!{N;ba' -e '}' -e $'s/\\n\t/\\t/g' Clicks.txt
Note how the option-argument for the last -e option is an ANSI C-quoted string, and how literal \n (which is the one escape sequence that is recognized by POSIX Sed) must then be represented as \\n. By contrast, $'...' expands \t to an actual tab before Sed sees it.

Thanks everyone for all your suggestions.. After looking at all the answers.. None quite did the trick... After some thought... I came up with
tr -d '\n' <Clicks.txt | tr '\r' '\n' | sed 's/\n/\r\n/g' >test.txt
Delete all newlines
translate all Carriage return to newline
Sed replace all newline with Carriage return line feed
This works in seconds on a 32mb file.

A sed command to swap first and last character of each line

I want to write a one liner sed command to swap first and last character of every line of file. The below shown command is not working
sed 's/\(.\)\(.+\)\(.\)/\3\2\1/' input.txt
I even tried adding start of line and end of line characters
sed 's/^\(.\)\(.+\)\(.\)$/\3\2\1/' input.txt
It doesn't seem to match anything in the file.

sed -E 's/(.)(.+)(.)/\3\2\1/' input.txt

You need to escape the +,
sed 's/^\(.\)\(.\+\)\(.\)$/\3\2\1/' input.txt

If you like to try some other, here is a gnu awk version
awk '{a=$1;$1=$NF;$NF=a}1' FS= OFS= input.txt
This sets a to the first character, then sets first to last and last to a
It needs gnu awk, since settings FS to nothing is not in standard awk

This works portable:
abcd | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
you can use the .*. Prints
dbca
also works with the ad too, like
echo ad | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
prints
da
The .+ isn't known for every sed e.g. for example it didn't work on OS X. Therefore I recommending to use .* or simulating the .+ with ..*, like
echo ad | sed 's/^\(.\)\(..*\)\(.\)$/\3\2\1/'
prints
ad #not swaps

echo 'are' | sed 's/\(.\)\(.*\)\(.\)/\3\2\1/'
No need of ^ nor $ becasue sed take the biggest possible by default (so the whole line)
use * instead of + because with the + you need at least a 3 char line to works where a 2 char line still should swap start and end.

Removing lines from a file that don't match a pattern using sed

I want to remove all the lines from a file that don't have the form:
something.something,something,something
For example if the file was the following:
A sentence, some words
ABCD.CP3,GHD,HDID
Hello. How are you?
A.B,C,D
dbibb.yes,whoami,words
I would be left with:
ABCD.CP3,GHD,HDID
A.B,C,D
dbibb.yes,whoami,words
I have tried to branch to the end of the sed script if I match the pattern I don't want to delete but continue and delete the line if it doesn't match:
cp $file{,.tmp}
sed "/^.+\..+,.+,.+$/b; /.+/d" "$file.tmp" > $file
rm "$file.tmp"
but this doesn't seem to have any affect at all.
I suppose I could read the file line by line, check if matches the pattern, and output it to a file if it does, but I'd like to do it using sed or similar.

You can use grep successfully:
grep -E '^[^.]+\.[^,]+,[^,]+,[^,]+$' file > temp
mv temp file

grep -E '^[^.]+\.[^.]+(,[^,]+){2}$'

Instead of deleting the lines which didn't satisfies the pattern, you could print the lines that matches this something.something,something,something pattern.
Through sed,
$ sed -n '/^[^.]*\.[^,]*,[^,]*,[^,.]*$/p' file
ABCD.CP3,GHD,HDID
A.B,C,D
dbibb.yes,whoami,words
Use inline edit option -i[suffix] to save the changes made.
sed -ni.bak '/^[^.]*\.[^,]*,[^,]*,[^,.]*$/p' file
Note: -i[suffix] make a backup if suffix is provided.
Through awk,
$ awk '/^[^.]*\.[^,]*,[^,]*,[^,.]*$/{print}' file
ABCD.CP3,GHD,HDID
A.B,C,D
dbibb.yes,whoami,words

sed - Commenting a line matching a specific string AND that is not already commented out

I have the following test file
AAA
BBB
CCC
Using the following sed I can comment out the BBB line.
# sed -e '/BBB/s/^/#/g' -i file
I'd like to only comment out the line if it does not already has a # at the begining.
# sed -e '/^#/! /BBB/s/^/#/g' file
sed: -e expression #1, char 7: unknown command: `/'
Any ideas how I can achieve this?

Assuming you don't have any lines with multiple #s this would work:
sed -e '/BBB/ s/^#*/#/' -i file
Note: you don't need /g since you are doing at most one substitution per line.

Another solution with the & special character which references the whole matched portion of the pattern space. It's a bit simpler/cleaner than capturing and referencing a regexp group.
sed -i 's/^[^#]*BBB/#&/' file

I find this solution to work the best.
sed -i '/^[^#]/ s/\(^.*BBB.*$\)/#\ \1/' file
It doesn't matter how many "#" symbols there are, it will never add another one. If the pattern you're searching for does not include a "#" it will add it to the beginning of the line, and it will also add a trailing space.
If you don't want a trailing space
sed -i '/^[^#]/ s/\(^.*BBB.*$\)/#\1/' file

Assuming the BBB is at the beginning of a line, I ended up using an even simpler expression:
sed -e '/^BBB/s/^/#/' -i file
One more note for the future me. Do not overlook the -i. Because this won't work: sed -e "..." same_file > same_file.

sed -i '/![^#]/ s/\(^.*BBB.*$\)/#\ \1/' file
This doesn't work for me with the keyword *.sudo, no comments at all...
Ony the syntax below works:
sed -e '/sudo/ s/^#*/#/' file

Actually, you don't need the exclamation sign (!) as the caret symbol already negates whatever is inside the square brackets and will ignore all hash symbol from your search. This example worked for me:
sed -i '/[^#]/ s/\(^.*BBB.*$\)/#\ \1/' file

Comment all "BBB", if it's haven't comment yet.
sed -i '/BBB/s/^#\?/#/' file

If BBB is at the beginning of the line:
sed 's/^BBB/#&/' -i file
If BBB is in the middle of the line:
sed 's/^[^#]*BBB/#&/' -i file

I'd usually supply sed with -i.bak to backup the file prior to making changes to the original copy:
sed -i.bak '/BBB/ s/^#*/#/' file
This way when done, I have both file and file.bak and I can decide to delete file.bak only after I'm confident.

If you want to comment out not only exact matches for 'BBB' but also lines that have 'BBB' somewhere in the middle, you can go with following solution:
sed -E '/^([^#].*)?BBB/ s/^/#/'
This won't change any strings that are already commented out.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

sed: delete everything that starts with $p, but not just $p - regex

use grep as below to keep all lines except aardvark grep -v -w 'aardvark' file if you want to delete everything except aardvark: grep -w aardvark file

This might work for you (GNU sed): sed -i /^'"$word"'\B/d' file This deletes any line that begins with $word but does not end on a word boundary.

Related

removing unmatched lines with SED

replace \n\t pattern in a file

A sed command to swap first and last character of each line

Removing lines from a file that don't match a pattern using sed

sed - Commenting a line matching a specific string AND that is not already commented out

Categories

Resources