SED command to delete empty lines till the first occurrence of sentence - regex

My input file will be
[emptyline]
[emptyline]
aaa
bbb
[emptyline]
cc
dd
Here [emptyline] indicates blanklines.
And I need an SED command to change this into
aaa
bbb
[emptyline]
cc
dd
That is, I need to delete all the blank lines at the top alone.
I need only SED command since i need to use that in bash script.
Additional info its MAC OSx

You can do it with branching in sed:
sed '/^ *$/d; :a; n; ba' file
A more efficient solution would be to use a range expression, see user2719058's answer for how to do this.
It is even more efficient if you can reduce the need for sed, see gniourf_gniourf's answer for alternatives.
This can be expressed in awk elegantly like this:
awk 'NF {f=1} f' file
Output in both cases:
aaa
bbb
cc
dd
Explanation
Both alternatives work by looking for the first non-empty line.
With sed the pattern /^ *$/d will delete all empty lines in the beginning of the file. What follows is a loop that prints the rest of the file.
awk will update NF for every line, when the line is empty NF is zero. This is exploited for setting the print-flag (f).

If the lines are really empty (no whitespace), I would suggest
sed -n '/./,$p', otherwise sed -n $'/[^ \t]/,$p'. (The $'..' syntax makes bash expand the \t, so you don't need a sed that understands it.)

One funny possibility:
{ sed -n '/./{p;q}' && cat; } < file
And it's really efficient too! (try to benchmark it against the other methods). If you might have some spaces in your first lines, you could do:
{ sed -n '/[^[:space:]]/{p;q}' && cat; } < file
sed does nothing until it reads a character; at this point it prints out the line and exits. Then cat outputs the whole thing; so since there's no more sed filtering, the data flows much faster through cat!
The same with grep:
{ grep -v -m 1 '^$' && cat; } < file
or discarding leading lines with possible spaces:
{ grep -v -m 1 '^[[:space:]]*$' && cat; } < file

A simple one is: sed '1,/^$/d' file
It will delete starting at line 1 up to the last blank line prior to the actual content of file; preserving the other blank lines as desired by OP.

Here is another way of deleting all blank lines at file start using pure BASH way without involving any external utility like awk/sed:
[[ "$(<file)" =~ ^[[:space:]]+(.*)$ ]] && echo "${BASH_REMATCH[1]}"
aaa
bbb
cc
dd

sed -n "H;$ {x;s/^\n*//p;}"
delete all first \n ant take into account that 1st line is maybe not empty (1,/^$/ does not work in this case)

Related

How can I print 2 lines if the second line contains the same match as the first line?

Let's say I have a file with several million lines, organized like this:
#1:N:0:ABC
XYZ
#1:N:0:ABC
ABC
I am trying to write a one-line grep/sed/awk matching function that returns both lines if the NCCGGAGA line from the first line is found in the second line.
When I try to use grep -A1 -P and pipe the matches with a match like '(?<=:)[A-Z]{3}', I get stuck. I think my creativity is failing me here.
With awk
$ awk -F: 'NF==1 && $0 ~ s{print p ORS $0} {s=$NF; p=$0}' ip.txt
#1:N:0:ABC
ABC
-F: use : as delimiter, makes it easy to get last column
s=$NF; p=$0 save last column value and entire line for printing later
NF==1 if line doesn't contain :
$0 ~ s if line contains the last column data saved previously
if search data can contain regex meta characters, use index($0,s) instead to search literally
note that this code assumes input file having line containing : followed by line which doesn't have :
With GNU sed (might work with other versions too, syntax might differ though)
$ sed -nE '/:/{N; /.*:(.*)\n.*\1/p}' ip.txt
#1:N:0:ABC
ABC
/:/ if line contains :
N add next line to pattern space
/.*:(.*)\n.*\1/ capture string after last : and check if it is present in next line
again, this assumes input like shown in question.. this won't work for cases like
#1:N:0:ABC
#1:N:0:XYZ
XYZ
This might work for you (GNU sed):
sed -n 'N;/.*:\(.*\)\n.*\1/p;D' file
Use grep-like option -n to explicitly print lines. Read two lines into the pattern space and print both if they meet the requirements. Always delete the first and repeat.
If you actual Input_file is same as shown example then following may help you too here.
awk -v FS="[: \n]" -v RS="" '$(NF-1)==$NF' Input_file
EDIT: Adding 1 more solution as per Sundeep suggestion too here.
awk -v FS='[:\n]' -v RS= 'index($NF, $(NF-1))' Input_file

Bash - how to put each line within quotation

I want to put each line within quotation marks, such as:
abcdefg
hijklmn
opqrst
convert to:
"abcdefg"
"hijklmn"
"opqrst"
How to do this in Bash shell script?
Using awk
awk '{ print "\""$0"\""}' inputfile
Using pure bash
while read FOO; do
echo -e "\"$FOO\""
done < inputfile
where inputfile would be a file containing the lines without quotes.
If your file has empty lines, awk is definitely the way to go:
awk 'NF { print "\""$0"\""}' inputfile
NF tells awk to only execute the print command when the Number of Fields is more than zero (line is not empty).
I use the following command:
xargs -I{lin} echo \"{lin}\" < your_filename
The xargs take standard input (redirected from your file) and pass one line a time to {lin} placeholder, and then execute the command at next, in this case a echo with escaped double quotes.
You can use the -i option of xargs to omit the name of the placeholder, like this:
xargs -i echo \"{}\" < your_filename
In both cases, your IFS must be at default value or with '\n' at least.
This sed should work for ignoring empty lines as well:
sed -i.bak 's/^..*$/"&"/' inFile
or
sed 's/^.\{1,\}$/"&"/' inFile
Use sed:
sed -e 's/^\|$/"/g' file
More effort needed if the file contains empty lines.
I think the sed and awk are the best solution but if you want to use just shell here is small script for you.
#!/bin/bash
chr="\""
file="file.txt"
cp $file $file."_backup"
while read -r line
do
echo "${chr}$line${chr}"
done <$file > newfile
mv newfile $file
paste -d\" /dev/null your-file /dev/null
(not the nicest looking, but probably the fastest)
Now, if the input may contain quotes, you may need to escape them with backslashes (and then escape backslashes as well) like:
sed 's/["\]/\\&/g; s/.*/"&"/' your-file
This answer worked for me in mac terminal.
$ awk '{ printf "\"%s\",\n", $0 }' your_file_name
It should be noted that the text in double quotes and commas was printed out in terminal, the file itself was unaffected.
I used sed with two expressions to replace start and end of line, since in my particular use case I wanted to place HTML tags around only lines that contained particular words.
So I searched for the lines containing words contained in the bla variable within the text file inputfile and replaced the beginnign with <P> and the end with </P> (well actually I did some longer HTML tagging in the real thing, but this will serve fine as example)
Similar to:
$ bla=foo
$ sed -e "/${bla}/s#^#<P>#" -e "/${bla}/s#\$#</P>#" inputfile
<P>foo</P>
bar
$

delete characters in lines starting with an unique pattern

I have a file consisting of many entries that look like this:
>1761420406686363113470.1
CAAGATTCTGAGATAATCGCGGTTTAAAGTTTCAAATTTGTTTCGGCCGATTCGAAGTCA
i.e. a header line starting with > and many lines of sequence, followed by a header line.
I am trying to write a sed script that goes to only the lines that start with > (not the sequences lines) and deletes all but the first 10 numbers.
There are a lot of similar questions to this, but I can't figure it out. I've been trying variations on this code:
sed 's/^>..........*/^>........../' input.fasta
but clearly am not doing it right..
This might work for you (GNU sed):
sed -r 's/^(>.{10}).*/\1/p;d' file
This deletes all but those lines that are substituted, if you want to retain the sequence lines:
sed -r 's/^(>.{10}).*/\1/' file
should fit the bill.
You have to capture the first 10 characters in parentheses:
sed -e 's/^\(>..........\).*/\1/'
Which can be shortened to
sed -e 's/^\(>.\{10\}\).*/\1/'
as an alternative to sed, use cut
$ echo ">1761420406686363113470.1" | cut -c1-11
>1761420406
To operate on lines starting with an >, wrap it in a bash-while-loop
$ while read line; do if [[ $line == \>* ]]; then cut -c1-11 <<< $line; else echo $line; fi done < input
>1761420406
CAAGATTCTGAGATAATCGCGGTTTAAAGTTTCAAATTTGTTTCGGCCGATTCGAAGTCA
or using awk:
$ awk '{if ($0 ~ />/){print substr($0,0,11)}else{print}}' input
>1761420406
CAAGATTCTGAGATAATCGCGGTTTAAAGTTTCAAATTTGTTTCGGCCGATTCGAAGTCA
Since good sed answers are already posted, here is an `GNU-awk solution.
gawk '/^>/{print gensub(/(.{11}).*/,"\\1","G",$1);next }1' inputFile

Using sed to find and replace within matched substrings

I'd like to use sed to process a property file such as:
java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace
I'd like to replace the .'s and -'s with _'s but only up to the ='s token. The output would be
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
I've tried various approaches including using addresses but I keep failing. Does anybody know how to do this?
What about...
$ echo foo.bar=/bla/bla-bla | sed -e 's/\([^-.]*\)[-.]\([^-.]*=.*\)/\1_\2/'
foo_bar=/bla/bla-bla
This won't work for the case where you have more than 1 dot or dash one the left, though. I'll have to think about it further.
awk makes life easier in this case:
awk -F= -vOFS="=" '{gsub(/[.-]/,"_",$1)}1' file
here you go:
kent$ echo "java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace"|awk -F= -vOFS="=" '{gsub(/[.-]/,"_",$1)}1'
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
if you really want to do with sed (gnu sed)
sed -r 's/([^=]*)(.*)/echo -n \1 \|sed -r "s:[-.]:_:g"; echo -n \2/ge' file
same example:
kent$ echo "java.home=/usr/bin/java
groovy-home=/usr/lib/groovy
workspace.home=/build/me/my-workspace"|sed -r 's/([^=]*)(.*)/echo -n \1 \|sed -r "s:[-.]:_:g"; echo -n \2/ge'
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
In this case I would use AWK instead of sed:
awk -F"=" '{gsub("\\.|-","_",$1); print $1"="$2;}' <file.properties>
Output:
java_home/usr/bin/java
groovy_home/usr/lib/groovy
workspace_home/build/me/my-workspace
This might work for you (GNU sed):
sed -r 's/=/\n&/;h;y/-./__/;G;s/\n.*\n//' file
"You wait ages for a bus..."
This works with any number of dots and hyphens in the line and does not require GNU sed:
sed 'h; s/.*=//; x; s/=.*//; s/[.-]/_/g; G; s/\n/=/' < data
Here's how:
h: save a copy of the line in the hold space
s: throw away everything before the equal sign in the pattern space
x: swap the pattern and hold
s: blow away everything after the = in the pattern
s: replaces dots and hyphens with underscores
G: join the pattern and hold with a newline
s: replace that newline with an equal to glue it all back together
Other way using sed
sed -re 's/(.*)([.-])(.*)=(.*)/\1_\3=\4/g' temp.txt
Output
java_home=/usr/bin/java
groovy_home=/usr/lib/groovy
workspace_home=/build/me/my-workspace
In case there are more than .- on left hand side then this
sed -re ':a; s/^([^.-]+)([\.-])(.*)=/\1_\3=/1;t a' temp.txt

What actually the meaning of "-n" in sed?

According to http://linux.about.com/od/commands/l/blcmdl1_sed.htm
suppress automatic printing of pattern
space
I've tested with or without -n, sed will produce same result
I dont understand what space does it means.
Sed has two places to store text: pattern space and hold space. Pattern space is where each line is put to be processed by sed commands; hold space is an auxiliary place to put some text you may want to use later. You probably will use only pattern space.
Before sed goes to process a line, it is put in the pattern space. Then, sed applies all commands (such as s///) to de pattern space and, by default, prints the resulting text from the pattern space. Let us suppose we have a file myfile with a line like:
The quick brown fox jumps over the lazy dog.
We run the following command:
sed 's/fox/coati/;s/dog/dingo/' myfile
Sed will apply s/fox/coati/ and then s/dog/dingo/ for each line of the file - in this case, the only one we showed above. When it occurs, it will put the line in the pattern space, which will have the following content:
The quick brown fox jumps over the lazy dog.
Then, sed will run the first command. After sed runs the command s/fox/coati/, the content of the pattern space will be:
The quick brown coati jumps over the lazy dog.
Then sed will apply the second command, s/dog/dingo/. After that, the content of the pattern space will be:
The quick brown coati jumps over the lazy dingo.
Note that this only happens in memory - nothing is printed by now.
After all, commands have been applied to the current line, by default, sed will then get the content of the pattern space and print it to the standard output. However, when you give -n as an option to sed, you ask sed not to execute this last step — except if it is explicitly required. So, if you run
sed -n 's/fox/coati/;s/dog/dingo/' myfile
nothing will be printed.
But how could you explicitly request sed to print the pattern space? Well, you can use the p command. When sed finds this command, it will print the content of the pattern space immediately. For example, in the command below we request sed to print the content of the pattern space just after the first command:
sed -n 's/fox/coat/;p;s/dog/dingo/' myfile
The result will be
$ sed -n 's/fox/coati/;p;s/dog/dingo/' myfile
The quick brown coati jumps over the lazy dog.
Note that only fox is replaced. It happens because the second command was not executed before the pattern space was printed. If we want to print the pattern space after both commands, we just put p after the second one:
sed -n 's/fox/coati/;s/dog/dingo/;p' myfile
Another option, if you are using the s/// command, is to pass the p flag to s///:
sed -n 's/fox/coati/;s/dog/dingo/p' myfile
In this case, the line will only be printed if the flagged replacement was executed. It may be very useful!
Just try a sed do-nothing:
sed '' file
and
sed -n '' file
First will print whole file but second will NOT print anything.
This puts sed into quiet mode, where sed will suppress all output except for when explicitly stated by a p command:
-n
--quiet
--silent
By default, sed will print out the pattern space at
the end of each cycle through the script. These
options disable this automatic printing, and sed
will only produce output when explicitly told to
via the p command.
An example of this would be if you wanted to use sed to simulate the actions of grep:
$echo -e "a\nb\nc" | sed -n '/[ab]/ p'
a
b
without the -n you would get an occurrence of c (and two occurrences of a and b)
$ echo "a b c d" | sed "s/a/apple/"
apple b c d
The pattern space is printed implicitly.
$ echo "a b c d" | sed -n "s/a/apple/"
No output.
$ echo "a b c d" | sed -n "s/a/apple/p"
apple b c d
Explicitly print the pattern space.