How to delete lines before a match perserving it? - regex

I have the following script to remove all lines before a line which matches with a word:
str='
1
2
3
banana
4
5
6
banana
8
9
10
'
echo "$str" | awk -v pattern=banana '
print_it {print}
$0 ~ pattern {print_it = 1}
'
It returns:
4
5
6
banana
8
9
10
But I want to include the first match too. This is the desired output:
banana
4
5
6
banana
8
9
10
How could I do this? Do you have any better idea with another command?
I've also tried sed '0,/^banana$/d', but seems it only works with files, and I want to use it with a variable.
And how could I get all lines before a match using awk?
I mean. With banana in the regex this would be the output:
1
2
3

This awk should do:
echo "$str" | awk '/banana/ {f=1} f'
banana
4
5
6
banana
8
9
10

sed -n '/^banana$/,$p'
Should do what you want. -n instructs sed to print nothing by default, and the p command specifies that all addressed lines should be printed. This will work on a stream, and is different than the awk solution since this requires the entire line to match 'banana' exactly whereas your awk solution merely requires 'banana' to be in the string, but I'm copying your sed example. Not sure what you mean by "use it with a variable". If you mean that you want the string 'banana' to be in a variable, you can easily do sed -n "/$variable/,\$p" (note the double quotes and the escaped $) or sed -n "/^$variable\$/,\$p" or sed -n "/^$variable"'$/,$p'. You can also echo "$str" | sed -n '/banana/,$p' just like you do with awk.

Just invert the commands in the awk:
echo "$str" | awk -v pattern=banana '
$0 ~ pattern {print_it = 1} <--- if line matches, activate the flag
print_it {print} <--- if the flag is active, print the line
'
The print_it flag is activated when pattern is found. From that moment on (inclusive that line), you print lines when the flag is ON. Previously the print was done before the checking.

cat in.txt | awk "/banana/,0"
In case you don't want to preserve the matched line then you can use
cat in.txt | sed "0,/banana/d"

Related

Inserting a "," in a particular position of a text

(I put a exact text and command I executed so would be looking a bit messy.)
I have a .TXT file looking like
11111111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111111
And outcome I am looking for would be like
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
Command I have tried is
sed -i 's/\(.\{14\}\)\(.\{7\}\)\(.\{2\}\)\(.\{1\}\)\(.\{3\}\)\(.\{13\}\)\(.\{1\}\)\(.\{8\}\)\(.\{16\}\)\(.\{3\}\)/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,/' SOME.TXT
And outcome I have got was
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,1111111111111110,111
I have literally no idea why these 0s suddenly popped out and ' , ' doesn't appear in the position where I command even though it worked half way.
Is this a bug or something in sed command?
It is printing 0 in output because sed capture groups and their back-references can be up to 9 only and \10 is interpreted as \1 followed by literal 0.
You can solve it easily using FIELDWIDTHS feature of gnu-awk:
awk -v OFS=, 'BEGIN { FIELDWIDTHS = "14 7 2 1 3 13 1 8 16 3 *" } {$1 = $1} 1' file
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
Just for academic exercise, here is a working sed to solve this using 2 substitutions:
sed -E 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.+)/\1,\2,\3,\4,\5,\6,\7,\8,\9/; s/(.+,.{16})(.{3})(.*)/\1,\2,\3/' file
sed can't reference capture groups > 9, Perl can:
perl -i -pe 's/(.{14})(.{7})(.{2})(.)(.{3})(.{13})(.)(.{8})(.{16})(.{3})/$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,/' SOME.TXT
If you insist to use sed, you can do something like:
sed 's/./&,/68;s/./&,/65;s/./&,/49;s/./&,/41;s/./&,/40;s/./&,/27;s/./&,/24;s/./&,/23;s/./&,/21;s/./&,/14' test.txt
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111
11111111111111,1111111,11,1,111,1111111111111,1,11111111,1111111111111111,111,111

Grep everything before a specific character [duplicate]

This question already has answers here:
How can I print all the characters until a certain pattern (excluding the pattern itself) using grep/awk/sed?
(2 answers)
Closed 2 years ago.
I have a file, my_file.
The contents of the file look like this:
4: something
5: something
7: another thing
I want to print out the following:
4
5
7
Basically I want to get all the numbers before the character :
Here is what I tried:
grep -i "^[0-9]+(?=(:)" my_file
This returned nothing. How can I change this command to make it work?
This is a use-case for awk:
$ awk -F":" '{print $1}' < inputfile
because you're using : as a field delimiter.
Try this:
grep -Eo "^[0-9]+" my_file # you can use either E (extended) or P (pearl) regular expressions
-o is for only matching
We also need to specify that we are using regex.
Both of the following will work:
-E extended regular expressions
-P pearl regular expressions
Breakdown:
^ signifies the start
[0-9] match a digit
+ match 1 or more from [0-9]
Output:
4
5
7
Using grep
grep -oE '^[0-9]+:' my_file | tr -d ':'
using sed
sed 's#:.*$##g' my_file
Demo :
$cat test.txt
4: something
5: something
7: another thing
$sed 's#:.*$##g' test.txt
4
5
7
$grep -oE '^[0-9]+:' test.txt | tr -d ':'
4
5
7

using sed to insert whitespaces between a number and word

I have a series of files that uses fixed with delimiting, instead of comma separated delimiting. They all look like this:
2015/09/29 659027 RIH619 25 105.80IN921186
2015/09/29 659027 RIH619 25 105.80IN921186
2015/09/29 659027 RIH619 25 105.80IN921186
2015/09/29 659027 RIH619 25 105.80IN921186
I would like to replace all the spaces with commas. I have a piece of code that accomplish this:
sed -r 's/^\s+//;s/\s+/,/g'
After running the code I get this result:
2015/09/29,659027,RIH619,25,105.80IN921186
2015/09/29,659027,RIH619,25,105.80IN921186
2015/09/29,659027,RIH619,25,105.80IN921186
2015/09/29,659027,RIH619,25,105.80IN921186
My problem is the files I get doesn't have a space between the amount and the reference. My output needs to look like this:
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
What I tried is:
sed -r 's/^\s+//;s/\.\d\d\D+/\.\d\d,\D/;s/\s+/,/g'
But it didn't seem to do anything
with tr and sed
tr ' ' ',' <file | sed -r 's/(\.[0-9]{2})/\1,/'
You can use this single sed for both:
sed -r 's/[[:blank:]]+/,/g; s/([[:digit:]])([[:alpha:]])/\1,\2/g' file
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
2015/09/29,659027,RIH619,25,105.80,IN921186
([[:digit:]]) matches a digit and captures it in group#1
([[:alpha:]]) matches an alphabet and captures it in group#2
\1,\2 places a comma between 2 groups.
awk has fixed field width support that is good for this sort of thing:
$ echo "2015/09/29 659027 RIH619 25 105.80IN921186" |
awk 'BEGIN { FIELDWIDTHS="10 1 6 1 6 1 2 1 6 8"; OFS="," }{ print $1,$3,$5,$7,$9,$10 }'
2015/09/29,659027,RIH619,25,105.80,IN921186

Awk to skip the blank lines

The output of my script is tab delimited using awk as :
awk -v variable=$bashvariable '{print variable"\t single\t" $0"\t double"}' myinfile.c
The awk command is run in a while loop which updates the variable value and the file myinfile.c for every cycle.
I am getting the expected results with this command .
But if the inmyfile.c contains a blank line (it can contain) it prints no relevant information. can I tell awk to ignore the blank line ?
I know it can be done by removing the blank lines from myinfile.c before passing it on to awk .
I am in knowledge of sed and tr way but I want awk to do it in the above mentioned command itself and not a separate solution as below or a piped one.
sed '/^$/d' myinfile.c
tr -s "\n" < myinfile.c
Thanks in advance for your suggestions and replies.
There are two approaches you can try to filter out lines:
awk 'NF' data.txt
and
awk 'length' data.txt
Just put these at the start of your command, i.e.,
awk -v variable=$bashvariable 'NF { print variable ... }' myinfile
or
awk -v variable=$bashvariable 'length { print variable ... }' myinfile
Both of these act as gatekeepers/if-statements.
The first approach works by only printining out lines where the number of fields (NF) is not zero (i.e., greater than zero).
The second method looks at the line length and acts if the length is not zero (i.e., greater than zero)
You can pick the approach that is most suitable for your data/needs.
You could just add
/^\s*$/ {next;}
To the front of your script that will match the blank lines and skip the rest of the awk matching rules. Put it all together:
awk -v variable=$bashvariable '/^\s*$/ {next;} {print variable"\t single\t" $0"\t double"}' myinfile.c
may be you could try this out:
awk -v variable=$bashvariable '$0{print variable"\t single\t" $0"\t double"}' myinfile.c
Try this:
awk -v variable=$bashvariable '/^.+$/{print variable"\t single\t" $0"\t double"}' myinfile.c
I haven't seen this solution, so: awk '!/^\s*$/{print $1}' will run the block for all non-empty lines.
\s metacharacter is not available in all awk implementations, but you can also write !/^[ \t]*$/.
https://www.gnu.org/software/gawk/manual/gawk.html
\s Matches any space character as defined by the current locale. Think of it as shorthand for ‘[[:space:]]’.
Based on Levon's answer, you may just add | awk 'length { print $1 }' to the end of the command.
So change
awk -v variable=$bashvariable '{ whatever }' myinfile.c
to
awk -v variable=$bashvariable '{ whatever }' myinfile.c | awk 'length { print $1 }'
In case this doesn't work, use | awk 'NF { print $1 }' instead.
another awk way of only trimming out actually zero length lines but keep the ones with only spaces tabs is this :
awk 8 RS=
just doing awk NF trims out lines 3 (zero length) and 5 (spaces and tabs) …..
1 abc
2 def
3
4 3591952
5
6 93253
1 abc
2 def
3 3591952
4
5 93253
1 abc
2 def
3 3591952
4 93253
but the RS= approach keeps line 5 for u:
1 abc
2 def
3 3591952
4
5 93253
** lines with \013 \v VT :: \014 \f FF :: \015 \r CR aren't skipped by default FS = " ", despite them also belonging to POSIX [[:space:]]

complex text replace from the command line

A simplified example of what I want to do:
I have a file: input.txt which looks like
a 2 4 b
a 3 8 b
c 9 4 d
a 3 4 8 b
and a script: add.sh which takes command-line parameters and returns their sum
I want to search input.txt for all instances of the pattern 'a (.*) b' where I pass the (.*) part as a command line parameter to add.sh.
For example, I want to do something like sed 's/a \(.*\) b/a {add.sh \1} b/g' input.txt
(that of course doesn't work).
So the output should look like
a 6 b
a 11 b
c 9 4 d
a 15 b
What would be the easiest way to do this?
Thanks
perl -pe 's/a (.*) b/"a ".`add.sh $1`." b"/eg' input.txt
Just make sure that add.sh doesn't output a newline.
And if perl isn't an option, you could
script it something like this:
grep -e '^a .* b$' input.txt | sed -e 's/a \(.*\) b/\1/g' | while read LINE; do ./add.sh $LINE; done
I realized the above doesn't solve your problem, I just focused on your sed expression.
However, if you are keen on solving this problem using another shell script, it would probably look something like this:
cat input.txt | while read LINE; do
if [[ "$LINE" =~ ^a (.*) b$ ]]; then
echo -n "a "
add.sh ${BASH_REMATCH[1]}
echo " b"
else
echo $LINE
fi
done
If add.sh is:
#!/bin/sh
arg1=$1
nums=$2
shift 2
for i in $nums
do
sum=$((sum+i))
done
echo "$arg1 $sum $#"
then you could do:
sed 's/^\([^ ]* \)\(.*\)\( [^ ]*\)$/\1\"\2\"\3/' input.txt | xargs -L 1 ./add.sh
which would add the numbers on every line. To add them only for lines that start with "a" and end with "b" use this:
sed 's/^a \(.*\) b$/a \"\1\" b/' input.txt | xargs -L 1 ./add.sh
The "c 9 4 d" line is still processed by add.sh but the sed command doesn't add any quotes, so the script sees only "9" as $2 and so the sum is only done once with the result as "9". The "4" is seen as part of the remainder of $#.