This question already has answers here:
How can I print all the characters until a certain pattern (excluding the pattern itself) using grep/awk/sed?
(2 answers)
Closed 2 years ago.
I have a file, my_file.
The contents of the file look like this:
4: something
5: something
7: another thing
I want to print out the following:
4
5
7
Basically I want to get all the numbers before the character :
Here is what I tried:
grep -i "^[0-9]+(?=(:)" my_file
This returned nothing. How can I change this command to make it work?
This is a use-case for awk:
$ awk -F":" '{print $1}' < inputfile
because you're using : as a field delimiter.
Try this:
grep -Eo "^[0-9]+" my_file # you can use either E (extended) or P (pearl) regular expressions
-o is for only matching
We also need to specify that we are using regex.
Both of the following will work:
-E extended regular expressions
-P pearl regular expressions
Breakdown:
^ signifies the start
[0-9] match a digit
+ match 1 or more from [0-9]
Output:
4
5
7
Using grep
grep -oE '^[0-9]+:' my_file | tr -d ':'
using sed
sed 's#:.*$##g' my_file
Demo :
$cat test.txt
4: something
5: something
7: another thing
$sed 's#:.*$##g' test.txt
4
5
7
$grep -oE '^[0-9]+:' test.txt | tr -d ':'
4
5
7
Related
This question already has answers here:
Using awk to count the number of occurrences of a word in a column
(6 answers)
Closed 3 years ago.
I have a text file that and I want to count the ocurrence of each matches of a regex using grep
I have a text file like:
# file.txt
72=JABBA123
72=JABBA123
72=THE5555
72=THE5555
72=THE5555
72=HUTT66
I want to count using grep like:
grep -c -Op "72=(\w+\d+)" file.txt
Than the result should be like:
JABBA123 2
THE5555 3
HUTT66 1
With GNU grep:
grep -Po "[[:alpha:]]+[[:digit:]]+" file | uniq -c
or
grep -Po '=\K.*' file | uniq -c
\K: removes matching part before \K
Output:
2 JABBA123
3 THE5555
1 HUTT66
Possibly a sort has to be inserted here.
Say I have this file data.txt:
a=0,b=3,c=5
a=2,b=0,c=4
a=3,b=6,c=7
I want to use grep to extract 2 columns corresponding to the values of a and c:
0 5
2 4
3 7
I know how to extract each column separately:
grep -oP 'a=\K([0-9]+)' data.txt
0
2
3
And:
grep -oP 'c=\K([0-9]+)' data.txt
5
4
7
But I can't figure how to extract the two groups. I tried the following, which didn't work:
grep -oP 'a=\K([0-9]+),.+c=\K([0-9]+)' data.txt
5
4
7
I am also curious about grep being able to do so. \K "removes" the previous content that is stored, so you cannot use it twice in the same expression: it will just show the last group. Hence, it should be done differently.
In the meanwhile, I would use sed:
sed -r 's/^a=([0-9]+).*c=([0-9]+)$/\1 \2/' file
it catches the digits after a= and c=, whenever this happens on lines starting with a= and not containing anything else after c=digits.
For your input, it returns:
0 5
2 4
3 7
You could try the below grep command. But note that , grep would display each match in separate new line. So you won't get the format like you mentioned in the question.
$ grep -oP 'a=\K([0-9]+)|c=\K([0-9]+)' file
0
5
2
4
3
7
To get the mentioned format , you need to pass the output of grep to paste or any other commands .
$ grep -oP 'a=\K([0-9]+)|c=\K([0-9]+)' file | paste -d' ' - -
0 5
2 4
3 7
use this :
awk -F[=,] '{print $2" "$6}' data.txt
I am using the separators as = and ,, then spliting on them
I have the following script to remove all lines before a line which matches with a word:
str='
1
2
3
banana
4
5
6
banana
8
9
10
'
echo "$str" | awk -v pattern=banana '
print_it {print}
$0 ~ pattern {print_it = 1}
'
It returns:
4
5
6
banana
8
9
10
But I want to include the first match too. This is the desired output:
banana
4
5
6
banana
8
9
10
How could I do this? Do you have any better idea with another command?
I've also tried sed '0,/^banana$/d', but seems it only works with files, and I want to use it with a variable.
And how could I get all lines before a match using awk?
I mean. With banana in the regex this would be the output:
1
2
3
This awk should do:
echo "$str" | awk '/banana/ {f=1} f'
banana
4
5
6
banana
8
9
10
sed -n '/^banana$/,$p'
Should do what you want. -n instructs sed to print nothing by default, and the p command specifies that all addressed lines should be printed. This will work on a stream, and is different than the awk solution since this requires the entire line to match 'banana' exactly whereas your awk solution merely requires 'banana' to be in the string, but I'm copying your sed example. Not sure what you mean by "use it with a variable". If you mean that you want the string 'banana' to be in a variable, you can easily do sed -n "/$variable/,\$p" (note the double quotes and the escaped $) or sed -n "/^$variable\$/,\$p" or sed -n "/^$variable"'$/,$p'. You can also echo "$str" | sed -n '/banana/,$p' just like you do with awk.
Just invert the commands in the awk:
echo "$str" | awk -v pattern=banana '
$0 ~ pattern {print_it = 1} <--- if line matches, activate the flag
print_it {print} <--- if the flag is active, print the line
'
The print_it flag is activated when pattern is found. From that moment on (inclusive that line), you print lines when the flag is ON. Previously the print was done before the checking.
cat in.txt | awk "/banana/,0"
In case you don't want to preserve the matched line then you can use
cat in.txt | sed "0,/banana/d"
Say the input is:
">"1aaa
2
3
4
">"5bbb
6
7
">"8ccc
9
">"10ddd
11
12
I want this output (per example for the matching pattern "bbb"):
">"5bbb
6
7
I had tried with grep:
grep -A 2 -B 0 "bbb" file.txt > results.txt
This works. However, the number of lines between ">"5bbb and ">"8ccc are variable. Does anyone knows how to achieve that using Unix command line tools?
With awk you could simply using a flag like so:
$ awk '/^">"/{f=0}/bbb/{f=1}f' file
">"5bbb
6
7
You could also parametrize the pattern like so:
$ awk '/^">"/{f=0}$0~pat{f=1}f' pat='aaa' file
">"1aaa
2
3
4
Explanation:
/^">"/ # Regular expression that matches lines starting ">"
{f=0} # If the regex matched unset the print flag
/bbb/ # Regular expression to match the pattern bbb
{f=1} # If the regex matched set the print flag
f # If the print flag is set then print the line
Something like this should do it:
sed -ne '/bbb/,/^"/ { /bbb/p; /^[^"]/p; }' file.txt
That is:
for the range of lines between matching /bbb/ and /^"/
if the line matches /bbb/ print it
if the line doesn't start with " print it
otherwise nothing else is printed
This might work for you (GNU sed):
sed '/^"/h;G;/\n.*bbb/P;d' file
A simplified example of what I want to do:
I have a file: input.txt which looks like
a 2 4 b
a 3 8 b
c 9 4 d
a 3 4 8 b
and a script: add.sh which takes command-line parameters and returns their sum
I want to search input.txt for all instances of the pattern 'a (.*) b' where I pass the (.*) part as a command line parameter to add.sh.
For example, I want to do something like sed 's/a \(.*\) b/a {add.sh \1} b/g' input.txt
(that of course doesn't work).
So the output should look like
a 6 b
a 11 b
c 9 4 d
a 15 b
What would be the easiest way to do this?
Thanks
perl -pe 's/a (.*) b/"a ".`add.sh $1`." b"/eg' input.txt
Just make sure that add.sh doesn't output a newline.
And if perl isn't an option, you could
script it something like this:
grep -e '^a .* b$' input.txt | sed -e 's/a \(.*\) b/\1/g' | while read LINE; do ./add.sh $LINE; done
I realized the above doesn't solve your problem, I just focused on your sed expression.
However, if you are keen on solving this problem using another shell script, it would probably look something like this:
cat input.txt | while read LINE; do
if [[ "$LINE" =~ ^a (.*) b$ ]]; then
echo -n "a "
add.sh ${BASH_REMATCH[1]}
echo " b"
else
echo $LINE
fi
done
If add.sh is:
#!/bin/sh
arg1=$1
nums=$2
shift 2
for i in $nums
do
sum=$((sum+i))
done
echo "$arg1 $sum $#"
then you could do:
sed 's/^\([^ ]* \)\(.*\)\( [^ ]*\)$/\1\"\2\"\3/' input.txt | xargs -L 1 ./add.sh
which would add the numbers on every line. To add them only for lines that start with "a" and end with "b" use this:
sed 's/^a \(.*\) b$/a \"\1\" b/' input.txt | xargs -L 1 ./add.sh
The "c 9 4 d" line is still processed by add.sh but the sed command doesn't add any quotes, so the script sees only "9" as $2 and so the sum is only done once with the result as "9". The "4" is seen as part of the remainder of $#.