Grep first line which contain a date

Grep first line which contain a date - regex

I'm trying to fetch the first line in a log file which contain a date.
Here is an example of the log file :
SOME
LOG
2021-1-1 21:50:19.0|LOG|DESC1
2021-1-4 21:50:19.0|LOG|DESC2
2021-1-5 21:50:19.0|LOG|DESC3
2021-1-5 21:50:19.0|LOG|DESC4
In this context I need to get the following line:
2021-1-1 21:50:19.0|LOG|DESC1
An other log file example :
SOME
LOG
21-1-3 21:50:19.0|LOG|DESC1
21-1-3 21:50:19.0|LOG|DESC2
21-1-4 21:50:19.0|LOG|DESC3
21-1-5 21:50:19.0|LOG|DESC4
I need to fetch :
21-1-3 21:50:19.0|LOG|DESC1
At the moment I tried the following command :
cat /path/to/file | grep "$(date +"%Y-%m-%d")" | tail -1
cat /path/to/file | grep "$(date +"%-Y-%-m-%-d")" | tail -1
cat /path/to/file | grep -E "[0-9]+-[0-9]+-[0-9]" | tail -1

In case you are ok with awk, could you please try following. This will find the matched regex first line and exit from program, which will be faster since its NOT reading whole Input_file.
awk '
/^[0-9]{2}([0-9]{2})?-[0-9]{1,2}-[0-9]{1,2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]+/{
print
exit
}' Input_file

Using sed, being not too concerned about exactly how many digits are present:
sed -En '/^[0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+[.][0-9]+[|]/ {p; q}' file

$ grep -m1 '^[0-9]' file1
2021-1-1 21:50:19.0|LOG|DESC1
$ grep -m1 '^[0-9]' file2
21-1-3 21:50:19.0|LOG|DESC1
If that's not all you need then edit your question to provide more truly representative sample input/output.

A simple grep with -m 1 (to exit after finding first match):
grep -m1 -E '^([0-9]+-){2}[0-9]+ ([0-9]{2}:){2}[0-9]+\.[0-9]+' file1
2021-1-1 21:50:19.0|LOG|DESC1
grep -m1 -E '^([0-9]+-){2}[0-9]+ ([0-9]{2}:){2}[0-9]+\.[0-9]+' file2
21-1-3 21:50:19.0|LOG|DESC1

This sed works with either GNU or POSIX sed:
sed -nE '/^[[:digit:]]{2,4}-[[:digit:]]{1,2}-[[:digit:]]{1,2}/{p;q;}' file
But awk, with the same BRE, is probably better:
awk '/^[[:digit:]]{2,4}-[[:digit:]]{1,2}-[[:digit:]]{1,2}/{print; exit}' file

Related

Sed command to search by regex in file

I need to get a number of version from file. My version file looks like this:
#define MINOR_VERSION_NUMBER 1
I try to use sed command:
VERSION_MINOR=`sed -i -e 'MINOR_VERSION_NUMBER\s+\([0-9]+\).*/\1/p' $WORKSPACE/project/common/version.h`
but I get error:
sed: -e expression #1, char 2: extra characters after command

The "address" that selects matching lines needs to be enclosed in /.../ (or \X...X for any X).
sed -ne '/MINOR_VERSION_NUMBER/{ s/.*\([0-9]\).*/\1/;p }'
Don't use -i, it changes the file in place and doesn't output anything.
The more common way would be to use awk to find the line and extract the wanted column:
awk '(/MINOR_VERSION_NUMBER/){print$3}'

using grep
grep MINOR_VERSION_NUMBER | grep -o '[0-9]*$'
Demo :
$echo "#define MINOR_VERSION_NUMBER 1" | grep -o '[0-9]*$'
1
$echo "#define MINOR_VERSION_NUMBER 1123" | grep -o '[0-9]*$'
1123
$

Here is a correction of your attempt. Change your line:
VERSION_MINOR=`sed -i -e 'MINOR_VERSION_NUMBER\s+\([0-9]+\).*/\1/p' $WORKSPACE/project/common/version.h`
into:
VERSION_MINOR=`sed -n -e '/^#define\s\+MINOR_VERSION_NUMBER\s\+\([0-9]\+\).*/ s//\1/p' $WORKSPACE/project/common/version.h`
This can be made more readable with GNU sed's -r option:
VERSION_MINOR=`sed -n -r -e '/^#define\s+MINOR_VERSION_NUMBER\s+([0-9]+).*/ s//\1/p' $WORKSPACE/project/common/version.h`
As stated by choroba, awk would be more suited than sed for this kind of processing (see his answer).
However, here is another solution using bash's read builtin, together with GNU grep:
read x x VERSION_MINOR x < <(grep -F -w -m1 MINOR_VERSION_NUMBER $WORKSPACE/project/common/version.h)

VERSION_MINOR=$(echo "#define MINOR_VERSION_NUMBER 1" | tr -s ' ' | cut -d' ' -f3)

sed & regex expression

I'm trying to add a 'chr' string in the lines where is not there. This operation is necessary only in the lines that have not '##'.
At first I use grep + sed commands, as following, but I want to run the command overwriting the original file.
grep -v "^#" 5b110660bf55f80059c0ef52.vcf | grep -v 'chr' | sed 's/^/chr/g'
So, to run the command in file I write this:
sed -i -E '/^#.*$|^chr.*$/ s/^/chr/' 5b110660bf55f80059c0ef52.vcf
This is the content of the vcf file.
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="#ref plus strand,#ref minus strand, #alt plus strand, #alt minus strand">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 24430-0009S21_GM17-12140
1 955597 95692 G T 1382 PASS VARTYPE=1;BGN=0.00134309;ARL=150;DER=53;DEA=55;QR=40;QA=39;PBP=1091;PBM=300;TYPE=SNP;DBXREF=dbSNP:rs115173026,g1000:0.2825,esp5400:0.2755,ExAC:0.2290,clinvar:rs115173026,CLNSIG:2,CLNREVSTAT:mult,CLNSIGLAB:Benign;SGVEP=AGRN|+|NM_198576|1|c.45G>T|p.:(p.Pro15Pro)|synonymous GT:DP:AD:DP4 0/1:125:64,61:50,14,48,13
chr1 957898 82729935 G T 1214 off_target VARTYPE=1;BGN=0.00113362;ARL=149;DER=50;DEA=55;QR=38;QA=40;PBP=245;PBM=978;NVF=0.53;TYPE=SNP;DBXREF=dbSNP:rs2799064,g1000:0.3285;SGVEP=AGRN|+|NM_198576|2|c.463+56G>T|.|intronic GT:DP:AD:DP4 0/1:98:47,51:9,38,10,41

If I understand what is your expected result, try:
sed -ri '/^(#|chr)/! s/^/chr/' file

Your question isn't clear and you didn't provide the expected output so we can't test a potential solution but if all you want is to add chr to the start of lines where it's not already present and which don't start with # then that's just:
awk '!/^(#|chr)/{$0="chr" $0} 1' file
To overwrite the original file using GNU awk would be:
awk -i inplace '!/^(#|chr)/{$0="chr" $0} 1' file
and with any awk:
awk '!/^(#|chr)/{$0="chr" $0} 1' file > tmp && mv tmp file

This can be done with a single sed invocation. The script itself is something like the following.
If you have an input of format
$ echo -e '#\n#\n123chr456\n789chr123\nabc'
#
#
123chr456
789chr123
abc
then to prepend chr to non-commented chrless lines is done as
$ echo -e '#\n#\n123chr456\n789chr123\nabc' | sed '/^#/ {p
d
}
/chr/ {p
d
}
s/^/chr/'
which prints
#
#
123chr456
789chr123
chrabc
(Note the multiline sed script.)
Now you only need to run this script on a file in-place (-i in modern sed versions.)

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

I want to find the set of words that contain two consecutive e’s, and also contains two y’s.
So far i got to /eeyy/

Alteration with ERE:
$ echo evyyree | grep -E '.*ee.*yy|.*yy.*ee'
evyyree
$ echo eveeryy | grep -E '.*ee.*yy|.*yy.*ee'
eveeryy
If the match needs to be in the same word, you can do:
$ echo "eee yyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee' # no match
$ echo "eeeyyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyyyy
Then only that word:
$ echo 'eeeyy heelo' | grep -Eo 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyy

Pipe it:
$ echo eennmmyy | grep ee | grep yy
eennmmyy

awk approach to match all words that contain both ee and yy:
s="eennmmyy heello thees-whyy someyy"
echo $s | awk '{for(i=1;i<=NF;i++) if($i~/ee/ && $i~/yy/) print $i}'
The output:
eennmmyy
thees-whyy

The only sensible and extensible way to do this is with awk:
awk '/ee/&&/yy/' file
Imagine trying to do it the grep way if you also had to find zz. Here's awk:
awk '/ee/&&/yy/&&/zz/' file
and here's grep:
grep -E 'ee.*yy.*zz|ee.*zz.*yy|yy.*ee.*zz|yy.*zz.*ee|zz.*yy.*ee|zz.*ee.*yy' file
Now add a 4th additional string to search for and see what that looks like!

Extract numbers with a regex and grep

i have a file which contain:
abc:12345
def:56323
i want to extract number by grep :
grep -o "[0-9]"
but it could not give the result :
12345
56323
Thanks for anyhelp

Maybe you missed [0-9]*:
$ grep -o "[0-9]*" file
12345
56323
Note that for this particular case, you can also make use of other tools:
while IFS=: read text number
do
echo "$number"
done < file
Or cut, sed or awk:
cut -d: -f2 file
sed 's/^[^:]*://' file
awk -F: '{print $2}' file

What is the Unix command to display all lines of a file with two certain strings

Basically, I have a file that I want to search and display only the lines that have the strings 'abc' and 'vhg'. What is the Unix command for this?

You can use grep for it:
grep abc file.txt | grep vhg
OR
you can use awk:
awk '/abc/ && /vhg/' file.txt
One more way with grep:
grep .*abc.*vhg file.txt

Use the grep command.
grep 'word1\|word2\|word3' /path/to/file
Example:
grep 'abc\|vhg' filename

Since a sed solution has not yet been given:
sed -n '/abc/{ /vhg/p; }'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Grep first line which contain a date - regex

In case you are ok with awk, could you please try following. This will find the matched regex first line and exit from program, which will be faster since its NOT reading whole Input_file. awk ' /^[0-9]{2}([0-9]{2})?-[0-9]{1,2}-[0-9]{1,2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]+/{ print exit }' Input_file

Using sed, being not too concerned about exactly how many digits are present: sed -En '/^[0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+[.][0-9]+[|]/ {p; q}' file

$ grep -m1 '^[0-9]' file1 2021-1-1 21:50:19.0|LOG|DESC1 $ grep -m1 '^[0-9]' file2 21-1-3 21:50:19.0|LOG|DESC1 If that's not all you need then edit your question to provide more truly representative sample input/output.

A simple grep with -m 1 (to exit after finding first match): grep -m1 -E '^([0-9]+-){2}[0-9]+ ([0-9]{2}:){2}[0-9]+\.[0-9]+' file1 2021-1-1 21:50:19.0|LOG|DESC1 grep -m1 -E '^([0-9]+-){2}[0-9]+ ([0-9]{2}:){2}[0-9]+\.[0-9]+' file2 21-1-3 21:50:19.0|LOG|DESC1

This sed works with either GNU or POSIX sed: sed -nE '/^[[:digit:]]{2,4}-[[:digit:]]{1,2}-[[:digit:]]{1,2}/{p;q;}' file But awk, with the same BRE, is probably better: awk '/^[[:digit:]]{2,4}-[[:digit:]]{1,2}-[[:digit:]]{1,2}/{print; exit}' file

Related

Sed command to search by regex in file

sed & regex expression

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

Extract numbers with a regex and grep

What is the Unix command to display all lines of a file with two certain strings

Categories

Resources