Extract numbers with a regex and grep - regex

i have a file which contain:
abc:12345
def:56323
i want to extract number by grep :
grep -o "[0-9]"
but it could not give the result :
12345
56323
Thanks for anyhelp

Maybe you missed [0-9]*:
$ grep -o "[0-9]*" file
12345
56323
Note that for this particular case, you can also make use of other tools:
while IFS=: read text number
do
echo "$number"
done < file
Or cut, sed or awk:
cut -d: -f2 file
sed 's/^[^:]*://' file
awk -F: '{print $2}' file

Related

Grep first line which contain a date

I'm trying to fetch the first line in a log file which contain a date.
Here is an example of the log file :
SOME
LOG
2021-1-1 21:50:19.0|LOG|DESC1
2021-1-4 21:50:19.0|LOG|DESC2
2021-1-5 21:50:19.0|LOG|DESC3
2021-1-5 21:50:19.0|LOG|DESC4
In this context I need to get the following line:
2021-1-1 21:50:19.0|LOG|DESC1
An other log file example :
SOME
LOG
21-1-3 21:50:19.0|LOG|DESC1
21-1-3 21:50:19.0|LOG|DESC2
21-1-4 21:50:19.0|LOG|DESC3
21-1-5 21:50:19.0|LOG|DESC4
I need to fetch :
21-1-3 21:50:19.0|LOG|DESC1
At the moment I tried the following command :
cat /path/to/file | grep "$(date +"%Y-%m-%d")" | tail -1
cat /path/to/file | grep "$(date +"%-Y-%-m-%-d")" | tail -1
cat /path/to/file | grep -E "[0-9]+-[0-9]+-[0-9]" | tail -1
In case you are ok with awk, could you please try following. This will find the matched regex first line and exit from program, which will be faster since its NOT reading whole Input_file.
awk '
/^[0-9]{2}([0-9]{2})?-[0-9]{1,2}-[0-9]{1,2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]+/{
print
exit
}' Input_file
Using sed, being not too concerned about exactly how many digits are present:
sed -En '/^[0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+[.][0-9]+[|]/ {p; q}' file
$ grep -m1 '^[0-9]' file1
2021-1-1 21:50:19.0|LOG|DESC1
$ grep -m1 '^[0-9]' file2
21-1-3 21:50:19.0|LOG|DESC1
If that's not all you need then edit your question to provide more truly representative sample input/output.
A simple grep with -m 1 (to exit after finding first match):
grep -m1 -E '^([0-9]+-){2}[0-9]+ ([0-9]{2}:){2}[0-9]+\.[0-9]+' file1
2021-1-1 21:50:19.0|LOG|DESC1
grep -m1 -E '^([0-9]+-){2}[0-9]+ ([0-9]{2}:){2}[0-9]+\.[0-9]+' file2
21-1-3 21:50:19.0|LOG|DESC1
This sed works with either GNU or POSIX sed:
sed -nE '/^[[:digit:]]{2,4}-[[:digit:]]{1,2}-[[:digit:]]{1,2}/{p;q;}' file
But awk, with the same BRE, is probably better:
awk '/^[[:digit:]]{2,4}-[[:digit:]]{1,2}-[[:digit:]]{1,2}/{print; exit}' file

Grep with regex expression

I need the content between the fourth and fifth "|" on all lines starting with FHEAD. The goal is to apply the regular expression in grep to read files.
I have this expression that returns all content between "|"
(?<=\|)(.*?)(?=\|)
The goal in the example below would be to return
1047
8401-
FHEAD|1|PRMPC|20200217103050|1047|S
TMBPE|FHEAD|2|MOD
FHEAD|3|8401|230008|8401-|8401-Dcto|8401-Dcto 10FHEAD|1|235211|20190206000001|20190402235959|2||1||8||
TPGRP|4|240184
TGLIST|5|235213||||FHEAD
TLITM|6|101029605
TLITM|7|FHEAD101052978
Someone can help me?
Thanks in advance
To print the content of the fifth field (non-empty) on lines starting with FHEAD:
awk -F'|' '$1=="FHEAD" && $5!=""{print $5}' file
awk -F '|' '$5=="1047" || $5=="8401-"{ print $0 }" inputfile.txt
Above will find "1047" or "8401" in the fifth column of the inputfile "inputfile.txt"
grep -E "\|1047\||\|8401-\|" inputfile.txt
Above will do the same with grep (but this will not be restricted to column 5.
EDIT:
I must have missed the 'starting with FHEAD'....
awk -F\| '/^FHEAD/{ print $5 }' inputfile.txt
or with grep
grep -e '^FHEAD|\(.[^|]*|\)\{3\}\(.[^|]*\)' -o inputfile.txt | grep '.[^|]*|*' -o | grep -v '|$'
a combination of grep and cut:
grep -e '^FHEAD' inputfile.txt | cut -d'|' -f 5

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

I want to find the set of words that contain two consecutive e’s, and also contains two y’s.
So far i got to /eeyy/
Alteration with ERE:
$ echo evyyree | grep -E '.*ee.*yy|.*yy.*ee'
evyyree
$ echo eveeryy | grep -E '.*ee.*yy|.*yy.*ee'
eveeryy
If the match needs to be in the same word, you can do:
$ echo "eee yyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee' # no match
$ echo "eeeyyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyyyy
Then only that word:
$ echo 'eeeyy heelo' | grep -Eo 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyy
Pipe it:
$ echo eennmmyy | grep ee | grep yy
eennmmyy
awk approach to match all words that contain both ee and yy:
s="eennmmyy heello thees-whyy someyy"
echo $s | awk '{for(i=1;i<=NF;i++) if($i~/ee/ && $i~/yy/) print $i}'
The output:
eennmmyy
thees-whyy
The only sensible and extensible way to do this is with awk:
awk '/ee/&&/yy/' file
Imagine trying to do it the grep way if you also had to find zz. Here's awk:
awk '/ee/&&/yy/&&/zz/' file
and here's grep:
grep -E 'ee.*yy.*zz|ee.*zz.*yy|yy.*ee.*zz|yy.*zz.*ee|zz.*yy.*ee|zz.*ee.*yy' file
Now add a 4th additional string to search for and see what that looks like!

awk: getting a certain value

Identifiable: fasdf/=egbalid=/more.garble/XY=foo.bar.baz
I have a line that can be uniquely identified with /Identifiable/. What I'd like to have is the value of XY (in this case foo.bar.baz). How can I get that in awk?
You could use grep for this purpose.
grep -oP '^(?=.*\bIdentifiable\b).*\bXY=\K[\w.]+' file
Example:
$ echo 'Identifiable: fasdf/=egbalid=/more.garble/XY=foo.bar.baz' | grep -oP '^(?=.*\bIdentifiable\b).*\bXY=\K[\w.]+'
foo.bar.baz
Assuming the XY value is always at the end of the line (it's always hard to guess when just 1 line of sample input is posted):
$ awk -F= '/Identifiable/{print $NF}' file
foo.bar.baz
Here is one way:
echo "Identifiable: fasdf/=egbalid=/more.garble/XY=foo.bar.baz" | awk -F"XY=" '{print $2}'
foo.bar.baz

How to use sed to identify a string in brackets?

I want to find the string in that is placed with in the brackets. How do I use sed to pull the string?
# cat /sys/block/sdb/queue/scheduler
noop anticipatory deadline [cfq]
I'm not getting the exact result
# cat /sys/block/sdb/queue/scheduler | sed 's/\[*\]//'
noop anticipatory deadline [cfq
I'm expecting an output
cfq
It can be easier with grep, if it happens to be changing the position in which the text in between brackets is located:
$ grep -Po '(?<=\[)[^]]*' file
cfq
This is look-behind: whenever you find a string [, start fetching all the characters up to a ].
See another example:
$ cat a
noop anticipatory deadline [cfq]
hello this [is something] we want to [enclose] yeah
$ grep -Po '(?<=\[)[^]]*' a
cfq
is something
enclose
You can also use awk for this, in case it is always in the same position:
$ awk -F[][] '{print $2}' file
cfq
It is setting the field separators as [ and ]. And from that, prints the second one.
And with sed:
$ sed 's/[^[]*\[\([^]]*\).*/\1/g' file
cfq
It is a bit messy, but basically it is looking from the block of text in between [] and prints it back.
I found one possible solution-
cut -d "[" -f2 | cut -d "]" -f1
so the exact solution is
# cat /sys/block/sdb/queue/scheduler | cut -d "[" -f2 | cut -d "]" -f1
Another potential solution is awk:
s='noop anticipatory deadline [cfq]'
awk -F'[][]' '{print $2}' <<< "$s"
cfq
Another way by gnu grep :
grep -Po "\[\K[^]]*" file
with pure shell:
while read line; do [[ "$line" =~ \[([^]]*)\] ]] && echo "${BASH_REMATCH[1]}"; done < file
Another awk
echo 'noop anticipatory deadline [cfq]' | awk '{gsub(/.*\[|\].*/,x)}8'
cfq
perl -lne 'print $1 if(/\[([^\]]*)\]/)'
Tested here