Matching decimal number in grep - regex

I have a file that has the line:
Time 97.7518 seconds
I want to get the decimal time. Why is the following simple grep command not working?
grep -Ei "\d+\.\d+" Nasa-1024-256.txt

You seem to need the -o option to extract the match, and using the [0-9] bracket expression is safer with ERE regex flavor (it is set by the -E option):
grep -Eo "[0-9]+\.[0-9]+" Nasa-1024-256.txt

Related

How to use square brackets in grep for MINGW64?

Currently, I have a following regex. It should match a string that I am echoing:
echo "TBGFSGFI22800_D_REP_D_RISIKOEINHEIT" | grep -E 'TBGFSGFI\d\d\d\d\d[A-Za-z_]{1,100}'
It works as expected in OsX on my Mac and in Notepad++, but in Bash for windows (MINGW64) I get an empty string. How can I use the grep with flags, or how should I rewrite the regex to match the pattern?
My grep version is 3.1. Bash: 4.4.23(1)
Thanks for help in advance!
You are using a POSIX ERE regex with the -E option, and that flavor does not support \d construct. You also need -o option to actually extract the matches.
Note you do not need to repeat \d five times, you can use a range quantifier, \d{5}.
You can use
echo "TBGFSGFI22800_D_REP_D_RISIKOEINHEIT" | grep -Po "TBGFSGFI\d{5}[A-Za-z_]{1,100}"
Where
-P means the regex is of a PCRE flavor
-o extracts matches only
TBGFSGFI\d{5}[A-Za-z_]{1,100} - a regex that matches TBGFSGFI, then any five digits and then 1-100 ASCII letters or _.

grep regex for return numerical values in between string

so i'm running the linux command
ls /etc/systemd/system | grep -o -E "[0-9]+"
which should return just numerical values, the only problem it returns some unwanted numerical values from parts of results i dont want, i want only the numerical values between - and .service so in like test-blah4-1321.service i just want it to return 1321. What am i missing here?
example
$ ls /etc/systemd/system
test.service test-blah4-1321.service test-blah2.service test-blah5-1387.service test-blah3-1521.service
GNU grep has the -P option for perl-style regexes, and the -o option to print only what matches the pattern. These can be combined using look-around assertions (described under Extended Patterns in the perlre manpage) to remove part of the grep pattern from what is determined to have matched for the purposes of -o.
Source
Applied to your example this would be:
echo test-blah4-1321.service | grep -oP '(?<=-)\d+(?=\.service)'
when I need look ahead or look behind tests I usually switch to a Perl one-liner.
this should do the trick.
echo test-blah4-1321.service | perl -ne 'm/(?<=-)(\d+)(?=\.service)/g; print "$1\n";'

Grep and Egrep options

When I use grep -ow it affects the regex so I'm wondering what the regex would be without these options
I know that:
-o means show the line that matches the pattern
-w select lines that only match whole words
I'd like to convert egrep -ow '[1-9][0-9][0-9]+' text
egrep '[1-9][0-9][0-9]+' text but this regex is wrong with no options
You need to add word boundary.
egrep -o '\b[1-9][0-9][0-9]+\b' file
OR
Since egrep is depreciated, it's better to use grep with -E parameter.
grep -Eo '\b[1-9][0-9][0-9]+\b' file

Match specific length words, anchored, without doing magic math

Let's say I wanted to find all 12-letter words in /usr/share/dict/words that started with c and ended with er. Off the top of my head, a workable pattern could look something like:
grep -E '^c.{9}er$' /usr/share/dict/words
It finds:
cabinetmaker
calcographer
calligrapher
campanologer
campylometer
...
But that .{9} bothers me. It feels too magical, subtracting the total length of all the anchor characters from the number defined in the original constraint.
Is there any way to rewrite this regex so it doesn't require doing this calculation up front, allowing a literal 12 to be used directly in the pattern?
You can use the -x option which selects only matches that exactly match the whole line.
grep -xE '.{12}' | grep 'c.*er'
Ideone Demo
Or use the -P option which clarifies the pattern as a Perl regular expression and use a lookahead assertion.
grep -P '^(?=.{12}$)c.*er$'
Ideone Demo
You can use awk as an alternative and avoid this calculation:
awk -v len=12 'length($1)==len && $1 ~ /^c.*?er$/' file
I don't know grep so well, but some more advanced NFA RegEx implementations provide you with lookaheads and lookbehinds. If you can figure out any means to make those available for you, you could write:
^(?=c).{12}(?<=er)$
Maybe as a perl one-liner like this?
cat /usr/share/dict/words | perl -ne "print if m/^(?=c).{12}(?<=er)$/"
One approach with GNU sed:
$ sed -nr '/^.{12}$/{/^c.*er$/p}' words
With BSD sed (Mac OS) it would be:
$ sed -nE '/^.{12}$/{/^c.*er$/p;}' words

Bash (grep) regex performing unexpectedly

I have a text file, which contains a date in the form of dd/mm/yyyy (e.g 20/12/2012).
I am trying to use grep to parse the date and show it in the terminal, and it is successful,
until I meet a certain case:
These are my test cases:
grep -E "\d*" returns 20/12/2012
grep -E "\d*/" returns 20/12/2012
grep -E "\d*/\d*" returns 20/12/2012
grep -E "\d*/\d*/" returns nothing
grep -E "\d+" also returns nothing
Could someone explain to me why I get this unexpected behavior?
EDIT: I get the same behavior if I substitute the " (weak quotes) for ' (strong quotes).
The syntax you used (\d) is not recognised by Bash's Extended regex.
Use grep -P instead which uses Perl regex (PCRE). For example:
grep -P "\d+/\d+/\d+" input.txt
grep -P "\d{2}/\d{2}/\d{4}" input.txt # more restrictive
Or, to stick with extended regex, use [0-9] in place of \d:
grep -E "[0-9]+/[0-9]+/[0-9]" input.txt
grep -E "[0-9]{2}/[0-9]{2}/[0-9]{4}" input.txt # more restrictive
You could also use -P instead of -E which allows grep to use the PCRE syntax
grep -P "\d+/\d+" file
does work too.
grep and egrep/grep -E don't recognize \d. The reason your first three patterns work is because of the asterisk that makes \d optional. It is actually not found.
Use [0-9] or [[:digit:]].
To help troubleshoot cases like this, the -o flag can be helpful as it shows only the matched portion of the line. With your original expressions:
grep -Eo "\d*" returns nothing - a clue that \d isn't doing what you thought it was.
grep -Eo "\d*/" returns / (twice) - confirmation that \d isn't matching while the slashes are.
As noted by others, the -P flag solves the issue by recognizing "\d", but to clarify Explosion Pills' answer, you could also use -E as follows:
grep -Eo "[[:digit:]]*/[[:digit:]]*/" returns 20/12/
EDIT: Per a comment by #shawn-chin (thanks!), --color can be used similarly to highlight the portions of the line that are matched while still showing the entire line:
grep -E --color "[[:digit:]]*/[[:digit:]]*/" returns 20/12/2012 (can't do color here, but the bold "20/12/" portion would be in color)