How can I use grep to match 3 numbers in a file? My file looks like this:
123
122
222
333443
fdsfs5454353
dsfsfjsk4654641
Note that some of the lines contain trailing spaces. I want to only match three digit numbers. I tried:
grep -E [0-9]{3} test.txt
grep -E '\<[0-9]{3}\>' test.txt
grep '^[0-9][0-9]*' test|awk '{if(length($0) == 3) print $0}'
or if you have whitespace:
sed 's/[ \t]*$//' test|grep '^[0-9][0-9]*'|awk '{if(length($0) == 3) print $0}'
(thanks #shellter)
Use Extended Regular Expressions with Bounds
I asked if you meant numbers with exactly three digits, or each three-digit match in a string. You replied that you wanted only lines that contained exactly three digits.
Extended grep provides an easy solution for this. Consider the following:
$ egrep '^\d{3}\b' /tmp/corpus
123
122
222
This uses a bound (also known as a range) to look for exactly three digits at the start of each line, followed by a word boundary. The word boundary will match trailing space or the end of line, ensuring that you get the proper match in either case.
Related
I have a file.txt looking like this:
abe
abbe
cde
45a678
ae
cababb
12345
And after running command egrep [[:digit:]] file.txt
it shows the result two results: "45a678" and "12345". I don't understand why does it show the first result (I tought that regex will only show lines with numbers).
You are searching for any digit in line. You should constrain it from beginning (^) to the end ($) of the line and find at least one digit in between (+).
egrep ^[[:digit:]]+$ file.txt
in Regex [:digit:] only matches a digit and not checking all the line.
For parsing all line you need to use ^ for beginning line and $ for end line.
as a result
egrep ^\d+$ file.txt
will only match those lines with numbers
Your regex [[:digit:]] searching for lines that has [[:digit:]] so the 45a678 matches so use ^[[:digit:]]*$ to match all only-digit lines:
$ egrep ^[[:digit:]]*$ file1.txt
12345
i have a file where lines have numbers with characters,only characters and only numbers. I would like to choose the lines with only numbers. I tried egrep '[^[:alpha:]]' filename but i take also lines with chars. Any idea?
AQ
Feb 9, 1999
11:45
45
And i want only
45
The regex needs to check that everything on the line is numeric. So a ^ and $ around the expression is needed to match from the start to the end of each line. Also the match will need to be explicitly for digits, rather than non-alpha.
E.g.
egrep '^[[:digit:]]+$' filename
This worked well against the example in the question.
I would exclude any line that contains any non-digit character:
grep -v '[^[:digit:]]' file
# ........| negates the character class
with awk
only lines with digits and nothing else
$ awk '/^[0-9]+$/' file
45
or, exclude any line which has a not digit char
$ awk '!/[^0-9]/' file
45
To match lines containing only numbers, use either "whole line mode" with -x:
grep -xE '[[:digit:]]+' file
or add the line start/end anchors to the regular expression:
grep -E '^[[:digit:]]+$' file
Note that you can replace the character class [:digit:] with the range 0-9 if you are only concerned with matching the ASCII characters from 0 to 9:
grep -xE '[0-9]+' file
I know a colon: should be literal, so I'm not clear why a grep matches all lines. Here's a file called "test":
cat test
123|4444
4546|4444
666666|5678
7777777|7890675::1
I need to match the line with::1. Of course, the real case is more complicated, so I can't simply search for "::1". I tried many iterations, like
grep -E '^[0-9]|[0-9]:' test
grep -E '^[0-9]|[0-9]::1' test
But they return all lines:
123|4444
4546|4444
666666|5678
7777777|7890675::1
I am expecting to match just the last line. Any idea why that is?
This is GNU/Linux bash.
The pipe needs to be escaped and you need to allow repeated digits:
grep -E '^[0-9]+\|[0-9]+:' test
Otherwise ^[0-9] is all that needs to match for a line to be retained by the grep.
Given:
$ echo "$txt"
123|4444
4546|4444
666666|5678
7777777|7890675::1
Use repetition (+ means 'one or more') and character classes:
$ echo "$txt" | grep -E '^[[:digit:]]+[|][[:digit:]]+[:]+'
7777777|7890675::1
Since | is a regex meta character, it has to be either escaped (\|) or in a character class.
There are two issues:
The regex [0-9] matches any single digit. Since you have multiple digits, you need to replace those parts with [0-9]+, which matches one or more digits. If you want to allow an empty sequence with no digits, replace the + with a *, which means “zero or more”.
The pipe character | means “alternative”s in regex. What you provided will match either a digit at the start of the line, or a digit followed by a colon. Since every line has at least one of those, you match every line. To get a literal | character, you can use either [|] or \|; the second option is usually preferred in most styles.
Applying both of these, you get ^[0-9]+\|[0-9]+::1.
Another approach is to use a tool like awk that can process the fields of each line, and match lines where the 2nd field ends with "::1"
awk -F'|' '$2 ~ /::1$/' test
I need to grep files for lines containing only lowercase letters and spaces. Both conditions must be met at least once and no other characters are allowed.
I know how to grep only for lowercase or only for space but I don't know how to join those two conditions in one regexp/command.
I have only this right now:
egrep "[[:space:]]" $DIR/$file | egrep -vq "[[:upper:]]"
which of course will display lines with digits and/or special characters as well which is not what I want.
Thanks.
This is what you require
The -x matches whole lines
The first expression matches lines composed entirely of spaces and lower case letters.
The second expression matches lines that have both a space and a lower case letter.
egrep -x '[[:lower:] ]*' $DIR/$file | egrep '( [[:lower:]])|([[:lower:]] )'
awk may be better to express such conditions:
awk '/^[ a-z]+$/ && /[a-z]/ && / /' file
That is, it checks that a line:
consists in just spaces and lowercase letters.
it contains at least a lowercase.
it contains at least a space.
Test
$ cat a
hello this is something simple
but SUDDENLY not
wah
wa ah
$ awk '/^[ a-z]+$/ && /[a-z]/ && / /' a
hello this is something simple
wa ah
First grep all lines that only consist of lowercase characters and whitespace, and then all those that contain at least one whitespace.
egrep -x '[[:lower:][:space:]]+' "$DIR/$file" | egrep '[[:space:]]+'
The [:space:] meta class also matches for tabs, and can be replaced with a plain space if desired.
I have this:
echo 12345 | grep -o '[[:digit:]]\{1,4\}'
Which gives this:
1234
5
I understand whats happening. How do I stop grep from trying to continue matching after 1 successful match?
How do I get only
1234
Do you want grep to stop matching or do you only care about the first match. You could use head if the later is true...
`grep stuff | head -n 1`
Grep is a line based util so the -m 1 flag tells grep to stop after it matches the first line which when combined with head is pretty good in practice.
You need to do the grouping: \(...\) followed by the exact number of occurrence: \{<n>\} to do the job:
maci:~ san$ echo 12345 | grep -o '\([[:digit:]]\)\{4\}'
1234
Hope it helps. Cheers!!
Use sed instead of grep:
echo 12345 | sed -n '/^\([0-9]\{1,4\}\).*/s//\1/p'
This matches up to 4 digits at the beginning of the line, followed by anything, keeps just the digits, and prints them. The -n prevents lines from being printed otherwise. If the digit string might also appear mid-line, then you need a slightly more complex command.
In fact, ideally you'll use a sed with PCRE regular expressions since you really need a non-greedy match. However, up to a reasonable approximation, you can use: (A semi-solution to a considerably more complex problem...now removed!)
Since you want the first string of up to 4 digits on the line, simply use sed to remove any non-digits and then print the digit string:
echo abc12345 | sed -n '/^[^0-9]*\([0-9]\{1,4\}\).*/s//\1/p'
This matches a string of non-digits followed by 1-4 digits followed by anything, keeps just the digits, and prints them.
If – as in your example – your numeric expression will appear at the beginning of the string you're starting with, you could just add a start-of-line anchor ^:
echo 12345 | grep -o '^\([[:digit:]]\)\{1,4\}'
Depending on which exact digits you want, an end-of-line anchor $ might help also.
grep manpage says on this topic (see chapter 'regular expressions'):
(…)
{n,}
The preceding item is matched n or more times.
{n,m}
The preceding item is matched at least n times, but not more than m times.
(…)
So the answer should be:
echo 12345 | grep -o '[[:digit:]]\{4\}'
I just tested it on cygwin terminal (2018) and it worked!