i have a file where lines have numbers with characters,only characters and only numbers. I would like to choose the lines with only numbers. I tried egrep '[^[:alpha:]]' filename but i take also lines with chars. Any idea?
AQ
Feb 9, 1999
11:45
45
And i want only
45
The regex needs to check that everything on the line is numeric. So a ^ and $ around the expression is needed to match from the start to the end of each line. Also the match will need to be explicitly for digits, rather than non-alpha.
E.g.
egrep '^[[:digit:]]+$' filename
This worked well against the example in the question.
I would exclude any line that contains any non-digit character:
grep -v '[^[:digit:]]' file
# ........| negates the character class
with awk
only lines with digits and nothing else
$ awk '/^[0-9]+$/' file
45
or, exclude any line which has a not digit char
$ awk '!/[^0-9]/' file
45
To match lines containing only numbers, use either "whole line mode" with -x:
grep -xE '[[:digit:]]+' file
or add the line start/end anchors to the regular expression:
grep -E '^[[:digit:]]+$' file
Note that you can replace the character class [:digit:] with the range 0-9 if you are only concerned with matching the ASCII characters from 0 to 9:
grep -xE '[0-9]+' file
Related
I need to filter all words out of a file that have duplicate characters.
Ive been stuck for a couple of days trying to figure this out.
This is what ive got so far to find a 5 letter string but ive still got words with duplicate letters showing up...
Any help would be appreciated
cat /file | grep -Eow '\w{5}' | grep -v '\(.\)(.\)\1' | sort -u
# Input file
$ cat file
aabc
123
1233
# Filter out repeating characters
$ grep -Ev "(.)\1" file
123
# Show only lines with repeating characters
$ grep -E "(.)\1" file
aabc
1233
This previous question explains how to do this using regex: Regex to find repeating numbers
Detailed explanations:
grep -E Use (extended) regex to match lines with grep
grep -v Invert match so unmatching rows are displayed
. regex match any character
( ) regex group
\1 match the previous groups one time, in this case any character which repeats.
Also, https://regex101.com/ is an easy way to construct a regex for your purpose. Create a few test cases and check that it works as you write the regex.
I am trying to do multiple grep pattern to find a number within a grepped string.
I have a text file like this:
This is the first sample line 1
this is the second sample line
another line
total lines: 3 tot
I am trying to find a way to get just the number of total lines. So the output here should be "3"
Here are the things I've tried:
grep "total lines: [0-9]" myfile.txt
grep "total lines" myfile.txt | grep "[0-9]"
You could use sed:
sed -En 's/^total lines: ([0-9]+).*/\1/p' myfile.txt
-E extended regular expressions
-n suppress automatic printing
Match ^total lines: ([0-9]+).* (capture the number)
\1 replace the whole line with the captured number
p print the result
1st solution: Using GNU grep try following. Simply using -o option to print only matched value, -P enables PCRE regex for program. Then in regex portion matching from starting ^total lines: in each line and if a match found then discard matched values by \K option(to remove it from expected output) which is followed by 1 or more digits, using positive look ahead to make sure its followed by space(s) tot here.
grep -oP '^total lines: \K[0-9]+(?=\s+tot)' Input_file
2nd solution: With your shown samples, please try following in awk. This could be done in a single awk itself. Searching line which has string /total lines: / in it then printing 2nd last field of that line.
awk '/total lines: /{print $(NF-1)}' Input_file
3rd solution: Using awk's match function here. Matching total lines: [0-9]+ tot and then substituting everything apart from digits with null in matched values.
awk 'match($0,/total lines: [0-9]+ tot/){val=substr($0,RSTART,RLENGTH);gsub(/[^0-9]+/,"",val);print val}' Input_file
Do you have to use grep?
$ echo myfile.txt | wc -l
If you mean that the file has a line in it formatted as
total lines: 3 tot
Then refer to https://unix.stackexchange.com/questions/13466/can-grep-output-only-specified-groupings-that-match and use something like:
grep -Po 'total lines: \K\d+' myfile.txt
Notes:
Perl regex is not my forte, so the \d\w part might not work.
This may be doable without -P, but I cannot test from this windows computer.
regex101.com helped me test the above line, so it may work.
Problem with relying on pattern of last line and applying grep/sed to find pattern is that if any line in file contains such pattern, then you will have to apply some additional logic to filter that.
e.g. Consider case of below input file.
line001
total lines: 883 tot
This is the first sample line 1
this is the second sample line
another line
total lines: 883 tot
Assuming your file format is constant (i.e. Second last line will be blank and last line will contain total count), instead of using any pattern matching commands you can directly count number of rows using below awk command.
awk 'END { print NR - 2 }' myfile.txt
You can use the following awk to get the third field on a line that starts with total count: and stop processing the file further:
awk '/^total lines:/{print $3; exit}' file
See this online demo.
You can use the following GNU grep:
# Extract a non-whitespace chunk after a certain pattern
grep -oP '^total lines:\s*\K\S+' file
# Extract a number after a pattern
grep -oP '^total lines:\s*\K\d+(?:\.\d+)?' file
See an online demo. Details:
^ - start of string
total lines: - a literal string
\s* - any zero or more whitespace chars
\K - match reset operator discarding all text matched so far
\S+ - one or more non-whitespace chars
\d+(?:\.\d+)? - one or more digits and then an optional sequence of . and one or more digits.
See the regex demo.
I have a file.txt looking like this:
abe
abbe
cde
45a678
ae
cababb
12345
And after running command egrep [[:digit:]] file.txt
it shows the result two results: "45a678" and "12345". I don't understand why does it show the first result (I tought that regex will only show lines with numbers).
You are searching for any digit in line. You should constrain it from beginning (^) to the end ($) of the line and find at least one digit in between (+).
egrep ^[[:digit:]]+$ file.txt
in Regex [:digit:] only matches a digit and not checking all the line.
For parsing all line you need to use ^ for beginning line and $ for end line.
as a result
egrep ^\d+$ file.txt
will only match those lines with numbers
Your regex [[:digit:]] searching for lines that has [[:digit:]] so the 45a678 matches so use ^[[:digit:]]*$ to match all only-digit lines:
$ egrep ^[[:digit:]]*$ file1.txt
12345
How can I use grep to match 3 numbers in a file? My file looks like this:
123
122
222
333443
fdsfs5454353
dsfsfjsk4654641
Note that some of the lines contain trailing spaces. I want to only match three digit numbers. I tried:
grep -E [0-9]{3} test.txt
grep -E '\<[0-9]{3}\>' test.txt
grep '^[0-9][0-9]*' test|awk '{if(length($0) == 3) print $0}'
or if you have whitespace:
sed 's/[ \t]*$//' test|grep '^[0-9][0-9]*'|awk '{if(length($0) == 3) print $0}'
(thanks #shellter)
Use Extended Regular Expressions with Bounds
I asked if you meant numbers with exactly three digits, or each three-digit match in a string. You replied that you wanted only lines that contained exactly three digits.
Extended grep provides an easy solution for this. Consider the following:
$ egrep '^\d{3}\b' /tmp/corpus
123
122
222
This uses a bound (also known as a range) to look for exactly three digits at the start of each line, followed by a word boundary. The word boundary will match trailing space or the end of line, ensuring that you get the proper match in either case.
I need to display all lines in file.txt containing the character "鱼", but only those where "鱼" is immediately preceded by a-z, A-Z, a space, or a line break.
I tried using grep, like this:
grep "[a-zA-Z\s\n]鱼" file.txt
The regular expression [a-zA-Z\s\n] does not appear to work. How can I search for this character, when appearing after a-z, A-Z, a space, or a line break?
If you want to match a space with grep, use a space:
grep "[a-zA-Z ]鱼" file.txt
If you want to match any whitespace, you can use the Posix standard character class:
grep "[a-zA-Z[:space:]]鱼" file.txt
("Any whitespace" is space, newline, carriage return, form feed, tab and vertical tab. If you just want to match space and tab, you can use [:blank:].)
You might also want to use a standard class for letters. Unless you are in the Posix or "C" locale, the meanings of character ranges like A-Z are unpredictable.
grep "[[:alpha:][:space:]]鱼" file.txt
grep works line by line, so it will never see a newline. But using an "extended" pattern, you can also match at the beginning of the line:
egrep "(^|[[:alpha:][:space:]])鱼" file.txt
(You can use grep -E instead of egrep if you prefer. But you need one or the other for the above regular expression to work.)
Grep does not support this by default
$ man grep | grep '\\s'
But awk does
$ man awk | grep '\\s'
\s Matches any whitespace character.
So perhaps use
awk '/[a-zA-Z\s\n]鱼/' file.txt
Use awk:
awk '/[A-Za-z \t]鱼/ || (NR > 1 && /^鱼/)' file
Which would print line if 鱼 is after [A-Za-z \t] or if it's not on the first line and it's in the beginning of the line: NR > 1 && /^鱼/.
If you just really want that it's on the beginning or is followed by [A-Za-z \t], you can simply do this:
awk '/(^|[A-Za-z \t])鱼/' file
Or
grep -E '/(^|[A-Za-z \t])鱼/' file
Try this one:
^[a-zA-Z \n]{1,}鱼
{1,} will make u assure that 鱼 got at least 1 of these element before
what is more i suggest to use awk in this particular case