I am looking to grep any positive/negative integers only and no decimals, or any other variation including a number.
I have a testpart1.txt which has:
This is a test for Part 1
Lets write -1324
Amount: $42.27
Numbers:
-345,64
067
Phone numbers:
(111)222-2424
This should output the following code:
This is a test for Part 1
Lets write -1324
067
I am new to bash and I cannot find how to exclude any number separated with a character like a '.' or ','. I am able to get all the numbers with the following code but I am stuck after that:
egrep '[0-9]' testpart1.txt
This gives me the opposite of what I want:
grep '[0-9]\.[0-9]' testpart1.txt
You may use this grep:
grep -E '(^|[[:blank:]])[+-]?[0-9]+([[:blank:]]|$)' file
This is a test for Part 1
Lets write -1324
067
Details:
-E: Enables extended regex matching
(^|[[:blank:]]): Match line start or a space or a tab character
[+-]?: Match optional + or -
[0-9]+: Match 1 or more digits
([[:blank:]]|$): Match line end or a space or a tab character
Related
I'd like to get rid of a line with a pattern containing:
CE1(2or8 # CE1(number 2 or 8
CE2(-1-17-2or8 # CE2(any number from -1 to 17, a dash, number 2 or 8
and 6 lines before that and 1 line after that.
grep -B6 -A1 'CE1([28]\|CE2([-1-17]-[28]' file
This attempt seems to match my pattern (does it do what I explicitly described?) but I was thinking of using reverse option to get rid of that pattern search from my file. Is it possible? It does not seem to work.
Not a complete answer, but some explanations:
A character class matches only one character. The hyphen in a character class, when it doesn't represent a literal hyphen (at the first position, at the end, when escaped or immediately after ^), defines a range of characters, but not a range of numbers. (make some tries with the ascii table on a corner to well understand.)
[-1-17] matches one of these characters that can be:
a literal hyphen (because at the beginning)
a character in the range 1-1 (so 1)
the character 7
To match an integer between -1 and 17, you need:
\(-1\|1[0-7]\|[0-9]\)
The simplest and most robust (since it works even when the skipped range includes lines that match the regexp or when the range runs off the start/end of the input file) approach, IMHO, is 2 passes - the first to identify the lines to be skipped and the second to skip those lines:
$ cat file
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
i 9
$ awk -v b=3 -v a=1 'NR==FNR{if (/f/) for (i=NR-b;i<=NR+a;i++) skip[i]; next} !(FNR in skip)' file file
a 1
b 2
h 8
i 9
Just change /f/ to /<your regexp of choice>/ and set the b(efore) and a(fter) values as you like.
As for your particular regexp, you didn't provide any sample input and expected output for us to test against but I THINK what you want might be:
awk -v b=6 -v a=1 'NR==FNR{if (/CE(1|2(-1|[0-9]|1[0-7])-)[28]/) for (i=NR-b;i<=NR+a;i++) skip[i]; next} !(FNR in skip)' file file
Why does the following literal string
1998-${year}
..match against the grep command:
grep "[0-9 ]*-[ 0-9]*" filename.txt ?
What I need is a regex to match any of the following strings containing either a year range or one value of year only.
sdkfmslf 1998-2008
asdassdadsa 1998 - 2008
mkklml mklsmdf 2006
..but NOT this one:
asdsad a s 1998-${year}
* means "match zero or more". You want + which means "one or more."
grep "[0-9 ]+-[0-9]+" filename.txt
Try [0-9]{4}(\s*-\s*[0-9]{4})?. This will match a 4 digit number, or if it is followed by (optional white space)-(optional whitespace) then that must be followed by another 4 digit number.
Your string "asdsad a s 1998-${year}" would still match, since it has a single 4 digit value in it.
I don't like answering my own question, but none of the above worked. Here is what I found by experimenting. I'm sure there could be more elegant solutions, but here is a working version:
grep "[0-9][0-9][0-9][0-9][ ]*[\-]*[ ]*[0-9]*" filename.txt
I have a file with many lines including a string like this: blah blah num=12345; blah blah
I would like to find lines where the number after the equals sign is greater than 1, with no upper limit. (I do not expect a number to ever start with zero.)
I started with this expression that will match any number starting with any digit that's not a 1, and it works fine and I understand it.
grep 'num=[2-9][0-9]*;'
This next expression should, I thought, return any number starting with a 1 that has two or more digits, but I instead get nothing back:
grep 'num=1[0-9]+;'
I though the above meant: must match num=1, then must match something between 0-9 one or more times, then must match ;. Where am I going wrong?
With grep you must escape the + quantifier
grep 'num=1[0-9]\+;'
For your problem you can use this (for all numbers >1, if i understand well):
grep 'num=\([2-9]\|1[0-9]\)[0-9]*;'
I have this very large dictionary file with 1 word on each line, and I would like to trim it down.
What I would like to do is leave 3-6 letter improper nouns, so it has to detect the words based on these:
if the word is less than 3 letters, delete it
if the word is more than 6 letters, delete it
if the word has a capital letter, delete it
if the word has a single quote or space, delete it.
I used this:
cat Downloads/en-US/en-US.dic | egrep '[a-z]{3,6}' > Downloads/3-6.txt
but the output is incorrect. It outputs the words with greater than 3 characters alright, but that's about my progress so far.
So how do I go about doing this in the mac terminal? There must be a way to do this right?
The following command will select only words that consist of exactly three to six lowercase a-z letters:
egrep '^[a-z]{3,6}$' /usr/share/dict/words > filtered.txt
Replace /usr/share/dict/words with your input file, and filtered.txt with a name for your output file. I just verified that this works on my Mac. Hope this helps!
Use grep and write a regex rule to match the lines you want to keep. You can get info on grep by typing man grep in the terminal.
Here's a sample string which I want do a regex on
101-nocola_conte_-_fuoco_fatuo_(koop_remix)
The first digit in "101" is the disc number and the next 2 digits are the track numbers. How do I match the track numbers and ignore the disc number (first digit)?
Something like
/^\d(\d\d)/
Would match one digit at the start of the string, then capture the following two digits
Do you mean that you don't mind what the disk number is, but you want to match, say, track number 01 ?
In perl you would match it like so: "^[0-9]01.*"
or more simply "^.01.*" - which means that you don't even mind if the first char is not a digit.
^\d(\d\d)
You may need \ in front of the ( depending on which environment you intend to run the regex into (like vi(1)).
Which programming language? For the shell something with egrep will do the job:
echo '101-nocola_conte_-_fuoco_fatuo_(koop_remix)' | egrep -o '^[0-9]{3}' | egrep -o '[0-9]{2}$'