regex to match lines with coordinates ending in zero - regex

given the following:
1803 1004 -4.2
1807 1005 3.3
1809 1006 -8.9
1800 1007 -3.7
1805 1008 9.1
1808 1009 -4.3
1800 1000 3.2
I'd like regex to match a line with the two first coordinates that are ending in zero, so we'd only return:
1800 1000 3.2
I only want lines that have both the first two digits ending in zero, and yes the lines will have large quantities of whitespace either at the start or between the digits.
I've tried various combinations of '\s*\d+0\z*\d+0*' and '\d+0\s\d+0*' with no result.
I'm using this in combination with grep.

I recommend option in grep: -E
$ grep -E '^ *([0-9]*0) +([0-9]*0) +.*$' dataFile
Result:
In action: https://regex101.com/r/h4on2q/1
Additional,
About -E: $ man grep
-E, --extended-regexp
Interpret pattern as an extended regular expression (i.e. force grep to behave as egrep).
Basic vs Extended Regular Expressions:
https://www.gnu.org/software/grep/manual/html_node/Basic-vs-Extended.html

Give this a try: ^\s*\d+0\s+\d+0\s+.*$
In action: https://regex101.com/r/t0hhDL/2
It's not clear from your question whether the data you're working with is all one big string, or these are multiple lines being returned. I assumed the latter with the answer above, but the pattern will need to be slightly different if that's not the case.

Related

Match lines containing even numbers with grep

I am doing a series of questions regarding grep and I have gotten stuck on trying to match lines containing even numbers in any way (so, it should match 'hello22 23', '8', '2222 2999 1', 'hello2hello9', etc.)
The problem is that, while I managed to match all of those cases, I cannot find a way to match cases in which the line either contains exclusively an even number or it's the last occurrence before an EOL ('22', 'hello8', anything that ends with a number which should match).
So far, this is what I'm using:
grep -P '((.)*[02468][^0-9](.)*)'
The above matches anything followed by an even number with no numbers whatsoever after it, followed by anything else.
I have tried playing with the '$' regex which should match it, with no effect. Could it be maybe that grep isn't detecting my EOLs properly?
I think I understand what you're after--you want to avoid lines that may contain even digits but the numbers they comprise are all not even. Examples include 3, 23, a23, 23a, 3a49. You want to match lines that have at least one even number: 2, 22, 32, a32, 32a, 45a5bb44, etc.
The pattern grep -P '[02468](?=\D|$)' ensures at least one even digit is present that's followed by EOL or a non-digit using a lookahead and should fit your requirements.
$ cat test.txt
3
23
a23
23a
3a49
2
22
32
a32
32a
45a5bb44
$ grep -P '[02468](?=\D|$)' test.txt
2
22
32
a32
32a
45a5bb44

How to use zgrep to display all words of a x size from a wordlist?

I want to display all the words from my wordlist who start with a w and are 9 letters long. Yesterday I learnt a bit more on how to use zgrep so I came with :
zgrep '\(^w\)\(^.........$\)' a.gz
But this doesn't work and I think it's because I don't know how to do a AND between the two conditions. I found that it should be (?=expr)(?=expr) but I can't figure out how to build my command then
So how can I build my command using the (?=expr) ?
for example if I have a wordlist like this:
Washington
Sausage
Walalalalalaaaa --> shouldn't match
Wwwwwwwww --> should match
You may use
zgrep '^w[[:alpha:]]\{8\}$' a.gz
The POSIX BRE pattern will match a string that
^w - starts with w
[[:alpha:]]\{8\} - then has eight letters
$ - followed with with the end of string marker.
Also, see the 9.3 Basic Regular Expressions.

Regular expression for finding tag numbers in a list of cats

I am trying to match the tag numbers in a list of cats:
Abyssinian 987
Burmese a1a
Dragon Li 2B
987 Cat
cat 987 Toyger
cat A1A Siamese
1
The tag numbers for the list of cats would be:
987
a1a
2B
987
987
A1A
1
I've tried using the regular expression:
\b[0-9a-zA-Z]{1,3}\b
The problem is that it will match "cat" and "Li" (in Dragon Li). It should only match the tag number.
The requirements for a tag number are:
1-3 characters, it must contain at least one integer (0-9)
It can appear at any place in the string
As a side note, I am using Postgres regular expressions, which I think use POSIX regular expressions. (http://www.postgresql.org/docs/9.3/static/functions-string.html)
This works in PostgreSQL:
SELECT substring(cat FROM '\m(?=\w{0,2}\d)\w{1,3}\M') AS tag
FROM cat;
\m and \M .. beginning and end of a word.
(?=\w{0,2}\d).. positive lookahead
\w{1,3} .. 1-3 word characters
Assuming there is a single match in every string, substring() (without the "global" switch 'g') is better for the job than regexp_matches(), which would return an array (even for a single match).
substring() is also a bit faster.
SQL Fiddle.
You can use this regex:
\b(?=\w*?\d)\w{1,3}\b
Online Demo
Test: Using grep -P:
grep -oP '\b(?=\w*?\d)\w{1,3}\b' file
987
a1a
2B
987
987
A1A
1

Matching only 5xx using regex

I want to find all numbers that are in between 500-599. I'm very new to regex, I came up with this :
5[0-9][0-9]+
This is working fine, matching 566,577,500. But it also matches 6578. Which I don't want.
Edit:
Here is my file contents:
asd 554
sad
sads
dsa
456
sa
d
dsa
asda
d500
521
519 asdasd
524 asdasdsdsadsdasd sadsadsadasdsd asdsa dsa dsadsad sad asdas dsa sad sad asds a 543
As many suggested I tried :
grep "^5[0-9]{2}$" test
which isn't finding any numbers at all!
How do I put a constraint on this?
If you want to match 5xx only on a line, and not when 5xx occurs as a part of x5xx,
^5\d{2}$
\d = Digit
^ = beginning of line
$ = end of line
EDIT:
Based on additional details in the question, you have a variable number of spaces at the beginning of the line, so, you want the following instead:
\s*5\d{2}\s
Matches spaces on either side of 5xx.
With grep the easiest way is to use -w to only match whole words:
grep --color=always -w "5[0-9][0-9]" test
Remove the + sign:
5[0-9][0-9]
This will match "5" succeeded by two numbers, and nothing else.
You have to describe a bit more accurately what you want to happen with e.g. 6578? If you want 578 in the output (because after "6" there is a sequence of characters matching your format 5xx) you can simply do
grep -o "5[0-9][0-9]"
Note that unlike other answers, the -o flag emits multiple numbers from a single line if needed.
If, on the other hand, you want to match words of format 5xx, you can add -w flag, too:
grep -o -w "5[0-9][0-9]"
For more complex rules for matching, you want to use -E flag instead and use possibly a much more complex regex.

Is there a regex to replace leading zeroes (except the last one) and colons in nn:nn:nn.nn in VIM?

In Vim, I have opened a file with basically the following structure:
3677137 00:01:47.04
666239 00:12:57.86
4346 00:00:01.77
418 00:00:00.82
6564 00:00:01.34
1800 00:00:23.93
29208 00:14:23.32
That is: a number, followed by a tab (could also whitespace, I don't believe it matters) followed by an expression that indicates some amont of elapsed time in HH:MM:SS.cs format. (cs standing for centoseconds).
Now, I'd like to replace leading zeroes and colons and have found the following regexp to do exactly this:
:%s/\s\#<=[0:]\+//
resulting in
3677137 1:47.04
666239 12:57.86
4346 1.77
418 .82
6564 1.34
1800 23.93
29208 14:23.32
This is not bad and I could probably live with that. However, if there were an easy regex to have at least one figure in front of the . I'd probably be even more happy. That is, if the fourth line read
418 0.82
instead of
418 .82
So, is there a regexp that does that?
I would suggest:
%s/\s\zs[0:]*\ze\d//
I tried it on your example and it seems to do what you want.
Not the most elegant, but
:%s/\s00:0\?0\?:\?0\?//g