Regular expressions in shell script

Regular expressions in shell script - regex

I am trying to search a line in a log file, based on the regular expression. When I use below command I am getting the proper output. Platform: Solaris, Shell: Bash
grep 14:[00-29]
O/P: Apr 02 14:07:35 [192.168.162.117.113.169]
But when I use the below command I am getting blank output
grep 14:[00-29]:[00-59].
Am I missing something?

[00-29] matches only the characters 0, 1, 2 and 9.
[00-59] matches only the characters 0, 1, 2, 3, 4, 5 and 9.
The [] construct creates a character class, not a numeric range.
You might want grep -E 14:[0-2][0-9].

Your regex is not doing what you think. What it is actually doing is saying get me 14:[ONE number that is 0, 0-2 or 9]:[ONE number that is 0, 0-5, or 9]. You should change it to 14:[0-9]|([0-2][0-9]):[0-9]|([0-5][0-9])

You need to use another regex. For example this makes it:
grep "14:[0-2][0-9]:[0-5][0-9]" file
^^^ ^^^ ^^^ ^^^
| | | any number
| | from 0 to 5
from 0 to 2 any number
See the output:
$ grep "14:[0-2][0-9]:[0-5][0-9]" file
Apr 02 14:07:35 [192.168.162.117.113.169]
The first one was matching but casually:
$ grep 14:[00-29] file
Apr 02 14:07:35 [192.168.162.117.113.169]
^^
| |
| 7 is not matched
[00-29] matches this 0

Both of these regex:
14:[00-29]
and
14:[00-29]:[00-59]
are incorrect and they aren't really matching values from 0 to 29 for example.
For range of 00 to 59 you can use:
\b(0[0-9]|[1-5][0-9])\b
And for the range of 0 to 29:
\b(0[0-9]|[12][0-9])\b

Related

Regular expression operator {} in linux bash

I'm having some problems with the {} operator. In the following examples, I'm trying to find the rows with 1, 2, and 2 or more occurrences of the word mint, but I get a response only if I search for 1 occurrence of mint, even though there are more than 1 per row.
The input I am processing is a listing like this obtainded with the ls -l command:
-rw-r--r-- 1 mint mint 26 Dec 20 21:11 example.txt
-rw-r--r-- 1 mint mint 26 Dec 20 21:11 another.example
-rw-r--r-- 1 mint mint 19 Dec 20 15:11 something.else
-rw-r--r-- 1 mint mint 1 Dec 20 01:23 filemint
-rw-r--r-- 1 mint mint 26 Dec 20 21:11 mint
With ls -l | grep -E 'mint{1}' I find all the rows above, and I expected to find nothing (should be all the rows with 1 occurrence of mint).
With ls -l | grep -E 'mint{2}' I find nothing, and I expected to find the first 3 rows above (should be all the rows with 2 occurrences of mint).
With ls -l | grep -E 'mint{2,}' I expected to find all the rows above, and again I found nothing (should be all the rows with at least 2 occurrences of mint).
Am I missing something on how {} works?

Firstly, a "quantifier" in a regular expression refers to the "token" immediately before it, which by default is a single character. So mint{2} is looking for the character t twice - it is equivalent to m{1}i{1}n{1}t{2}, or mintt.
To search for a sequence of characters a number of times, you need to group that sequence, using parentheses. So (mint){2} would search for the sequence mint twice in a row, as in mintmint.
Secondly, in your input, there are additional characters in between the occurrences of mint; the regular expression needs to specify that those are allowed.
The simplest way to do that is using the pattern .*, which means "anything, zero or more times". That gives you (mint.*){2} which will match "mint followed by anything, twice".
Finally, given the input "mint mint", the pattern (mint.*){1} will match - it doesn't care that some of the "extra" characters also spell "mint", it just knows that the required parts are there. In fact, {1} is always redundant, and (mint.*){1} matches exactly the same things that just mint matches. In general, regular expressions are good at asserting what is there, and not at asserting what is not there.
Some regular expression flavours have "lookahead assertions" which can process negative assertions like "not followed by mint", but grep -E does not. What it does have is a switch, -v, which inverts the whole command - it shows all lines except the ones matched by the regular expression. A simple approach to say "no more than 1 instance of mint" is therefore to run grep twice - once normally, and once with -v:
# At least once, but not twice -> exactly once
ls -l | grep -E 'mint' | grep -v -E '(mint.*){2}'
# At least twice, but not three times -> exactly twice
ls -l | grep -E '(mint.*){2}' | grep -v -E '(mint.*){3}'

Replace regex with captured group ONLY

I'm trying to understand why the following does not give me what I think (or want :)) should be returned:
sed -r 's/^(.*?)(Some text)?(.*)$/\2/' list_of_values
or Perl:
perl -lpe 's/^(.*?)(Some text)?(.*)$/$2/' list_of_values
So I want my result to be just the Some text, otherwise (meaning if there was nothing captured in $2) then it should just be EMPTY.
I did notice that with perl it does work if Some text is at the start of the line/string (which baffles me...). (Also noticed that removing ^ and $ has no effect)
Basically, I'm trying to get what grep would return with the --only-matching option as discussed here. Only I want/need to use sub/replace in the regex.
EDITED (added sample data)
Sample input:
$ cat -n list_of_values
1 Black
2 Blue
3 Brown
4 Dial Color
5 Fabric
6 Leather and Some text after that ....
7 Pearl Color
8 Stainless Steel
9 White
10 White Mother-of-Pearl Some text stuff
Desired output:
$ perl -ple '$_ = /(Some text)/ ? $1 : ""' list_of_values | cat -n
1
2
3
4
5
6 Some text
7
8
9
10 Some text

First of all, this shows how to duplicate grep -o using Perl.
You're asking why
foo Some text bar
012345678901234567
results in just a empty string instead of
Some text
Well,
At position 0, ^ matches 0 characters.
At position 0, (.*?) matches 0 characters.
At position 0, (Some text)? matches 0 characters.
At position 0, (.*) matches 17 characters.
At position 17, $ matches 0 characters.
Match succeeds.
You could use
s{^ .*? (?: (Some[ ]text) .* | $ )}{ $1 // "" }exs;
or
s{^ .*? (?: (Some[ ]text) .* | $ )}{$1}xs; # Warns if warnings are on.
Far simpler:
$_ = /(Some text)/ ? $1 : "";
I question your use of -p. Are you sure you want a line of output for each line of input? It seems to me you'd rather have
perl -nle'print $1 if /(Some text)/'

Removing Leading 0 and applying Regex to Sed

I have several file names, for ease I've put them in a file as follows:
01.action1.txt
04action2.txt
12.action6.txt
2.action3.txt
020.action9.txt
10action4.txt
15action7.txt
021action10.txt
11.action5.txt
18.action8.txt
As you can see the formats aren't consistent what I'm trying to do is extract the first numbers from these file names 1,4,12,2,20 etc
I have the following regex
(\.)?action\d{1,}.txt
Which is successfully matching .action[number].txt but I need to also match the leading 0 and apply it to my substitute with blank in sed so i'm only left with the leading numbers. I'm having trouble matching the leading 0 and applying the whole thing to sed.
Thanks

With GNU sed:
sed -r 's/0*([0-9]*).*/\1/' file
Output:
1
4
12
2
20
10
15
21
11
18
See: The Stack Overflow Regular Expressions FAQ

I don't know if the below awk is helpful but it works as well:
awk '{print $1 + 0}' file
1
4
12
2
20
10
15
21
11
18

Retrieving digits from multiple file names using regex

Given files:
aaabbcc.43.311b.file
ddeeff.x51.311b.file
ffg.1.311b.file
hh.ii.jj.x26.311b.file
ll.m.311.311b.file
How would I get the numbers within the file name but not 311b? So I would like to get 43, 51, 1, 26 and 311.

You can do it with grep:
grep -o '[0-9]\+\b' test.text

sed 's#[^0-9]\+\([0-9]\+\).*#\1#' INPUTFILE
Will give you the needed output for the exampled lines. It searches the input lines for the first group of digit characters, and prints only them.

% ls
aaabbcc.43.311b.file ddeeff.x51.311b.file ffg.1.311b.file hh.ii.jj.x26.311b.file ll.m.311.311b.file
% ls|grep -o -P '\d+(?=\.311b\.file)'
43
51
1
26
311

RegEx to extract numeric value immediately before a search character /string found

what would be the best regEx to extract all the number (only numbers) before a search string ?
ABC Y C S 1 $ 46CC MAN 25/ 31
Need to extract 25 in this case, but its not fixed length ? Any help ?

'\d+(?=/)'
should work. see test with grep:
kent$ echo "ABC Y C S 1 $ 46CC MAN 25/ 31 "|grep -Po '\d+(?=/)'
25

Perl regex:
while ($subject =~ m!\d+(?=.*/)!g) {
# matched text = $&
}
Output:
1
46
25
So basically keep matching, as long as a / exist somewhere later.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expressions in shell script - regex

[00-29] matches only the characters 0, 1, 2 and 9. [00-59] matches only the characters 0, 1, 2, 3, 4, 5 and 9. The [] construct creates a character class, not a numeric range. You might want grep -E 14:[0-2][0-9].

Your regex is not doing what you think. What it is actually doing is saying get me 14:[ONE number that is 0, 0-2 or 9]:[ONE number that is 0, 0-5, or 9]. You should change it to 14:[0-9]|([0-2][0-9]):[0-9]|([0-5][0-9])

Both of these regex: 14:[00-29] and 14:[00-29]:[00-59] are incorrect and they aren't really matching values from 0 to 29 for example. For range of 00 to 59 you can use: \b(0[0-9]|[1-5][0-9])\b And for the range of 0 to 29: \b(0[0-9]|[12][0-9])\b

Related

Regular expression operator {} in linux bash

Replace regex with captured group ONLY

Removing Leading 0 and applying Regex to Sed

Retrieving digits from multiple file names using regex

RegEx to extract numeric value immediately before a search character /string found

Categories

Resources