Print last match of a sed regex - regex

I have the following:
cat /tmp/cluster_concurrentnodedump.out.20140501.103855 | sed -n '/Starting inject/s/.*[Ii]nject \([0-9]*\).*/\1/p
Which gives a list of
0
1
2
..
How can I print only the last match with this sed?
Thanks.

Store the substitution results in the hold buffer then print it at the end:
sed -ne '
/Starting inject/ {
# do the substitution
s/.*[Ii]nject \([0-9]*\).*/\1/
# instead of printing, copy the results to the hold buffer
h
}
$ { # at the end of the file:
# copy the hold buffer back to the pattern buffer
x
# print the pattern buffer
p
}
' /tmp/cluster_concurrentnodedump.out.20140501.103855

Use tac to print the file in reverse (first line last) and exit after first match:
tac /tmp/cluster_concurrentnodedump.out.20140501.103855 | sed -n '/Starting inject/s/.*[Ii]nject \([0-9]*\).*/\1/p;q'
Last part is where we have ;q to quit:
sed -n '....p;q'
^
Example
Print last number:
$ cat a
1
2
3
4
5
6
7
8
9
$ tac a | sed -n 's/\([0-9]\)/\1/p;q'
9

Related

Awk if-statement to count the number of characters (wc -m) coming from a pipe

I tried to scratch my head around this issue and couldn't understand what it wrong about my one liner below.
Given that
echo "5" | wc -m
2
and that
echo "55" | wc -m
3
I tried to add a zero in front of all numbers below 9 with an awk if-statement as follow:
echo "5" | awk '{ if ( wc -m $0 -eq 2 ) print 0$1 ; else print $1 }'
05
which is "correct", however with 2 digits numbers I get the same zero in front.
echo "55" | awk '{ if ( wc -m $0 -eq 2 ) print 0$1 ; else print $1 }'
055
How come? I assumed this was going to return only 55 instead of 055. I now understand I'm constructing the if-statement wrong.
What is then the right way (if it ever exists one) to ask awk to evaluate if whatever comes from the | has 2 characters as one would do with wc -m?
I'm not interested in the optimal way to add leading zeros in the command line (there are enough duplicates of that).
Thanks!
I suggest to use printf:
printf "%02d\n" "$(echo 55 | wc -m)"
03
printf "%02d\n" "$(echo 123456789 | wc -m)"
10
Note: printf is available as a bash builtin. It mainly follows the conventions from the C function printf().. Check
help printf # For the bash builtin in particular
man 3 printf # For the C function
Facts:
In AWK strings or variables are concatenated just by placing them side by side.
For example: awk '{b="v" ; print "a" b}'
In AWK undefined variables are equal to an empty string or 0.
For example: awk '{print a "b", -a}'
In AWK non-zero strings are true inside if.
For example: awk '{ if ("a") print 1 }'
wc -m $0 -eq 2 is parsed as (i.e. - has more precedence then string concatenation):
wc -m $0 -eq 2
( wc - m ) ( $0 - eq ) 2
^ - integer value 2, converted to string "2"
^^ - undefined variable `eq`, converted to integer 0
^^ - input line, so string "5" converted to integer 5
^ - subtracts 5 - 0 = 5
^^^^^^^^^^^ - integer 5, converted to string "5"
^ - undefined variable "m", converted to integer 0
^^ - undefined variable "wc" converted to integer 0
^^^^^^^^^ - subtracts 0 - 0 = 0, converted to a string "0"
^^^^^^^^^^^^^^^^^^^^^^^^^ - string concatenation, results in string "052"
The result of wc -m $0 -eq 2 is string 052 (see awk '{ print wc -m $0 -eq 2 }' <<<'5'). Because the string is not empty, if is always true.
It should return only 55 instead of 055
No, it should not.
Am I constructing the if statement wrong?
No, the if statement has valid AWK syntax. Your expectations to how it works do not match how it really works.
To actually make it work (not that you would want to):
echo 5 | awk '
{
cmd = "echo " $1 " | wc -m"
cmd | getline len
if (len == 2)
print "0"$1
else
print $1
}'
But why when you can use this instead:
echo 5 | awk 'length($1) == 1 { $1 = "0"$1 } 1'
Or even simpler with the various printf solutions seen in the other answers.

SED - reverse matched pattern

I have the following task:
match all lines which end with a number and then reverse these numbers
example:
romarana:qwerty12543
asdewfpwk:asdqwe312
asdj:asbd
asdewfpwk:strwtwe129
ooasodo:asbdjahj
should be:
romarana:qwerty34521
asdj:asbd
asdewfpwk:asdqwe213
asdewfpwk:strwtwe921
ooasodo:asbdjahj
What I tried:
sed -r "/[0-9]$/s/[0-9]{1,}$/$(rev <<< &)/" test.txt
NOTE: you can ignore lines that don't end with the number for now.
NOTE: You can use awk,grep or any other tool
With perl
$ perl -pe 's/\d+$/reverse $&/e' ip.txt
romarana:qwerty34521
asdewfpwk:asdqwe213
asdj:asbd
asdewfpwk:strwtwe921
ooasodo:asbdjahj
The e modifier allows to use Perl code in replacement section. $& contains the matched portion.
This can also be done with sed alone, by inserting a separator character (let's take the number sign) before the number and then repeatedly moving the line's last digit before the separator:
sed 's/\([0-9]*\)$/#\1/;:b;s/#\([0-9]*\)\([0-9]\)$/\2#\1/;tb;s/#$//'
You can do this with an awk command, as in the following bash script:
#!/usr/bin/env sh
( echo romarana:qwerty12543
echo asdewfpwk:asdqwe312
echo asdj:asbd
echo asdewfpwk:strwtwe129
echo ooasodo:asbdjahj ) | awk '
/[0-9]+$/ { # Lines ending in digits.
num = txt = $0 # Divide into text and num.
gsub("[0-9]+$", "", txt)
num = substr(num, length(txt)+1)
revnum = "" # Build reversed number bit.
while (num != "") {
revnum = substr(num, 1, 1)""revnum
num = substr(num, 2)
}
print txt""revnum" (from "$0")" # Output text, reversed num.
next
}
{ print } # Not digits at end.
'
It's pretty verbose, and could probably be reduced, but it does the job (you can get rid of the from output, that's just there so you can see it's working):
pax:~> ./testprog.sh
romarana:qwerty34521 (from romarana:qwerty12543)
asdewfpwk:asdqwe213 (from asdewfpwk:asdqwe312)
asdj:asbd
asdewfpwk:strwtwe921 (from asdewfpwk:strwtwe129)
ooasodo:asbdjahj
With GNU awk could you please try following.
awk '
match($0,/[0-9]+$/,a){
num=split(a[0],arr,"")
for(i=num;i>0;i--){
val=val arr[i]
}
print substr($0,1,RSTART-1) val
val=""
next
}
1
' Input_file
Output will be as follows.
romarana:qwerty34521
asdewfpwk:asdqwe213
asdj:asbd
asdewfpwk:strwtwe921
ooasodo:asbdjahj
With GNU awk for the 3rd arg to match() and null FS splitting $0 into chars:
$ awk -v FS= 'match($0,/(.*[^0-9])([0-9]+)$/,a) {
$0=a[2]; for (i=NF;i>=1;i--) a[1]=a[1] $i; $0=a[1]
} 1' file
romarana:qwerty34521
asdewfpwk:asdqwe213
asdj:asbd
asdewfpwk:strwtwe921
ooasodo:asbdjahj

How can I group unknown (but repeated) words to create an index?

I have to create a shellscript that indexes a book (text file) by taking any words that are encapsulated in angled brackets (<>) and making an index file out of that. I have two questions that hopefully you can help me with!
The first is how to identify the words in the text that are encapsulated within angled brackets.
I found a similar question that was asked but required words inside of square brackets and tried to manipulate their code but am getting an error.
grep -on \\<.*> index.txt
The original code was the same but with square brackets instead of the angled brackets and now I am receiving an error saying:
line 5: .*: ambiguous redirect
This has been answered
I also now need to take my index and reformat it like so, from:
1:big
3:big
9:big
2:but
4:sun
6:sun
7:sun
8:sun
Into:
big: 1 3 9
but: 2
sun: 4 6 7 8
I know that I can flip the columns with an awk command like:
awk -F':' 'BEGIN{OFS=":";} {print $2,$1;}' index.txt
But am not sure how to group the same words into a single line.
Thanks!
Could you please try following(if you are not worried about sorting order, in case you need to sort it then append sort to following code).
awk '
BEGIN{
FS=":"
}
{
name[$2]=($2 in name?name[$2] OFS:"")$1
}
END{
for(key in name){
print key": "name[key]
}
}
' Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=":" ##Setting field separator as : here.
}
{
name[$2]=($2 in name?name[$2] OFS:"")$1 ##Creating array named name with index of $2 and value of $1 which is keep appending to its same index value.
}
END{ ##Starting END block of this code here.
for(key in name){ ##Traversing through name array here.
print key": "name[key] ##Printing key colon and array name value with index key
}
}
' Input_file ##Mentioning Input_file name here.
If you want to extract multiple occurrences of substrings in between angle brackets with GNU grep, you may consider a PCRE regex based solution like
grep -oPn '<\K[^<>]+(?=>)' index.txt
The PCRE engine is enabled with the -P option and the pattern matches:
< - an open angle bracket
\K - a match reset operator that discards all text matched so far
[^<>]+ - 1 or more (due to the + quantifier) occurrences of any char but < and > (see the [^<>] bracket expression)
(?=>) - a positive lookahead that requires (but does not consume) a > char immediately to the right of the current location.
Something like this might be what you need, it outputs the paragraph number, line number within the paragraph, and character position within the line for every occurrence of each target word:
$ cat book.txt
Wee, <sleeket>, cowran, tim’rous beastie,
O, what a panic’s in <thy> breastie!
Thou need na start <awa> sae hasty,
Wi’ bickerin brattle!
I wad be laith to rin an’ chase <thee>
Wi’ murd’ring pattle!
I’m <truly> sorry Man’s dominion
Has broken Nature’s social union,
An’ justifies that ill opinion,
Which makes <thee> startle,
At me, <thy> poor, earth-born companion,
An’ fellow-mortal!
.
$ cat tst.awk
BEGIN { RS=""; FS="\n"; OFS="\t" }
{
for (lineNr=1; lineNr<=NF; lineNr++) {
line = $lineNr
idx = 1
while ( match( substr(line,idx), /<[^<>]+>/ ) ) {
word = substr(line,idx+RSTART,RLENGTH-2)
locs[word] = (word in locs ? locs[word] OFS : "") NR ":" lineNr ":" idx + RSTART
idx += (RSTART + RLENGTH)
}
}
}
END {
for (word in locs) {
print word, locs[word]
}
}
.
$ awk -f tst.awk book.txt | sort
awa 1:3:21
sleeket 1:1:7
thee 1:5:34 2:4:24
thy 1:2:23 2:5:9
truly 2:1:6
Sample input courtesy of Rabbie Burns
GNU datamash is a handy tool for working on groups of columnar data (Plus some sed to massage its output into the right format):
$ grep -oPn '<\K[^<>]+(?=>)' index.txt | datamash -st: -g2 collapse 1 | sed 's/:/: /; s/,/ /g'
big: 1 3 9
but: 2
sun: 4 6 7 8
To transform
index.txt
1:big
3:big
9:big
2:but
4:sun
6:sun
7:sun
8:sun
into:
big: 1 3 9
but: 2
sun: 4 6 7 8
you can try this AWK program:
awk -F: '{ if (entries[$2]) {entries[$2] = entries[$2] " " $1} else {entries[$2] = $2 ": " $1} }
END { for (entry in entries) print entries[entry] }' index.txt | sort
Shorter version of the same suggested by RavinderSingh13:
awk -F: '{
{ entries[$2] = ($2 in entries ? entries[$2] " " $1 : $2 ": " $1 }
END { for (entry in entries) print entries[entry] }' index.txt | sort

What is the regular expression for a total 10 digit number with a decimal precision of 1 or 2?

I am trying a regex that satisfy the following for a total 10 digit number.
Tried this so far :
^(\d){0,8}(\.){0,1}(\d){0,2}$
It works fine but fails if I give the following :
123456789.0
Valid example:
1234567890 (total 10 digits)
1234567.1 (total 8 digits)
12345678.10 (total 10 digits)
123456789.1 (total 10 digits)
Invalid example :
12345678901 (11 characters)
Here is a way to go:
^(?:\d{1,10}|(?=\d+\.\d\d?$)[\d.]{3,11})$
Explanation:
^ : begining of string
(?: : start non capture group
\d{1,10} : 1 upto 10 digits
| : OR
(?= : start look ahead
\d+\.\d\d?$ : 1 or more digits then a dot then 1 or 2 digits
) : end lookahead
[\d.]{3,11} : only digit or dot are allowed, with a length from 3 upto 11
) : end group
$ : end of string
In action:
#!/usr/bin/perl
use Modern::Perl;
my $re = qr~^(?:\d{1,10}|(?=\d+\.\d\d?$)[\d.]{3,11})$~;
while(<DATA>) {
chomp;
say (/$re/ ? "OK: $_" : "KO: $_");
}
__DATA__
1
123
1.2
1234567890
1234567.1
12345678.10
123456789.1
12345678901
1.2.3
Output:
OK: 1
OK: 123
OK: 1.2
OK: 1234567890
OK: 1234567.1
OK: 12345678.10
OK: 123456789.1
KO: 12345678901
KO: 1.2.3
The solution using String.prototype.match() and RegExp.prototype.text() functions:
var isValid = function (num) {
return /^\d+(\.\d+)?$/.test(num) && String(num).match(/\d/g).length <= 10;
};
console.log(isValid(1234567890));
console.log(isValid(12345678.10));
console.log(isValid(12345678901));
console.log(isValid('123d3457'));
you can break your pattern in 3 step:
First step
You need at least 8 digit + 1 or 2 precision that both are optional
\d{8}\.?\d?\d? Here . and both digit are optional
Second step
You need at least 9 digit + 1 precision and that's it
\d{9}\.?\d? Here . and digit are optional
Then you can mix these three rule together with or | keyword
^(\d{8}\.?\d?\d?|\d{9}\.?\d?)$
Okay now this regex only matches 7 to 10 digit with 1 or 2 precision
It never matches less than 8 digit and a tricky part is here that you can change second step \d{8} with \d{1,8} and then It match from 1 to 9999999999 and plus 1 or 2 precision.
what you want:
^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$
echo 1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1
echo 9999999999 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
9999999999
echo 1.1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1.1
echo 1.12 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1.12
echo 1234567.1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1234567.1
echo 1234567.12 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
1234567.12
echo 99999999.9 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
99999999.9
echo 99999999.99 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
99999999.99
not match
echo 1.111 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 1234567.111 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 123456781.11 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 1234567891.1 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'
echo 123456789101 | perl -lne '/^(\d{1,8}\.?\d?\d?|\d{9}\.?\d?)$/ && print $&'

sed print from match until other match NOT inclusive

I want to print all lines from a match up to a second match, not including that second match.
What I have so far does everything and does too much, in that it prints the second match as well.
Specifically, let's say I want to print everything starting on a line containing 'test', up to, but not including, the first line starting with a number or an open bracket '['.
This goes some way, but not all the way:
sed -n '/test/,/^[0-9]\|^\[/p' file
It is much easier to do this via awk:
awk '/test/{p=1} /^([0-9]|\[)/{p=0} p' file
Using awk:
awk 'p && /^[0-9]|^\[/ { exit }; /test/{ p = 1 } p' file
Example:
$ cat temp.txt
4
1
2
3
4
5
$ awk 'p && /4/ { exit }; /2|1/{ p = 1 } p' temp.txt
1
2
3
Notice how it skipped 4 when /2|1/ wasn't found yet.
sed -n '/test/,/^[0-9[]/ {
/test/ {
h;b
}
x;p
$ {
x
/^[^0-9[]/ p
}
}' YourFile
should work but not elegant