Sed Regular expression that get data between rx: and [space] - regex

I have this expression that take everything
local RX=`sed -e 's#.*rx:\(\)#\1#' <<< "${LINE}"`
The variable LINE has as content:
4: uart:PL011 rev3 mmio:0xC006D000 irq:26 tx:435 rx:0 RTS|DTR
What I want is returning rx value, in this case, 0
Right now, it is returing everything after "rx:"
How should I do?

You may extract all digits after rx: using
RX=`sed -e 's#.*rx:\([0-9]*\).*#\1#' <<< "${LINE}"`
See the online demo
I added [0-9]* between \(\) to match 0 or more digits and also a .* pattern at the end of the regex to consume the rest of the line, so that in the output, you could have just the value captured in Group 1.
To match any chars other than whitespace replace [0-9] with [^[:space:]] or [^[:blank:]], or even [^ ].

A better approach:
$ awk -v tag='rx' '{for (i=1;i<=NF;i++){ split($i,t,/:/); f[t[1]]=t[2] } print f[tag]}' <<<"$line"
0
$ awk -v tag='mmio' '{for (i=1;i<=NF;i++){ split($i,t,/:/); f[t[1]]=t[2] } print f[tag]}' <<<"$line"
0xC006D000
$ awk -v tag='uart' '{for (i=1;i<=NF;i++){ split($i,t,/:/); f[t[1]]=t[2] } print f[tag]}' <<<"$line"
PL011
Given that you can simply print individual or as many values as you like.

You can change to this:
local RX=`sed -e 's#.*rx:\([^ ]*\).*#\1#' <<< "${LINE}"`
But in this case, if you can use GNU grep then it's better:
local RX=`grep -oP 'rx:\K[^ ]*' <<< "${LINE}"`
They're to capture things after rx: and before the space.

Could you please try following too.
awk '
match($0,/rx[^ ]*/){
val=substr($0,RSTART,RLENGTH)
sub(/.*:/,"",val)
print val
val=""
}
' Input_file

Related

Grep value between strings with regex

$ acpi
Battery 0: Charging, 18%, 01:37:09 until charged
How to grep the battery level value without percentage character (18)?
This should do it but I'm getting an empty result:
acpi | grep -e '(?<=, )(.*)(?=%)'
Your regex is correct but will work with experimental -P or perl mode regex option in gnu grep. You will also need -o to show only matching text.
Correct command would be:
grep -oP '(?<=, )\d+(?=%)'
However, if you don't have gnu grep then you can also use sed like this:
sed -nE 's/.*, ([0-9]+)%.*/\1/p' file
18
Could you please try following, written and tested in link https://ideone.com/nzSGKs
your_command | awk 'match($0,/Charging, [0-9]+%/){print substr($0,RSTART+10,RLENGTH-11)}'
Explanation: Adding detailed explanation for above only for explanation purposes.
your_command | ##Running OP command and passing its output to awk as standrd input here.
awk ' ##Starting awk program from here.
match($0,/Charging, [0-9]+%/){ ##Using match function to match regex Charging, [0-9]+% in line here.
print substr($0,RSTART+10,RLENGTH-11) ##Printing sub string and printing from 11th character from starting and leaving last 11 chars here in matched regex of current line.
}'
Using awk:
awk -F"," '{print $2+0}'
Using GNU sed:
sed -rn 's/.*\, *([0-9]+)\%\,.*/\1/p'
You can use sed:
$ acpi | sed -nE 's/.*Charging, ([[:digit:]]*)%.*/\1/p'
18
Or, if Charging is not always in the string, you can look for the ,:
$ acpi | sed -nE 's/[^,]*, ([[:digit:]]*)%.*/\1/p'
Using bash:
s='Battery 0: Charging, 18%, 01:37:09 until charged'
res="${s#*, }"
res="${res%%%*}"
echo "$res"
Result: 18.
res="${s#*, }" removes text from the beginning to the first comma+space and "${res%%%*}" removes all text from end till (and including) the last occurrence of %.

Extract Number from Constant Output in Bash

I have a script that producing this kind of output stream in infinite loop:
m 17:24:34|ethminer Speed 377.61 Mh/s gpu/0 29.01 gpu/1 29.91 gpu/2 30.21 gpu/3 28.71 gpu/4 28.11 gpu/5 27.96 gpu/6 28.71 gpu/7 29.01 gpu/8 28.48 gpu/9 28.86 gpu/10 29.91 gpu/11 29.08 gpu/12 29.68 [A1484+5:R0+0:F0] Time: 04:19
I want to extract the integer after "Speed", which is 377 in this case. So far I have, suppose the string is named string:
$string | grep -oP '(?<=Speed).*'
I got
377.61 Mh/s gpu/0 29.01 gpu/1 29.91 gpu/2 30.21 gpu/3 28.71 gpu/4 28.11 gpu/5 27.96 gpu/6 28.71 gpu/7 29.01 gpu/8 28.48 gpu/9 28.86 gpu/10 29.91
I want to get rid of the trailing string by executing:
$string | grep -oP '(?<=Speed).*' | grep -o -E '[1-9][0-9][0-9]*'
but that regular expression is wrong, it doesn't come out with anything. How can I fix this?
regards
You may use
grep -Po 'Speed\s*\K\d+'
Or, to also get the fractional part if it is necessary
grep -Po 'Speed\s*\K\d+(\.\d+)?'
See the online demo
Details
Speed - a literal substring
\s* - 0+ whitespaces
\K - a match reset operator (discarding all text matched so far from the match value)
\d+ - 1+ digits
(\.\d+)? - an optional sequence of a . and 1+ digits
If the output it always like that (i.e. not extra lines in between), a simple cut -d' ' -f6 will do the job.
awk 'match($0,"Speed [0-9]+.?[0-9]*"){print substr($0,RSTART+6,RLENGTH-6)}'
sed '/Speed/s/.*Speed \([^ ]*\).*/\1/'
and if each line is always the same way formatted, you can do:
awk '{print $6}' file
This means, that every line always has the word speed in column 5 and you want to print column 6.
Could you please try following. Considering that your Input_file is same as shown samples.
awk '{sub(/.*Speed /,"");sub(/ .*/,"")} 1' Input_file
In case you want to save output into Input_file itself then try following.
awk '{sub(/.*Speed /,"");sub(/ .*/,"")} 1' Input_file > temp_file && mv temp_file Input_file
Explanation: Adding explanation too here.
awk ' ##awk script starts from here.
{
sub(/.*Speed /,"") ##Using sub for substitution operation which will substitute from starting of line to till Speed string with NULL fir current line.
sub(/ .*/,"") ##Using sub for substitution of everything starting from space to till end in current line with NULL.
}
1 ##Mentioning 1 will print edited/non-edited lines in Input_file.
' Input_file ##Mentioning Input_file name here.
sed works too.
$: echo $string | sed -En '/ Speed /{ s/.* Speed ([0-9]+).*/\1/; p; }'
377

Find all text between $...$ delimiters using bash script

I have a text file, and I'm trying to get an array of strings containing between $..$ delimiters (LaTeX formulas) using bash script. My current code doesn't work, result is empty:
#!/bin/bash
array=($(grep -o '\$([^\$]*)\$' test.txt))
echo ${array[#]}
I tested this regex here, it finds the matches. I use the following test string:
b5f1e7$bfc2439c621353$d1ce0$629f$b8b5
Expected result is
bfc2439c621353 629f
But echo returns empty. Although if I use '[0-9]\+' it works:
5 1 7 2439 621353 1 0 629 8 5
What do I do wrong?
How about:
grep -o '\$[^$]*\$' test.txt | tr -d '$'
This is basically performing your original grep (but without the brackets, which were causing it to not match), then removing the first/last characters from each match.
You may use awk with input field separator as $:
s='b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
awk -F '$' '{for (i=2; i<=NF; i+=2) print $i}' <<< "$s"
Note that this awk command doesn't validate input. If you want awk to allow for only valid inputs then you may use this gnu awk command with FPAT:
awk -v FPAT='\\$[^$]*\\$' '{for (i=1; i<=NF; i++) {gsub(/\$/, "", $i); print $i}}' <<< "$s"
bfc2439c621353
629f
What about this?
grep -Eo '\$[^$]+\$' a.txt | sed 's/\$//g'
I'm using sed to replace the $.
Try escaping your braces:
tst> grep -o '\$\([^\$]*\)\$' test.txt
$bfc2439c621353$
$629f$
of course, you then have to strip out the $ signs (-o prints the entire match). You can try sed instead:
tst> sed 's/[^\$]*\$\([^\$]*\)\$[^\$]*/\1\n/g' test.txt
bfc2439c621353
629f
Why is your expected output given b5f1e7$bfc2439c621353$d1ce0$629f$b8b5 the two elements bfc2439c621353 629f rather than the three elements bfc2439c621353 d1ce0 629f?
Here's a single grep command to extract those:
$ grep -Po '\$\K[^\$]*(?=\$)' <<<'b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
bfc2439c621353
d1ce0
629f
(This requires GNU grep as compiled with libpcre for -P)
This uses \$\K (equivalent to (?<=\$)to look behind at the first $ and (?=\$) to look ahead to the next $. Since these are lookarounds, they are not absorbed by grep in the process and therefore d1ce0 is available to be found.
Here's a single POSIX sed command to extract those:
$ sed 's/^[^$]*\$//; s/\$[^$]*$//; s/\$/\n/g' \
<<<'b5f1e7$bfc2439c621353$d1ce0$629f$b8b5'
bfc2439c621353
d1ce0
629f
This does not use any GNU notation and should work on any POSIX-compatible system (such as OS X). It removes the leading and trailing portions that aren't wanted, then replaces each $ with a newline.
Using bash regex:
var="b5f1e7\$bfc2439c621353\$d1ce0\$629f\$b8b5" # string to var
while [[ $var =~ ([^$]*\$)([^$]*)\$(.*) ]] # matching
do
echo -n "${BASH_REMATCH[2]} " # 2nd element has the match
var="${BASH_REMATCH[3]}" # 3rd is the rest of the string
done
echo # trailing newline
bfc2439c621353 629f

How to print matched regex pattern using awk?

Using awk, I need to find a word in a file that matches a regex pattern.
I only want to print the word matched with the pattern.
So if in the line, I have:
xxx yyy zzz
And pattern:
/yyy/
I want to only get:
yyy
EDIT:
thanks to kurumi i managed to write something like this:
awk '{
for(i=1; i<=NF; i++) {
tmp=match($i, /[0-9]..?.?[^A-Za-z0-9]/)
if(tmp) {
print $i
}
}
}' $1
and this is what i needed :) thanks a lot!
This is the very basic
awk '/pattern/{ print $0 }' file
ask awk to search for pattern using //, then print out the line, which by default is called a record, denoted by $0. At least read up the documentation.
If you only want to get print out the matched word.
awk '{for(i=1;i<=NF;i++){ if($i=="yyy"){print $i} } }' file
It sounds like you are trying to emulate GNU's grep -o behaviour. This will do that providing you only want the first match on each line:
awk 'match($0, /regex/) {
print substr($0, RSTART, RLENGTH)
}
' file
Here's an example, using GNU's awk implementation (gawk):
awk 'match($0, /a.t/) {
print substr($0, RSTART, RLENGTH)
}
' /usr/share/dict/words | head
act
act
act
act
aft
ant
apt
art
art
art
Read about match, substr, RSTART and RLENGTH in the awk manual.
After that you may wish to extend this to deal with multiple matches on the same line.
gawk can get the matching part of every line using this as action:
{ if (match($0,/your regexp/,m)) print m[0] }
match(string, regexp [, array])
If array is present, it is cleared,
and then the zeroth element of array is set to the entire portion of
string matched by regexp. If regexp contains parentheses, the
integer-indexed elements of array are set to contain the portion of
string matching the corresponding parenthesized subexpression.
http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions
If Perl is an option, you can try this:
perl -lne 'print $1 if /(regex)/' file
To implement case-insensitive matching, add the i modifier
perl -lne 'print $1 if /(regex)/i' file
To print everything AFTER the match:
perl -lne 'if ($found){print} else{if (/regex(.*)/){print $1; $found++}}' textfile
To print the match and everything after the match:
perl -lne 'if ($found){print} else{if (/(regex.*)/){print $1; $found++}}' textfile
If you are only interested in the last line of input and you expect to find only one match (for example a part of the summary line of a shell command), you can also try this very compact code, adopted from How to print regexp matches using `awk`?:
$ echo "xxx yyy zzz" | awk '{match($0,"yyy",a)}END{print a[0]}'
yyy
Or the more complex version with a partial result:
$ echo "xxx=a yyy=b zzz=c" | awk '{match($0,"yyy=([^ ]+)",a)}END{print a[1]}'
b
Warning: the awk match() function with three arguments only exists in gawk, not in mawk
Here is another nice solution using a lookbehind regex in grep instead of awk. This solution has lower requirements to your installation:
$ echo "xxx=a yyy=b zzz=c" | grep -Po '(?<=yyy=)[^ ]+'
b
Off topic, this can be done using the grep also, just posting it here in case if anyone is looking for grep solution
echo 'xxx yyy zzze ' | grep -oE 'yyy'
Using sed can also be elegant in this situation. Example (replace line with matched group "yyy" from line):
$ cat testfile
xxx yyy zzz
yyy xxx zzz
$ cat testfile | sed -r 's#^.*(yyy).*$#\1#g'
yyy
yyy
Relevant manual page: https://www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions
If you know what column the text/pattern you're looking for (e.g. "yyy") is in, you can just check that specific column to see if it matches, and print it.
For example, given a file with the following contents, (called asdf.txt)
xxx yyy zzz
to only print the second column if it matches the pattern "yyy", you could do something like this:
awk '$2 ~ /yyy/ {print $2}' asdf.txt
Note that this will also match basically any line where the second column has a "yyy" in it, like these:
xxx yyyz zzz
xxx zyyyz
echo "abc123def" | awk '
function MATCH(haystack, needle, ltrim, rtrim)
{
if(ltrim == 0 && !length(ltrim))
ltrim = 0;
if(rtrim == 0 && !length(rtrim))
rtrim = 0;
return substr(haystack, match(haystack, needle) + ltrim, RLENGTH - ltrim - rtrim);
}
{
print $0 " - " MATCH($0, "123"); # 123
print $0 " - " MATCH($0, "[0-9]*d", 0, 1); # 123
print $0 " - " MATCH($0, "1234"); # Nothing printed
}'

AWK: Access captured group from line pattern

If I have an awk command
pattern { ... }
and pattern uses a capturing group, how can I access the string so captured in the block?
With gawk, you can use the match function to capture parenthesized groups.
gawk 'match($0, pattern, ary) {print ary[1]}'
example:
echo "abcdef" | gawk 'match($0, /b(.*)e/, a) {print a[1]}'
outputs cd.
Note the specific use of gawk which implements the feature in question.
For a portable alternative you can achieve similar results with match() and substr.
example:
echo "abcdef" | awk 'match($0, /b[^e]*/) {print substr($0, RSTART+1, RLENGTH-1)}'
outputs cd.
That was a stroll down memory lane...
I replaced awk by perl a long time ago.
Apparently the AWK regular expression engine does not capture its groups.
you might consider using something like :
perl -n -e'/test(\d+)/ && print $1'
the -n flag causes perl to loop over every line like awk does.
This is something I need all the time so I created a bash function for it. It's based on glenn jackman's answer.
Definition
Add this to your .bash_profile etc.
function regex { gawk 'match($0,/'$1'/, ary) {print ary['${2:-'0'}']}'; }
Usage
Capture regex for each line in file
$ cat filename | regex '.*'
Capture 1st regex capture group for each line in file
$ cat filename | regex '(.*)' 1
You can use GNU awk:
$ cat hta
RewriteCond %{HTTP_HOST} !^www\.mysite\.net$
RewriteRule (.*) http://www.mysite.net/$1 [R=301,L]
$ gawk 'match($0, /.*(http.*?)\$/, m) { print m[1]; }' < hta
http://www.mysite.net/
NOTE: the use of gensub is not POSIX compliant
You can simulate capturing in vanilla awk too, without extensions. Its not intuitive though:
step 1. use gensub to surround matches with some character that doesnt appear in your string.
step 2. Use split against the character.
step 3. Every other element in the splitted array is your capture group.
$ echo 'ab cb ad' | awk '{ split(gensub(/a./,SUBSEP"&"SUBSEP,"g",$0),cap,SUBSEP); print cap[2]"|" cap[4] ; }'
ab|ad
I struggled a bit with coming up with a bash function that wraps Peter Tillemans' answer but here's what I came up with:
function regex
{
perl -n -e "/$1/ && printf \"%s\n\", "'$1'
}
I found this worked better than opsb's awk-based bash function for the following regular expression argument, because I do not want the "ms" to be printed.
'([0-9]*)ms$'
i think gawk match()-to-array is only for first instance of the capture group.
if there are multiple things you'd like to capture, and perform any complex operations upon them, perhaps
gawk 'BEGIN { S = SUBSEP
} {
nx=split(gensub(/(..(..)..(..))/,
"\\1"(S)"\\2"(S)"\\3", "g", str),
arr, S)
for(x in nx) { perform-ops-over arr[x] } }'
This way you aren't constrained by either gensub(), which limits the complexity if your modifications, or by match().
by pure trial-and-error, one caveat i've noted about gawk in unicode mode : for a valid unicode string 뀇꿬 with the 6 octal codes listed below :
Scenario 1 : matching individual bytes are fine, but will also report you the multi-byte RSTART of 1 instead of a byte-level answer of 2. It also won't provide info on whether \207 is the 1st continuation byte, or the second one, since RLENGTH will always be 1 here.
$ gawk 'BEGIN{ print match("\353\200\207\352\277\254", "\207") }'
$ 1
Scenario 2 : Match also works against unicode-invalid patterns like this
$ gawk 'BEGIN{ match("\353\200\207\352\277\254", "\207\352");
$ print RSTART, RLENGTH }'
$ 1 2
Scenario 3 : you can check for existence of a pattern against a unicode-illegal string (\300 \xC0 is UTF8-invalid for all possible byte pairings)
$ gawk 'BEGIN{ print ("\300\353\200\207\352\277\254" ~ /\200/) }'
$ 1
Scenarios 4/5/6 : the error message will show up for either (a) match() with unicode-invalid string, index() for either argument to be unicode-invalid/incomplete.
$ gawk 'BEGIN{ match("\300\353\200\207\352\277\254", "\207\352"); print RSTART, RLENGTH }' gawk: cmd. line:1: warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale. 2 2
$ gawk 'BEGIN{ print index("\353\200\207\352\277\254", "\352") }' gawk: cmd. line:1: warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale. 0
$ gawk 'BEGIN{ print index("\353\200\207\352\277\254", "\200") }' gawk: cmd. line:1: warning: Invalid multibyte data detected. There may be a mismatch between your data and your locale. 0