Why isn't Mac sed isn't matching what I expect? - regex

echo 'iPhone 12 Pro Max (5EF5105C-7EED-4017-979C-A6185E927B84) (Booted)' | sed -En 's,(\w+-\w+-\w+-\w+-\w+),\1,p'
Because I'm using extended regex -E (-r in GNU sed) and -n for print only matched/replaced. Assuming my regex101 is correct,
expecting 5EF5105C-7EED-4017-979C-A6185E927B84 in the output, but getting empty.

If you're just trying to get the serial number out from inside the parens, and you're not actually modifying anything, then use grep
$ echo 'iPhone 12 Pro Max (5EF5105C-7EED-4017-979C-A6185E927B84) (Booted)' \
| grep -E '\w+-\w+-\w+-\w+-\w+' -o
5EF5105C-7EED-4017-979C-A6185E927B84
-o tells grep "Just output what matched, not the entire line".

Related

Match only first occurrence of digit

After few hours of disappointed searching I can't figure this out.
I am piping to grep input, what I want to get is first occurrence of any digit.
Example:
nmcli --version
nmcli tool, version 1.1.93
Pipe to grep with regex
nmcli --version |grep -o '[[:digit:]]'
Output:
1
1
9
3
What I want:
1
Yeah there is a way to do that with another pipe, but is there "pure" single regex to do that?
With GNU grep:
nmcli --version | grep -Po ' \K[[:digit:]]'
Output:
1
See: Support of \K in regex
Although you want to avoid another process, it seems simplest just to add a head to your existing command...
grep -o [[:digit:]] | head -n1
echo "nmcli tool, version 1.1.93" |sed "s/[^0-9]//g" |cut -c1
1
echo "nmcli tool, version 1.1.93" |grep -o '[0-9]' |head -1
1
This can be seen as a stream editing task: reduce that one line to the first digit. Basic regex register-based referencing achieves the task:
$ echo "junk 1.2.3.4" | sed -e 's/.* \([0-9]\).*/\1/'
1
Traditionally, Grep is best for searching for files and lines which match a pattern. This is why the grep solution requires the use of Perl regex; Perl regex has features that, in combination with -o, allow grep to escape "out of the box" and be used in ways it wasn't really intended: match X, but then output a substring of X. The solution is terse, but not portable to grep implementations that don't have PCRE.
Use [0-9] to match ASCII digits, by the way. The purpose of [[:digit:]] is to bring in locale-specific behavior: to be able to match digits other than just the ASCII 0x30 through 0x39.
It's fairly safe to say that nmcli isn't going to put outs its --version using, say, Devangari numerals, like १.२.३.४.
You could use standard awk instead:
nmcli --version | awk 'match($0, /[[:digit:]]/) {print substr($0, RSTART, RLENGTH); exit}'
For example:
$ seq 11111 33333 | awk 'match($0, /[[:digit:]]/) {print substr($0, RSTART, RLENGTH); exit}'
1

Replace string with another string based on backreference with sed

I'm trying to convert a predefined string %c# where # can be some number with another string. The catch is that the length of the other string must be truncated to # number of characters.
Ideally these set of commands would work:
FORMAT="%c10"
LAST_COMMIT="5189e42b14797b1e36ffb7fc5657c7eea08f1c0f"
echo $FORMAT | sed "s/%c\([0-9]\+\)/${LAST_COMMIT:0:\1}/g"
but clearly there is a syntax error on the \1. You can replace it with a number to see what I'm trying to get as output.
I'm open to using some other program other than sed to achieve this but ideally it should be programs that are pretty much native to most linux installations.
Thanks!
This is my idea.
echo ${LAST_COMMIT} | head -c $(echo ${FORMAT} | sed -e 's/%c//')
Get number with sed and get first some character with head.
EDIT1
This might be better.
echo ${LAST_COMMIT} | head -c $(echo ${FORMAT} | sed -e 's/%c\([0-9]\+\)/\1/')
EDIT2
I make the script because it is too tough to understand. Please try this.
$ cat sample.sh
#!/bin/bash
FORMAT="%b-%t-%c10-%c5"
LAST_COMMIT="5189e42b14797b1e36ffb7fc5657c7eea08f1c0f"
## List numbers
lengths=$(echo ${FORMAT} | sed -e "s/%[^c]//g" -e "s/-//g" -e "s/%c/ /g")
## Substitute %cXX to first XX characters of LAST_COMMIT
for n in ${lengths}
do
to_str=$(echo ${LAST_COMMIT:0:${n}})
FORMAT=$(echo ${FORMAT} | sed "s/%c${length}/${to_str}/")
done
## Print result
echo ${FORMAT}
This is the result.
$ ./sample.sh
%b-%t-5189e42b1410-5189e5
Also this is one line commands (Same contents but too long and too tough)
for n in $(echo ${FORMAT} | sed -e "s/%[^c]//g" -e "s/-//g" -e "s/%c/ /g"); do to_str=$(echo ${LAST_COMMIT:0:${n}}); FORMAT=$(echo ${FORMAT} | sed "s/%c${length}/${to_str}/"); done; echo ${FORMAT}
The value of $LAST_COMMIT gets interpolated before sed runs, so there is no backreference to refer back to yet. There is an /e extension in GNU sed which would support something like this, but I would simply use a slightly more capable tool.
perl -e '$fmt = shift; $fmt=~ s/%c(\d+)/%.$1s/g; printf("$fmt\n", #ARGV)' '%c10' "$LAST_COMMIT"
Of course, if you can let go of your own ad-hoc format string specifier, and switch to a printf-compatible format string altogether, just use the printf shell command straight off.
length=$(echo $FORMAT | sed "s/%c\([0-9]\+\)/\1/g")
echo "${LAST_COMMIT:0:$length}"

grep with extended regex over multiple lines

I'm trying to get a pattern over multiple lines. I would like to ensure the line I'm looking for ends in \r\n and that there is specific text that comes after it at some point. The two problems I've had are I often get unmatched parenthesis in groupings or I get a positive match when there is none. Here are two simple examples.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'(\r\n)+.*TEST'
grep: Unmatched ( or \(
What exactly is unmatched there? I don't get it.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'\r\n.*TEST'
1
There is no TEST in the string, so why does this return a count of 1 for matches?
I'm using grep (GNU grep) 2.16 on Ubuntu 14. Thanks
Instead of -E you can use -P for PCRE support in gnu grep to use advanced regex like this:
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*TEST'
0
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*cd'
1
grep -E matches only in single line input.

Pcregrep using matching groups

I want to use a simple bash syntax to grep numbers from a range. E.g. from the phrase
range "7.2-55.0"
I want to save start=7.2 and end=55.0.
Because I know some perl regex (pcre), I tried:
echo 'range "7.2-55.0"' | pcregrep -o '^range \"(\S+)\"'
echo 'range "7.2-55.0"' | pcregrep -o '^range \"([0-9.-]+)\"'
which isn't working. The output is the whole line. So what is my fault? And is it possible to save 2 matching groups with pcregrep?
While searching the web I found e.g. pcregrep -o1 but I seem to have another version of the tool, because I am only allowed to use -o option (GNU Bash-3.2).
You can do like this with awk
start=$(echo 'range "7.2-55.0"' | awk -F'["-]' '/range/ {print $2}')
end=$(echo 'range "7.2-55.0"' | awk -F'["-]' '/range/ {print $3}')
echo $start
7.2
echo $end
55.0

grep: group capturing

I have following string:
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
and I need to get value of "scheme version", which is 1234 in this example.
I have tried
grep -Eo "\"scheme_version\":(\w*)"
however it returns
"scheme_version":1234
How can I make it? I know I can add sed call, but I would prefer to do it with single grep.
You'll need to use a look behind assertion so that it isn't included in the match:
grep -Po '(?<=scheme_version":)[0-9]+'
This might work for you:
echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234
Sorry it's not grep, so disregard this solution if you like.
Or stick with grep and add:
grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2
I would recommend that you use jq for the job. jq is a command-line JSON processor.
$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}
$ cat tmp | jq .scheme_version
1234
As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,
$ grep -Po 'scheme_version":\K[0-9]+'
This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.
You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.
To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.
$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json
1234
-r Capture group indices (e.g., $5) and names (e.g., $foo).
Another example with Python and json.tool module which can validate and pretty-print:
$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234
Related: Can grep output only specified groupings that match?
You can do this:
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'
Improving #potong's answer that works only to get "scheme_version", you can use this expression :
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724
$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234