sed: print only matching group - regex

I want to grab the last two numbers (one int, one float; followed by optional whitespace) and print only them.
Example:
foo bar <foo> bla 1 2 3.4
Should print:
2 3.4
So far, I have the following:
sed -n 's/\([0-9][0-9]*[\ \t][0-9.]*[\ \t]*$\)/replacement/p'
will give me
foo bar <foo> bla 1 replacement
However, if I try to replace it with group 1, the whole line is printed.
sed -n 's/\([0-9][0-9]*[\ \t][0-9.]*[\ \t]*$\)/\1/p'
How can I print only the section of the line that matches the regex in the group?

Match the whole line, so add a .* at the beginning of your regex. This causes the entire line to be replaced with the contents of the group
echo "foo bar <foo> bla 1 2 3.4" |
sed -n 's/.*\([0-9][0-9]*[\ \t][0-9.]*[ \t]*$\)/\1/p'
2 3.4

grep is the right tool for extracting.
using your example and your regex:
kent$ echo 'foo bar <foo> bla 1 2 3.4'|grep -o '[0-9][0-9]*[\ \t][0-9.]*[\ \t]*$'
2 3.4

And for yet another option, I'd go with awk!
echo "foo bar <foo> bla 1 2 3.4" | awk '{ print $(NF-1), $NF; }'
This will split the input (I'm using STDIN here, but your input could easily be a file) on spaces, and then print out the last-but-one field, and then the last field. The $NF variables hold the number of fields found after exploding on spaces.
The benefit of this is that it doesn't matter if what precedes the last two fields changes, as long as you only ever want the last two it'll continue to work.

The cut command is designed for this exact situation. It will "cut" on any delimiter and then you can specify which chunks should be output.
For instance:
echo "foo bar <foo> bla 1 2 3.4" | cut -d " " -f 6-7
Will result in output of:
2 3.4
-d sets the delimiter
-f selects the range of 'fields' to output, in this case, it's the 6th through 7th chunks of the original string. You can also specify the range as a list, such as 6,7.

I agree with #kent that this is well suited for grep -o. If you need to extract a group within a pattern, you can do it with a 2nd grep.
# To extract \1 from /xx([0-9]+)yy/
$ echo "aa678bb xx123yy xx4yy aa42 aa9bb" | grep -Eo 'xx[0-9]+yy' | grep -Eo '[0-9]+'
123
4
# To extract \1 from /a([0-9]+)b/
$ echo "aa678bb xx123yy xx4yy aa42 aa9bb" | grep -Eo 'a[0-9]+b' | grep -Eo '[0-9]+'
678
9
I generally cringe when I see 2 calls to grep/sed/awk piped together, but it's not always wrong. While we should exercise our skills of doing things efficiently, "A foolish consistency is the hobgoblin of little minds", and "Real artists ship".

Related

Regex pattern for quoted numbers and commas

I'm trying to find the correct regex to search a file for double quoted numbers separated by a comma. For example I'm trying to find "27,422,734" and then replace it in a text editor to correct the comma to be every 4 numbers so the end result would be "2742,2734"
I've tried a few examples I found on SO but none are helping me with this scenario like
"[^"]+"
'\d+'
while the above do find matches, I don't know how to deal with the commas and how what to replace that with.
Thanks for any help!
I found an even shorter solution (works with gnu-sed):
colonmv () {
echo $# | sed 's/,//g' | sed -r ':a;s/\B[0-9]{4}\>/,&/;ta'
}
But attention, the first sed command eats every comma, not just between digits, so improve it or filter your input before.
The second command uses the :a trick.
Read 4 digits, followed by a non digit (>) replace with the same plus comma, when a replacement took place, jump back from ta to :a and repeat.
Now, let's see colonmv in the wild:
colonmv '"A 3-grouped, pretty long number: 5,127,422,734 and an ungrouped one 5678905567789065778"'
"A 3-grouped pretty long number: 51,2742,2734 and an ungrouped one 567,8905,5677,8906,5778"
There might be better way of doing but I propose the following approach:
INPUT:
$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"
CMD:
$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'
OUTPUT:
$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"
CODE DETAILS AND EXPLANATIONS:
<(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) will extract each number to be processed from the input file, the regex used here use lookbehind/lookahead to enforce the surrounded by quotes condition, (:?\d+,\d+)+ is used to extract the numbers like 27,422,734.
the sed command will getting the output from the grep command will then do the following operations:
SED DETAILS:
s/,//g #remove all , in the number
:loop #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.
Temporary output after the paste operation:
27,422,734 2742,2734
27,422,734 2742,2734
123,734 12,3734
345,678,123,734 3456,7812,3734
345,678,123,734 3456,7812,3734
345,678,123,734 3456,7812,3734
123,734 12,3734
Last but not least the awk command will read this file and run some sed command to replace every element of the first column by the corresponding value in the second command: awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'.
Precondition: Your input conforms to "[0-9,]*" and is a "#,###"-format correct number.
#!/bin/bash
colonmv () {
echo $1 | sed -r 's/,([0-9]{3})+/\1/g;' | \
rev | sed -r 's/[^0-9]?([0-9]{4})/\1,/g;s/,"$/"/;s/.*/"&/' | rev
}
colonmv '"734"'
colonmv '"2,734"'
colonmv '"22,734"'
colonmv '"422,734"'
colonmv '"7,422,734"'
colonmv '"27,422,734"'
colonmv '"127,422,734"'
colonmv '"5,127,422,734"'
Test:
colonmv.sh
"734""
"2734"
"2,2734"
"42,2734"
"742,2734"
"2742,2734"
"1,2742,2734"
"51,2742,2734"

non matching groups in grep regex not working

I would like to extract 1, 10, and 100 from:
1 one -args 123
10 ten -args 123
100 one hundred -args 123
However this regex returns 100:
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '^(?=[ ]*)\d+(?=.*)'
100
Not ignoring the preceding spaces returns the numbers (but of course with undesired spaces):
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '^[ ]*\d+(?=.*)'
1
10
100
Have I misunderstood non capturing regex groups in grep / Perl (grep version 2.2, Perl as the -P flag should use its regex) or is this a bug? I notice the release notes for 2.6 says "This release fixes an unexpectedly large number of flaws, from outright bugs (surprisingly many, considering this is "grep")".
If someone with 2.6 could try these examples that would be valuable to determine if this is a bug (in 2.2) or intended behaviour.
The issue is what is considered a 'match' by grep. In the absence of telling grep part of the total match is not what you want, it prints everything up to the end of the match regardless of matching groups.
Given:
$ echo "$txt"
1 one -args 123
10 ten -args 123
100 one hundred -args 123
You can get just the first column of digits without leading spaces several ways.
With GNU grep:
$ echo "$txt" | grep -Po '^[ ]*\K\d+'
1
10
100
Here \K is equivalent to a look behind assertion that resets the match text of the match to be what comes after. The left hand, before the \K, is required to match, but is not included in match text printed by grep.
Demo
awk:
$ echo "$txt" | awk '/^[ ]*[0-9]+/{print $1}'
sed:
$ echo "$txt" | sed 's/^[ ]*\([0-9]*\).*/\1/'
Perl:
$ echo "$txt" | perl -lne 'print $1 if /^[ ]*\K(\d+)/'
And then if you want the matches on a single line, run through xargs:
$ echo "$txt" | grep -Po '^[ ]*\K(\d+)' | xargs
1 10 100
Or, if you are using awk or Perl, just change the way it is printed to not include a carriage return.
You can delete the unwanted spaces this way :
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '^[ ]*(\d+)' | tr -d ' '
As for your question of why it is not working, it is not a bug, it is working as intended, you just misinterpreted how it should work.
If we focus on this ^(?=[ ]*)\d+:
The (?=[ ]*) part is a lookahead assertion. So it means that the regex engine tries to check if the ^ is followed by zero or more spaces. But the assertion itself is not part of the match, so in reality this code means :
- Match a ^ that is followed by 0 or more spaces
- After this ^, match one or more digits
So your code will only match when a digit is the first character of the line. The lookahead won't help you on your use case.
I think the anchor messes with the lookahead, which could be a lookbehind, but they can't be ambiguous (I always run into that one). So the following would work:
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '(?=[ ]*)\d+(?=.*)'
As for a better tool, I would use awk as it is suited to any column driven data. So if you were running it off of ps you could do something like:
ps | awk '/stuff you want to look for here/{print $1}'
awk will take care of all the white space by default

How to match and keep the first number in a line using sed?

Question
Let's say I have one line of text with a number placed somewhere (it could be at the beginning, in the middle or at the end of the line).
How to match and keep the first number found in a line using sed?
Minimal example
Here is my attempt (following this page of a tutorial on regular expressions) and the output for different positions of the number:
$echo "SomeText 123SomeText" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
$echo "123SomeText" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
$echo "SomeText 123" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
As you can only the last digit is kept in the process whereas the desired output should be 123...
Using sed:
echo "SomeText 123SomeText 456" | sed -r 's/^[^0-9]*([0-9]+).*$/\1/'
123
You can also do this in gnu awk:
echo "SomeText 123SomeText 456" | awk '{print gensub(/^[^0-9]*([0-9]+).*$/, "\\1", $0)}'
123
To complement the sed solutions, here's an awk alternative (assuming that the goal is to extract the 1st number on each line, if any (i.e., ignore lines without any numbers)):
awk -F'[^0-9]*' '/[0-9]/ { print ($1 != "" ? $1 : $2) }'
-F'[^0-9]*' defines any sequence of non-digit chars. (including the empty string) as the field separator; awk automatically breaks each input line into fields based on that separator, with $1 representing the first field, $2 the second, and so on.
/[0-9]/ is a pattern (condition) that ensures that output is only produced for lines that contain at least one digit, via its associated action (the {...} block) - in other words: lines containing NO number at all are ignored.
{ print ($1!="" ? $1 : $2) } prints the 1st field, if nonempty, otherwise the 2nd one; rationale: if the line starts with a number, the 1st field will contain the 1st number on the line (because the line starts with a field rather than a separator; otherwise, it is the 2nd field that contains the 1st number (because the line starts with a separator).
You can also use grep, which is ideally suited to this task. sed is a Stream EDitor, which is only going to indirectly give you what you want. With grep, you only have to specify the part of the line you want.
$ cat file.txt
SomeText 123SomeText
123SomeText
SomeText 123
$ grep -o '[0-9]\+' file.txt
123
123
123
grep -o prints only the matching parts of a line, each on a separate line. The pattern is simple: one or more digits.
If your version of grep is compatible with the -P switch, you can use Perl-style regular expressions and make the command even shorter:
$ grep -Po '\d+' file.txt
123
123
123
Again, this matches one or more digits.
Using grep is a lot simpler and has the advantage that if the line doesn't match, nothing is printed:
$ echo "no number" | grep -Po '\d+' # no output
$ echo "yes 123number" | grep -Po '\d+'
123
edit
As pointed out in the comments, one possible problem is that this won't only print the first matching number on the line. If the line contains more than one number, they will all be printed. As far as I'm aware, this can't be done using grep -o.
In that case, I'd go with perl:
perl -lne 'print $1 if /.*?(\d+).*/'
This uses lazy matching (the question mark) so only non-digit characters are consumed by the .* at the start of the pattern. The $1 is a back reference, like \1 in sed. If there are more than one number on the line, this only prints the first. If there aren't any at all, it doesn't print anything:
$ echo "no number" | perl -ne 'print "$1\n" if /.*?(\d+).*/'
$ echo "yes123number456" | perl -lne 'print $1 if /.*?(\d+).*/'
123
If for some reason you still really want to use sed, you can do this:
sed -n 's/^[^0-9]*\([0-9]\{1,\}\).*$/\1/p'
unlike the other answers, this is compatible with all version of sed and will only print lines that contain a match.
Try this sed command,
$echo "SomeText 123SomeText" | sed -r '/[^0-9]*([0-9][0-9]*)[^0-9]*/ s//\1 /g'
123
Another example,
$ echo "SomeText 123SomeText 456" | sed -r '/[^0-9]*([0-9][0-9]*)[^0-9]*/ s//\1 /g'
123 456
It prints all the numbers in a file and the captured numbers are separated by spaces while printing.

Using regex to extract a substring while excluding a certain phrase

Say for the string:
test.1234.mp4
I would like to extract the numbers
1234
without extracting the 4 in mp4
What would the regex be for this?
The numbers aren't always in the second position and can be in different positions and might not always be four digits. I would like to extract the number without extracting the 4 in mp4 essentially.
More examples:
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
Essentially only the numbers would be extracted. Hence, for the last example, 666 from e666 would not be extracte and only 123.
To extract I have been using
echo "example.123.mp4" | grep -o "REGEX"
Edit: test456 was meant to be test.456
The accepted answer will fail on "test.e666.123.mp4" (print 666).
This should work
$ cat | perl -ne '/\.(\d+)\./; print "$1\n"'
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
1234
456
111
123
Note that this will only print the first group of numbers, if we have test.123.456.mp4 only 123 will be printed.
The idea is to match a dot followed by numbers which we are interested in (parentheses to save the match), followed by another dot. This means that it will fail on 123.mp4.
To fix this you could have:
$ cat | perl -ne '/(^|\.)(\d+)\./; print "$2\n"'
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
781.test.mp4
1234
456
111
123
781
First match is either beginning of line (^) or a dot, followed by numbers and a dot. We use $2 here since $1 is either beginning of a line or a dot.
cut can make it:
$ echo "test.1234.mp4" | cut -d. -f2
1234
where
cut -d'.' -f2
delimiter 2nd field
If you provide more examples we can improve the output. With the current code you would extract any something in blablabla.something.blablabla.
Update: from your question update we can do this:
grep -o '\.[0-9]*\.' | sed 's/\.//g'
test:
$ echo "test.abc.1234.mp4
test456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4" | grep -o '\.[0-9]*\.' | sed 's/\.//g'
1234
111
123
grep -Po "(?<=\.)\d+(?=\.)"
echo "test.1234.mp4" | perl -lpe 's/[^.\d]+\d*//g;s/\D*(\d+).*/$1/'
or:
echo "1321.test.mp4" | perl -lpe 's/.*(?:^|\.)(\d+)\..*/$1/'
p is to print by default so that we don't need explicit print.
e says we have an expression, not a script file
l puts the newline
These will also work if you have a number at the first part of the name.
perl -F'\.' -lane 'print "$F[scalar(#F)-2]" if(/\d+\.mp4$/)' your_file
tested:
> perl -F'\.' -lane 'print "$F[scalar(#F)-2]" if(/\d+\.mp4$/)' temp
1234
111
123
$ cat file
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
$ sed 's/.*\.\([0-9][0-9]*\)\..*/\1/' file
1234
456
111
123

How can I output only captured groups with sed?

Is there a way to tell sed to output only captured groups?
For example, given the input:
This is a sample 123 text and some 987 numbers
And pattern:
/([\d]+)/
Could I get only 123 and 987 output in the way formatted by back references?
The key to getting this to work is to tell sed to exclude what you don't want to be output as well as specifying what you do want. This technique depends on knowing how many matches you're looking for. The grep command below works for an unspecified number of matches.
string='This is a sample 123 text and some 987 numbers'
echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'
This says:
don't default to printing each line (-n)
exclude zero or more non-digits
include one or more digits
exclude one or more non-digits
include one or more digits
exclude zero or more non-digits
print the substitution (p) (on one line)
In general, in sed you capture groups using parentheses and output what you capture using a back reference:
echo "foobarbaz" | sed 's/^foo\(.*\)baz$/\1/'
will output "bar". If you use -r (-E for OS X) for extended regex, you don't need to escape the parentheses:
echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/'
There can be up to 9 capture groups and their back references. The back references are numbered in the order the groups appear, but they can be used in any order and can be repeated:
echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/'
outputs "a bar a".
If you have GNU grep:
echo "$string" | grep -Po '\d+'
It may also work in BSD, including OS X:
echo "$string" | grep -Eo '\d+'
These commands will match any number of digit sequences. The output will be on multiple lines.
or variations such as:
echo "$string" | grep -Po '(?<=\D )(\d+)'
The -P option enables Perl Compatible Regular Expressions. See man 3 pcrepattern or man 3 pcresyntax.
Sed has up to nine remembered patterns but you need to use escaped parentheses to remember portions of the regular expression.
See here for examples and more detail
you can use grep
grep -Eow "[0-9]+" file
run(s) of digits
This answer works with any count of digit groups. Example:
$ echo 'Num123that456are7899900contained0018166intext' \
| sed -En 's/[^0-9]*([0-9]{1,})[^0-9]*/\1 /gp'
123 456 7899900 0018166
Expanded answer.
Is there any way to tell sed to output only captured groups?
Yes. replace all text by the capture group:
$ echo 'Number 123 inside text' \
| sed 's/[^0-9]*\([0-9]\{1,\}\)[^0-9]*/\1/'
123
s/[^0-9]* # several non-digits
\([0-9]\{1,\}\) # followed by one or more digits
[^0-9]* # and followed by more non-digits.
/\1/ # gets replaced only by the digits.
Or with extended syntax (less backquotes and allow the use of +):
$ echo 'Number 123 in text' \
| sed -E 's/[^0-9]*([0-9]+)[^0-9]*/\1/'
123
To avoid printing the original text when there is no number, use:
$ echo 'Number xxx in text' \
| sed -En 's/[^0-9]*([0-9]+)[^0-9]*/\1/p'
(-n) Do not print the input by default.
(/p) print only if a replacement was done.
And to match several numbers (and also print them):
$ echo 'N 123 in 456 text' \
| sed -En 's/[^0-9]*([0-9]+)[^0-9]*/\1 /gp'
123 456
That works for any count of digit runs:
$ str='Test Num(s) 123 456 7899900 contained as0018166df in text'
$ echo "$str" \
| sed -En 's/[^0-9]*([0-9]{1,})[^0-9]*/\1 /gp'
123 456 7899900 0018166
Which is very similar to the grep command:
$ str='Test Num(s) 123 456 7899900 contained as0018166df in text'
$ echo "$str" | grep -Po '\d+'
123
456
7899900
0018166
About \d
and pattern: /([\d]+)/
Sed does not recognize the '\d' (shortcut) syntax. The ascii equivalent used above [0-9] is not exactly equivalent. The only alternative solution is to use a character class: '[[:digit:]]`.
The selected answer use such "character classes" to build a solution:
$ str='This is a sample 123 text and some 987 numbers'
$ echo "$str" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'
That solution only works for (exactly) two runs of digits.
Of course, as the answer is being executed inside the shell, we can define a couple of variables to make such answer shorter:
$ str='This is a sample 123 text and some 987 numbers'
$ d=[[:digit:]] D=[^[:digit:]]
$ echo "$str" | sed -rn "s/$D*($d+)$D+($d+)$D*/\1 \2/p"
But, as has been already explained, using a s/…/…/gp command is better:
$ str='This is 75577 a sam33ple 123 text and some 987 numbers'
$ d=[[:digit:]] D=[^[:digit:]]
$ echo "$str" | sed -rn "s/$D*($d+)$D*/\1 /gp"
75577 33 123 987
That will cover both repeated runs of digits and writing a short(er) command.
Give up and use Perl
Since sed does not cut it, let's just throw the towel and use Perl, at least it is LSB while grep GNU extensions are not :-)
Print the entire matching part, no matching groups or lookbehind needed:
cat <<EOS | perl -lane 'print m/\d+/g'
a1 b2
a34 b56
EOS
Output:
12
3456
Single match per line, often structured data fields:
cat <<EOS | perl -lape 's/.*?a(\d+).*/$1/g'
a1 b2
a34 b56
EOS
Output:
1
34
With lookbehind:
cat <<EOS | perl -lane 'print m/(?<=a)(\d+)/'
a1 b2
a34 b56
EOS
Multiple fields:
cat <<EOS | perl -lape 's/.*?a(\d+).*?b(\d+).*/$1 $2/g'
a1 c0 b2 c0
a34 c0 b56 c0
EOS
Output:
1 2
34 56
Multiple matches per line, often unstructured data:
cat <<EOS | perl -lape 's/.*?a(\d+)|.*/$1 /g'
a1 b2
a34 b56 a78 b90
EOS
Output:
1
34 78
With lookbehind:
cat EOS<< | perl -lane 'print m/(?<=a)(\d+)/g'
a1 b2
a34 b56 a78 b90
EOS
Output:
1
3478
I believe the pattern given in the question was by way of example only, and the goal was to match any pattern.
If you have a sed with the GNU extension allowing insertion of a newline in the pattern space, one suggestion is:
> set string = "This is a sample 123 text and some 987 numbers"
>
> set pattern = "[0-9][0-9]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
123
987
> set pattern = "[a-z][a-z]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
his
is
a
sample
text
and
some
numbers
These examples are with tcsh (yes, I know its the wrong shell) with CYGWIN. (Edit: For bash, remove set, and the spaces around =.)
Try
sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"
I got this under cygwin:
$ (echo "asdf"; \
echo "1234"; \
echo "asdf1234adsf1234asdf"; \
echo "1m2m3m4m5m6m7m8m9m0m1m2m3m4m5m6m7m8m9") | \
sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"
1234
1234 1234
1 2 3 4 5 6 7 8 9
$
You need include whole line to print group, which you're doing at the second command but you don't need to group the first wildcard. This will work as well:
echo "/home/me/myfile-99" | sed -r 's/.*myfile-(.*)$/\1/'
It's not what the OP asked for (capturing groups) but you can extract the numbers using:
S='This is a sample 123 text and some 987 numbers'
echo "$S" | sed 's/ /\n/g' | sed -r '/([0-9]+)/ !d'
Gives the following:
123
987
I want to give a simpler example on "output only captured groups with sed"
I have /home/me/myfile-99 and wish to output the serial number of the file: 99
My first try, which didn't work was:
echo "/home/me/myfile-99" | sed -r 's/myfile-(.*)$/\1/'
# output: /home/me/99
To make this work, we need to capture the unwanted portion in capture group as well:
echo "/home/me/myfile-99" | sed -r 's/^(.*)myfile-(.*)$/\2/'
# output: 99
*) Note that sed doesn't have \d
You can use ripgrep, which also seems to be a sed replacement for simple substitutions, like this
rg '(\d+)' -or '$1'
where ripgrep uses -o or --only matching and -r or --replace to output only the first capture group with $1 (quoted to be avoid intepretation as a variable by the shell) two times due to two matches.