How to retrieve a captured group regex in busybox sed

How to retrieve a captured group regex in busybox sed - regex

I'm making a script to boot a TS-7400 ARM SBC and I want it to be able to read some arguments and optional kernel parameters passed via a configuration file found on a SDcard. I called my config file syscfg.conf and it is organized using KEYWORD=value pairs, but since kernel arguments themselves can have the same syntax, I thought of delimiting values like this:
CMDLINE_ARGS="elevator=noop scheduler=noop"
While testing in regular bash, I was able to isolate the kernel command line arguments using either one of these methods:
$ grep CMDLINE_ARGS syscfg.conf | sed 's/CMDLINE_ARGS="\(.*\)"/\1/'
elevator=noop scheduler=noop
$ grep CMDLINE_ARGS syscfg.conf | cut -d'"' -f2
elevator=noop scheduler=noop
$ awk -F'"' '/CMDLINE_ARGS/ {print $2}' syscfg.conf
elevator=noop scheduler=noop
but when it runs on TS-LINUX, which is a busybox-based stripped down Linux used to boot a custom kernel or application, it doesn't work like in regular bash. While the awk command doesn't even exist, the cut version worked fine, but the sed one returns this:
CMDLINE_ARGS="elevator=noop scheduler=noop"
Why is busybox's sed implementation behaving like this? Instead of returning the whole string, I expected it to output only the "\1" group of any characters between the " delimiters - the "(.*)" regex. Is there any way we can make it work like in bash?

sed '/CMDLINE_ARGS=/ {s/CMDLINE_ARGS=.//;s/.$//;}' syscfg.conf
try maybe this if quote are problem or the pipe (i suspect shell substitution in one of those part)

Related

How to extract value from shell and regex

I have a string "12G 39G 24% /dev" . I have to extract the value '24'. I have used the below regex
grep '[0-9][0-9]%' -o
But I am getting output as 24%. I want only 24 as output and don't want '%' character. How to modify the regex script to extract only 24 as value?

One option would be to just grep again for the digits:
grep -o '[0-9][0-9]%' | grep -o '[0-9][0-9]'
However, if you want to accomplish this with a single regex, you can use the following:
grep -Po '[0-9]{2}(?=%)'
Note the -P option in this case; vanilla grep doesn't seem to support the (?=%) "look-around" part.

The most common way not to capture something is using look-around assertions:
Use it like this
grep -oP '[0-9][0-9](?=%)'
It's worth noting that GNU grep support the -P option to enable Perl compatible regex syntax, however it is not included with OS X. On Linux, it will be available by default. A workaround would be to use ack instead.
But I'd still recommend to use GNU grep on OS X by default. It can be installed on OSX using Homebrew with the command brew grep install
Also, see How to match, but not capture, part of a regex?

You can use sed as an alternative:
sed -rn 's/(^.*)([[:digit:]]{2})(%.*$)/\2/p' <<< "12G 39G 24% /dev"
Enable regular expressions with -r or -E and then split the line into 3 sections represented through parenthesis. Substitute the line for the second section only and print.

Use awk:
awk '{print $3+0}'
The value you seek is in the third field, and adding a zero coerces the string to a number, so % is removed.

Get text between two patterns with egrep and awk

I'm trying to parse a command's help file to grab all the arguments the command excepts.
Here is some text from the help file:
* --digest:
Set the digest for fingerprinting (defaults to the digest used when
signing the cert). Valid values depends on your openssl and openssl ruby
extension version.
* --debug:
Enable full debugging.
* --help:
Print this help message
* --verbose:
Enable verbosity.
* --version:
Print the puppet version number
I want to just grab --argument and nothing else.
I almost got it with this command, but its still including the ":" which I want to exclude:
puppet cert --help | egrep '^* --(.*):$' | awk '{print $2}'
--all:
--allow-dns-alt-names:
--digest:
--debug:
--help:
--verbose:
--version:
Why is '^* --(.*):$' including the ":" shouldn't it be matching everything between '^* --' and ':$' ?

shouldn't it be matching everything between ^* -- and :$ ?
Actually, no. You're capturing a group, but it won't print just the group. I suggest using the -P flag to use Perl regex, and look arounds. In your case, this might be enough:
$ cert --help | grep -Po '^\* \K--\w+'
Note that I also used the -o option, to print only the matched content, not the whole line. This eliminates the usage of awk.
A more complete line based on your initial thoughts and more look arounds:
$ cert --help | grep -Po '^\* \K--.*(?=:)'
Edit: as noted in the comments and fine answer by mklement0, this requires GNU grep. You can however do the same with Perl itself, which certainly is probably already installed in your system.
$ cert --help | perl -nle 'print $1 if /^\* (--\w+)/'
This works like a line of code inside a loop. Which is automatically generated by the -nle. -n for the input look, -l for the auto line break, and -e to present the line of code.
The line of Perl code prints the first captured group if the line matches the regex. So it combines ideas from your original solution too.
For a complete POSIX compliant answer, check the answer by mklement0 here in this page.

To provide a POSIX-compliant alternative to sidyll's elegant GNU grep answer (which also explains why the OP's approach didn't work):
Update: Avinash Raj points out in a comment that sed is an option, which indeed allows for a POSIX-compliant single-tool solution: sed allows us to match entire lines of interest and replace them with the contents of a capture group (the part of the line of interest):
puppet cert --help | sed -n 's/^\* \(--.*\):$/\1/p'
Note that since sed is used without the - nonstandard - -r / -E option, a basic regular expression must be used, where ( and ) must be \-escaped to act as capture-group delimiters.
Original answer:
puppet cert --help | egrep '^\* --.+:$' | awk -F '\\* |:' '{print $2}'
Note:
^* was replaced with ^\* so as to ensure that * is matched as a literal, and (.*) was replaced with .+, because (a) there is nothing to be gained by a capture group here, and (b) it's fair to assume that at least one letter follows the --.
-F '\\* |:' uses either literal *<space> or : as the field separator, which ensures that only the --... token (the second field) is printed.

Get all Commands without arguments from history (with Regex)

I have just started with learning shell commands and how to script in bash.
Now I like to solve the mentioned task in the title.
What I get from history command (without line numbers):
ls [options/arguments] | grep [options/arguments]
find [...] exec- sed [...]
du [...]; awk [...] file
And how my output should look like:
ls
grep
find
sed
du
awk
I already found a solution, but it doesn't really satisfy me. So far I declared three arrays, used the readarray -t << (...) command twice, in order to save the content from my history and after that, in combination with compgen -ac, to get all commands which I can possibly run. Then I compared the contents from both with loops, and saved the command every time it matched a line in the "history" array. A lot of effort for an simple exercise, I guess.
Another solution I thought of, is to do it with regex pattern matching.
A command usually starts at the beginning of the line, after a pipe, an execute or after a semicolon. And maybe more, I just don't know about yet.
So I need a regex which gives me only the next word after it matched one of these conditions. That's the command I've found and it seems to work:
grep -oP '(?<=|\s/)\w+'
Here it uses the pipe | as a condition. But I need to insert the others too. So I have put the pattern in double quotes, created an array with all conditions and tried it as recommend:
grep -oP "(?<=$condition\s/)\w+"
But no matter how I insert the variable, it fails. To keep it short, I couldn't figure out how the command works, especially not the regex part.
So, how can solve it using regular expressions? Or with a better approach than mine?
Thank you in advance! :-)

This is simple and works quite well
history -w /dev/stdout | cut -f1 -d ' '

You can use this awk with fc command:
awk '{print $1}' <(fc -nl)
find
mkdir
find
touch
tty
printf
find
ls
fc -nl lists entries from history without the line numbers.

Sed substitution not doing what I want and think it should do

I have am trying to use sed to get some info that is encoded within the path of a file which is passed as a parameter to my script (Bourne sh, if it matters).
From this example path, I'd like the result to be 8
PATH=/foo/bar/baz/1-1.8/sing/song
I first got the regex close by using sed as grep:
echo $PATH | sed -n -e "/^.*\/1-1\.\([0-9][0-9]*\).*/p"
This properly recognized the string, so I edited it to make a substitution out of it:
echo $PATH | sed -n -e "s/^.*\/1-1\.\([0-9][0-9]*\).*/\1/"
But this doesn't produce any output. I know I'm just not seeing something simple, but would really appreciate any ideas about what I'm doing wrong or about other ways to debug sed regular expressions.
(edit)
In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.
It is also possible to have an input string that the regular expression should not match and should product no output.

The -n option to sed supresses normal output, and since your second line doesn't have a p command, nothing is output. Get rid of the -n or stick a p back on the end

It looks like you're trying to get the 8 from the 1-1.8 (where 8 is any sequence of numerics), yes? If so, I would just use:
echo /foo/bar/baz/1-1.8/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
No doubt you could get it working with one sed "instruction" (-e) but sometimes it's easier just to break it down.
The first strips out everything from the start up to and including 1-1., the second strips from the first non-numeric after that to the end.
$ echo /foo/bar/baz/1-1.8/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
8
$ echo /foo/bar/baz/1-1.752/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
752
And, as an aside, this is actually how I debug sed regular expressions. I put simple ones in independent instructions (or independent part of a pipeline for other filtering commands) so I can see what each does.
Following your edit, this also works:
$ echo /foo/bar/baz/1-1.962/sing/song | sed -e "s/.*\/1-1\.\([0-9][0-9]*\).*/\1/"
962
As to your comment:
In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.
The two-part sed command I gave you should work with numerics anywhere in the string (as long as there's no 1-1. after the one you're interested in). That's because it actually deletes up to the specific 1-1. string and thereafter from the first non-numeric). If you have some examples that don't work as expected, toss them into the question as an update and I'll adjust the answer.

You can shorten you command by using + (one or more) instead of * (zero or more):
sed -n -e "s/^.*\/1-1\.\([0-9]\+\).*/\1/"

don't use PATH as your variable. It clashes with PATH environment variable
echo $path|sed -e's/.*1-1\.//;s/\/.*//'

You needn't divide your patterns with / (s/a/b/g), but may choose every character, so if you're dealing with paths, # is more useful than /:
echo /foo/1-1.962/sing | sed -e "s#.*/1-1\.\([0-9]\+\).*#\1#"

How do I find broken NMEA log sentences with grep?

My GPS logger occassionally leaves "unfinished" lines at the end of the log files. I think they're only at the end, but I want to check all lines just in case.
A sample complete sentence looks like:
$GPRMC,005727.000,A,3751.9418,S,14502.2569,E,0.00,339.17,210808,,,A*76
The line should start with a $ sign, and end with an * and a two character hex checksum. I don't care if the checksum is correct, just that it's present. It also needs to ignore "ADVER" sentences which don't have the checksum and are at the start of every file.
The following Python code might work:
import re
from path import path
nmea = re.compile("^\$.+\*[0-9A-F]{2}$")
for log in path("gpslogs").files("*.log"):
for line in log.lines():
if not nmea.match(line) and not "ADVER" in line:
print "%s\n\t%s\n" % (log, line)
Is there a way to do that with grep or awk or something simple? I haven't really figured out how to get grep to do what I want.
Update: Thanks #Motti and #Paul, I was able to get the following to do almost what I wanted, but had to use single quotes and remove the trailing $ before it would work:
grep -nvE '^\$.*\*[0-9A-F]{2}' *.log | grep -v ADVER | grep -v ADPMB
Two further questions arise, how can I make it ignore blank lines? And can I combine the last two greps?

The minimum of testing shows that this should do it:
grep -Ev "^\$.*\*[0-9A-Fa-f]{2}$" a.txt | grep -v ADVER
-E use extended regexp
-v Show lines that do not match
^ starts with
.* anything
\* an asterisk
[0-9A-Fa-f] hexadecimal digit
{2} exactly two of the previous
$ end of line
| grep -v ADVER weed out the ADVER lines
HTH, Motti.

#Motti's answer doesn't ignore ADVER lines, but you easily pipe the results of that grep to another:
grep -Ev "^\$.*\*[0-9A-Fa-f]{2}$" a.txt |grep -v ADVER

#Tom (rephrased) I had to remove the trailing $ for it to work
Removing the $ means that the line may end with something else (e.g. the following will be accepted)
$GPRMC,005727.000,A,3751.9418,S,14502.2569,E,0.00,339.17,210808,,,A*76xxx
#Tom And can I combine the last two greps?
grep -Ev "ADVER|ADPMB"

#Motti: Combining the greps isn't working, it's having no effect.
I understand that without the trailing $ something else may folow the checksum & still match, but it didn't work at all with it so I had no choice...
GNU grep 2.5.3 and GNU bash 3.2.39(1) if that makes any difference.
And it looks like the log files are using DOS line-breaks (CR+LF). Does grep need a switch to handle that properly?

#Tom
GNU grep 2.5.3 and GNU bash 3.2.39(1) if that makes any difference.
And it looks like the log files are using DOS line-breaks (CR+LF). Does grep need a switch to handle that properly?
I'm using grep (GNU grep) 2.4.2 on Windows (for shame!) and it works for me (and DOS line-breaks are naturally accepted) , I don't really have access to other OSs at the moment so I'm sorry but I won't be able to help you any further :o(

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to retrieve a captured group regex in busybox sed - regex

sed '/CMDLINE_ARGS=/ {s/CMDLINE_ARGS=.//;s/.$//;}' syscfg.conf try maybe this if quote are problem or the pipe (i suspect shell substitution in one of those part)

Related

How to extract value from shell and regex

Get text between two patterns with egrep and awk

Get all Commands without arguments from history (with Regex)

Sed substitution not doing what I want and think it should do

How do I find broken NMEA log sentences with grep?

Categories

Resources