Catch specific string using regex

Catch specific string using regex - regex

I have multiple boards. Inside my bash script, I want to catch my root filesystem name using regex. When I do a cat /proc/cmdline, I have this:
BOOT_IMAGE=/vmlinuz-5.15.0-57-generic root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7
I just want to select /dev/mapper/vgubuntu-root
So far I have managed to catch root=/dev/mapper/vgubuntu-root using this command
\broot=[^ ]+

You can use your regex in sed with a capture group:
sed -E 's~.* root=([^ ]+).*~\1~' /proc/cmdline
/dev/mapper/vgubuntu-root
Another option is to use awk(should work in any awk):
awk 'match($0, /root=[^ ]+/) {
print substr($0, RSTART+5, RLENGTH-5)
}' /proc/cmdline
# if your string is always 2nd field then a simpler one
awk '{sub(/^[^=]+=/, "", $2); print $2}' /proc/cmdline

1st solution: With your shown samples in GNU awk please try following awk code.
awk -v RS='[[:space:]]+root=[^[:space:]]+' '
RT && split(RT,arr,"="){
print arr[2]
}
' Input_file
2nd solution: With GNU grep you could try following solution, using -oP options to enable PCRE regex in grep and in main section of grep using regex ^.*?[[:space:]]root=\K\S+ where \K is used for forgetting matched values till root= and get rest of the values as required.
grep -oP '^.*?[[:space:]]root=\K\S+' Input_file
3rd solution: In case your Input_file is always same as shown samples then try this Simple awk using field separator(s) concept.
awk -F' |root=' '{print $3}' Input_file

If the second field has the value, using awk you can split and check for root
awk '
{
n=split($2,a,"=")
if (n==2 && a[1]=="root"){
print a[2]
}
}
' file
Output
/dev/mapper/vgubuntu-root
Or using GNU-awk with a capture group
awk 'match($0, /(^|\s)root=(\S+)/, a) {print a[2]}' file

Since you are using Linux, you can use a GNU grep:
grep -oP '\broot=\K\S+'
where o allows match output, and P sets the regex engine to PCRE. See the online demo. Details:
\b - word boundary
root= - a fixed string
\K - match reset operator discarding the text matched so far
\S+ - one or more non-whitespace chars.

another awk solution, using good ole' FS / OFS :
-- no PCRE, capture groups, match(), g/sub(), or substr() needed
echo 'BOOT_IMAGE=/vmlinuz-5.15.0-57-generic root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7' |
mawk NF=NF FS='^[^=]+=[^=]+=| [^/]+$' OFS=
/dev/mapper/vgubuntu-root
if you're very very certain the structure has root=, then :
gawk NF=NF FS='^.+root=| .+$' OFS=
/dev/mapper/vgubuntu-root
if you like doing it the RS way instead :
nawk '$!NF = $NF' FS== RS=' [^/]+\n'
/dev/mapper/vgubuntu-root

Related

sed - get only text without extension

How do I remove the extension in this SED statement?
Through
sed 's/.* - //'
File content
2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename.mp4
Actual
Filename.mp4
Desired
Filename

With your shown samples only. This could be done with simple codes in awk,sed and perl as follows.
1st solution: Using sed, perform simple substitutions and you will get desired output.
sed 's/.*- //;s/\.mp4$//' Input_file
2nd solution: Using awk its more simpler, creating different field separator and just print appropriate 2nd last column.
awk -F'- |.mp4' '{print $(NF-1)}' Input_file
3rd solution: Using substitution method in awk to get the required value as per OP's requirement.
awk '{gsub(/.*- |\.mp4$/,"")} 1' Input_file
4th solution: With perl one liner we could grab the appropriate needed value by setting field separators as dash spaces and .mp4 as follows:
perl -a -F'-\s+|\.mp4' -ne 'print "$F[$#F-1]\n";' Input_file

The Bash way (which works in most similar shells such us zsh,sh,ksh) is:
fn="2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename.mp4"
base=${fn%.*}
ext=${fn#$base.}
echo "$base"
echo "$ext"
Prints:
2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename
mp4

You can use
#!/bin/bash
s='2021-04-21_#fluffyban_6953588770591509765.mp4 - Filename.mp4'
sed -n 's/.* - \([^.]*\).*/\1/p' <<< "$s"
# => Filename
See the online demo.
Details:
-n - suppress default line output
s/ - substitute found pattern
.* - \([^.]*\).* - any text, space, -, space, then any zero or more chars other than a dot captured into Group 1, and then any text
/\1/ - replace found matches with Group 1 value
p - print the result of the substitution.

Using gnu awk you can also use a capture group to get the filename
match($0, /.* - ([^.]+)\.mp4$/, a) {print a[1]}' file
Regex explanation
.* - Match the last occurrence of -
( Capture group 1 (Referred to by a[1] in the awk example)
[^.]+ Match 1+ times any char except a dot
) Close group 1
\.mp4$ Match .mp4 at the end of the string
Awk explanation
awk '
match($0, /.* - ([^.]+)\.mp4$/, a) { # Test if the line using $0 matches the pattern
print a[1] # Print the value of group 1
}
' file

Yet another awk:
awk '{sub(/\.[^.]+$/, ""); print $NF}' file
Filename

gawk/mawk/mawk2 'BEGIN { FS = "( \- |[.][^. ]+$)"
} NF > 2 { print $(NF-1) }'
no substr(), index(), match(), or sub() needed. If you're VERY certain " - " can only occur once, then
awk 'BEGIN { FS = "(^.* \- |[.][^. ]+$)"; OFS = "" } —-NF'

Grep value between strings with regex

$ acpi
Battery 0: Charging, 18%, 01:37:09 until charged
How to grep the battery level value without percentage character (18)?
This should do it but I'm getting an empty result:
acpi | grep -e '(?<=, )(.*)(?=%)'

Your regex is correct but will work with experimental -P or perl mode regex option in gnu grep. You will also need -o to show only matching text.
Correct command would be:
grep -oP '(?<=, )\d+(?=%)'
However, if you don't have gnu grep then you can also use sed like this:
sed -nE 's/.*, ([0-9]+)%.*/\1/p' file
18

Could you please try following, written and tested in link https://ideone.com/nzSGKs
your_command | awk 'match($0,/Charging, [0-9]+%/){print substr($0,RSTART+10,RLENGTH-11)}'
Explanation: Adding detailed explanation for above only for explanation purposes.
your_command | ##Running OP command and passing its output to awk as standrd input here.
awk ' ##Starting awk program from here.
match($0,/Charging, [0-9]+%/){ ##Using match function to match regex Charging, [0-9]+% in line here.
print substr($0,RSTART+10,RLENGTH-11) ##Printing sub string and printing from 11th character from starting and leaving last 11 chars here in matched regex of current line.
}'

Using awk:
awk -F"," '{print $2+0}'
Using GNU sed:
sed -rn 's/.*\, *([0-9]+)\%\,.*/\1/p'

You can use sed:
$ acpi | sed -nE 's/.*Charging, ([[:digit:]]*)%.*/\1/p'
18
Or, if Charging is not always in the string, you can look for the ,:
$ acpi | sed -nE 's/[^,]*, ([[:digit:]]*)%.*/\1/p'

Using bash:
s='Battery 0: Charging, 18%, 01:37:09 until charged'
res="${s#*, }"
res="${res%%%*}"
echo "$res"
Result: 18.
res="${s#*, }" removes text from the beginning to the first comma+space and "${res%%%*}" removes all text from end till (and including) the last occurrence of %.

sed regex cut string after match

I tested a regex on http://regexr.com/ and it works like expected.
How can I run this by using sed?
/^.*?OU=([^,]*)/g
The test string looks like:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test,OU=Tese Sites,DC=Test,DC=local;test.local
And the output is:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
So it should cut the string before the second OU= starts.
Thanks

sed is not the best tool for this case when you have to deal with text that contains "columns" and can be split. Here are two possibilities, one with sed and the other with awk:
s="mario.test;Mario Test;Mario;Test;123;+001122334455,CN=Mario Test,OU=AT-Linz,OU=Tese Sites,DC=Test,DC=local;test.local"
echo $s | sed 's/OU=/й/' | sed 's/\([^й]*\)й\([^,]*\).*/\1OU=\2/'
echo $s | awk -F",OU=" '{print $1 ",OU=" $2}'
See the online demo
The awk solution splits with ,OU= substring and then joins the first and second column with the separator (since it is hardcoded, it is easy to put it back).
sed uses 2 passes: 1) add a non-used char (must be a control char, here, a Cyrillic letter is used for better "visibility") to mark the border of our match, 2) match all we do not need and match and capture what we need to keep with the help of capturing groups and backreferences.

Your question isn't clear but from reading your comments, are either of these what you're looking for?
$ awk -F, '{print $1 FS $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
$ awk -F'CN=[^,]+,OU=|,' '{print $1 $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;AT-Test

extract all values for specific key from space delimited text file

have a text file in the format
1=23 2=44 15=17:31:37.640 5=abc 15=17:31:37.641 4=23 15=17:31:37.643 15=17:31:37.643
I need a regex to extract all the values for key 15 for a multiline text file
output should be
17:31:37.640 17:31:37.641 17:31:37.643 17:31:37.643
Sorry, I should have stated that the values I'm trying to extract are timestamps in the form 17:31:37.643

You can use GNU grep to extract the substrings.
grep -Po '\b15=\K\S+' | tr '\n' ' '
-P option interprets the pattern as a Perl regular expression.
-o option shows only the matching part that matches the pattern.
\K throws away everything that it has matched up to that point.
Output
17:31:37.640 17:31:37.641 17:31:37.643 17:31:37.643

You can use sed:
sed 's/15=\([^ ]*\)/\1/g;s/[0-9]\+[^ ]\+ //g' input.file
Gave that answer before OP added the expected output, it will work too, but adds a new line after every value:
If you have GNU grep, you can use a lookbehind assertion that comes with perl compatible regex mode:
grep -oP '(?<=15=)[^ ]*' <<< '1=23 2=44 15=xyz 5=abc 15=yyy 4=23 15=omnet 15=that'
Output:
xyz
yyy
omnet
that

Using awk:
awk -F'=' -v RS=' ' -v ORS=' ' '$1==15 { print $2 }' file
xyz yyy omnet that
Set the Input and Output Record Separator to space and Input Field Separator to =. Test the condition of column1 to be 15. If that is true, print the second column.
As suggested by Ed Morton in the comments, this would leave a trailing blank char or even an absent newline. If thats a concern, you can use the following using GNU awk for multi-char RS.
gawk -F'=' -v RS='[[:space:]]+' '$1==15{ printf "%s%s", (c++?OFS:""), $2 } END{print ""}' file

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After

Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.

This might work for you (GNU sed):
sed 's/-[^-]*//2g' file

You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u

#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.

awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).

This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u

awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After

This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Catch specific string using regex - regex

If the second field has the value, using awk you can split and check for root awk ' { n=split($2,a,"=") if (n==2 && a[1]=="root"){ print a[2] } } ' file Output /dev/mapper/vgubuntu-root Or using GNU-awk with a capture group awk 'match($0, /(^|\s)root=(\S+)/, a) {print a[2]}' file

Related

sed - get only text without extension

Grep value between strings with regex

sed regex cut string after match

extract all values for specific key from space delimited text file

Remove everything after 2nd occurrence in a string in unix

Categories

Resources