sed regex cut string after match - regex

I tested a regex on http://regexr.com/ and it works like expected.
How can I run this by using sed?
/^.*?OU=([^,]*)/g
The test string looks like:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test,OU=Tese Sites,DC=Test,DC=local;test.local
And the output is:
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
So it should cut the string before the second OU= starts.
Thanks

sed is not the best tool for this case when you have to deal with text that contains "columns" and can be split. Here are two possibilities, one with sed and the other with awk:
s="mario.test;Mario Test;Mario;Test;123;+001122334455,CN=Mario Test,OU=AT-Linz,OU=Tese Sites,DC=Test,DC=local;test.local"
echo $s | sed 's/OU=/й/' | sed 's/\([^й]*\)й\([^,]*\).*/\1OU=\2/'
echo $s | awk -F",OU=" '{print $1 ",OU=" $2}'
See the online demo
The awk solution splits with ,OU= substring and then joins the first and second column with the separator (since it is hardcoded, it is easy to put it back).
sed uses 2 passes: 1) add a non-used char (must be a control char, here, a Cyrillic letter is used for better "visibility") to mark the border of our match, 2) match all we do not need and match and capture what we need to keep with the help of capturing groups and backreferences.

Your question isn't clear but from reading your comments, are either of these what you're looking for?
$ awk -F, '{print $1 FS $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;CN=Mario Test,OU=AT-Test
$ awk -F'CN=[^,]+,OU=|,' '{print $1 $2}' file
mario.test;Mario Test;Mario;Test;123;+001122334455;AT-Test

Related

Catch specific string using regex

I have multiple boards. Inside my bash script, I want to catch my root filesystem name using regex. When I do a cat /proc/cmdline, I have this:
BOOT_IMAGE=/vmlinuz-5.15.0-57-generic root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7
I just want to select /dev/mapper/vgubuntu-root
So far I have managed to catch root=/dev/mapper/vgubuntu-root using this command
\broot=[^ ]+
You can use your regex in sed with a capture group:
sed -E 's~.* root=([^ ]+).*~\1~' /proc/cmdline
/dev/mapper/vgubuntu-root
Another option is to use awk(should work in any awk):
awk 'match($0, /root=[^ ]+/) {
print substr($0, RSTART+5, RLENGTH-5)
}' /proc/cmdline
# if your string is always 2nd field then a simpler one
awk '{sub(/^[^=]+=/, "", $2); print $2}' /proc/cmdline
1st solution: With your shown samples in GNU awk please try following awk code.
awk -v RS='[[:space:]]+root=[^[:space:]]+' '
RT && split(RT,arr,"="){
print arr[2]
}
' Input_file
2nd solution: With GNU grep you could try following solution, using -oP options to enable PCRE regex in grep and in main section of grep using regex ^.*?[[:space:]]root=\K\S+ where \K is used for forgetting matched values till root= and get rest of the values as required.
grep -oP '^.*?[[:space:]]root=\K\S+' Input_file
3rd solution: In case your Input_file is always same as shown samples then try this Simple awk using field separator(s) concept.
awk -F' |root=' '{print $3}' Input_file
If the second field has the value, using awk you can split and check for root
awk '
{
n=split($2,a,"=")
if (n==2 && a[1]=="root"){
print a[2]
}
}
' file
Output
/dev/mapper/vgubuntu-root
Or using GNU-awk with a capture group
awk 'match($0, /(^|\s)root=(\S+)/, a) {print a[2]}' file
Since you are using Linux, you can use a GNU grep:
grep -oP '\broot=\K\S+'
where o allows match output, and P sets the regex engine to PCRE. See the online demo. Details:
\b - word boundary
root= - a fixed string
\K - match reset operator discarding the text matched so far
\S+ - one or more non-whitespace chars.
another awk solution, using good ole' FS / OFS :
-- no PCRE, capture groups, match(), g/sub(), or substr() needed
echo 'BOOT_IMAGE=/vmlinuz-5.15.0-57-generic root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7' |
mawk NF=NF FS='^[^=]+=[^=]+=| [^/]+$' OFS=
/dev/mapper/vgubuntu-root
if you're very very certain the structure has root=, then :
gawk NF=NF FS='^.+root=| .+$' OFS=
/dev/mapper/vgubuntu-root
if you like doing it the RS way instead :
nawk '$!NF = $NF' FS== RS=' [^/]+\n'
/dev/mapper/vgubuntu-root

Sed version extract

I am trying to extract the version number from a string. I am unable to find the exact regex to find what I need.
For eg -
1012-EPS-Test-OF-Something-1.3
I need sed to only extract 1.3 from the above line.
I have tried quite a few things until now something like but it is clearly not working out
sed 's/[^0-9.0-9]*//')
With your shown samples, easiest way could be. Simply print value of shell variable into awk program as input and then setting field separator as - and printing the last field value in it.
echo "$string" | awk -F'-' '{print $NF}'
2nd solution: In case you could have anything else also apart from version number in last field of your value(where - is field delimiter) then use match function of awk.
echo "$var" |
awk -F'-' 'match($NF,/[0-9]+(\.[0-9]+)*/){print substr($NF,RSTART,RLENGTH)}'
3rd solution: Using GNU grep try following once. Using \K option for GNU grep here. This will match everything till - and then mentioning \K will forget OR wouldn't consider that matched value for printing and will print all further matched value(with further mentioned regex).
echo "$var" | grep -oP '.*-\K\d+(\.\d+)*'
This should work in any grep:
s='1012-EPS-Test-OF-Something-1.3'
grep -Eo '[0-9]+(\.[0-9]+)+' <<< "$s"
1.3
This might work for you (GNU sed):
sed -n 's/.*[^0-9.]//p' file
The regexp is greedy and swallows the whole line .* then steps back a character at a time till the first match of [^0-9.], removes the front portion and prints the remainder.
You can use string manipulation to get the last part after -:
s='1012-EPS-Test-OF-Something-1.3'
s="${s##*-}"
See this online demo:
#!/bin/bash
s='1012-EPS-Test-OF-Something-1.3'
s="${s##*-}"
echo "$s"
# => 1.3
See 10.1. Manipulating Strings:
${string##substring}
    Deletes longest match of $substring from front of $string.

Finding and replacing the last space at or before nth character works with sed but not awk, what am I doing wrong?

I have a string in a test.csv file like this:
here is my string
when I use sed it works just as I expect:
cat test.csv | sed -r 's/^(.{1,9}) /\1,/g'
here is,my string
Then when I use awk it doesn't work and I'm not sure why:
cat test.csv | awk '{gsub(/^(.{1,9}) /,","); print}'
,my string
I need to use awk because once I get this figured out I will be selecting only one column to split into two columns with the added comma. I'm using extended regex with sed, "-r" and was wondering how or if it's supported with awk, but I don't know if that really is the problem or not.
awk does not support back references in gsub. If you are on GNU awk, then gensub can be used to do what you need.
echo "here is my string" | awk '{print gensub(/^(.{1,9}) /,"\\1,","G")}'
here is,my string
Note the use of double \ inside the quoted replacement part. You can read more about gensub here.

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After
Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.
This might work for you (GNU sed):
sed 's/-[^-]*//2g' file
You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u
#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After
This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

How to seek forward and replace selected characters with sed

Can I use sed to replace selected characters, for example H => X, 1 => 2, but first seek forward so that characters in first groups are not replaced.
Sample data:
"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";
How it should be after sed:
"Hello World";"Number 1 is there";"tX2s-Xas,2,XXunKnownData";
What I have tried:
Nothing really, I would try but everything I know about sed expressions seems to be wrong.
Ok, I have tried to capture ([^;]+) and "skip" (get em back using ´\1\2´...) first groups separated by ;, this is working fine but then comes problem, if I use capturing I need to select whole group and if I don't use capturing I'll lose data.
This is possible with sed, but is kinda tedious. To do the translation if field number $FIELD you can use the following:
sed 's/\(\([^;]*;\)\{'$((FIELD-1))'\}\)\([^;]*;\)/\1\n\3\n/;h;s/[^\n]*\n\([^\n]*\).*/\1/;y/H1/X2/;G;s/\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)/\2\1\4/'
Or, reducing the number of brackets with GNU sed:
sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
Example:
$ FIELD=3
$ echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' | sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
"Hello World";"Number 1 is there";"tX2s-Xas,2,XXunKnownData";
$ FIELD=2
$ echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' | sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
"Hello World";"Number 2 is there";"tH1s-Has,1,HHunKnownData";
There may be a simpler way that I didn't think of, though.
If awk is ok for you:
awk -F";" '{gsub("H","X",$3);gsub("1","2",$3);}1' OFS=";" file
Using -F, the file is split with semi-colon as delimiter, and hence now the 3rd field($3) is of our interest. gsub function substitutes all occurences of H with X in the 3rd field, and again 1 to 2.
1 is to print every line.
[UPDATE]
(I just realized that it could be shorter. Perl has an auto-split mode):
$F[2] =~ s/H/X/g; $F[2] =~ s/1/2/g; $_=join(";",#F)
Perl is not known for being particularly readable, but in this case I suspect the best you can get with sed might not be as clear as with Perl:
echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' |
perl -F';' -ape '$F[2] =~ s/H/X/g; $F[2] =~ s/1/2/g; $_=join(";",#F)'
Taking apart the Perl code:
# your groups are in #F, accessed as $F[$i]
$F[2] =~ s/H/X/g; # Do whatever you want with your chosen (Nth) group.
$F[2] =~ s/1/2/g;
$_ = join(";", #F) # Put them back together.
perl -pe is like sed. (sort of.)
and perl -F';' -ape means use auto-splitting (-a) and set the field separator to ';'. Then your groups are accessible via $F[i] - so it works slightly like awk, too.
So it would also work like perl -F';' -ape '/*your code*/' < inputfile
I know you asked for a sed solution - I often find myself switching to Perl (though I do still like sed) for one-liners.
awk -F";" '{gsub("H","X",$3);gsub("1","2",$3);}1' Your_file
This might work for you (GNU sed):
sed 's/H/X/2g;s/1/2/2g' file
This changes all but the first occurrence of H or 1 to X or 2 respectively
If it's by fields separated by ;'s, use:
sed 's/H[^;]*;/&\n/;h;y/H/X/;H;g;s/\n.*\n//;s/1[^;]*;/&\n/;h;y/1/2/;H;g;s/\n.*\n//' file
This can be mutated to cater for many values, so:
echo -e "H=X\n1=2"|
sed -r 's|(.*)=(.*)|s/\1[^;]*;/\&\\n/;h;y/\1/\2/;H;g;s/\\n.*\\n//|' |
sed -f - file