Grep matches only of multiple separated strings - regex

I have a file with lines containing this format:
fieldA=value1, fieldB=value2, fieldC=value3, fieldD=value4, fieldE=value5
I am interested in fieldA, fieldB, fieldD. However, fieldC may or may not be present, therefore I cannot use something like:
grep "field" * | awk -F"," '{print $1, $2, $4}'
My end goal is to have output like this, all in one line:
fieldA=value1, fieldB=value2, fieldD=value4
I tried using grep -E, but it outputs those fields in different lines, and the association between the fields breaks.
grep -o -E "field1_=\w*|field2_=\w*|field3_=\w*"

if you know the field name of A,B,D grep and xargs could do the job. ( awk/sed could do it for sure)
grep -Po "fieldA=[^,]*|fieldB=[^,]*|fieldD=[^,]*" file|xargs -n3
that gives you:
fieldA=value1 fieldB=value2 fieldD=value4
if you want the comma in output:
grep -Po "fieldA=[^,]*,|fieldB=[^,]*,|fieldD=[^,]*" file|xargs -n3

Is a sed solution acceptable?
sed 's/^\([^ ]* [^ ]*\).*\(fieldD=[^,]*\).*/\1 \2/' filename

Related

grep to match filename in full path

I want to "extract" the file name in a full path string using grep.
For example:
In /etc/network/interfaces I want to match "interfaces".
In /home/user/Documents/report.pdf I want to match "report.pdf".
Basically I want the opposite of:
$ ls /etc/network/interfaces | grep "^.*/"
I tried:
$ ls -p /etc/network/interfaces | grep "/.*$"
But it won't be the last slash (/), all chars (.*), until the end ($). Since slashes are chars as well, it matches all the path.
Does anyone know a way to match only the last part? Something like (from last slash until the end.
Thank you,
awk, getting the / separated last field:
% awk -F/ '{print $NF}' <<<'/etc/network/interfaces'
interfaces
% awk -F/ '{print $NF}' <<<'/home/user/Documents/report.pdf'
report.pdf
grep, getting the portion after last /:
% grep -o '[^/]\+$' <<<'/etc/network/interfaces'
interfaces
% grep -o '[^/]\+$' <<<'/home/user/Documents/report.pdf'
report.pdf
sed, replacing everything upto the last / with null:
% sed 's_.*/__' <<<'/etc/network/interfaces'
interfaces
% sed 's_.*/__' <<<'/home/user/Documents/report.pdf'
report.pdf
How about simply matching on not /? Also, for extraction, you need the -o flag to grep.
ls -p /etc/network/interfaces | grep -o '[^/]*$'
As said in the first comment you can directly try using awk something like:
ls -l /etc/networks | awk '{print $NF}' | awk -F "/" '{print $NF}'
It should be possible to trim these 2 awk pipes as well

Regex to match an IP adress within a colon and a slash with grep

The lines in the file I want to search look like this:
log:192.1.1.128/50098
log:192.1.1.11/22
...
Now I tried the following RegEx but none of them worked:
grep -oE "\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" file
grep -oE "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b"
grep -oE "\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b"
You can do this without regex using awk (on this simple example):
awk -F":|/" '{print $2}' file
192.1.1.128
192.1.1.11
To test if its IP contains three .:
awk -F":|/" '{n=split($2,a,".");if (n=4) print $2}' file
192.1.1.128
192.1.1.11
You could use grep also.
$ grep -oP '.*?:\K[^/]*(?=/)' file
192.1.1.128
192.1.1.11
Grep's extended regexp parameter -E won't support \d, you need to use [0-9] instead of \d.
$ grep -oE "\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b" file
192.1.1.128
192.1.1.11

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After
Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.
This might work for you (GNU sed):
sed 's/-[^-]*//2g' file
You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u
#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After
This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

Removing text between "|" delimiter and "," delimiter using shell script

I have a large multi-lined file that is being pulled from a database the file has fields delimited by commas and if the field has multiple values the values are separated by "|"
example input:
name,title,email1|email2|email3,phone,address
In a shell script I need to remove "|email2|email3"
example output:
name,title,email1,phone,address
I need to do this for each line in the file.
Try sed:
sed "s/\|[^,]*//g"
Result:
h2co3-macbook:~ h2co3$ echo "name,title,email1|email2|email3,phone,address" | sed "s/\|[^,]*//g"
name,title,email1,phone,address
h2co3-macbook:~ h2co3$
Using sed:
sed -i 's/|[^,]*//g' filename
Note that in most regex flavors | is a special character that specifies alternation, and to match a literal | you need to use \|. This is not the case for sed, to match a literal | you use | and for alternation you use \| (unless an extended regex option is specified).
Use sed with inline option:
sed -i.bak 's/|[^|,]*//g' inFile
Live Demo: http://ideone.com/zKUVhl
This answer splits the input into fields and outputs the ones you want.
awk -F'[|,]' -v OFS=, '{print $1, $2, $3, $(NF-1), $NF}' file

Get An Specified Match Under a String

I'm trying to match the contents of a string that contains sequences of quotes using Shell Script, at the time the far I got was this:
et="\"He\" \"llo\""
echo $et | sed -e '/\"(.*?)\"/g'
Which returns this:
"He" "llo"
But I don't want the quote marks to appear on the result, also how can I echo only the first, or the second, or the third, etc. match?
sed -e 's/"\([^"]*\)"/\1/g' will remove quotes around balanced " quotes. To only show the first, second match etc with sed you probably have to make different capture groups.
$ echo '"1" "2" "3"' | sed -e 's/"\([^"]*\)" "\([^"]*\)" "\([^"]*\)"/\2/g'
2
$
Provided that what is wanted is only the text between the first pair of quotes, here is a solution with perl:
echo $et | perl -ne '/"[^"]+"/ and print "$&\n";'
This will also handle quotes witin quotes if they are preceded by a backslash:
echo $et | perl -ne '/"[^"\\]+(\\.[^"]*)*"/ and print "$&\n";'
This is much simpler with awk since you can specify the double-quote to be the field separator.
$ et='"He" "llo"'
$ awk -F'"' '{print $2}' <<<$et
He
$ awk -F'"' '{print $4}' <<<$et
llo
Note: This is also scalable and the strings fields will be in multiples of two, i.e $2, $4, $6, etc.
You can also do something like this:
[srikanth#myhost ~]$ echo "\"He\" \"llo\"" | awk ' { match($0,/([A-Za-z]+)[" ]+([A-Za-z]+)/,a); print a[1]","a[2]} '
He,llo