grep to match filename in full path - regex

I want to "extract" the file name in a full path string using grep.
For example:
In /etc/network/interfaces I want to match "interfaces".
In /home/user/Documents/report.pdf I want to match "report.pdf".
Basically I want the opposite of:
$ ls /etc/network/interfaces | grep "^.*/"
I tried:
$ ls -p /etc/network/interfaces | grep "/.*$"
But it won't be the last slash (/), all chars (.*), until the end ($). Since slashes are chars as well, it matches all the path.
Does anyone know a way to match only the last part? Something like (from last slash until the end.
Thank you,

awk, getting the / separated last field:
% awk -F/ '{print $NF}' <<<'/etc/network/interfaces'
interfaces
% awk -F/ '{print $NF}' <<<'/home/user/Documents/report.pdf'
report.pdf
grep, getting the portion after last /:
% grep -o '[^/]\+$' <<<'/etc/network/interfaces'
interfaces
% grep -o '[^/]\+$' <<<'/home/user/Documents/report.pdf'
report.pdf
sed, replacing everything upto the last / with null:
% sed 's_.*/__' <<<'/etc/network/interfaces'
interfaces
% sed 's_.*/__' <<<'/home/user/Documents/report.pdf'
report.pdf

How about simply matching on not /? Also, for extraction, you need the -o flag to grep.
ls -p /etc/network/interfaces | grep -o '[^/]*$'

As said in the first comment you can directly try using awk something like:
ls -l /etc/networks | awk '{print $NF}' | awk -F "/" '{print $NF}'
It should be possible to trim these 2 awk pipes as well

Related

Access the word in the file with grep

I have a conf file and I use grep to access the data in this file but not a very useful method for me.
How can I just get the main word by search-term?
I using:
grep "export:" /etc/VDdatas.conf
Print:
export: HelloWorld
I want: (without "export: ")
HelloWorld
How can I do that?
If you're using GNU grep you can use PCRE and a lookbehind:
grep -P -o '(?<=export:).*' /etc/VDdatas.conf
The -o option means to print only the part of the line that matches the regexp, and using a lookbehind for the export: prefix makes it not part of the match.
You can also use sed or awk
sed 's/export:/s/^export: //' /etc/VDdatas.conf
awk '/export:/ {print $2}' /etc/VDdatas.conf
I suggest you pipe the match to awk.
grep "export:" /etc/VDdatas.conf | awk -F ' ' '{print $2}'
This will print the second word in the output (after splitting the line on spaces).

grep extract simple url - without scheme

I need to extract n url from a file. I've started with:
grep -E -o 'ftp://\S*' $filename
I know, that this particular url will start with ftp scheme and will end with some white character (space or newline).
I receive something like:
ftp:/dir/some_file.ext
But I need just a path (/dir/some_file.ext). Without scheme (ftp:// part)
Can I do it with the first regexp? Do I have to use a second one?
I cannot use anything else then grep/egrep.
If your grep supports -P (PCRE flag) then you can use:
grep -oP 'ftp:/\K/\S*' $filename
/dir/some_file.ext
If fore some reason you don't have grep -P available then pipe with another grep:
grep -oE 'ftp://\S*' file | grep -oE '/[^/].*'
/dir/some_file.ext
This gnu awk (due to multiple characters in Record Selector) may also do:
awk -v RS="ftp:/" 'NR>1 {print $1}' file

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After
Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.
This might work for you (GNU sed):
sed 's/-[^-]*//2g' file
You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u
#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After
This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

Grep matches only of multiple separated strings

I have a file with lines containing this format:
fieldA=value1, fieldB=value2, fieldC=value3, fieldD=value4, fieldE=value5
I am interested in fieldA, fieldB, fieldD. However, fieldC may or may not be present, therefore I cannot use something like:
grep "field" * | awk -F"," '{print $1, $2, $4}'
My end goal is to have output like this, all in one line:
fieldA=value1, fieldB=value2, fieldD=value4
I tried using grep -E, but it outputs those fields in different lines, and the association between the fields breaks.
grep -o -E "field1_=\w*|field2_=\w*|field3_=\w*"
if you know the field name of A,B,D grep and xargs could do the job. ( awk/sed could do it for sure)
grep -Po "fieldA=[^,]*|fieldB=[^,]*|fieldD=[^,]*" file|xargs -n3
that gives you:
fieldA=value1 fieldB=value2 fieldD=value4
if you want the comma in output:
grep -Po "fieldA=[^,]*,|fieldB=[^,]*,|fieldD=[^,]*" file|xargs -n3
Is a sed solution acceptable?
sed 's/^\([^ ]* [^ ]*\).*\(fieldD=[^,]*\).*/\1 \2/' filename

grep regex to pull out a string between two known strings

I have a string of text in a file that I am parsing out, I almost got it but not sure what I am missing
basic expression I am using is
cat cred.txt | grep -m 1 -o '&CD=[^&]*'
I am getting a results of
&CD=u8AA-RaF-97gc_SdZ0J74gc_SdZ0J196gc_SdZ0J211
I do not want the &CD= part in the resulting string, how would I do that.
The string I am parsing from is:
webpage.asp?UserName=username&CD=u8AA-RaF-97gc_SdZ0J74gc_SdZ0J196gc_SdZ0J211&Country=USA
If your grep knows Perl regex:
grep -m 1 -oP '(?<=&CD=)[^&]*' cred.txt
If not:
sed '1s/.*&CD=\([^&]*\).*/\1/' cred.txt
Many ways to skin this cat.
Extend your pipe:
grep -o 'CD=[^&]*' cred.txt | cut -d= -f2
Or do a replacement in sed:
sed -r 's/.*[&?]CD=([^&]*).*/\1/' cred.txt
Or get really fancy and parse the actual QUERY_STRING in awk:
awk -F'?' '{ split($2, a, "&"); for(i in a){split(a[i], kv, "="); out[kv[1]]=kv[2];} print out["CD"];}'