Extracting a part of String using grep/sed - regex

I have a file in linux with similar entries as below
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN=Review users,OU=groups,DC=HDFCSLDM,DC=COM
I would like to extract only the CN information, till the first ,
for ex:
> HP_NetworkSupport
> Review users
in the above case to another file.
What would be command for doing the same.

This is one way with lookahead:
grep -Po '(?<=CN=)[^,]*' file > new_file
It gets all text from CN= (not included) until it finds a comma ,. The idea of [^,]* is to fetch any character that is not a comma.
Test
$ grep -Po '(?<=CN=)[^,]*' file
HP_NetworkSupport
Review users

Using awk
awk -F"=|," '{print $2}' file
HP_NetworkSupport
Review users
or
awk -F[=,] '{print $2}' file
HP_NetworkSupport
Review users
Set the delimiter to , or =, then print second field.
To handel field with comma within, you should use a parser for LDAP, but this should work.
echo file
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN="Review, users",OU=groups,DC=HDFCSLDM,DC=COM
awk -F"CN=|,OU" '{print $2}' file
HP_NetworkSupport
Review, users

Using sed:
$ sed -r 's/.*CN=([^,]*),.*/\1/' inputfile
HP_NetworkSupport
Review users

perl -lne 'print $1 if(/CN=([^\,]*),/)' your_file
Tested Below:
> cat temp
dn: CN=HP_NetworkSupport,OU=groups,DC=HDFCSLDM,DC=COM
dn: CN=Review users,OU=groups,DC=HDFCSLDM,DC=COM
> perl -lne 'print $1 if(/CN=([^\,]*),/)' temp
HP_NetworkSupport
Review users
>

Pipe it through this command:
sed -E "s/.*CN=(.+?),OU=.*/\\1/g"

Related

Grep password from .my.cnf

I try to grep the (write access) password from .my.cnf, but I didnt get it yet.
The .my.cnf looks like this:
# longer
# comment text
[clientreadonly]
password=pass1 # comment
port=3306
user=test_ro
socket=/var/lib/mysql/mysql.sock
[client]
password=pass2 # comment
port=3306
user=test
socket=/var/lib/mysql/mysql.sock
and I want to grep pass2. and the code shouldnt be too verbose of course. I ended up with
grep 'password=' ~/.my.cnf | sed -e 's/password=//
but thath actually leaves the #comment behind the pass2 and I dont want to replace the whole comment (because its long in the original and stupid to just replace it). So I would need a regex to somehow get the pass2 only.
The main target is, to grep the password so I can easily use it in a shell command line
$ awk -F'[= ]' '/^password=/ && p !~ /clientreadonly/{print $2} {p=$0}' ~/.my.cnf
pass2
-F'[= ]' use space or = as field separator
/^password=/ && p !~ /clientreadonly/ if line starts with password= and previous line doesn't contain clientreadonly
print $2 print the second field
p=$0 save the previous line in p variable
You can modify little bit like the following -
grep 'password=' ~/.my.cnf | sed -e 's/password=//' -e 's/ # comment//'
or other way -
grep 'password=' ~/.my.cnf | cut -d' ' -f1 | cut -d'=' -f2
Using perl:
perl -00 -ane '/\[client\].password=(\S+)/s && print $1' < ~/.my.cnf
Output:
pass2

sed or awk to capture part of url

I am not very experienced with regular expressions and sed/awk scripting.
I have urls that are similar to the following torrent url:
http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
I would like to have sed or awk script extract the text after the title i.e
from the example above just get:
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
A simple approach with awk: use the = as the field separator:
awk -F"=" '{print $2}'
Thus:
echo "http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent" | awk -F"=" '{print $2}'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
Just remove everything before the title=: sed 's/.*title=//'
$ echo "http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent" | sed 's/.*title=//'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
Let's say:
s='http://torcache.net/torrent/D7249CD9AF321C8578B3A7007ABBDD63B0475EEB.torrent?title=[kickass.to]against.the.ropes.by.carly.fall.epub.torrent'
Pure BASH solution:
echo "${s/*title=}"
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
OR using grep -P:
echo "$s"|grep -oP 'title=\K.*'
[kickass.to]against.the.ropes.by.carly.fall.epub.torrent
By using sed (no need to mention title in the regexp in your example) :
sed 's/.*=//'
An another solution exists with cut, another standard unix tool :
cut -d= -f2

i have a file and i need to extract a particular string followed after the regex 'LN:' from the second line

please refer the file contents below.
#HD VN:1.0 SO:unsorted
#SQ SN:Chr1 LN:30427680
#PG ID:bowtie2 PN:bowtie2 VN:2.1.0
how can i extract just the number 30427680 using awk or any other unix command.
Using sed
sed -n 's/.*LN://p' < input.txt
This will erase everything up until LN:, and print what's left, and only if a substitution did take place.
Using awk
awk -v FS=: '/LN:/ { print $3; }' < input.txt
This will match lines that contain LN:, use : as field separator, and print the 3rd column.
Using grep
grep -o '[0-9]\{3,\}' < input.txt
This will match sequences of 3 or more digits, and print only the matched pattern thanks to the -o.
Depending on other cases not included in your question, you might have to make the patterns more strict.
Using grep:
grep -oP 'LN:\K.*' filename
Just use grep:
grep -o 30427680 file
-o, --only-matching
Prints only the matching part of the lines.
Using perl :
perl -ne 'print $& if /LN:\K.*/' filename
or
perl -ne 'print $1 if /LN:(.*)/' filename
Another awk
awk -F"LN:" 'NF>1 {print $2}' file

What is the Unix command to display all lines of a file with two certain strings

Basically, I have a file that I want to search and display only the lines that have the strings 'abc' and 'vhg'. What is the Unix command for this?
You can use grep for it:
grep abc file.txt | grep vhg
OR
you can use awk:
awk '/abc/ && /vhg/' file.txt
One more way with grep:
grep .*abc.*vhg file.txt
Use the grep command.
grep 'word1\|word2\|word3' /path/to/file
Example:
grep 'abc\|vhg' filename
Since a sed solution has not yet been given:
sed -n '/abc/{ /vhg/p; }'

How to search pattern in a file by Linux CLI?

I've got log file with lines like:
07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}
How do I obtain only www.website.pl/some,site.html from all lines?
Can this be done with "sed" or other command?
Cut also supports delimiter and field(s) selection.
$ cut -d\| -f7
07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}
www.website.pl/some,site.html
Yes, with awk.
Simply process your file with
awk -F '|' '{print $7}'
A little transcript on your example line:
$ echo '07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}' | awk -F '|' '{print $7}'
www.website.pl/some,site.html
CAVEAT This assumes there are no other pipes in your file except those used for delimters.
This might work for you:
echo '07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}'|
sed 's/^\(\([^|]*\)|\)\{7\}.*/\2/'
www.website.pl/some,site.html
Or if the sites all begin www:
echo '07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}'|
sed 's/.*\(www[^|]*\).*/\1/'
www.website.pl/some,site.html