How to remove/strip double or single quote from a string? - regex

I have a file with some lines like these:
ENVIRONMENT="myenv"
ENV_DOMAIN='mydomain.net'
LOGIN_KEY=mykey.pem
I want to extract the parts after the = but without the surrounding quotes. I tried with gsub like this:
awk -F= '!/^(#|$)/ && /^ENVIRONMENT=/ {gsub(/"|'/, "", $2); print $2}'
Which ends up with -bash: syntax error near unexpected token ')' error. It works just fine for single matching: /"/ or /'/ but doesn't work when I try match either one. What am I doing wrong?

If you are just trying to remove the punctuation then you can do it as below....
# remove all punctuation
awk -F= '{print $2}' n.dat | tr -d [[:punct:]]
# only remove single and double quotes
awk -F= '{print $2}' n.dat | tr -d \''"\'
explanation:
tr -d \''"\' is to delete any single and double quotes.
tr -d [[:punct:]] to delete all character from the punctuation class
Sample output as below from 2nd command above (without quotes):
myenv
mydomain.net
mykeypem

The problem is not with awk, but with bash. The single quote inside the gsub is closing the open quote so that bash is trying to parse the command awk with arguments !/^...gsub(/"|/,, ,, $2 and then an unmatched close paren. Try replacing the single quote with '"'"' (so that bash will properly terminate the string, then apply a single quote, then reopen another string.)

Is awk really a requirement? If not, why don't you use a simple sed command:
sed -rn -e "s/^[^#]+='(.*)'$/\1/p" \
-e "s/^[^#]+=\"(.*)\"$/\1/p" \
-e "s/^[^#]+=(.*)/\1/p" data
This might seems over engineered, but it works properly with embedded quotes:
sh$ cat data
ENVIRONMENT="myenv"
ENV_DOMAIN='mydomain.net'
LOGIN_KEY=mykey.pem
PASSWD="good ol'passwd"
sh$ sed -rn -e "s/^[^#]+='(.*)'/\1/p" -e "s/^[^#]+=\"(.*)\"/\1/p" -e "s/^[^#]+=(.*)/\1/p" data
myenv
mydomain.net
mykey.pem
good ol'passwd

You can use awk like this:
awk -F "=['\"]?|['\"]" '{print $2}' file
myenv
mydomain.net
mykey.pem

This will work with your awk
awk -F= '!/^(#|$)/ && /^ENVIRONMENT=/ {gsub(/"/,"",$2);gsub(q,"",$2); print $2}' q=\' file
It is the single quote in the expression that create problems. Add it to an variable and it will work.

I did the following:
awk -F"=\"|='|'|\"|=" '{print $2}' file
myenv
mydomain.net
mykey.pem
This tells awk to use either =", =', ' or " as field separator.

This is because the awk program must be enclosed in single quotes when run as a command line program. The program can be tripped up if a single quote is contained inside the script. Special tricks can be made to use single quotes as strings inside the program. See Shell-Quoting Issues in the GNU Awk Manual.
One trick is to save the match string as a variable:
awk -F\= -v s="'|\"" '{gsub(s, "", $2); print $2}' file
Output:
myenv
mydomain.net
mykey.pem

Related

shell multiline selection from word to character

.textexpandrc
[yoro] よろしくお願いします。
[ohayo] おはようございます。
元気ですか?
[otsu] お疲れさまでします。
Looking for
$ KEY=ohayo; awk "???" ~/.textexpandrc
おはようございます。
元気ですか?
awk or sed is fine, but I'd like to avoid using a mix of awk/sed/perl/tr/cut etc because I'm under the impression that awk is robust enough to handle this on its own.
The best I could find on my own was
$ KEY=ohayo; awk "/\[${KEY}/,/\[otsu/" ~/.textexpandrc | sed "s/\[${KEY}\] //" | grep -v otsu
おはようございます。
元気ですか?
But I need to know the next key in advance (not impossible but ugly). Strangely, if asking awk to search until the square bracket, it fails to select a multiline
$ KEY=ohayo; awk "/\[${KEY}/,/\[/" ~/.textexpandrc
[ohayo] おはようございます。
Currently using a single-line parser solution as follow
#!/usr/bin/env bash
CONFIG=${HOME}/.textexpandrc
ALL_KEYS=$(sed 's/\].*/]/' ${CONFIG} | tr -d '[]')
KEY=$(echo $ALL_KEYS | rofi -sep ' ' -dmenu -p "autocomplete")
grep "\[${KEY}\]" $CONFIG | sed "s/\[${KEY}\] //" | xsel -ib # ← HERE
xdotool key ctrl+shift+v
If you set up the RS and FS variables to match [ and ], this works quite well:
awk 'BEGIN{ RS="\["; FS="\] " }; $1 ~ key { print $2 }' key=ohayo tmp.txt
You pass in the parameter you're searching for using key=.... on the command line instead of setting a variable. This makes it much easier to write the awk script within single quotes.

Access the word in the file with grep

I have a conf file and I use grep to access the data in this file but not a very useful method for me.
How can I just get the main word by search-term?
I using:
grep "export:" /etc/VDdatas.conf
Print:
export: HelloWorld
I want: (without "export: ")
HelloWorld
How can I do that?
If you're using GNU grep you can use PCRE and a lookbehind:
grep -P -o '(?<=export:).*' /etc/VDdatas.conf
The -o option means to print only the part of the line that matches the regexp, and using a lookbehind for the export: prefix makes it not part of the match.
You can also use sed or awk
sed 's/export:/s/^export: //' /etc/VDdatas.conf
awk '/export:/ {print $2}' /etc/VDdatas.conf
I suggest you pipe the match to awk.
grep "export:" /etc/VDdatas.conf | awk -F ' ' '{print $2}'
This will print the second word in the output (after splitting the line on spaces).

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd - should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After
Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.
This might work for you (GNU sed):
sed 's/-[^-]*//2g' file
You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u
#EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub():
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
Always print the 1st field (print $1), followed by:
If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:
$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
awk '$0 = $2 ? $1 FS $2 : $1' FS=-
Result
After-u
After
This will do it in awk:
echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'

How to search pattern in a file by Linux CLI?

I've got log file with lines like:
07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}
How do I obtain only www.website.pl/some,site.html from all lines?
Can this be done with "sed" or other command?
Cut also supports delimiter and field(s) selection.
$ cut -d\| -f7
07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}
www.website.pl/some,site.html
Yes, with awk.
Simply process your file with
awk -F '|' '{print $7}'
A little transcript on your example line:
$ echo '07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}' | awk -F '|' '{print $7}'
www.website.pl/some,site.html
CAVEAT This assumes there are no other pipes in your file except those used for delimters.
This might work for you:
echo '07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}'|
sed 's/^\(\([^|]*\)|\)\{7\}.*/\2/'
www.website.pl/some,site.html
Or if the sites all begin www:
echo '07:44:24||||234.234.234.234|123.123.123.123|www.website.pl/some,site.html|a:0:{}'|
sed 's/.*\(www[^|]*\).*/\1/'
www.website.pl/some,site.html

Get An Specified Match Under a String

I'm trying to match the contents of a string that contains sequences of quotes using Shell Script, at the time the far I got was this:
et="\"He\" \"llo\""
echo $et | sed -e '/\"(.*?)\"/g'
Which returns this:
"He" "llo"
But I don't want the quote marks to appear on the result, also how can I echo only the first, or the second, or the third, etc. match?
sed -e 's/"\([^"]*\)"/\1/g' will remove quotes around balanced " quotes. To only show the first, second match etc with sed you probably have to make different capture groups.
$ echo '"1" "2" "3"' | sed -e 's/"\([^"]*\)" "\([^"]*\)" "\([^"]*\)"/\2/g'
2
$
Provided that what is wanted is only the text between the first pair of quotes, here is a solution with perl:
echo $et | perl -ne '/"[^"]+"/ and print "$&\n";'
This will also handle quotes witin quotes if they are preceded by a backslash:
echo $et | perl -ne '/"[^"\\]+(\\.[^"]*)*"/ and print "$&\n";'
This is much simpler with awk since you can specify the double-quote to be the field separator.
$ et='"He" "llo"'
$ awk -F'"' '{print $2}' <<<$et
He
$ awk -F'"' '{print $4}' <<<$et
llo
Note: This is also scalable and the strings fields will be in multiples of two, i.e $2, $4, $6, etc.
You can also do something like this:
[srikanth#myhost ~]$ echo "\"He\" \"llo\"" | awk ' { match($0,/([A-Za-z]+)[" ]+([A-Za-z]+)/,a); print a[1]","a[2]} '
He,llo