can not use "(" in awk command - regex

I tried following awk command in my Linux box, it throws error.
awk -F "VALUES(" '{print $2}'
Error :
awk: fatal: Unmatched ( or \(: /VALUES(/
I also tried with back slash, its also not working.
awk -F "VALUES\(" '{print $2}'
Error :
awk: warning: escape sequence `\(' treated as plain `('
awk: fatal: Unmatched ( or \(: /VALUES(/
Please let me know how to include ( in awk search string.

if the value of -F is longer than 1, it was considered as regex, so you need do something like:
regex character class:
kent$ echo "a foo( b"|awk -F"foo[(]" '{print $1,$2}'
a b
escape the (
if you really want to escape the (, you need:
kent$ echo "a foo( b"|awk -F"foo\\\\(" '{print $1,$2}'
a b
or
kent$ echo "a foo( b"|awk -F'foo\\(' '{print $1,$2}'
a b

You're using Linux, so you're likely using gawk. From the gawk man page on my system:
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS prede-
fined variable).
and:
Fields
As each input record is read, gawk splits the record into fields, using
the value of the FS variable as the field separator. If FS is a single
character, fields are separated by that character. If FS is the null
string, then each individual character becomes a separate field. Otherwise, FS is expected to be a full regular expression.
Since you've got multiple characters in your -F, it's being interpreted as a regular expression. And you have an opening bracket without a closing bracket.
You may be able solve this either by escaping the bracket ("VALUES\("), or by putting the bracket in a group ("VALUES[(]"). I recommend the latter, because escaping things with backslashes can be ugly and unpredictable, and will help point out to you that this is a regex, not a string, when you re-read this script a few months from now.

Related

Using protected wildcard character in awk field separator doesn't work

I have a file that contains paragraphs separated by lines of *(any amount). When I use egrep with the regex of '^\*+$' it works as intended, only displaying the lines that contain only stars.
However, when I use the same expression in awk -F or awk FS, it doesn't work and just prints out the whole document, excluding the lines of stars.
Commands that I tried so far:
awk -F'^\*+$' '{print $1, $2}' msgs
awk -F'/^\*+$/' '{print $1, $2}' msgs
awk 'BEGIN{ FS="/^\*+$/" } ; { print $1,$2 }' msgs
Printing the first field always prints out the whole document, using the first version it excludes the lines with the stars, other versions include everything from the file.
Example input:
Par1 test teststsdsfsfdsf
fdsfdsfdsftesyt
fdsfdsfdsf fddsteste345sdfs
***
Par2 dsadawe232343a5edsfe
43s4esfsd s45s45e4t rfgsd45
***
Par3 dsadasd
fasfasf53sdf sfdsf s45 sdfs
dfsf dsf
***
Par4 dasdasda r3ar d afa fs
ds fgdsfgsdfaser ar53d f
***
Par 5 dasdawr3r35a
fsada35awfds46 s46 sdfsds5 34sdf
***
Expected output for print $1:
Par1 test teststsdsfsfdsf fdsfdsfdsftesyt fdsfdsfdsf fddsteste345sdfs
EDIT: Added example input and expected output
Strings used as regexps in awk are parsed twice:
to turn them into a regexp, and
to use them as a regexp.
So if you want to use a string as a regexp (including any time you assign a Field Separator or Record Separator as a regexp) then you need to double any escapes as each iteration of parsing will consume one of them. See https://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps for details.
Good (a literal/constant regexp):
$ echo 'a(b)c' | awk '$0 ~ /\(b)/'
a(b)c
Bad (a poorly-written dynamic/computed regexp):
$ echo 'a(b)c' | awk '$0 ~ "\(b)"'
awk: cmd. line:1: warning: escape sequence `\(' treated as plain `('
a(b)c
Good (a well-written dynamic/computed regexp):
$ echo 'a(b)c' | awk '$0 ~ "\\(b)"'
a(b)c
but IMHO if you're having to double escapes to make a char literal then it's clearer to use a bracket expression instead:
$ echo 'a(b)c' | awk '$0 ~ "[(]b)"'
a(b)c
Also, ^ in a regexp means "start of string" which is only matched at the start of all the input, just like $ would only be matched at the end of all of the output. ^ does not mean "start of line" as some documents/scripts may lead you to believe. It only appears to mean that in grep and sed because they are line-oriented and so usually the script is being compared to 1 line at a time, but awk isnt line-oriented, it's record-oriented and so the input being compared to the regexp isn't necessarily just a line (the same is true in sed if you read multiple lines into its hold space).
So to match a line of *s as a Record Separator (RS) assuming you're using gawk or some other awk that can treat a multi-char RS as a regexp, you'd have to write this regexp:
(^|\n)[*]+(\n|$)
but be aware that also matches the newlines before the first and after the last *s on the target lines so you need to handle that appropriately in your code.
It seems like this is what you're really trying to do:
$ awk -v RS='(^|\n)[*]+(\n|$)' 'NR==1{$1=$1; print}' file
Par1 test teststsdsfsfdsf fdsfdsfdsftesyt fdsfdsfdsf fddsteste345sdfs

How to grep value from a php array

I have simple php array in a php file. Here is the content :
<?php
$arr = array(
'fookey' => 'foovalue',
'barkey' => 'barvalue'
);
How can I fetch value foovalue using grep command ?
I have tried :
cat file.php | grep 'fookey=>'
Or
cat file.php | grep 'fookey=>*'
but always return the full line.
Your grep command shouldn’t have worked if you are doing it just the way you posted it here.
But if you are getting that line from grep whatever way you are doing,
Pass the output you got from grep through a pipe to
awk -F"'" '{print $4}'
I tested it this way on my pc:
echo "'fookey' => 'foovalue'" | awk -F"'" '{print $4}'
grep 'fookey=>' doesn't return any matches because this regex is not matched. Your example shows a record with single quotes around fookey and a space before the =>.
Also, you want to lose the useless use of cat.
Because your regex contains literal single quotes, we instead use double quotes to protect the regex from the shell.
grep "'fookey' =>" file.php
If your goal is to extract the value inside single quotes after the => the simple standard solution is to use sed instead of grep. On a matching line, replace the surrounding text with nothing before printing the line.
sed "/.*'fookey' => '/!d;s///;s/'.*//" file.php
In some more detail,
/.*'fookey' => '/!d skips any lines which do not match this regex;
s/// replaces the matched regex (which is implied when you pass in an empty regex) with nothing;
s/'.*// replaces everything after the remaining single quote with nothing;
and then sed prints the resulting line (because that's what it always does)
If you get "event not found" errors, you want to set +H or (in the very unlikely event that you really want to use Csh history expansion) figure out how to escape the !; see also echo "#!" fails -- "event not found"
Other than that, we are lucky that the script doesn't contain any characters which are special within double quotes; generally speaking, single quotes are much safer because they really preserve the text between them verbatim, whereas double quotes in the shell are weaker (you have to separately escape any dollar signs, backquotes, or backslashes).
This should do:
awk -F "'" '$2~/fookey/ {print $4}' file
or in your case
awk -F "'" '$2~/secret/ {print $4}' file
It searches for all lines where second filed contains fookey/secret and the print fort field with your password.
To fetch a value from an array why can't you use array_search method instead of grep?
<?php
$arr = array(
'fookey' => 'foovalue',
'barkey' => 'barvalue'
);
echo array_search("foovalue",$arr);
?>
You can use cut in combination with grep to get what you need.
cat file.php | grep 'fookey' | cut -c18-25
cut is used to get substring. In -cN-M, N and M are starting and ending position of the substring.

AWK: get file name from LS

I have a list of file names (name plus extension) and I want to extract the name only without the extension.
I'm using
ls -l | awk '{print $9}'
to list the file names and then
ls -l | awk '{print $9}' | awk /(.+?)(\.[^.]*$|$)/'{print $1}'
But I get an error on escaping the (:
-bash: syntax error near unexpected token `('
The regex (.+?)(\.[^.]*$|$) to isolate the name has a capture group and I think it is correct, while I don't get is not working within awk syntax.
My list of files is like this ABCDEF.ext in the root folder.
Your specific error is caused by the fact that your awk command is incorrectly quoted. The single quotes should go around the whole command, not just the { action } block.
However, you cannot use capture groups like that in awk. $1 refers to the first field, as defined by the input field separator (which in this case is the default: one or more "blank" characters). It has nothing to do with the parentheses in your regex.
Furthermore, you shouldn't start from ls -l to process your files. I think that in this case your best bet would be to use a shell loop:
for file in *; do
printf '%s\n' "${file%.*}"
done
This uses the shell's built-in capability to expand * to the list of everything in the current directory and removes the .* from the end of each name using a standard parameter expansion.
If you really really want to use awk for some reason, and all your files have the same extension .ext, then I guess you could do something like this:
printf '%s\0' * | awk -v RS='\0' '{ sub(/\.ext$/, "") } 1'
This prints all the paths in the current directory, and uses awk to remove the suffix. Each path is followed by a null byte \0 - this is the safe way to pass lists of paths, which in principle could contain any other character.
Slightly less robust but probably fine in most cases would be to trust that no filenames contain a newline, and use \n to separate the list:
printf '%s\n' * | awk '{ sub(/\.ext$/, "") } 1'
Note that the standard tool for simple substitutions like this one would be sed:
printf '%s\n' * | sed 's/\.ext$//'
(.+?) is a PCRE construct. awk uses EREs, not PCREs. Also you have the opening script delimiter ' in the middle of the script AFTER the condition instead of where it belongs, before the start of the script.
The syntax for any command (awk, sed, grep, whatever) is command 'script' so this should be is awk 'condition{action}', not awk condition'{action}'.
But, in any case, as mentioned by #Aaron in the comments - don't parse the output of ls, see http://mywiki.wooledge.org/ParsingLs
Try this.
ls -l | awk '{ s=""; for (i=9;i<=NF;i++) { s = s" "$i }; sub(/\.[^.]+$/,"",s); print s}'
Notes:
read the ls -l output is weird
It doesn't check the items (they are files? directories? ... strip extentions everywhere)
Read the other answers :D
If the extension is always the same pattern try a sed replacement:
ls -l | awk '{print $9}' | sed 's\.ext$\\'

Bash: regular expressions within backticks

I have a file called "align_summary.txt" which looks like this:
Left reads:
Input : 26410324
Mapped : 21366875 (80.9% of input)
of these: 451504 ( 2.1%) have multiple alignments (4372 have >20)
...more text....
... and several more lines of text....
I want to pull out the % of multiple alignments among all left aligned reads (in this case it's 2.1) in bash shell.
If I use this:
pcregrep -M "Left reads.\n..+.\n.\s+Mapped.+.\n.\s+of these" align_summary.txt | awk -F"\\\( " '{print $2}' | awk -F"%" '{print $1}' | sed -n 4p
It promptly gives me the output: 2.1
However, if I enclose the same expression in backticks like this:
leftmultiple=`pcregrep -M "Left reads.\n..+.\n.\s+Mapped.+.\n.\s+of these" align_summary.txt | awk -F"\\\( " '{print $2}' | awk -F"%" '{print $1}' | sed -n 4p`
I receive an error:
awk: syntax error in regular expression ( at
input record number 1, file
source line number 1
As I understand it, enclosing this expression in backticks affects the interpretation of the regular expression that includes "(" symbol, despite the fact that it is escaped by backslashes.
Why does this happen and how to avoid this error?
I would be grateful for any input and suggestions.
Many thanks,
Just use awk:
leftmultiple=$(awk '/these:.*multiple/{sub(" ","",$2);print $2}' FS='[(%]' align_summary.txt )
Always use $(...) instead of backticks but more importantly, just use awk alone:
$ leftmultiple=$( gawk -v RS='^$' 'match($0,/Left reads.\s*\n\s+.+\n\s+Mapped.+.\n.\s+of these[^(]+[(]\s*([^)%]+)/,a) { print a[1] }' align_summary.txt )
$ echo "$leftmultiple"
2.1
The above uses GNU awk 4.* and assumes you do need the complicated regexp that you were using to avoid false matches elsewhere in your input file. If that's not the case then the script can of course get much simpler.

How to remove/strip double or single quote from a string?

I have a file with some lines like these:
ENVIRONMENT="myenv"
ENV_DOMAIN='mydomain.net'
LOGIN_KEY=mykey.pem
I want to extract the parts after the = but without the surrounding quotes. I tried with gsub like this:
awk -F= '!/^(#|$)/ && /^ENVIRONMENT=/ {gsub(/"|'/, "", $2); print $2}'
Which ends up with -bash: syntax error near unexpected token ')' error. It works just fine for single matching: /"/ or /'/ but doesn't work when I try match either one. What am I doing wrong?
If you are just trying to remove the punctuation then you can do it as below....
# remove all punctuation
awk -F= '{print $2}' n.dat | tr -d [[:punct:]]
# only remove single and double quotes
awk -F= '{print $2}' n.dat | tr -d \''"\'
explanation:
tr -d \''"\' is to delete any single and double quotes.
tr -d [[:punct:]] to delete all character from the punctuation class
Sample output as below from 2nd command above (without quotes):
myenv
mydomain.net
mykeypem
The problem is not with awk, but with bash. The single quote inside the gsub is closing the open quote so that bash is trying to parse the command awk with arguments !/^...gsub(/"|/,, ,, $2 and then an unmatched close paren. Try replacing the single quote with '"'"' (so that bash will properly terminate the string, then apply a single quote, then reopen another string.)
Is awk really a requirement? If not, why don't you use a simple sed command:
sed -rn -e "s/^[^#]+='(.*)'$/\1/p" \
-e "s/^[^#]+=\"(.*)\"$/\1/p" \
-e "s/^[^#]+=(.*)/\1/p" data
This might seems over engineered, but it works properly with embedded quotes:
sh$ cat data
ENVIRONMENT="myenv"
ENV_DOMAIN='mydomain.net'
LOGIN_KEY=mykey.pem
PASSWD="good ol'passwd"
sh$ sed -rn -e "s/^[^#]+='(.*)'/\1/p" -e "s/^[^#]+=\"(.*)\"/\1/p" -e "s/^[^#]+=(.*)/\1/p" data
myenv
mydomain.net
mykey.pem
good ol'passwd
You can use awk like this:
awk -F "=['\"]?|['\"]" '{print $2}' file
myenv
mydomain.net
mykey.pem
This will work with your awk
awk -F= '!/^(#|$)/ && /^ENVIRONMENT=/ {gsub(/"/,"",$2);gsub(q,"",$2); print $2}' q=\' file
It is the single quote in the expression that create problems. Add it to an variable and it will work.
I did the following:
awk -F"=\"|='|'|\"|=" '{print $2}' file
myenv
mydomain.net
mykey.pem
This tells awk to use either =", =', ' or " as field separator.
This is because the awk program must be enclosed in single quotes when run as a command line program. The program can be tripped up if a single quote is contained inside the script. Special tricks can be made to use single quotes as strings inside the program. See Shell-Quoting Issues in the GNU Awk Manual.
One trick is to save the match string as a variable:
awk -F\= -v s="'|\"" '{gsub(s, "", $2); print $2}' file
Output:
myenv
mydomain.net
mykey.pem