Shell dynamic variable as condition inside AWK - if-statement

I have a sh (POSIX) script where I'm trying to choose a host from a list of hosts (SSH), this list have some interesting info along the hosts names, like free memory available, users already connected... I want the user of my script to choose what characteristics should have the script in consideration to choose a suitable host from said list.
As this characteristics are optional, what I thought was to build a conditional statment in a shell variable (just text) based in the options the user have chosen. Then, pass this variable to AWK to extract those suitable hosts and choose one of them randomly. By the moment, saild statment or FILTER could look like this:
FILTER='$6 <=MAXUSERS && $7 <=MAXUSERS'
FILTER='$6 <=MAXUSERS && $7 <=MAXUSERS && $9 >=10'
FILTER='$6 <=MAXUSERS && $7 <=MAXUSERS && $9 >=10 && $11 >=8'
($X are just positional parameters as characteristics of the host, I know theirs positions and should not change as long as the HTML source doesnt change either) I build those statments with this piece of code:
setUpFilter(){
FILTER='$6 <=MAXUSERS && $7 <=MAXUSERS'
if [ -n "$CPUNUM" ]; then
FILTER="${FILTER} && \$9 >=$CPUNUM"
fi
if [ -n "$FREEMEM" ]; then
FILTER="${FILTER} && \$12 >=$FREEMEM"
fi
}
Once the statment is built, I use this:
filterHosts(){
setUpFilter
for HOST in $(echo "$AVAILABLEHOSTS"); do # $AVAILABLEHOSTS is a list with only the hosts names
HOSTDATA=$(echo "$REPORTCLEAN" | egrep -A 11 "$HOST") # Report clean is a list of hosts and its data extracted from an HTML file
FILTEREDHOSTS=${FILTEREDHOSTS}$(echo $HOSTDATA | awk -v FILTER="$FILTER" -v MAXUSERS=5 '{
if (FILTER)
print $1 "\\n"
}')
done
}
However this does not work as expected since the "filter" seems like doing nothing, as all the hosts are stored in $FILTEREDHOSTS like the condition was true in all the iterations (And I'm not sure why), however when I manually set the condition inside the AWK script, the "filter" works as expected:
filterHosts(){
setUpFilter
for HOST in $(echo "$AVAILABLEHOSTS"); do # $AVAILABLEHOSTS is a list with only the hosts names
HOSTDATA=$(echo "$REPORTCLEAN" | egrep -A 11 "$HOST") # Report clean is a list of hosts and its data extracted from an HTML file
FILTEREDHOSTS=${FILTEREDHOSTS}$(echo $HOSTDATA | awk -v MAXUSERS=5 '{
if ($6 <=MAXUSERS && $7 <=MAXUSERS && $9 >=10)
print $1 "\\n"
}')
done
}
$HOSTDATA looks like this, btw: HOSTNAME PARAM2 PARAM3 ... PARAM12, so I want to retrieve $1 (The host name) based on some of the params matching some values.
I know that this second piece of code works because I checked the list manually after executing it.
I've tried to make AWK print FILTER, and apparently the statment is correct since it prints just like the firsts examples I wrote above, with nothing expanded or unexpected things, literally like the examples.
I'm not that good in AWK so I have no idea what could be happening, I'm not even sure if this is an issue of AWK at all. By the other hand, I'm wondering if this is the best approach I could get for what I'm trying to accomplish.
Thanks in advance.
UPDATE
As some users suggested, I moved the 'filter' built to the awk script itself, so I ended up with this:
filterHosts(){
for HOST in $(echo "$AVAILABLEHOSTS"); do
HOSTDATA=$(echo "$REPORTCLEAN" | egrep -A 11 "$HOST")
FILTEREDHOSTS=${FILTEREDHOSTS}$(echo $HOSTDATA | awk -v cpunum="$CPUNUM" -v freemem="$FREEMEM" -v MAXUSERS="$1" '{
filter = $6 <=MAXUSERS && $7 <=MAXUSERS
if (length(cpunum) != 0)
filter = filter && $9 >= cpunum
if (length(freemem) != 0)
filter = filter && $12 >= freemem
if (filter)
print $1 "\\n"
}')
done
}
Apparently this works with no issues at first glance, so I consider this issue 'semi-solved' since I not sure why the previous code did not work, but I have a different piece of code that works.

Related

Extracting CGI query parameter values in bash [duplicate]

This question already has answers here:
How to parse $QUERY_STRING from a bash CGI script?
(16 answers)
Closed 3 years ago.
All right, folks, you may have seen this infamous quirk to get hold of those values:
query=`echo $QUERY_STRING | sed "s/=/='/g; s/&/';/g; s/$/'/"`
eval $query
If the query string is host=example.com&port=80 it works just fine and you get the values in bash variables host and port.
However, you may know that a cleverly crafted query string will cause an arbitrary command to be executed on the server side.
I'm looking for a secure replacement or an alternative not using eval. After some research I dug up these alternatives:
read host port <<< $(echo "$QUERY_STRING" | tr '=&' ' ' | cut -d ' ' -f 2,4)
echo $host
echo $port
and
if [[ $QUERY_STRING =~ ^host=([^&]*)\&port=(.*)$ ]]
then
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
else
echo no match, sorry
fi
Unfortunately these two alternatives only work if the pars come in the order host,port. But they could come in the opposite order.
There could also be more than 2 pars, and any order is possible and allowed. So how do you propose to get the values into the
appropriate bash vars? Can the above methods be amended? Remember that with n pars there are n! possible orders. With 2 pars
there are only 2, but with 3 pars there are already 3! = 6.
I returned to the first method. Can it be made safe to run eval? Can you transform $QUERY_STRING with sed in a way that
makes it safe to do eval $query ?
EDIT: Note that this question differs from the other one referred to and is not a duplicate. The emphasis here is on using eval in a safe way. That is not answered in the other thread.
This method is safe. It does not eval or execute the QUERY_STRING. It uses string manipulation to break up the string into pieces:
QUERY_STRING='host=example.com&port=80'
declare -a pairs
IFS='&' read -ra pairs <<<"$QUERY_STRING"
declare -A values
for pair in "${pairs[#]}"; do
IFS='=' read -r key value <<<"$pair"
values["$key"]="$value"
done
echo do something with "${values[host]}" and "${values[port]}"
URL "percent decoding" left as an exercise.
You must avoid executing strings at all time when they come from untrusted sources. Therefore I would strongly suggest never to use eval in Bash do something with a string.
To be really save, I think I would echo the string into a file, use grep to retrieve parts of the string and remove the file afterwards. Always use a directory out of the web root.
#! /bin/bash
MYFILE=$(mktemp)
QUERY_STRING='host=example.com&port=80&host=lepmaxe.moc&port=80'
echo "${QUERY_STRING}" > ${MYFILE}
TMP_ARR=($(grep -Eo '(host|port)[^&]*' ${MYFILE}))
[ ${#TMP_ARR} -gt 0 ] || exit 1
[ $((${#TMP_ARR} % 2)) -eq 0 ] || exit 1
declare -A ARRAY;
for ((i = 0; i < ${#TMP_ARR[#]}; i+=2)); do
tmp=$(echo ${TMP_ARR[#]:$((i)):2})
port=$(echo $tmp | sed -r 's/.*port=([^ ]*).*/\1/')
host=$(echo $tmp | sed -r 's/.*host=([^ ]*).*/\1/')
ARRAY[$host]=$port
done
for i in ${!ARRAY[#]}; do
echo "$i = ${ARRAY[$i]}"
done
rm ${MYFILE}
exit 0
This produces:
lepmaxe.moc = 80
example.com = 80

Numeric expression in if condition of awk

Pretty new to AWK programming. I have a file1 with entries as:
15>000000513609200>000000513609200>B>I>0011>>238/PLMN/000100>File Ef141109.txt>0100-75607-16156-14 09-11-2014
15>000000513609200>000000513609200>B>I>0011>Danske Politi>238/PLMN/000200>>0100-75607-16156-14 09-11-2014
15>000050354428060>000050354428060>B>I>0011>Danske Politi>238/PLMN/000200>>4100-75607-01302-14 31-10-2014
I want to write a awk script, where if 2nd field subtracted from 3rd field is a 0, then it prints field 2. Else if the (difference > 0), then it prints all intermediate digits incremented by 1 starting from 2nd field ending at 3rd field. There will be no scenario where 3rd field is less than 2nd. So ignoring that condition.
I was doing something as:
awk 'NR > 2 { print p } { p = $0 }' file1 | awk -F">" '{if ($($3 - $2) == 0) print $2; else l = $($3 - $2); for(i=0;i<l;i++) print $2++; }'
(( Someone told me awk is close to C in terms of syntax ))
But from the output it looks to me that the String to numeric or numeric to string conversions are not taking place at right place at right time. Shouldn't it be taken care by AWK automatically ?
The OUTPUT that I get:
513609200
513609201
513609200
Which is not quiet as expected. One evident issue is its ignoring the preceding 0s.
Kindly help me modify the AWK script to get the desired result.
NOTE:
awk 'NR > 2 { print p } { p = $0 }' file1 is just to remove the 1st and last entry in my original file1. So the part that needs to be fixed is:
awk -F">" '{if ($($3 - $2) == 0) print $2; else l = $($3 - $2); for(i=0;i<l;i++) print $2++; }'
In awk, think of $ as an operator to retrieve the value of the named field number ($0 being a special case)
$1 is the value of field 1
$NF is the value of the field given in the NF variable
So, $($3 - $2) will try to get the value of the field number given by the expression ($3 - $2).
You need fewer $ signs
awk -F">" '{
if ($3 == $2)
print $2
else {
v=$2
while (v < $3)
print v++
}
}'
Normally, this will work, but your numbers are beyond awk integer bounds so you need another solution to handle them. I'm posting this to initiate other solutions and better illustrate your specifications.
$ awk -F'>' '{for(i=$2;i<=$3;i++) print i}' file
note that this will skip the rows that you say impossible to happen
A small scale example
$ cat file_0
x>1000>1000>etc
x>2000>2003>etc
x>3000>2999>etc
$ awk -F'>' '{for(i=$2;i<=$3;i++) print i}' file_0
1000
2000
2001
2002
2003
Apparently, newer versions of gawk has --bignum options for arbitrary precision integers, if you have a compatible version that may solve your problem but I don't have access to verify.
For anyone who does not have ready access to gawk with bigint support, it may be simpler to consider other options if some kind of "big integer" support is required. Since ruby has an awk-like mode of operation,
let's consider ruby here.
To get started, there are just four things to remember:
invoke ruby with the -n and -a options (-n for the awk-like loop; -a for automatic parsing of lines into fields ($F[i]));
awk's $n becomes $F[n-1];
explicit conversion of numeric strings to integers is required;
To specify the lines to be executed on the command line, use the '-e TEXT' option.
Thus a direct translation of:
awk -F'>' '{for(i=$2;i<=$3;i++) print i}' file
would be:
ruby -an -F'>' -e '($F[1].to_i .. $F[2].to_i).each {|i| puts i }' file
To guard against empty lines, the following script would be slightly better:
($F[1].to_i .. $F[2].to_i).each {|i| puts i } if $F.length > 2
This could be called as above, or if the script is in a file (say script.rb) using the incantation:
ruby -an -F'>' script.rb file
Given the OP input data, the output is:
513609200
513609200
50354428060
The left-padding can be accomplished in several ways -- see for example this SO page.

Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)

I have a bunch of daily printer logs in CSV format and I'm writing a script to keep track of how much paper is being used and save the info to a database, but I've come across a small problem
Essentially, some of the document names in the logs include commas in them (which are all enclosed within double quotes), and since it's in comma separated format, my code is messing up and pushing everything one column to the right for certain records.
From what I've been reading, it seems like the best way to go about fixing this would be using awk or sed, but I'm unsure which is the best option for my situation, and how exactly I'm supposed to implement it.
Here's a sample of my input data:
2015-03-23 08:50:22,Jogn.Doe,1,1,Ineo 4000p,"MicrosoftWordDocument1",COMSYRWS14,A4,PCL6,,,NOT DUPLEX,GRAYSCALE,35kb,
And here's what I have so far:
#!/bin/bash
#Get today's file name
yearprefix="20"
currentdate=$(date +"%m-%d-%y");
year=${currentdate:6};
year="$yearprefix$year"
month=${currentdate:0:2};
day=${currentdate:3:2};
filename="papercut-print-log-$year-$month-$day.csv"
echo "The filename is: $filename"
# Remove commas in between quotes.
#Loop through CSV file
OLDIFS=$IFS
IFS=,
[ ! -f $filename ] && { echo "$Input file not found"; exit 99; }
while read time user pages copies printer document client size pcl blank1 blank2 duplex greyscale filesize blank3
do
#Remove headers
if [ "$user" != "" ] && [ "$user" != "User" ]
then
#Remove any file name with an apostrophe
if [[ "$document" =~ "'" ]];
then
document="REDACTED"; # Lazy. Need to figure out a proper solution later.
fi
echo "$time"
#Save results to database
mysql -u username -p -h localhost -e "USE printerusage; INSERT INTO printerlogs (time, username, pages, copies, printer, document, client, size, pcl, duplex, greyscale, filesize) VALUES ('$time', '$user', '$pages', '$copies', '$printer', '$document', '$client', '$size', '$pcl', '$duplex', '$greyscale', '$filesize');"
fi
done < $filename
IFS=$OLDIFS
Which option is more suitable for this task? Will I have to create a second temporary file to get this done?
Thanks in advance!
As I wrote in another answer:
Rather than interfere with what is evidently source data, i.e. the stuff inside the quotes, you might consider replacing the field-separator commas (with say |) instead:
s/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g
And then splitting on | (assuming none of your data has | in it).
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern
There is probably an easier way using sed alone, but this should work. Loop on the file, for each line match the parentheses with grep -o then replace the commas in the line with spaces (or whatever it is you would like to use to get rid of the commas - if you want to preserve the data you can use a non printable and explode it back to commas afterward).
i=1 && IFS=$(echo -en "\n\b") && for a in $(< test.txt); do
var="${a}"
for b in $(sed -n ${i}p test.txt | grep -o '"[^"]*"'); do
repl="$(sed "s/,/ /g" <<< "${b}")"
var="$(sed "s#${b}#${repl}#" <<< "${var}")"
done
let i+=1
echo "${var}"
done

Parsing links from html with gawk

I'm trying to take googles html, and parse out the links. I use curl obtain the html then pass it to gawk. From gawk I used the match() function, and it works but it only returns a small amount of links. Maybe 10 at most. If I test my regex on regex101.com it returns 51 links using the g global modifier. How can I use this in gawk to obtain all the links (relative and absolute)?
#!/bin/bash
html=$(curl -L "http://google.com")
echo "${html}" | gawk '
BEGIN {
RS=" "
IGNORECASE=1
}
{
match($0, /href=\"([^\"]*)/, array);
if (length(array[1]) > 0) {
print array[1];
}
}'
Instead of awk you can also use grep -oP:
curl -sL "http://google.com" | grep -iPo 'href="\K[^"]+'
However this is also fetching 31 links for me. This may vary with your browser because google.com serves a different page for different locations/signed in users.
Match only matches the leftmost match, you need to update the line each time.
Try
curl -sL "http://google.com" | gawk '{while(match($0, /href=\"([^\"]+)/, array)){
$0=substr($0,RSTART+RLENGTH);print array[1]}}'

How to use regex to match ASTERISK in awk

I'm stil pretty new to regular expression and just started learning to use awk. What I am trying to accomplish is writing a ksh script to read-in lines from text, and and for every lines that match the following:
*RECORD 0000001 [some_serial_#]
to replace $2 (i.e. 000001) with a different number. So essentially the script read in batch record dump, and replace the record number with date+record#, and write to separate file.
So this is what I'm thinking the format should be:
awk 'match($0,"/*RECORD")!=0{$2="$DATE-n++"; print $0} match($0,"/*RECORD")==0{print $0}' $BATCH > $OUTPUT
but obviously "/*RECORD" is not going to work, and I'm not sure if changing $2 and then write the whole line is the correct way to do this. So I am in need of some serious enlightenment.
So you want your example line to look like
*RECORD $DATE-n++ [some_serial_#]
after awk's done with it?
awk '{ if (match($0, "*RECORD") != 0) { $2="$DATE-n++"; }; print }' $BATCH > $OUTPUT
Based on your update, it looks like you instead expect $DATE to be an environment variable which is used in the awk expression and n is a variable in the awk script that keeps count of how many records matched the pattern. Given that, this may look more like what you want.
$ cat script.awk
BEGIN { n=0 }
{
if (match($0, "\*RECORD") != 0) {
n++;
$2 = (ENVIRON["DATE"] "-" n);
}
print;
}
$ awk -f script.awk $BATCH > $OUTPUT
use equality.
D=$(date +%Y%m%d)
awk -vdate="$D" '
{
for(i=1;i<=NF;i++){
if ( $i == "*RECORD" ){
$(i+1) = date"00002"
break # break after searching for one record, otherwise, remove break
}
}
}1' file