Extracting CGI query parameter values in bash [duplicate] - regex

This question already has answers here:
How to parse $QUERY_STRING from a bash CGI script?
(16 answers)
Closed 3 years ago.
All right, folks, you may have seen this infamous quirk to get hold of those values:
query=`echo $QUERY_STRING | sed "s/=/='/g; s/&/';/g; s/$/'/"`
eval $query
If the query string is host=example.com&port=80 it works just fine and you get the values in bash variables host and port.
However, you may know that a cleverly crafted query string will cause an arbitrary command to be executed on the server side.
I'm looking for a secure replacement or an alternative not using eval. After some research I dug up these alternatives:
read host port <<< $(echo "$QUERY_STRING" | tr '=&' ' ' | cut -d ' ' -f 2,4)
echo $host
echo $port
and
if [[ $QUERY_STRING =~ ^host=([^&]*)\&port=(.*)$ ]]
then
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
else
echo no match, sorry
fi
Unfortunately these two alternatives only work if the pars come in the order host,port. But they could come in the opposite order.
There could also be more than 2 pars, and any order is possible and allowed. So how do you propose to get the values into the
appropriate bash vars? Can the above methods be amended? Remember that with n pars there are n! possible orders. With 2 pars
there are only 2, but with 3 pars there are already 3! = 6.
I returned to the first method. Can it be made safe to run eval? Can you transform $QUERY_STRING with sed in a way that
makes it safe to do eval $query ?
EDIT: Note that this question differs from the other one referred to and is not a duplicate. The emphasis here is on using eval in a safe way. That is not answered in the other thread.

This method is safe. It does not eval or execute the QUERY_STRING. It uses string manipulation to break up the string into pieces:
QUERY_STRING='host=example.com&port=80'
declare -a pairs
IFS='&' read -ra pairs <<<"$QUERY_STRING"
declare -A values
for pair in "${pairs[#]}"; do
IFS='=' read -r key value <<<"$pair"
values["$key"]="$value"
done
echo do something with "${values[host]}" and "${values[port]}"
URL "percent decoding" left as an exercise.

You must avoid executing strings at all time when they come from untrusted sources. Therefore I would strongly suggest never to use eval in Bash do something with a string.
To be really save, I think I would echo the string into a file, use grep to retrieve parts of the string and remove the file afterwards. Always use a directory out of the web root.
#! /bin/bash
MYFILE=$(mktemp)
QUERY_STRING='host=example.com&port=80&host=lepmaxe.moc&port=80'
echo "${QUERY_STRING}" > ${MYFILE}
TMP_ARR=($(grep -Eo '(host|port)[^&]*' ${MYFILE}))
[ ${#TMP_ARR} -gt 0 ] || exit 1
[ $((${#TMP_ARR} % 2)) -eq 0 ] || exit 1
declare -A ARRAY;
for ((i = 0; i < ${#TMP_ARR[#]}; i+=2)); do
tmp=$(echo ${TMP_ARR[#]:$((i)):2})
port=$(echo $tmp | sed -r 's/.*port=([^ ]*).*/\1/')
host=$(echo $tmp | sed -r 's/.*host=([^ ]*).*/\1/')
ARRAY[$host]=$port
done
for i in ${!ARRAY[#]}; do
echo "$i = ${ARRAY[$i]}"
done
rm ${MYFILE}
exit 0
This produces:
lepmaxe.moc = 80
example.com = 80

Related

Regular Expression to search for a number between two

I am not very familiar with Regular Expressions.
I have a requirement to extract all lines that match an 8 digit number between any two given numbers (for example 20200628 and 20200630) using regular expression. The boundary numbers are not fixed, but need to be parameterized.
In case you are wondering, this number is a timestamp, and I am trying to extract information between two dates.
HHHHH,E.164,20200626113247
HHHHH,E.164,20200627070835
HHHHH,E.164,20200628125855
HHHHH,E.164,20200629053139
HHHHH,E.164,20200630125855
HHHHH,E.164,20200630125856
HHHHH,E.164,20200626122856
HHHHH,E.164,20200627041046
HHHHH,E.164,20200628125856
HHHHH,E.164,20200630115849
HHHHH,E.164,20200629204531
HHHHH,E.164,20200630125857
HHHHH,E.164,20200630125857
HHHHH,E.164,20200626083628
HHHHH,E.164,20200627070439
HHHHH,E.164,20200627125857
HHHHH,E.164,20200628231003
HHHHH,E.164,20200629122857
HHHHH,E.164,20200630122237
HHHHH,E.164,20200630122351
HHHHH,E.164,20200630122858
HHHHH,E.164,20200630122857
HHHHH,E.164,20200630084722
Assuming the above data is stored in a file named data.txt, the idea is to sort it on the 3rd column delimited by the comma (i.e. sort -nk3), and then pass the sorted output through this perl filter, as demonstrated by this find_dates.sh script:
#!/bin/bash
[ $# -ne 3 ] && echo "Expects 3 args: YYYYmmdd start, YYYYmmdd end, and data filename" && exit
DATE1=$1
DATE2=$2
FILE=$3
echo "$DATE1" | perl -ne 'exit 1 unless /^\d{8}$/'
[ $? -ne 0 ] && echo "ERROR: First date is invalid - $DATE1" && exit
echo "$DATE2" | perl -ne 'exit 1 unless /^\d{8}$/'
[ $? -ne 0 ] && echo "ERROR: Second date is invalid - $DATE2" && exit
[ ! -r "$FILE" ] && echo "ERROR: File not found - $FILE" && exit
cat $FILE | sort -t, -nk3 | perl -ne '
BEGIN { $date1 = shift; $date2 = shift }
print if /164,$date1/ .. /164,$date2/;
print if /164,$date2/;
' $DATE1 $DATE2 | sort -u
Running the command find_dates.sh 20200627 20200629 data.txt will produce the result:
HHHHH,E.164,20200627041046
HHHHH,E.164,20200627070439
HHHHH,E.164,20200627070835
HHHHH,E.164,20200627125857
HHHHH,E.164,20200628125855
HHHHH,E.164,20200628125856
HHHHH,E.164,20200628231003
HHHHH,E.164,20200629053139
HHHHH,E.164,20200629122857
HHHHH,E.164,20200629204531
For the example you gave, between 20200628 and 20200630, you may try:
\b202006(?:2[89]|30)
Demo
I might be tempted to make the general comment that regex is not very suitable for finding numerical ranges (whereas application programming languages are). However, in the case of parsing a text log file, regex is what would be easily available.

Substring removal in bash

I'm currently trying to get into bash regular expressions to change multiple filenames at the same time. Here are the file names:
a_001_D_xy_S37_L003_R1_001.txt
a_001_D_xy_S37_L003_R2_001.txt
a_002_D_xy_S37_L006_R1_001.txt
a_002_D_xy_S37_L006_R2_001.txt
a_003_D_xy_S23_L003_R1_001.txt
a_003_D_xy_S23_L003_R2_001.txt
I want this as my result:
a_002_D_xy_R1.txt
a_002_D_xy_R2.txt
...
I only want to change those with *001.txt at the end. First I want to remove the _S.._L00. in the filenames and the 001 in the end. I split this procedure in two parts:
for file in *001.txt;
do
echo ${file#_S.._L..6}
done
This loop already does not work. As a second alternative I tried:
for file in *001.fastq.gz;
do
echo ${file/_S.._L00./}
done
but the filenames are again unchanged. (I just use echo here to see the results. If it works I will replace it with mv ${file} ${regularexpression})
Thanks for help!
Considering that you need lots of different fields it is possibly better to just split the filename and then reconstruct it as you wish.
I suggest using an array built by splitting the original filename with _. Then you just reconstruct the new name by using the fields that you wish.
for file in *001.txt; do
echo "FILE: $file"
IFS='_' read -r -a fileFields <<< "$file"
echo "FILE FIELDS: "
for index in "${!fileFields[#]}"; do
echo "- $index ${fileFields[index]}"
done
fileName="${fileFields[0]}_${fileFields[1]}_${fileFields[2]}_${fileFields[3]}_${fileFields[-2]}.txt"
echo "NEW FILE NAME: $fileName"
# mv $file $fileName
done
The echo commands are just for debuging, you can remove them all once you understand the code.
However, if you really need to split the string using BASH expressions you can check this post:
Extracting part of a string to a variable in bash or take a look at this BASH cheat sheet.
Try to make a function, you'll first have to decide the number (n) of files.
n=$(ls *_001.txt | wc -l)
functionRename(){
for(( i=1; i <=n; i++))
do
file=$(ls *_001.txt | head -n $i | tail -n 1)
mv "${file}" "${file%_S??_*}${file#???????????????????}"
file2=$(ls *_001.txt | head -n $i | tail -n 1)
mv "${file2}" "${file2%_001*}.txt"
done
}
functionRename

Changing file extensions with sed [duplicate]

This question already has answers here:
How to use sed to change file extensions?
(7 answers)
Closed 5 years ago.
If the arguments are files, I want to change their extensions to .file.
That's what I got:
#!/bin/bash
while [ $# -gt 0 ]
do
if [ -f $1 ]
then
sed -i -e "s/\(.*\)\(\.\)\(.*\)/\1\2file" $1
fi
shift
done
The script is running, but it doesn't do anything. Another problem is that file hasn't any extension, my sed command will not work, right? Please help.
sed is for manipulating the contents of files, not the filename itself.
Option 1, taken from this answer by John Smith:
filename="file.ext1"
mv "${filename}" "${filename/%ext1/ext2}"
Option 2, taken from this answer by chooban:
rename 's/\.ext/\.newext/' ./*.ext
Option 3, taken from this answer by David W.:
$ find . -name "*.ext1" -print0 | while read -d $'\0' file
do
mv $file "${file%.*}.ext2"
done
and more is here.
UPDATE : (in comment asked what % and {} doing?)
"${variable}othter_chars" > if you want expand a variable in string you can use it. and %.* in {} means take the value of variable strip off the pattern .* from the tail of the value for example if your variable be filename.txt "${variable%.*} return just filename.
Using a shell function to wrap a sed evaluate (e) command:
mvext ()
{
ext="$1";
while shift && [ "$1" ]; do
sed 's/.*/mv -iv "&" "&/
s/\(.*\)\.[^.]*$/\1/
s/.*/&\.'"${ext}"'"/e' <<< "$1";
done
}
Tests, given files bah and boo, and the extension should be .file, which is then changed to .buzz:
mvext file bah boo
mvext buzz b*.file
Output:
'bah' -> 'bah.file'
'boo' -> 'boo.file'
'bah.file' -> 'bah.buzz'
'boo.file' -> 'boo.buzz'
How it works:
The first arg is the file extension, which is stored in $ext.
The while loop parses each file name separately, since a name might include escaped spaces and whatnot. If the filenames were certain to have not such escaped spaces, the while loop could probably be avoided.
sed reads standard input, provided by a bash here string <<< "$1".
The sed code changes each name foo.bar (or even just plain foo) to the string "mv -iv foo.bar
foo.file" then runs that string with the evaluate command. The -iv options show what's been moved and prompts if an existing file might be overwritten.

Bash: Replace array value with curl result

I have a text file named raw.txt with something like the following:
T DOTTY CRONO 52/50 53/40 54/30 55/20 RESNO NETKI
U CYMON DENDU 51/50 52/40 53/30 54/20 DOGAL BEXET
V YQX KOBEV 50/50 51/40 52/30 53/20 MALOT GISTI
W VIXUN LOGSU 49/50 50/40 51/30 52/20 LIMRI XETBO
X YYT NOVEP 48/50 49/40 50/30 51/20 DINIM ELSOX
Y DOVEY 42/60 44/50 47/40 49/30 50/20 SOMAX ATSUR
Z SOORY 43/50 46/40 48/30 49/20 BEDRA NERTU
A DINIM 51/20 52/30 50/40 47/50 RONPO COLOR
B SOMAX 50/20 51/30 49/40 46/50 URTAK BANCS
C BEDRA 49/20 50/30 48/40 45/50 VODOR RAFIN
D ETIKI 48/15 48/20 49/30 47/40 44/50 BOBTU JAROM
E 46/40 43/50 42/60 DOVEY
F 45/40 42/50 41/60 JOBOC
G 43/40 41/50 40/60 SLATN
I'm reading it into an array:
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
done < raw.txt
I'm trying to replace all occurrences of [A-Z]{5} with an curl result where the match of [A-Z]{5} is fed as a variable into the curl call.
First match to be replaced would be DOTTY. The call looks similar to curl -s http://example.com/api_call/DOTTY and the result is something like -55.5833 50.6333 which should replace DOTTY in the array.
I was so far unable to correctly match the desired string and feed the match into curl.
Your help is greatly appreciated.
All the best,
Chris
EDIT:
Solution
Working solution based on #Kevin extensive answer and #Floris hint about a possible carriage return in the curl result. This was indeed the case. Thank you! Combined with some tinkering on my side I now got it to work.
#!/bin/bash
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
i=0
for str in ${array[#]}; do
if [[ "$str" =~ [A-Z]{5} ]]; then
curl_tmp=$(curl -s http://example.com/api_call/$str)
# cut off line break
curl=${curl_tmp/$'\r'}
# insert at given index
declare array[$i]="$curl"
fi
let i++
done
# write to file
for index in "${array[#]}"; do
echo $index
done >> $WORK_DIR/nats.txt
done < raw.txt
I didn't change anything about your script except add the matching part, since it seems that's what you're needing help on:
#!/bin/bash
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
for str in ${array[#]}; do
if [[ "$str" =~ [A-Z]{5} ]]; then
echo curl "http://example.com/api_call/$str"
fi
done
done < raw.txt
EDIT: added in the url example you provided with the variable in the URI. You can do whatever you need with the fetched output by changing it to do_something "$(curl ...)"
EDIT2: Since you're wanting to maintain the bash array you create from each line, how about this:
I'm not great at bash when it comes to arrays, so I expect someone to call me out on it, but this should work.
I've left some echos there so you can see what it's doing. The shift commands are to push the array index from the current location when the regex matches. The tmp variable to hold your curl output could probably be improved, but this should get you started, I hope.
removed temporarily to avoid confusion
EDIT3: Oops the above didn't actually work. My mistake. Let me try again here.
EDIT4:
#!/bin/bash
while read line; do
set $line
IFS=' ' read -a array <<< "$line"
i=0
# echo ${array[#]} below is just so you can see it before processing. You can remove this
echo "Array before processing: ${array[#]}"
for str in ${array[#]}; do
if [[ "$str" =~ [A-Z]{5} ]]; then
# replace the echo command below with your curl command
# ie - curl="$(curl http://example.com/api_call/$str)"
curl="$(echo 1234 -1234)"
if [[ "$flag" = "1" ]]; then
array=( ${adjustedArray[#]} )
push=$(( $push + 2 ));
let i++
else
push=1
fi
adjustedArray=( ${array[#]:0:$i} ${curl[#]} ${array[#]:$(( $i + $push)):${#array[#]}} )
#echo "DEBUG adjustedArray in loop: ${adjustedArray[#]}"
flag=1;
fi
let i++
done
unset flag
echo "final: ${adjustedArray[#]}"
# do further processing here
done < raw.txt
I know there's a smarter way to do this than the above, but we're getting into areas in bash where I'm not really suited to give advice. The above should work, but I'm hoping someone can do better.
Hope it helps, anyway
ps - You should probably not use a shell script for this unless you really need to. Perl, php, or python would make the code simple and readable
Since I misread the first time:
How about just using sed?
sed "s/\([A-Z]\{5\}\)/$(echo curl http:\\/\\/example.com\\/api_call\\/\\1)/g" /tmp/raw.txt
Try that, then try removing the echo. I'm not 100% on this since I can't run it on the real domain
EDIT: And just so I'm clear, the echo is just there so you can see what it will do with the echo removed
create a file cmatch:
#!/bin/bash
while read line
do
echo $line
a=`echo $line | egrep -o '\b[A-Z]{5}\b'`
for v in $a
do
echo "doing curl to replace $v in $line"
r=`curl -s http://example.com/api_call/$v`
r1=`echo $r | xargs echo`
line=`echo $line | sed 's/'$v'/'$r1'/'`
done
done
then call it with
chmod 755 cmatch
./cmatch < inputfile.txt > outputfile.txt
It will do what you asked
Notes:
the \b before and after the [A-Z]{5} ensures that ABCDEFG (which is not a five letter word) will not match.
using egrep -o produces an array of matches
I loop over this array to allow the replacement of multiple matches in a line
I update the line for each match found using the result of the curl call
to keep code clean, I assign the result of the curl to an intermediate variable
edit Just saw the comments about arrays. I suggest to take the output of this script and convert it to an array if you want to do further manipulation...
more edits If your curl command returns a multi-line string (which would explain the error you see), you can use the new line I introduced in the script to remove the newlines (essentially stringing all the arguments together):
echo $r | xargs echo
calls echo with one line at a time as argument, and without the carriage returns. It's a fun way of getting rid of carriage returns.
#!/bin/bash
while read line;do
set -- $line
echo "second parm is $2"
echo "do your curl here"
done < afile.txt

Getting the index of the substring on solaris

How can I find the index of a substring which matches a regular expression on solaris10?
Assuming that what you want is to find the location of the first match of a wildcard in a string using bash, the following bash function returns just that, or empty if the wildcard doesn't match:
function match_index()
{
local pattern=$1
local string=$2
local result=${string/${pattern}*/}
[ ${#result} = ${#string} ] || echo ${#result}
}
For example:
$ echo $(match_index "a[0-9][0-9]" "This is a a123 test")
10
If you want to allow full-blown regular expressions instead of just wildcards, replace the "local result=" line with
local result=$(echo "$string" | sed 's/'"$pattern"'.*$//')
but then you're exposed to the usual shell quoting issues.
The goto options for me are bash, awk and perl. I'm not sure what you're trying to do, but any of the three would likely work well. For example:
f=somestring
string=$(expr match "$f" '.*\(expression\).*')
echo $string
You tagged the question as bash, so I'm going to assume you're asking how to do this in a bash script. Unfortunately, the built-in regular expression matching doesn't save string indices. However, if you're asking this in order to extract the match substring, you're in luck:
if [[ "$var" =~ "$regex" ]]; then
n=${#BASH_REMATCH[*]}
while [[ $i -lt $n ]]
do
echo "capture[$i]: ${BASH_REMATCH[$i]}"
let i++
done
fi
This snippet will output in turn all of the submatches. The first one (index 0) will be the entire match.
You might like your awk options better, though. There's a function match which gives you the index you want. Documentation can be found here. It'll also store the length of the match in RLENGTH, if you need that. To implement this in a bash script, you could do something like:
match_index=$(echo "$var_to_search" | \
awk '{
where = match($0, '"$regex_to_find"')
if (where)
print where
else
print -1
}')
There are a lot of ways to deal with passing the variables in to awk. This combination of piping output and directly embedding one into the awk one-liner is fairly common. You can also give awk variable values with the -v option (see man awk).
Obviously you can modify this to get the length, the match string, whatever it is you need. You can capture multiple things into an array variable if necessary:
match_data=($( ... awk '{ ... print where,RLENGTH,match_string ... }'))
If you use bash 4.x you can source the oobash. A string lib written in bash with oo-style:
http://sourceforge.net/projects/oobash/
String is the constructor function:
String a abcda
a.indexOf a
0
a.lastIndexOf a
4
a.indexOf da
3
There are many "methods" more to work with strings in your scripts:
-base64Decode -base64Encode -capitalize -center
-charAt -concat -contains -count
-endsWith -equals -equalsIgnoreCase -reverse
-hashCode -indexOf -isAlnum -isAlpha
-isAscii -isDigit -isEmpty -isHexDigit
-isLowerCase -isSpace -isPrintable -isUpperCase
-isVisible -lastIndexOf -length -matches
-replaceAll -replaceFirst -startsWith -substring
-swapCase -toLowerCase -toString -toUpperCase
-trim -zfill