How to split URL parameters in Bash

How to split URL parameters in Bash - regex

I have this Bash script:
#!/bin/bash
rawurldecode() {
# This is perhaps a risky gambit, but since all escape characters must be
# encoded, we can replace %NN with \xNN and pass the lot to printf -b, which
# will decode hex for us
printf -v REPLY '%b' "${1//%/\\x}" # You can either set a return variable (FASTER)
echo "${REPLY}" #+or echo the result (EASIER)... or both... :p
}
echo -e "Content-type: video/x-matroska\n"
arr=(${QUERY_STRING//=/ })
ffmpeg -i "$(rawurldecode ${arr[1]})" -acodec copy -vcodec copy -map 0:0 -map 0:2 -f matroska - 2>/dev/null &
pid=$!
trap "kill $pid" SIGTERM SIGPIPE
wait
I want to change it so it can handle multiple parameters in the query string like this:
param1=value1&param2=value2&param3=value3
currently the arr regex split is based on = so it can only handle one parameter. I am not sure how to change this regex so I get arr[1] = value1; arr[2] = value2, etc.
Ideally I need it to be an associative array like: arr['param1'] = value1 but I am not sure if this is possible in Bash.
Solutions in other languages (PHP, Perl, Python) are acceptable as long as the behaviour of the script remains the same (i.e. it needs to take the query string and output the header + output from the stdout, and be able to kill the process it spawned when the client disconnects).
Any suggestions how to sanitize this input are also welcome.

You can just change the line:
arr=(${QUERY_STRING//=/ })
With:
arr=(${QUERY_STRING//[=&]/ })
Then you can get your values in the odd indexes.
Example
$ QUERY_STRING='param1=value1&param2=value2&param3=value3'
$ arr=(${QUERY_STRING//[=&]/ })
$ echo ${arr[1]}
value1
$ echo ${arr[3]}
value2
$ echo ${arr[5]}
value3
Reading your question again, I see you want the values in subsequent indices. You can do that with the extglob shell option as follows:
shopt -s extglob # with this you enable 'extglob'
arr=${QUERY_STRING//?(&)+([^&=])=/ }
Explanation:
?(&) -> matches zero or one occurrence of &
+([^&=])= -> matches 1+ occurrences of string without & or = followed by a =

Related

Remove substring which may contain special characters in Korn shell scripting

I have some text with a password which may contain special characters (like /, *, ., [], () and other that may be used in regular expressions). How to remove that password from the text using Korn shell or maybe sed or awk? I need an answer which would be compatible with Linux and IBM AIX.
For example, the password is "123*(/abc" (it is contained in a variable).
The text (it is also contained in a variable) may look like below:
"connect user/123*(/abc#connection_string"
As a result I want to obtain following text:
"connect user/#connection_string"
I tried to use tr -d but received wrong result:
l_pwd='1234/\#1234'
l_txt='connect target user/1234/\#1234#connection'
print $l_txt | tr -d $l_pwd
connect target user\connection

tr -d removes all characters in l_pwd from l_txt that's why the result is so strange.
Try this:
l_pwd="1234/\#1234";
escaped_pwd=$(printf '%s\n' "$l_pwd" | sed -e 's/[]\/$*.^[]/\\&/g')
l_txt="connect target user/1234/\#1234#connection";
echo $l_txt | sed "s/$escaped_pwd//g";
It prints connect target user/#connection in bash at least.
Big caveat is this does not work on newlines, and maybe more, check out where I got this solution from.

With ksh 93u+
Here's some parameter expansion madness:
$ echo "${l_txt//${l_pwd//[\/\\]/?}/}"
connect target user/#connection
That takes the password variable, substitutes forward and back slashes into ?, then uses that as a pattern to remove from the connection string.
A more robust version:
x=${l_pwd//[^[:alnum:]]/\\\0}
typeset -p x # => x='1234\/\\\#1234'
conn=${l_txt%%${x}*}${l_txt#*${x}}
typeset -p conn # => connect target user/#connection

sed / awk - remove space in file name

I'm trying to remove whitespace in file names and replace them.
Input:
echo "File Name1.xml File Name3 report.xml" | sed 's/[[:space:]]/__/g'
However the output
File__Name1.xml__File__Name3__report.xml
Desired output
File__Name1.xml File__Name3__report.xml

You named awk in the title of the question, didn't you?
$ echo "File Name1.xml File Name3 report.xml" | \
> awk -F'.xml *' '{for(i=1;i<=NF;i++){gsub(" ","_",$i); printf i<NF?$i ".xml ":"\n" }}'
File_Name1.xml File_Name3_report.xml
$
-F'.xml *' instructs awk to split on a regex, the requested extension plus 0 or more spaces
the loop {for(i=1;i<=NF;i++) is executed for all the fields in which the input line(s) is(are) splitted — note that the last field is void (it is what follows the last extension), but we are going to take that into account...
the body of the loop
gsub(" ","_", $i) substitutes all the occurrences of space to underscores in the current field, as indexed by the loop variable i
printf i<NF?$i ".xml ":"\n" output different things, if i<NF it's a regular field, so we append the extension and a space, otherwise i equals NF, we just want to terminate the output line with a newline.
It's not perfect, it appends a space after the last filename. I hope that's good enough...
▶    A D D E N D U M    ◀
I'd like to address:
the little buglet of the last space...
some of the issues reported by Ed Morton
generalize the extension provided to awk
To reach these goals, I've decided to wrap the scriptlet in a shell function, that changing spaces into underscores is named s2u
$ s2u () { awk -F'\.'$1' *' -v ext=".$1" '{
> NF--;for(i=1;i<=NF;i++){gsub(" ","_",$i);printf "%s",$i ext (i<NF?" ":"\n")}}'
> }
$ echo "File Name1.xml File Name3 report.xml" | s2u xml
File_Name1.xml File_Name3_report.xml
$
It's a bit different (better?) 'cs it does not special print the last field but instead special-cases the delimiter appended to each field, but the idea of splitting on the extension remains.

This seems a good start if the filenames aren't delineated:
((?:\S.*?)?\.\w{1,})\b
( // start of captured group
(?: // non-captured group
\S.*? // a non-white-space character, then 0 or more any character
)? // 0 or 1 times
\. // a dot
\w{1,} // 1 or more word characters
) // end of captured group
\b // a word boundary
You'll have to look-up how a PCRE pattern converts to a shell pattern. Alternatively it can be run from a Python/Perl/PHP script.
Demo

Assuming you are asking how to rename file names, and not remove spaces in a list of file names that are being used for some other reason, this is the long and short way. The long way uses sed. The short way uses rename. If you are not trying to rename files, your question is quite unclear and should be revised.
If the goal is to simply get a list of xml file names and change them with sed, the bottom example is how to do that.
directory contents:
ls -w 2
bob is over there.xml
fred is here.xml
greg is there.xml
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
echo "${a_glob[i]}";
done
shopt -u nullglob
# output
bob is over there.xml
fred is here.xml
greg is there.xml
# then rename them
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
# I prefer 'rename' for such things
# rename 's/[[:space:]]/_/g' "${a_glob[i]}";
# but sed works, can't see any reason to use it for this purpose though
mv "${a_glob[i]}" $(sed 's/[[:space:]]/_/g' <<< "${a_glob[i]}");
done
shopt -u nullglob
result:
ls -w 2
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml
globbing is what you want here because of the spaces in the names.
However, this is really a complicated solution, when actually all you need to do is:
cd [your space containing directory]
rename 's/[[:space:]]/_/g' *.xml
and that's it, you're done.
If on the other hand you are trying to create a list of file names, you'd certainly want the globbing method, which if you just modify the statement, will do what you want there too, that is, just use sed to change the output file name.
If your goal is to change the filenames for output purposes, and not rename the actual files:
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[#]}; i++));do
echo "${a_glob[i]}" | sed 's/[[:space:]]/_/g';
done
shopt -u nullglob
# output:
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml

You could use rename:
rename --nows *.xml
This will replace all the spaces of the xml files in the current folder with _.
Sometimes it comes without the --nows option, so you can then use a search and replace:
rename 's/[[:space:]]/__/g' *.xml
Eventually you can use --dry-run if you want to just print filenames without editing the names.

Shell :Select lowercase words from a file,sort them and copy to another file

I want to make a shell script which gets two parameters from command line,the first should be an existing file,another one the new file which will contents the result.From the first file,i want to select the lowercase words and then sort them and copy the result in second file. The grep command is obviously not good,how should i change it to get the result?
#!/bin/bash
file1=$1
file2=$2
if [ ! -f $file1]
then
echo "this file doesn't exist or is not a file
break
else
grep '/[a-z]*/' $file1 | sort > $file2

You can change the grep command like this:
grep -o '\<[[:lower:]][[:lower:]]*\>' "$file1" | sort -u > "$file2"
The -o is an output control switch that forces grep to return each match in a newline.
\< is a left word boundary and \> a right word boundary. (this way the word Site doesn't return ite)
[[:lower:]][[:lower:]]* ensures there's at least one lower case letter.
(The use of [[:lower:]] instead of the range [a-z] is preferable because with some locales, letters may be alphabetically ordered despite of the character case: aBbCcDd...YyZz)
Notice: I added the -u switch to the sort command to remove duplicate entries, if you don't want this behaviour, remove it.

I'm in a hurry so I won't rewrite what I pointed out in a comment, but here is your code with all these problems fixed :
#!/bin/bash
file1=$1
file2=$2
if [ ! -f $file1 ]
then
echo "this file doesn't exist or is not a file"
else
grep '[a-z]*' $file1 | sort > $file2
fi
ShellCheck gives one more tip which you should definitely apply, I'll let you check it out.
It would also be a good practice to exit with a non-zero code when the script can't execute its task, that is in your case when the file isn't found.

Using awk and sort, First the test file:
$ cat file
This is a test.
This is another one.
Code:
$ awk -v RS="[ .\n]+" '/^[[:lower:]]+$/' file | sort
a
another
is
is
one
test
I'm using space, newline and period as record separator to separate each word as its own record and print words that consists of only lower case letters.

Your shell code could use some fixing up.
#!/bin/bash
file1=$1
file2=$2
if [ ! -f "$file1" ] # need space before ]; quote expansions
# send error messages to stderr instead of stdout
# include program and file name in message
printf >&2 '%s: file "%s" does not exist or is not a file\n' "$0" "$file1"
# exit with nonzero code when something goes wrong
exit 1
fi
# -w to get only whole words
# -o to print out each match on a separate line
grep -wo '[a-z][a-z]*' "$file1" | sort > "$file2"
As written that will include multiple copies of the same word if it occurs multiple times in the file; change to sort -u if you don't want that.

Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)

I have a bunch of daily printer logs in CSV format and I'm writing a script to keep track of how much paper is being used and save the info to a database, but I've come across a small problem
Essentially, some of the document names in the logs include commas in them (which are all enclosed within double quotes), and since it's in comma separated format, my code is messing up and pushing everything one column to the right for certain records.
From what I've been reading, it seems like the best way to go about fixing this would be using awk or sed, but I'm unsure which is the best option for my situation, and how exactly I'm supposed to implement it.
Here's a sample of my input data:
2015-03-23 08:50:22,Jogn.Doe,1,1,Ineo 4000p,"MicrosoftWordDocument1",COMSYRWS14,A4,PCL6,,,NOT DUPLEX,GRAYSCALE,35kb,
And here's what I have so far:
#!/bin/bash
#Get today's file name
yearprefix="20"
currentdate=$(date +"%m-%d-%y");
year=${currentdate:6};
year="$yearprefix$year"
month=${currentdate:0:2};
day=${currentdate:3:2};
filename="papercut-print-log-$year-$month-$day.csv"
echo "The filename is: $filename"
# Remove commas in between quotes.
#Loop through CSV file
OLDIFS=$IFS
IFS=,
[ ! -f $filename ] && { echo "$Input file not found"; exit 99; }
while read time user pages copies printer document client size pcl blank1 blank2 duplex greyscale filesize blank3
do
#Remove headers
if [ "$user" != "" ] && [ "$user" != "User" ]
then
#Remove any file name with an apostrophe
if [[ "$document" =~ "'" ]];
then
document="REDACTED"; # Lazy. Need to figure out a proper solution later.
fi
echo "$time"
#Save results to database
mysql -u username -p -h localhost -e "USE printerusage; INSERT INTO printerlogs (time, username, pages, copies, printer, document, client, size, pcl, duplex, greyscale, filesize) VALUES ('$time', '$user', '$pages', '$copies', '$printer', '$document', '$client', '$size', '$pcl', '$duplex', '$greyscale', '$filesize');"
fi
done < $filename
IFS=$OLDIFS
Which option is more suitable for this task? Will I have to create a second temporary file to get this done?
Thanks in advance!

As I wrote in another answer:
Rather than interfere with what is evidently source data, i.e. the stuff inside the quotes, you might consider replacing the field-separator commas (with say |) instead:
s/,([^,"]*|"[^"]*")(?=(,|$))/|$1/g
And then splitting on | (assuming none of your data has | in it).
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

There is probably an easier way using sed alone, but this should work. Loop on the file, for each line match the parentheses with grep -o then replace the commas in the line with spaces (or whatever it is you would like to use to get rid of the commas - if you want to preserve the data you can use a non printable and explode it back to commas afterward).
i=1 && IFS=$(echo -en "\n\b") && for a in $(< test.txt); do
var="${a}"
for b in $(sed -n ${i}p test.txt | grep -o '"[^"]*"'); do
repl="$(sed "s/,/ /g" <<< "${b}")"
var="$(sed "s#${b}#${repl}#" <<< "${var}")"
done
let i+=1
echo "${var}"
done

Egrep expression: how to unescape single quotes when reading from file?

I need to use egrep to obtain an entry in an index file.
In order to find the entry, I use the following command:
egrep "^$var_name" index
$var_name is the variable read from a var list file:
while read var_name; do
egrep "^$var_name" index
done < list
One of the possible keys comes usually in this format:
$ERROR['SOME_VAR']
My index file is in the form:
$ERROR['SOME_VAR'] --> n
Where n is the line where the variable is found.
The problem is that $var_name is automatically escaped when read. When I enable the debug mode, I get the following command being executed:
+ egrep '^$ERRORS['\''SELECT_COUNTRY'\'']' index
The command above doesn't work, because egrep will try to interpret the pattern.
If I don't use the extended version, using grep or fgrep, the command will work only if I remove the ^ anchor:
grep -F "$var_name" index # this actually works
The problem is that I need to ensure that the match is made at the beginning of the line.
Ideas?

set -x shows the command being executed in shell notation.
The backslashes you see do not become part of the argument, they're just printed by set -x to show the executed command in a copypastable format.
Your problem is not too much escaping, but too little: $ in regex means "end of line", so ^$ERROR will never match anything. Similarly, [ ] is a character range, and will not match literal square brackets.
The correct regex to match your pattern would be ^\$ERROR\['SOME VAR'], equivalent to the shell argument in egrep "^\\\$ERROR\['SOME_VAR']".
Your options to fix this are:
If you expect to be able to use regex in your input file, you need to include regex escapes like above, so that your patterns are valid.
If you expect to be able to use arbitrary, literal strings, use a tool that can match flexibly and literally. This requires jumping through some hoops, since UNIX tools for legacy reasons are very sloppy.
Here's one with awk:
while IFS= read -r line
do
export line
gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list
It passes the string in through the environment (because -v is sloppy) and then matches literally against the string from the start of the input.
Here's an example invocation:
$ cat script
while IFS= read -r line
do
export line
gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list
$ cat list
$ERRORS['SOME_VAR']
\E and \Q
'"'%##%*'
$ cat index
hello world
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%##%*' too
etc
$ bash script
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%##%*' too

You can use printf "%q":
while read -r var_name; do
egrep "^$(printf "%q\n" "$var_name")" index
done < list
Update: You can also do:
while read -r var_name; do
egrep "^\Q$var_name\E" index
done < list
Here \Q and \E are used to make string in between a literal string removing all special meaning of regex symbols.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to split URL parameters in Bash - regex

Related

Remove substring which may contain special characters in Korn shell scripting

sed / awk - remove space in file name

Shell :Select lowercase words from a file,sort them and copy to another file

Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)

Egrep expression: how to unescape single quotes when reading from file?

Categories

Resources