Replace all String regex matches with environment variables in bash - regex

I am trying to figure out how to replace matches of regex in a file with the value of an environment variable called a substring of that matched string. Each matched String also has a default along with it, separated by a colon. What I mean is if I have a file called myFile.properties that contains:
varWithDefault=${envVar1:default1}
varSetNoDefault=${envVar2}
varEmptyValue=${emptyEnvVar}
varEmptyValueWithDefault=${emptyEnvVar:3}
varNotSetWithDefault=${notSetEnvVar:I have : a colon}
# required but no match will prob be an error message
varNotSetNoDefault=${notSetEnvVar}
And I set the following environment variables:
export envVar1=value1
export envVar2=value2
export emptyEnvVar=""
then the file should update to be:
varWithDefault=value1
varSetNoDefault=value2
varEmptyValue=
varEmptyValueWithDefault=
varNotSetWithDefault=I have : a colon
# required but no match will prob be an error message
varNotSetNoDefault=${notSetEnvVar}
I have something that for the most part works... but doesn't handle when an environment is set to empty string, nor does it handle defaults properly.
# get any variables we would like to replace in the myFile.properties file
# after the colon I say "anything except for a closing bracket" so that it will allow for colons in the value
varStrings=$( grep -Po '\${([_a-zA-Z][_a-zA-Z0-9]*)(:[^}]*)*}' myFile.properties )
# loop through the variables found and replace them with the value of the corresponding environment variable
# varString is a value that looks like: "${my_variable:my_default}"
for varString in ${varStrings[#]} ; do
# ideally grab these values from the matched regex, but I can't seem to figure out how to do that
# propName would be: "my_variable"
# defaultValue would be: "my_default"
propName=$varString[0]
defaultValue=$varString[1]
# this technically gets the values, but I would also need to remove the "${}"
propName="$( cut -d ':' -f 1- <<< "$varString" )"
defaultValue="$( cut -d ':' -f 2- <<< "$varString" )"
# $varString will be a String in the format '${my_variable:my_default}' so I need to strip the default chunk from it before doing this
# but... to get the environment variable value if there was no default:
envValue=`eval echo $varString`
# if there is a matching environment variable, do the replacement; otherwise, spit out a warning
# the -z will also fail the if check if the value is an empty string, which I don't want. I only want it to go to the else clause if it was not set. Not sure how to do that.
if [ ! -z "$envValue" ]; then
echo "Replacing $varString with environment variable value '$envValue'"
sed -i "s|$varString|$envValue|g" myFile.properties
else
# set the default value
if [[ noDefaultValueGiven ]] ; then
echo "Warning: No environment variable defined for $envVarName. String not replaced."
else
echo "Warning: No environment variable '$envVarName' defined. Using default value '$defaultValue'."
sed -i "s|$varString|$defaultValue|g" myFile.properties
fi
fi
done
The two big issues I'm having with this is:
1. How to loop through each regex match and have access to both regex groups (the sections surrounded by parenthesis)
2. How to check if an environment variable exists based on a string representation of said variable (i.e. check if "${my_variable}" is set, not just if it is empty)
Does anyone know how this should be done? I'm used to SpringBoot doing this for me.

You may do it in pure BASH regex:
re='^([^=]+=)\$\{([_a-zA-Z][_a-zA-Z0-9]*)(:([^}]*))?\}'
while IFS= read -r line; do
if [[ $line =~ $re ]]; then # match regex
#declare -p BASH_REMATCH
var="${BASH_REMATCH[2]}" # var name in group #2
if [[ -n ${!var+set} ]]; then # if var is set in env
line="${BASH_REMATCH[1]}${!var}" # use var value
elif [[ -n ${BASH_REMATCH[4]} ]]; then. # if default value is set
line="${BASH_REMATCH[1]}${BASH_REMATCH[4]}" # use default string
fi
fi
echo "$line" # print each line
done < file.properties
Output:
varWithDefault=value1
varSetNoDefault=value2
varEmptyValue=
varEmptyValueWithDefault=
varNotSetWithDefault=I have : a colon
# required but no match will prob be an error message
varNotSetNoDefault=${notSetEnvVar}
Code Demo

Related

test that argument looks like a date

I'm modifying a script used to test a report writer. The report writer takes optional --from and --to flags to specify the start and end dates. I'd like to modify the script function that starts up the report writer so that its date arguments are also optional.
Sadly, there are already optional arguments to the function, so I'm trying to test whether an argument is in the right format for a date (we use nn/nn/nnnn).
So, I'm echoing the candidate string and checking with grep whether it is in the correct format. Except it doesn't work.
Here is an extract from the function
# If the next argument looks like a date, consume it and use it to define
# the report start date
looksLikeDate=$(echo $1 | grep -e '[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]')
echo from -
echo \$1: \"$1\"
echo looksLikeDate: \"$looksLikeDate\"
if [ -n $looksLikeDate ]
then
echo "-n: true"
FROMFLAG="--from $1"
shift 1
else
echo "-n : false"
FROMFLAG=""
fi
# If the next argument looks like a date, consume it and use it to define
# the report end date
looksLikeDate=$(echo $1 | grep -e '[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]')
echo to -
echo \$1: \"$1\"
echo looksLikeDate: \"$looksLikeDate\"
if [ -n $looksLikeDate ]
then
echo "-n: true"
TOFLAG="--to $1"
shift 1
else
echo "-n: false"
TOFLAG=""
fi
...and here is the output with dates...
from -
$1: "09/02/2018"
looksLikeDate: "09/02/2018"
-n: true
to -
$1: "09/02/2018"
looksLikeDate: "09/02/2018"
-n: true
...and without...
from -
$1: ""
looksLikeDate: ""
-n: true
to -
$1: ""
looksLikeDate: ""
-n: true
...what have I missed? I'd expect that since looksLikeDate is demonstrable empty [ -n $looksLikeDate ] would return false and the code would go down the else path of the if statement.
Update:
Since posting, it occurs to me that the easiest thing is to not to look at the arguments in the function and get callers to pass the --from and --to with the arguments so that I can simply pass $* to the report writer as is done for the existing optional arguments.
Thank you very much for reading; I'm still curious as to why the posted code doesn't work.
That's because you're not quoting your variables! Use more quotes!
So here's what's happening: when Bash sees
[ -n $looksLikeDate ]
it performs parameter expansion, glob expansion, quote removal, etc., and finally sees this (I put one token on each line):
[
-n
]
and you see that the $looksLikeDate part is missing because the parameter $looksLikeDate expands to the empty string before the quote removal step. Then Bash executes the builtin [, and with the closing ], this is equivalent to the following command:
test -n
Now looking at the reference manual for the test builtin, you'll read:
1 argument
The expression is true if, and only if, the argument is not null.
And here, the argument is -n, hence not nil, hence the expression is true.
So remember:
Use more quotes! quote all your variable expansions!
This specific line should look like:
[ -n "$looksLikeDate" ]
Another possibility is to use the [[ keyword:
[[ -n $looksLikeDate ]]
But anyway, quote all your expansion!
Also, you don't need the external tool grep, you can use Bash's internal regex engine or, better yet:
if [[ $1 = [[:digit:]][[:digit:]]/[[:digit:]][[:digit:]]/[[:digit:]][[:digit:]][[:digit:]][[:digit:]] ]]; then
which is a bit long, so use a variable:
date_pattern="[[:digit:]][[:digit:]]/[[:digit:]][[:digit:]]/[[:digit:]][[:digit:]][[:digit:]][[:digit:]]"
if [[ $1 = $date_pattern ]]; then
(and here you mustn't quote the right hand side $date_pattern).

Get multiple values in an xml file

<!-- someotherline -->
<add name="core" connectionString="user id=value1;password=value2;Data Source=datasource1.comapany.com;Database=databasename_compny" />
I need to grab the values in userid , password, source, database. Not all lines are in the same format.My desired result would be (username=value1,password=value2, DataSource=datasource1.comapany.com,Database=databasename_compny)
This regex seems little bit more complicated as it is more complicated. Please, explain your answer if possible.
I realised its better to loop through each line. Code I wrote so far
while read p || [[ -n $p ]]; do
#echo $p
if [[ $p =~ .*connectionString.* ]]; then
echo $p
fi
done <a.config
Now inside the if I have to grab the values.
For this solution I am considering:
Some lines can contain no data
No semi-colon ; is inside the data itself (nor field names)
No equal sign = is inside the data itself (nor field names)
A possible solution for you problem would be:
#!/bin/bash
while read p || [[ -n $p ]]; do
# 1. Only keep what is between the quotes after connectionString=
filteredLine=`echo $p | sed -n -e 's/^.*connectionString="\(.\+\)".*$/\1/p'`;
# 2. Ignore empty lines (that do not contain the expected data)
if [ -z "$filteredLine" ]; then
continue;
fi;
# 3. split each field on a line
oneFieldByLine=`echo $filteredLine | sed -e 's/;/\r\n/g'`;
# 4. For each field
while IFS= read -r field; do
# extract field name + field value
fieldName=`echo $field | sed 's/=.*$//'`;
fieldValue=`echo $field | sed 's/^[^=]*=//' | sed 's/[\r\n]//'`;
# do stuff with it
echo "'$fieldName' => '$fieldValue'";
done < <(printf '%s\n' "$oneFieldByLine")
done <a.xml
Explanations
General sed replacement syntax :
sed 's/a/b/' will replace what matches the regex a by the content of b
Step 1
-n argument tells sed not to output if no match is found. In this case this is useful to ignore useless lines.
^.* - anything at the beginning of the line
connectionString=" - literally connectionString="
\(.\+\)" - capturing group to store anything in before the closing quote "
.*$" - anything until the end of the line
\1 tells sed to replace the whole match with only the capturing group (which contains only the data between the quotes)
p tells sed to print out the replacement
Step 3
Replace ; by \r\n ; it is equivalent to splitting by semi-colon because bash can loop over line breaks
Step 4 - field name
Replaces literal = and the rest of the line with nothing (it removes it)
Step 4 - field value
Replaces all the characters at the beginning that are not = ([^=] matches all but what is after the '^' symbol) until the equal symbol by nothing.
Another sed command removes the line breaks by replacing it with nothing.

How to capture the beginning of a filename using a regex in Bash?

I have a number of files in a directory named edit_file_names.sh, each containing a ? in their name. I want to use a Bash script to shorten the file names right before the ?. For example, these would be my current filenames:
test.file.1?twagdsfdsfdg
test.file.2?
test.file.3?.?
And these would be my desired filenames after running the script:
test.file.1
test.file.2
test.file.3
However, I can't seem to capture the beginning of the filenames in my regex to use in renaming the files. Here is my current script:
#!/bin/bash
cd test_file_name_edit/
regex="(^[^\?]*)"
for filename in *; do
$filename =~ $regex
echo ${BASH_REMATCH[1]}
done
At this point I'm just attempting to print off the beginnings of each filename so that I know that I'm capturing the correct string, however, I get the following error:
./edit_file_names.sh: line 7: test.file.1?twagdsfdsfdg: command not found
./edit_file_names.sh: line 7: test.file.2?: command not found
./edit_file_names.sh: line 7: test.file.3?.?: command not found
How can I fix my code to successfully capture the beginnings of these filenames?
Regex as such may not be the best tool for this job. Instead, I'd suggest using bash parameter expansion. For example:
#!/bin/bash
files=(test.file.1?twagdsfdsfdg test.file.2? test.file.3?.?)
for f in "${files[#]}"; do
echo "${f} shortens to ${f%%\?*}"
done
which prints
test.file.1?twagdsfdsfdg shortens to test.file.1
test.file.2? shortens to test.file.2
test.file.3?.? shortens to test.file.3
Here, ${f%%\?*} expands f and trims the longest suffix that matches a ? followed by any characters (the ? must be escaped since it's a wildcard character).
You miss the test command [[ ]] :
for filename in *; do
[[ $filename =~ $regex ]] && echo ${BASH_REMATCH[1]}
done

How to find and replace special chars within a string in zsh

I'm trying to build a secure copy protocol quick function. When I run the command it will work with a single file OR with the entire directory, but as soon as I put a /* after the local_repo it returns zsh: no matches found: hackingedu/*.
If I put the command scp hackingedu\/\* hackingedu the command works properly. I think I'm on the right track, but can't get it to work.
contains() {
string="$1"
substring="$2"
if test "${string#*$substring}" != "$string"
then
# echo '$substring is in $string'
return 1 # $substring is in $string
else
# echo '$substring is not in $string'
return 0 # $substring is not in $string
fi
}
# Quickly scp files in Workspace to Remote
function scp() {
local_repo="$1"
remote_repo="$2"
# find all the `*` and replace with `/*`
if [ contains $local_repo '*' ]; then
# replace all instances of * with \* <- HOW TO DO
fi
command scp -r $LOCAL_REPOS/$local_repo $ALEX_SERVER_UNAME#$ALEX_SERVER_PORT:$ALEX_REMOTE_ROOT_PATH/$remote_repo
# Description: $1: Local Repo | $2: Remote Repo
# Define ex: scpp local/path/to/file/or/directory/* remote/path/to/file/or/directory/*
# Live ex: scpp alexcory/index.php alexcory/index.php
# Live ex: scpp alexcory/* alexcory/*
#
# This Saves you from having long commands that look like this:
# scp -r ~/Google\ Drive/server/Dev/git\ repositories/hackingedu/* alexander#alexander.com:/home2/alexander/public_html/hackingedu/beta
}
Command trying to execute: scp -r ~/Google\ Drive/server/Dev/git\ repositories/hackingedu/* alexander#alexander.com:/home2/alexander/public_html/hackingedu/beta
Any ideas on how to find and replace an *? If there's a better way to do this please do tell! :)
If you know how to do this in bash I would like your input as well!
References:
How do you tell if a string contains another string in Unix shell scripting?
ZSH Find command replacement
ZSH Find command replacement 2
Using wildcards in commands with zsh
You can either prefix your scp call using noglob (which will turn off globbing for that command, e.g. noglob ls *) or use
autoload -U url-quote-magic
zle -N self-insert url-quote-magic
zstyle -e :urlglobber url-other-schema '[[ $words[1] == scp ]] && reply=("*") || reply=(http https ftp)'
the above should make zsh auto quote * when you use scp.
[...]
BTW, in any case, you should learn that you can easily quote special characters using ${(q)variable_name}, e.g.
% foo='*&$%normal_chars'
% echo $foo
*&$%normal_chars
% echo ${(q)foo}
\*\&\$%normal_chars

Multiple multi-line regex matches in Bash

I'm trying to do some fairly simple string parsing in bash script.
Basically, I have a file that is comprised of multiple multi-line fields. Each field is surrounded by a known header and footer.
I want to extract each field separately into an array or similar, like this
>FILE=`cat file`
>REGEX="######[\s\S]+?#####"
>
>if [[$FILE =~ $REGEX ]] then
> echo $BASH_REMATCH
>fi
FILE:
######################################
this is field one
######
######################################
this is field two
they can be any number of lines
######
Now I'm pretty sure the problem is that bash doesn't match newlines with the "."
I can match this with "pcregrep -M", but of course the whole file is going to match. Can I get one match at a time from pcregrep?
I'm not opposed to using some inline perl or similar.
if you have gawk
awk 'BEGIN{ RS="##*#" }
NF{
gsub("\n"," ") #remove this is you want to retain new lines
print "-->"$0
# put to array
arr[++d]=$0
} ' file
output
$ ./shell.sh
--> this is field one
--> this is field two they can be any number of lines
The TXR language performs whole-document multi-line matching, binds variables, and (with the -B "dump bindings" option) emits properly escaped shell variable assignments that can be eval-ed. Arrays are supported.
The # character is special so it has to be doubled up to match literally.
$ cat fields.txr
#(collect)
#########################################
# (collect)
#field
# (until)
#########
# (end)
# (cat field)## <- catenate the fields together with a space separator by default
#(end)
$ txr -B fields.txr data
field[0]="this is field one"
field[1]="this is field two they can be any number of lines"
$ eval $(txr -B fields.txr data)
$ echo ${field[0]}
this is field one
$ echo ${field[1]}
this is field two they can be any number of lines
The #field syntax matches an entire line. These are collected into a list since it is inside a #(collect), and the lists are collected into lists-of-lists because that is nested inside another #(collect). The inner #(cat field) however, reduces the inner lists to a single string, so we end up with a list of strings.
This is "classic TXR": how it was originally designed and used, sparked by the idea:
Why don't we make here-documents work backwards and do parsing from reams of text into variables?
This implicit emission of matched variables by default, in the shell syntax by default, continues to be a supported behavior even though the language has grown much more powerful, so there is less of a need to integrate with shell scripts.
I would build something around awk. Here is a first proof of concept:
awk '
BEGIN{ f=0; fi="" }
/^######################################$/{ f=1 }
/^######$/{ f=0; print"Field:"fi; fi="" }
{ if(f==2)fi=fi"-"$0; if(f==1)f++ }
' file
begin="######################################"
end="######"
i=0
flag=0
while read -r line
do
case $line in
$begin)
flag=1;;
$end)
((i++))
flag=0;;
*)
if [[ $flag == 1 ]]
then
array[i]+="$line"$'\n' # retain the newline
fi;;
esac
done < datafile
If you want to keep the marker lines in the array elements, move the assignment statement (with its flag test) to the top of the while loop before the case.