POSIX sh: find and replace with function

POSIX sh: find and replace with function - regex

In JavaScript, you can do:
someComplexProcessing = (wholeMatch, group1, group2, index, mystr)=> replacement...
mystr.replace(/some.* regex(with) multiple (capture groups)/g, someComplexProcessing)
eg.
const renderTemplate = (str, env)=> str.replace(/{{(.*?)}}/g, (_, name)=> env[name])
renderTemplate('{{salut}} {{name}}!', {salut: 'Hi', name: 'Leo'}) // "Hi Leo!"
What is the best POSIX compatible, generic, variant?
- reusability # eg. a function taking regex, processingFunction, and input, etc - that I could but in my .shellrc/source lib.sh or similar and reuse
- multiline # eg. if "uppercase everything between {{ and }}", `a {{b\nc}}` -> `a B\nC`
- no escape gotchas # eg. it shouldn't break if input, replacement, or regex contains special characters
- POSIX compatible # eg. running it under `docker run --rm -it alpine sh`, etc
- using regex # eg. perl regex seems like the most prominent one, please note differences from it if other is used
meriting:
- no/less dependencies # eg. as portable as possible
- multiple capture groups
- performance
- security # related to no escape gotchas, eg. ok with untrusted input
I've found a couple solutions for bash, and a few compatible edge-case solutions, though none that does it all anywhat close to the simplicity js' .replace provides. Ultimately, I want to program without thinking too much on implementation details/gotchas, and without bringing in 100's of MB (mostly to alpine container, but also using ubuntu/OSX), thereby trying to build up a library of portable, posix-compatible snippets, functions and patterns.

An uneffective, input somewhat escaped (assumes no \r) (but not regex input escaped), solution, with only one capture group (middle). Though portable (only uses tr and sed (and printf, -z empty string check). (possibly possible to change the sed parts to something generally perl regex compatible)
lib.sh:
#!/usr/bin/env sh
multiline_substitute_with_fn () {
sub_start="$1"; shift; fn_name="$1"; shift; sub_end="$1"; shift; left="$(cat)";
# uppercase () { cat | tr 'a-z' 'A-Z'; }; echo 'Hello [there]!' | multiline_substitute_with_fn '\[' uppercase '\]'
# make single-line, sanitize input against _SUB(START|END)_, a\ra {{echo "b\rb"}} c {{echo d}} e
left="$(echo "$left" | tr '\n' '\r' | sed 's/_SUB/_ASUB/g')"
while [ ! -z "$left" ]; do
left="$(echo "$left" | sed "s/$sub_start/_SUBSTART_/")" # a\ra _SUBSTART_echo "b\rb"}} c {{echo d}} e
printf '%s' "$(echo "$left" | sed 's/_SUBSTART_.*//' | sed 's/_ASUB/_SUB/g' | tr '\r' '\n')" # a\na
lefttmp="$(echo "$left" | sed 's/.*_SUBSTART_//' | sed "s/$sub_end/_SUBEND_/")" # echo "b\rb"_SUBEND_ c {{echo d}} e
if [ "$lefttmp" = "$left" ]; then left=''; break; fi
left="$lefttmp"
middle="$(echo "$left" | sed 's/_SUBEND_.*//' | tr '\r' '\n')" # echo "b\nb"
[ ! -z "$middle" ] && printf '%s' "$(echo "$middle" | $fn_name | sed 's/_ASUB/_SUB/g')" # b\nb
left="$(echo "$left" | sed 's/.*_SUBEND_//')" # c {{echo d}} e
done
}
usage:
cat file | multiline_substitute_with_fn 'start regex' processingFunction 'end regex'
eg. usage:
#!/usr/bin/env sh
. ./lib.sh # load lib
uppercase () { cat | tr 'a-z' 'A-Z'; };
echo 'Hello [there]!' | multiline_substitute_with_fn '\[' uppercase '\]'
# -> Hello THERE!
eval_template () { # not "safe" in terms of eval
# echo 'a\na {{echo "b\nb"}} c {{echo d}} e' | eval_template # -> 'a\na b\nb c d e'
# hello=hi; echo '{{=$hello}} there' | eval_template # -> {{echo "$hello"}} there -> 'hi there'
fn () {
middle="$(cat)"
case "$middle" in =*) middle="echo \"${middle#=}\"" ;; *);; esac # '=$a' -> 'echo "$a"'
eval "$middle"
}
cat | multiline_substitute_with_fn '{{' fn '}}'
}
eval_template <<-EOF
a
a {{echo "b
b"}} c {{echo d}} e
EOF
# -> a
# a b
# b c d e'
echo '{{=$salut}} {{=$name}}!' > my.template
salut=Hi; name="Leo Name";
cat my.template | eval_template
# Hi Leo Name!

Related

Grep a result from Hive output log

I have an output from Hive. I stored that output in a variable called match.
I am isolating the line I need from the log using the command below.
echo $(echo $match | grep "COUNT_TOTAL_MATCH")
0: jdbc:hive2://hiveaddress> . . . . . . . . . . . . . . . . . . . . . . .> +--------------------+-------+--+ | stats | _c1 | +--------------------+-------+--+ | COUNT_TOTAL_MATCH | 1000 | +--------------------+-------+--+ 0: jdbc:hive2://hiveaddress> 0: jdbc:hive2://hiveaddress>
How do I grab the 1000 value knowing it could be any other number?

You can treat | (space pipe space) as the field delimiter and print the sixth field, like this:
awk -F ' \\| ' '{ print $6 }'
Notice that the pipe has to be escaped twice.
Side note:
echo $(echo $match | grep "COUNT_TOTAL_MATCH")
can be rewritten as
grep 'COUNT_TOTAL_MATCH' <<< "$match"
No echo, no pipes, and no word splitting in $match. echo "$(command)" is always the same as just command. (Notice that quoting makes a difference, though.)
This means that you can combine your grep and awk commands into this:
awk -F ' \\| ' '/COUNT_TOTAL_MATCH/ { print $6 }' <<< "$match"

try
grep -oP 'COUNT_TOTAL_MATCH\h*\|\h*\K\d+'
\h*\|\h* optional space/tab followed by | followed by optional space/tab
\K is positive lookbehind... so only if COUNT_TOTAL_MATCH\h*\|\h* is matched
\d+ get digits
From man grep
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output
line.
-P, --perl-regexp
Interpret the pattern as a Perl-compatible regular expression (PCRE). This is highly experimental and
grep -P may warn of unimplemented features.

How to use sed to replace every match according to each match?

$ echo 'a,b,c,d=1' | sed '__MAGIC_HERE__'
a=1,b=1,c=1,d=1
$ echo 'a,b,c,d=2' | sed '__MAGIC_HERE__'
a=2,b=2,c=2,d=2
Dose sed can cast this spell ?
EDIT
I have to use sed twice to achieve this
s='a,b,c,d=2'
v=`echo $s | sed -rn 's/.*([0-9]+)/\1/p'`
echo $s | sed "s/=.*//" | sed -rn "s/([a-z])/\1=$v/gp"
OR
s='a,b,c,d=2'
echo $s | sed -rn 's/.*([0-9]+)/\1/p' | { read v;echo $s | sed "s/=.*//" | sed -rn "s/([a-z])/\1=$v/gp"; }
EDIT
The real use case is here and there is multiline content, Thanks to #hek2mgl, the awk is way more easier.
EDIT
My usecase
export LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:ex=01;32'
exts="
tar|tgz|arj|taz|lzh|zip|z|Z|gz|bz2|deb|rpm|jar=01;31
jpg|jpeg|gif|bmp|pbm|pgm|ppm|tga|xbm|xpm|tif|tiff|png=01;34
mov|fli|gl|dl|xcf|xwd|ogg|mp3|wav=01;35
flv|mkv|mp4|mpg|mpeg|avi=01;36
"
# SED Version
read -rd '' exts < <(
for i in $(echo $exts)
do
echo $i | sed -rn 's/.*=(.*)/\1/p' | { read v; echo $i | sed "s/=.*//" | sed -rn "s/([^|]+)\|?/:\*.\1=$v/gp"; }
done | tr -d '\n'
)
export LS_COLORS="$LS_COLORS$exts"
# AWK Version
read -r -d '' exts < <( echo $exts | xargs -n1 | awk -F= '{gsub(/\|/,"="$2":*.")}$2' | tr "\n" ":" )
export LS_COLORS="$LS_COLORS:*.$exts"
unset exts
EDIT
Finale sed version
read -r -d '' exts < <( echo $exts | xargs -n1 | sed -r 's/\|/\n/g;:a;s/\n(.*(=.*))/\2:*.\1/;ta' | sed "s/^/*./g" | tr "\n" ":" )
export LS_COLORS="$LS_COLORS:$exts"

This might work for you (GNU sed):
sed -r 's/,/\n/g;:a;s/\n(.*(=.*))/\2,\1/;ta' file
Convert the separators to newlines (a unique character not found in the file) and then replace each occurrence of the newline by the required string and the original separator.

I would use awk:
awk -F= '{gsub(/,/,"="$2",")}1'
-F= splits the input line by = which let's us access the number in field two $2. gsub() replaces all occurrences of , by =$2,. The 1 at the end is an awk idiom. It will simply print the, modified, line.

Perl can...
echo 'a,b,c,d=1' | perl -ne 'chomp; my ($val) = m|=(\d+)|; s|\=.*||; print join(",", map {"$_=$val"} split/,/) . "\n";'
a=1,b=1,c=1,d=1
Explained
perl -ne # Loop over input and run command
chomp; # Remove trailing newline
my ($val) = m|=(\d+)|; # Find numeric value after '='
s|\=.*||; # Remove everything starting with '='
split /,/ # Split input on ',' => ( a, b, c, d )
map {"$_=$val" } # Create strings ( "a=1", "b=1", ... ) from results of split
join(",",...) # Join the results of previous map with ','
print .... "\n" # Print it all out with a newline at the end.

I hope you're not seriously going to use that mush of read/echo/xargs/sed/sed/tr in your code. Just use one small, simple awk script:
$ cat tst.sh
exts="
tar|tgz|arj|taz|lzh|zip|z|Z|gz|bz2|deb|rpm|jar=01;31
jpg|jpeg|gif|bmp|pbm|pgm|ppm|tga|xbm|xpm|tif|tiff|png=01;34
mov|fli|gl|dl|xcf|xwd|ogg|mp3|wav=01;35
flv|mkv|mp4|mpg|mpeg|avi=01;36
"
exts=$( awk -F'=' '
NF {
gsub(/\||$/, "="$2":", $1)
out = out $1
}
END {
sub(":$", "", out)
print out
}
' <<<"$exts" )
echo "$exts"
$ ./tst.sh
tar=01;31:tgz=01;31:arj=01;31:taz=01;31:lzh=01;31:zip=01;31:z=01;31:Z=01;31:gz=01;31:bz2=01;31:deb=01;31:rpm=01;31:jar=01;31:jpg=01;34:jpeg=01;34:gif=01;34:bmp=01;34:pbm=01;34:pgm=01;34:ppm=01;34:tga=01;34:xbm=01;34:xpm=01;34:tif=01;34:tiff=01;34:png=01;34:mov=01;35:fli=01;35:gl=01;35:dl=01;35:xcf=01;35:xwd=01;35:ogg=01;35:mp3=01;35:wav=01;35:flv=01;36:mkv=01;36:mp4=01;36:mpg=01;36:mpeg=01;36:avi=01;36

Perl, another Perl alternative...
d=1:
echo 'a,b,c,d=1' | perl -pe '($a)=/(\d+)$/; s/,/=$a,/g;'
a=1,b=1,c=1,d=1
d=2:
echo 'a,b,c,d=2' | perl -pe '($a)=/(\d+)$/; s/,/=$a,/g;'
a=2,b=2,c=2,d=2
Explanations:
perl -e # perl one-liner switch
perl -ne # puts an implicit loop for each line of input
perl -pe # as 'perl -ne', but adds an implicit print at the end of each iteration
($a)=/(\d+)$/; # catch the number in d=1 or d=2, assign variable $a
s/,/=$a,/g; # substitute each ',' with '=1,' if $a=1

Regex w/grep against tnsnames.ora

I am trying to print out the contents of a TNS entry from the tnsnames.ora file to make sure it is correct from an Oracle RAC environment.
So if I do something like:
grep -A 4 "mydb.mydomain.com" $ORACLE_HOME/network/admin/tnsnames.ora
I will get back:
mydb.mydomain.com =
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)(HOST = myhost.mydomain.com)(PORT = 1521))
  (CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME=mydb)))
Which is what I want. Now I have an environment variable being set for the JDBC connection string by an external program when the shell script gets called like:
export $DB_URL=#myhost.mydomain.com:1521/mydb
So I need to get TNS alias mydb.mydomain.com out of the above string. I'm not sure how to do multiple matches and reorder the matches with regex and need some help.
grep #.+: $DB_URL
I assume will get the
#myhost.mydomain.com:
but I'm looking for
mydb.mydomain.com
So I'm stuck at this part. How do I get the TNS alias and then pipe/combine it with the initial grep to display the text for the TNS entry?
Thanks
update:
#mklement0 #Walter A - I tried your ways but they are not exactly what I was looking for.
echo "#myhost.mydomain.com:1521/mydb" | grep -Po "#\K[^:]*"
echo "#myhost.mydomain.com:1521/mydb" | sed 's/.*#\(.*\):.*/\1/'
echo "#myhost.mydomain.com:1521/mydb" | cut -d"#" -f2 | cut -d":" -f1
echo "#myhost.mydomain.com:1521/mydb" | tr "#:" "\t" | cut -f2
echo "#myhost.mydomain.com:1521/mydb" | awk -F'[#:]' '{ print $2 }'
All these methods get me back: myhost.mydomain.com
What I am looking for is actually: mydb.mydomain.com

Note:
- For brevity, the commands below use bash/ksh/zsh here-string syntax to send strings to stdin (<<<"$var"). If your shell doesn't support this, use printf %s "$var" | ... instead.
The following awk command will extract the desired string (mydb.mydomain.com) from $DB_URL (#myhost.mydomain.com:1521/mydb):
awk -F '[#:/]' '{ sub("^[^.]+", "", $2); print $4 $2 }' <<<"$DB_URL"
-F'[#:/]' tells awk to split the input into fields by either # or : or /. With your input, this means that the field of interest are part of the second field ($2) and the fourth field ($4). The sub() call removes the first .-based component from $2, and the print call pieces together the result.
To put it all together:
domain=$(awk -F '[#:/]' '{ sub("^[^.]+", "", $2); print $4 $2 }' <<<"$DB_URL")
grep -F -A 4 "$domain" "$ORACLE_HOME/network/admin/tnsnames.ora"
You don't strictly need intermediate variable $domain, but I've added it for clarity.
Note how -F was added to grep to specify that the search term should be treated as a literal, so that characters such as . aren't treated as regex metacharacters.
Alternatively, for more robust matching, use a regex that is anchored to the start of the line with ^, and \-escape the . chars (using shell parameter expansion) to ensure their treatment as literals:
grep -A 4 "^${domain//./\.}" "$ORACLE_HOME/network/admin/tnsnames.ora"

You can get a part of a string with
# Only GNU-grep
echo "#myhost.mydomain.com:1521/mydb" | grep -Po "#\K[^:]*"
# or
echo "#myhost.mydomain.com:1521/mydb" | sed 's/.*#\(.*\):.*/\1/'
# or
echo "#myhost.mydomain.com:1521/mydb" | cut -d"#" -f2 | cut -d":" -f1
# or, when the string already is in a var
echo "${DB_URL#*#}" | cut -d":" -f1
# or using a temp var
tmpvar="${DB_URL#*#}"
echo "${tmpvar%:*}"
I had skipped the alternative awk, that was given by #mklement0 already:
echo "#myhost.mydomain.com:1521/mydb" | awk -F'[#:]' '{ print $2 }'
The awk solution is straight-forward, when you want to use the same approach without awk you can do something like
echo "#myhost.mydomain.com:1521/mydb" | tr "#:" "\t" | cut -f2
or the ugly
echo "#myhost.mydomain.com:1521/mydb" | (IFS='#:' read -r _ url _; echo "$url")
What is happening here?
After introducing the new IFS I want to take the second word of the input. The first and third word(s) are caught in the dummy var's _ (you could have named them dummyvar1 and dummyvar2). The pipe | creates a subprocess, so you need ()to hold reading and displaying the var url in the same process.

what is regular expression to get the data after _

I am having a filename like:2015_q1_cricket_international.txt
How can I get the data after underscore(_).
my final output should be 2015internationalcricket

Using awk
Let's create a shell variable with your file name:
$ fname=2015_q1_cricket_international.txt
Now, let's extract the parts that you want:
$ echo "$fname" | awk -F'[_.]' '{print $1 $4 $3}'
2015internationalcricket
How it works:
-F'[_.]' tells awk to split the input anywhere it sees either a _ or a .
print $1 $4 $3 tells awk to print the parts that you asked for
Using shell
$ echo "$fname" | { IFS='_.' read a b c d e; echo "$a$d$c"; }
2015internationalcricket
Using sed
$ echo "$fname" | sed -E 's/^([^_.]*)_([^_.]*)_([^_.]*)_([^_.]*).*/\1\4\3/'
2015internationalcricket
Capturing to a shell variable
If we want put the new string in a shell variable, we use command subsitution:
var=$(echo "$fname" | awk -F'[_.]' '{print $1 $4 $3}')
var=$(echo "$fname" | { IFS='_.' read a b c d e; echo "$a$d$c"; })
var=$(echo "$fname" | sed -E 's/^([^_.]*)_([^_.]*)_([^_.]*)_([^_.]*).*/\1\4\3/')
If the shell is bash, we can do this more directly:
IFS='_.' read a b c d e <<<"$fname"
var="$a$d$c"

.*_([^_]*)_.* gets «cricket» as \1

You can use String.Split('_') and get array of results, or you can use regular expression _[A-Za-z0-9]* which returns all the chars after the underscore which matches three sets.
All the results are returned in an Array.

Which characters to escape to match these in find regex expression in Bourne shell?

I writing a little bourne shell script which load a conf file content a string, this string is uses in find (after some awk tricks) like this following example:
original string:
rx='~ #'
find command:
find -regex "^.*~$\|^.*#$"
EDIT: the original string is in a conf file, so the problem is when the string content special characters as "*.".. Exemple:
original string (with characters to escape):
rx='~ # $*'
EDIT2: I trying to match any file ended by word in rx (separates with space). If rx="st ar", I want to match with "test" and "bar". But if the word content any characters as * $, my regex doesn't work properly.. So, I wanted to know which is all characters that I have to escape to make it work..
Thank's ! :)

As I understand it, you want to split your string on spaces, and match any substring from that split.
The irc.freenode.org #bash channel has a factoid providing a function for performing quoting, used below with some minor tweaks for POSIX compatibility:
requote() { printf '%s\n' "$1" | sed 's/[^^]/[&]/g; s/\^/\\^/g'; }
input_string='hello# cruel*world how~are~you'
output_string=$(printf '%s\n' "$input_string" | tr ' ' '\n' | {
out_s=''
while read -r line; do
if [ -n "$out_s" ]; then
out_s="${out_s}|$(requote "$line")"
else
out_s="$(requote "$line")"
fi
done
printf '%s\n' "$out_s"
})
find . -regex ".*(${output_string}).*"

Ok, thank's to Charles Duffy, I understand that the good method is to encapsule any characters in "[]" to make there safe in a regex. Except for '^', we make it like this '\^'. here's what I did bases on the answer of Mr. Duffy.
So, I have an init string and I want to match with any words in this string.
Init string (emacs tmp and example for this trick)
rx=' ~ # oo ^ '
First, I trim the strign like this:
rx=`printf '%s\n' "$rx" | awk '{$1=$1};1'`
==> rx='~ # oo ^'
Second, I do the sed trick of Duffy with some change to apply in my case:
rx=`printf '%s\n' "$rx" | sed 's/[[:blank:]]/ /g; s/[^^ ]/[&]/g; s/\^/\\^/g'`;
==> rx='[~] [#] [oo] [^]'
Third, I apply a little awk command to make a regex:
rx=`printf '%s\n' "$rx" | awk '{ gsub(" ", "$\\|^.*", $0); print "^.*"$0"$" }'`;
==> rx='^.*[~]$\|^.*[#]$\|^.*[o][o]$\|^.*\^$'
Finally, I just exec my find command like this:
find -regex "$rx"
et voilà !
BTW, i'm doing this:
rx=`printf '%s\n' "$rx" | awk '{$1=$1};1 | sed 's/[[:blank:]]/ /g; s/[^^ ]/[&]/g; s/\^/\\^/g' | awk '{ gsub(" ", "$\\|^.*", $0);'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

POSIX sh: find and replace with function - regex

Related

Grep a result from Hive output log

How to use sed to replace every match according to each match?

Regex w/grep against tnsnames.ora

what is regular expression to get the data after _

Which characters to escape to match these in find regex expression in Bourne shell?

Categories

Resources