how to parse out string in to variables - regex

I want to be able read each line of a file that contains lines that look like: redhat-ubi-ubi7-7.8 where vendor=redhat, product=ubi, image_name=ubi7, tag=7.8 so that I can have these parsed in order to do something like:
while read -r line;
do
vendor=sed/awk
product=sed/awk
image_name=sed/awk
version=sed/awk
echo "Copying $image_name:$version into registry..."
skopeo copy \
docker-archive:/opt/app-root/src/ironbank-images/"$line" \
docker://"$REGISTRY_DOMAIN"/"$vendor"/"$product"/"$image_name":"$version" \
--dest-creds="$REGISTRY_USERNAME":"$REGISTRY_PASSWORD" \
--dest-tls-verify=false
done < "$SYNC_IMAGES"
How can I separate this string out in order to get the desired result for my usecase?

A combination of read's multi-variable feature and bash's IFS would do the trick:
while IFS=- read -r vendor product image_name version;
do
echo "Copying $image_name:$version into registry..."
skopeo copy \
docker-archive:/opt/app-root/src/ironbank-images/"${vendor}-${product}-${image_name}-${version}" \
docker://"$REGISTRY_DOMAIN"/"$vendor"/"$product"/"$image_name":"$version" \
--dest-creds="$REGISTRY_USERNAME":"$REGISTRY_PASSWORD" \
--dest-tls-verify=false
done < "$SYNC_IMAGES"

Just use awk with - as the field separator.
awk -F- -v domain="$REGISTRY_DOMAIN" -v user="$REGISTRY_USER" -v pw="$REGISTRY_PASSWORD" '
{ vendor = $1; product = $2; image_name = $3; version = $4;
printf("echo \"Copying %s:%s into registry\"\n", image_name, version);
printf("skopeo copy docker-archive:/opt/app-root/src/ironbank-images/\"%s\" docker://\"%s\"/\"%s\"/\"%s\"/\"%s\":\"$version\" --dest-creds=\"%s\":\"%s\" --dest-tls-verify=false\n",
domain, vendor, product, image_name, version, user, pw)
}' < "$SYNC_IMAGES" | bash

Just in case you want to use P.E. parameter expansion.
while read -r string; do
vendor=${string%%-*} version=${string##*-} image_name=${string%-*}
product=${image_name#*-} product=${product%-*} image_name=${image_name##*-}
echo "Copying $image_name:$version into registry..."
echo skopeo copy \
docker-archive:/opt/app-root/src/ironbank-images/"${vendor}-${product}-${image_name}-${version}" \
docker://"${REGISTRY_DOMAIN}"/"$vendor"/"$product"/"$image_name":"$version" \
--dest-creds="${REGISTRY_USERNAME}":"${REGISTRY_PASSWORD}" \
--dest-tls-verify=false
done < "$SYNC_IMAGES"

Related

How to count the number of s3 folders inside given path?

I tried to search for this solution through out but wasn't lucky. Hoping to find some solution quickly here. I have some migrated files in S3 and now there is a requirement to identify the number of folders involved in the give path. Say I have some files with as below.
If I give aws s3 ls s3://my-bucket/foo1 --recursive >> file_op.txt
"cat file_op.txt" - will look something like below:
my-bucket/foo1/foo2/foo3/foo4/foo5/foo6/foo7/file1.txt
my-bucket/foo1/foo2/foo3/foo4/foo5/foo6/foo7/file2.txt
my-bucket/foo1/foo2/foo3/foo4/foo5/foo6/file1.pdf
my-bucket/foo1/foo2/foo3/foo4/foo6/file2.txt
my-bucket/foo1/foo2/foo3/file3.txt
my-bucket/foo1/foo8/file1.txt
my-bucket/foo1/foo9/foo10/file4.csv
I have stored the output in a file and processed to find the number of files by wc -l
But I couldn't find the number of folders involved in the path.
I need the output as below:
number of files : 7
number of folders : 9
EDIT 1:
Corrected the expected number of folders.
(Excluding my-bucket and foo1)
(foo6 is in foo5 and foo4 directories)
Below is my code where I'm failing in calculating the count of directories:
#!/bin/bash
if [[ "$#" -ne 1 ]] ; then
echo "Usage: $0 \"s3 folder path\" <eg. \"my-bucket/foo1\"> "
exit 1
else
start=$SECONDS
input=$1
input_code=$(echo $input | awk -F'/' '{print $1 "_" $3}')
#input_length=$(echo $input | awk -F'/' '{print NF}' )
s3bucket=$(echo $input | awk -F'/' '{print $1}')
db_name=$(echo $input | awk -F'/' '{print $3}')
pathfinder=$(echo $input | awk 'BEGIN{FS=OFS="/"} {first = $1; $1=""; print}'|sed 's#^/##g'|sed 's#$#/#g')
myn=$(whoami)
cdt=$(date +%Y%m%d%H%M%S)
filename=$0_${myn}_${cdt}_${input_code}
folders=${filename}_folders
dcountfile=${filename}_dir_cnt
aws s3 ls s3://${input} --recursive | awk '{print $4}' > $filename
cat $filename |awk -F"$pathfinder" '{print $2}'| awk 'BEGIN{FS=OFS="/"}{NF--; print}'| sort -n | uniq > $folders
#grep -oP '(?<="$input_code" ).*'
fcount=`cat ${filename} | wc -l`
awk 'BEGIN{FS="/"}
{ if (NF > maxNF)
{
for (i = maxNF + 1; i <= NF; i++)
count[i] = 1;
maxNF = NF;
}
for (i = 1; i <= NF; i++)
{
if (col[i] != "" && $i != col[i])
count[i]++;
col[i] = $i;
}
}
END {
for (i = 1; i <= maxNF; i++)
print count[i];
}' $folders > $dcountfile
dcount=$(cat $dcountfile | xargs | awk '{for(i=t=0;i<NF;) t+=$++i; $0=t}1' )
printf "Bucket name : \e[1;31m $s3bucket \e[0m\n" | tee -a ${filename}.out
printf "DB name : \e[1;31m $db_name \e[0m\n" | tee -a ${filename}.out
printf "Given folder path : \e[1;31m $input \e[0m\n" | tee -a ${filename}.out
printf "The number of folders in the given directory are\e[1;31m $dcount \e[0m\n" | tee -a ${filename}.out
printf "The number of files in the given directory are\e[1;31m $fcount \e[0m\n" | tee -a ${filename}.out
end=$SECONDS
elapsed=$((end - start))
printf '\n*** Script completed in %d:%02d:%02d - Elapsed %d:%02d:%02d ***\n' \
$((end / 3600)) $((end / 60 % 60)) $((end % 60)) \
$((elapsed / 3600)) $((elapsed / 60 % 60)) $((elapsed % 60)) | tee -a ${filename}.out
exit 0
fi
Your question is not clear.
If we count unique relatives folder paths in the list provided there are 12:
my-bucket/foo1/foo2/foo3/foo4/foo5/foo6/foo7
my-bucket/foo1/foo2/foo3/foo4/foo5/foo6
my-bucket/foo1/foo2/foo3/foo4/foo6
my-bucket/foo1/foo2/foo3/foo4/foo5
my-bucket/foo1/foo2/foo3/foo4
my-bucket/foo1/foo2/foo3
my-bucket/foo1/foo2
my-bucket/foo1/foo8
my-bucket/foo1/foo9/foo10
my-bucket/foo1/foo9
my-bucket/foo1
my-bucket
The awk script to count this is:
BEGIN {FS = "/";} # set field deperator to "/"
{ # for each input line
commulativePath = OFS = ""; # reset commulativePath and OFS (Output Field Seperator) to ""
for (i = 1; i < NF; i++) { # loop all folders up to file name
if (i > 1) OFS = FS; # set OFS to "/" on second path
commulativePath = commulativePath OFS $i; # append current field to commulativePath variable
dirs[commulativePath] = 0; # insert commulativePath into an associative array dirs
}
}
END {
print NR " " length(dirs); # print records count, and associative array dirs length
}
If we count unique folder names there are 11:
my-bucket
foo1
foo2
foo3
foo4
foo5
foo6
foo7
foo8
foo9
foo10
The awk script to count this is:
awk -F'/' '{for(i=1;i<NF;i++)dirs[$i]=1;}END{print NR " " length(dirs)}' input.txt
You have clarified that you wanted to count the unique names, ignoring the top two levels (my-bucket and foo1) and the last level (the file name).
perl -F/ -lane'
++$f;
++$d{ $F[$_] } for 2 .. $#F - 1;
END {
print "Number of files: ".( $f // 0 );
print "Number of dirs: ".( keys(%d) // 0 );
}
'
Output:
Number of files: 7
number of dirs: 9
Specifying file to process to Perl one-liner
If you don't want mind using a pipe and calling awk twice, then it's rather clean :
mawk 'BEGIN {OFS=ORS;FS="/";_^=_}_+_<NF && --NF~($_="")' file \
\
| mawk 'NF {_[$__]} END { print length(_) }'

Tokenize and capture with sed

Suppose we have a string like
"dir1|file1|dir2|file2"
and would like to turn it into
"-f dir1/file1 -f dir2/file2"
Is there an elegant way to do this with sed or awk for a general case of n > 2?
My attempt was to try
echo "dir1|file1|dir2|file2" | sed 's/\(\([^|]\)|\)*/-f \2\/\4 -f \6\/\8/'
An awk solution:
awk -F'|' '{ for (i=1;i<=NF;i+=2) printf "-f %s/%s%s", $i, $(i+1), ((i==NF-1) ? "\n" : " ") }' \
<<<"dir1|file1|dir2|file2"
-F'|' splits the input into fields by |
for (i=1;i<=NF;i+=2) loops over the field indices in increments of 2
printf "-f %s/%s%s", $i, $(i+1), ((i==NF-1) ? "\n" : " ") prints pairs of consecutive fields joined with / and prefixed with -f<space>
((i==NF-1) ? "\n" : " ") terminates each field-pair either with a space, if more fields follow, or a \n to terminate the overall output.
In a comment, the OP suggests a shorter variation, which may be of interest if you don't need/want the output to be \n-terminated:
awk -F'|' '{ for (i=1;i<=NF;++i) printf "%s", (i%2 ? " -f " $i : "/" $i ) }' \
<<<"dir1|file1|dir2|file2"
This might work for you (GNU sed):
sed 's/\([^|]*\)|\([^|]*\)|\?/-f \1\/\2 /g;s/ $//' file
This will work for dir1|file1|dir2|file2|dirn|filen type strings
The regexp forms two back references (\1,\2 used in the replacement part of the substitution command s/pattern/replacement/), the first is all non-|'s, then a |, the second is all non-|'s then an optional | i.e. for the first application of the substitution (N.B. the g flag is implemented and so the substitutions may be multiple) dir1 becomes \1 and file1 becomes \2. All that remains is to prepend -f and replace the first | by / and the second | by a space. The last space is not needed at the end of the line and is removed in the second substitution command.
$ awk -v RS='|' 'NR%2{p=$0;next} {printf " -f %s/%s", p, $0}' <<< 'dir1|file1|dir2|file2'
-f dir1/file1 -f dir2/file2
A gnu-awk solution:
s="dir1|file1|dir2|file2"
awk 'BEGIN{ FPAT="[^|]+\\|[^|]+" } {
for (i=1; i<=NF; i++) {
sub(/\|/, "/", $i);
if (i>1)
printf " ";
printf "-f " $i
};
print ""
}' <<< "$s"
-f dir1/file1 -f dir2/file2
FPAT is used for grabbing dir1|file2 into single field.

Escaping special characters with sed

I have a script to generate char arrays from strings:
#!/bin/bash
while [ -n "$1" ]
do
echo -n "{" && echo -n "$1" | sed -r "s/((\\\\x[0-9a-fA-F]+)|(\\\\[0-7]{1,3})|(\\\\?.))/'\1',/g" && echo "0}"
shift
done
It works great as is:
$ wchar 'test\n' 'test\\n' 'test\123' 'test\1234' 'test\x12345'
{'t','e','s','t','\n',0}
{'t','e','s','t','\\','n',0}
{'t','e','s','t','\123',0}
{'t','e','s','t','\123','4',0}
{'t','e','s','t','\x12345',0}
But because sed considers each new line to be a brand new thing it doesn't handle actual newlines:
$ wchar 'test
> test'
{'t','e','s','t',
't','e','s','t',0}
How can I replace special characters (Tabs, newlines etc) with their escaped versions so that the output would be like so:
$ wchar 'test
> test'
{'t','e','s','t','\n','t','e','s','t',0}
Edit: Some ideas that almost work:
echo -n "{" && echo -n "$1" | sed -r ":a;N;;s/\\n/\\\\n/;$!ba;s/((\\\\x[0-9a-fA-F]+)|(\\\\[0-7]{1,3})|(\\\\?.))/'\1',/g" && echo "0}"
Produces:
$ wchar 'test\n\\n\1234\x1234abg
test
test'
{test\n\\n\1234\x1234abg\ntest\ntest0}
While removing the !:
echo -n "{" && echo -n "$1" | sed -r ":a;N;;s/\\n/\\\\n/;$ba;s/((\\\\x[0-9a-fA-F]+)|(\\\\[0-7]{1,3})|(\\\\?.))/'\1',/g" && echo "0}"
Produces:
$ wchar 'test\n\\n\1234\x1234abg
test
test'
{'t','e','s','t','\n','\\','n','\123','4','\x1234ab','g','\n','t','e','s','t',
test0}
This is close...
The first isn't performing the final replacement, and the second isn't correctly adding the last line
You can pre-filter before passing to sed. Perl will do:
$ set -- 'test1
> test2'
$ echo -n "$1" | perl -0777 -pe 's/\n/\\n/g'
test1\ntest2
This is a very convoluted solution, but might work for your needs. GNU awk 4.1
#!/usr/bin/awk -f
#include "join"
#include "ord"
BEGIN {
RS = "\\\\(n|x..)"
FS = ""
}
{
for (z=1; z<=NF; z++)
y[++x] = ord($z)<0x20 ? sprintf("\\x%02x",ord($z)) : $z
y[++x] = RT
}
END {
y[++x] = "\\0"
for (w in y)
y[w] = "'" y[w] "'"
printf "{%s}", join(y, 1, x, ",")
}
Result
$ cat file
a
b\nc\x0a
$ ./foo.awk file
{'a','\x0a','b','\n','c','\x0a','\0'}

filtering some text from line using sed linux

I have a following content in the file:
NAME=ALARMCARDSLOT137 TYPE=2 CLASS=116 SYSPORT=2629 STATE=U ALARM=M APPL=" " CRMPLINK=CHASSIS131 DYNDATA="GL:1,15 ADMN:1 OPER:2 USAG:2 STBY:0 AVAL:0 PROC:0 UKNN:0 INH:0 ALM:20063;1406718801,"
I just want to filter out NAME , SYSPORT and ALM field using sed
Try the below sed command to filter out NAME,SYSPORT,ALM fields ,
$ sed 's/.*\(NAME=[^ ]*\).*\(SYSPORT=[^ ]*\).*\(ALM:[^;]*\).*/\1 \2 \3/g' file
NAME=ALARMCARDSLOT137 SYSPORT=2629 ALM:20063
why not using grep?
grep -oE 'NAME=\S*|SYSPORT=\S*|ALM:[^;]*'
test with your text:
kent$ echo 'NAME=ALARMCARDSLOT137 TYPE=2 CLASS=116 SYSPORT=2629 STATE=U ALARM=M APPL=" " CRMPLINK=CHASSIS131 DYNDATA="GL:1,15 ADMN:1 OPER:2 USAG:2 STBY:0 AVAL:0 PROC:0 UKNN:0 INH:0 ALM:20063;1406718801,"'|grep -oE 'NAME=\S*|SYSPORT=\S*|ALM:[^;]*'
NAME=ALARMCARDSLOT137
SYSPORT=2629
ALM:20063
Here is another awk
awk -F" |;" -v RS=" " '/NAME|SYSPORT|ALM/ {print $1}'
NAME=ALARMCARDSLOT137
SYSPORT=2629
ALM:20063
Whenever there are name=value pairs in input files, I find it best to first create an array mapping the names to the values and then operating on the array using the names of the fields you care about. For example:
$ cat tst.awk
function bldN2Varrs( i, fldarr, fldnr, subarr, subnr, tmp ) {
for (i=2;i<=NF;i+=2) { gsub(/ /,RS,$i) }
split($0,fldarr,/[[:blank:]]+/)
for (fldnr in fldarr) {
split(fldarr[fldnr],tmp,/=/)
gsub(RS," ",tmp[2])
gsub(/^"|"$/,"",tmp[2])
name2value[tmp[1]] = tmp[2]
split(tmp[2],subarr,/ /)
for (subnr in subarr) {
split(subarr[subnr],tmp,/:/)
subName2value[tmp[1]] = tmp[2]
}
}
}
function prt( fld, subfld ) {
if (subfld) print fld "/" subfld "=" subName2value[subfld]
else print fld "=" name2value[fld]
}
BEGIN { FS=OFS="\"" }
{
bldN2Varrs()
prt("NAME")
prt("SYSPORT")
prt("DYNDATA","ALM")
}
.
$ awk -f tst.awk file
NAME=ALARMCARDSLOT137
SYSPORT=2629
DYNDATA/ALM=20063;1406718801,
and if 20063;1406718801, isn't the desired value for the ALM field and you just want some subsection of that, simply tweak the array construction function to suit whatever your criteria is.

How can I check the balance of ASCII images using bash?

I have some large ASCII images that I want to check are symmetrical. Say I have the following file:
***^^^MMM
*^**^^MMM
**^^^^^MMMMM
The first line is what I want, they are all separated and have the same amount in each section (it doesn't have to be 3 of each ever time though), and the next two are not what I want. I want to count the number of *'s in a row, and then make sure there are the same amount of ^'s and M's following them. I'm trying to get some symmetry on each line, so this would be good:
**^^MM
**********^^^^^^^^^^MMMMMMMMMM
****^^^^MMMM
*^M
etc.
How can I scan through a file and maybe grep the problem lines?
I tried a few loops with cat ASCIIfile | sed 's/\^//g' | sed 's/M//g' | wc -c and assigning output to a variable and then comparing the count to the other char counts, but obviously this doesn't take into account order and lines like *^*^*M^MM were working.
Using perl:
perl -ne ' { $l=$_; chomp; ($v)=/^((.)\2*)/; $t=length($v); \
s/M{$t}//;s/\^{$t}//;s/\*{$t}//; \
print $l if length } ' input_file
Using bash/sed:
while read line; do
m=$(echo $line | sed 's/[^M]*\([M][M]*\)[^M]*/\1/' | wc -c)
s=$(echo $line | sed 's/[^*]*\([*][*]*\)[^*]*/\1/' | wc -c)
n=$(echo $line | sed 's/[^\^]*\([\^][\^]*\)[^\^]*/\1/' | wc -c)
if [[ $m -ne $s || $m -ne $n ]]; then
echo "- $line $m::$s::$n"
else
echo "+ $line $m::$s::$n"
fi
done < input_file
Pure Bash:
#!/bin/bash
for string in '***^^^MMM' '**^^MM' '****^^MMMM' '*^*M^'
do
flag=true
sym=true
patt=''
prevlen=${#string}
for c in '*' '^' 'M'
do
patt+="*\\$c"
sub="${string##$patt}"
sublen="${#sub}"
if $flag
then
flag=false
((compare = prevlen - sublen ))
else
if (( prevlen - sublen != compare ))
then
printf '%s\n' "$string is NOT symmetrical"
sym=false
break
fi
fi
prevlen=$sublen
done
if $sym
then
printf '%s\n' "$string IS symmetrical"
fi
done
To read from a file, change the first for loop to while read -r string and add < filename after the last done on the same line.