Parsing volume array of Dockerfile in bash

Parsing volume array of Dockerfile in bash - regex

I'm working on a management script for Docker containers. Right now the user has to configure certain variables before using it. Often, these variables are already defined in the Dockerfile so the default should be to read those values.
I'm having some trouble with the array format used in these Dockerfiles. If I have the volume definition: VOLUME ["/root/", "/var/log/"] the file script should be able to figure out /root/ and /var/log, I haven't been able to accomplish this yet.
So far I have been able to get "/root/" ", " and "/var/log" out of the file using grep VOLUME Dockerfile | cut -c 8- | grep -o -P '(?<=").+?(?=")' but this stil includes the ", " which should be left out.
Does anyone have suggestions about how to parse this properly?

awk to the rescue!
$ echo VOLUME ["/root/", "/var/log/"] |
awk -F'[ ,\\[\\]]' '/VOLUME/{for(i=3;i<=NF;i+=2) print $i}'
/root/
/var/log/
by setting the delimiters you can extract all the fields.

Related

How do you pipe and filter text from tail as input for a variable in a script?

Backstory
I am trying to create a script that updates a "device" through the devices cli, but it doesn't accept any form of command following the establishment of an ssh connection.
for this reason i have started using screen to logging the output from the device and then attempting to filter the log for relevant info so i can pass commands back to the remote device by stuffing it into screens buffer.(kind of a ramshackled way of doing it but its all i can think of.
Issue
I need to use some combo of grep and sed or awk to filter out one of two outputs i'm looking for respectively "SN12345678" '\w[a-zA-Z]\d{6-10}' and "finished" inside screenlog.2 I've got regex patterns for both of these but i cannot seem to get the right output and assign it to a variable
.screenrc (relevant excerpt)
screen -t script 0 ./script
screen -t local 1 bash
screen -t remote 2 bash
screen -t Shell 3 bash
./script
screen -p 2 -X log on #turns logging on window 2
screen -p 3 -X stuff 'tail-Fn 0 screenlog.2 | #SOMESED Function that i cant figure out'
screen -p 2 -X stuff 'ssh -o "UserKnownHostsFile /dev/null" -o "StrictHostKeyChecking=no" admin#192.168.0.1^M' && echo "Stuffed ssh login -> window 2"
sleep 2 # wait for ssh connection
screen -p 2 -X stuff admin^M && echo "stuffed pw"
sleep 4 # wait for auth
screen -p 2 -X stuff "copy sw ftp://ftpuser:admin#192.168.0.2/dev_uimage-4_4_5-26222^M" && echo "initiated flash"
screen -p 2 -X stuff "copy license ftp://ftpuser:admin#192.168.0.2/$(result of sed from screenlog.2).lic^M" && echo "uploading license"
sorry if this is a bit long winded i've been wracking my brain for the last few days trying to get this to work.
Thank you for your time!

Answer
Regular Expression
Looking at the example regex you provided, I'm going to assume SN can't just be hardcoded and that it could be uppercase,lowercase,digit for first character and uppercase,lowercase for the second digit, so I think you are looking for:
grep -Eo '[[:alnum:]][[:alpha:]][[:digit:]]{6,10}' # Works regardless of computer's local settings
# OR
egrep -o '[[:alnum:]][[:alpha:]][[:digit:]]{6,10}' # Works regardless of computer's local settings
# OR
grep -Eo '[0-9A-Za-z][A-Za-z][0-9]{6,10}'
# OR
egrep -o '[0-9A-Za-z][A-Za-z][0-9]{6,10}'
These are exact conversions of your regular expression (includes the _ as. a possibility of the first character):
grep -Eo '[[:alnum:]_][[:alpha:]][[:digit:]]{6,10}' # Works regardless of computer's local settings
# OR
grep -Eo '[0-9A-Za-z_][A-Za-z][0-9]{6,10}'
# OR (non-extended regular expressions)
grep -o '[[:alnum:]_][[:alpha:]][[:digit:]]\{6,10\}'
grep -o '[0-9A-Za-z_][A-Za-z][0-9]\{6,10\}'
Reuse the Match
I don't know how you would assign the output to a variable, but I would just write it to a file and delete the file afterwards (assuming the "script" and "Shell" windows have the same pwd [present working directory]):
. . .
screen -p 3 -X stuff 'tail -Fn1 screenlog.2 | grep -Eo "[[:alnum:]][[:alpha:]][[:digit:]]{6,10}" >> SerialNumberOrID^M'
. . .
screen -p 2 -X stuff "copy license ftp://ftpuser:admin#192.168.0.2/$(cat SerialNumberOrID).lic^M" && echo "uploading license"
rm -f SerialNumberOrID
Explanation
Regular Expression
I'm fairly confident that grep, sed, and awk (and most POSIX compliant utilities) don't support \w and \d. Those are Perl-like flags. You can pass -E to grep and sed to make them use extended regular expressions (will save you from having to do as much escaping).
Command Changes
Writing the match to a file seemed like the best way to reuse it. Using >> ensures that we append to the file, so that grep will only write the matching expression to the file and won't overwrite it with an empty file. This is why it's necessary to delete the file at the end of your script (so that it won't mess up next run and also so you don't have unnecessary files laying around). In the license upload command, we use cat to output the contents of the file in-line. I also changed the tail command to tail -Fn1 because I'm pretty sure you need to at least have 1 for it to feed a line into grep.
Resources
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
grep, sed, and awk man pages

Regular expression based searching for Mercurial changeset

I would like to be able to perform regular expression-type searches on Mercurial changesets and display results using log.
I've come up with the following function, which seems to work, but has a number of possible bugs (e.g. $1 is in line of text containing the word changeset).
function hgs { hg log `hg log | grep changeset | grep "$1" \
| sed 's/changeset: *//g' | sed 's/:.*$//g' | \
awk '{print " -r " $0}'`; }
export -f hgs
Am I trying to recreate something here that already exists as a well-tested solution elsewhere?

It pretty much looks like a combination of using hg grep, making use revsets and templated output could possibly help you (check hg help revsets, hg help templates, hg help grep and possibly also hg help fileset).
E.g. to find all changes to config.lib or where the commit message contains 'pkgconfig' which were made after 2010:
hg log -r"(file('config.lib') or desc('pkgconfig')) and date('>2010')"
revsets are very powerful. You can also sort, limit to a certain number of changesets, combine different requirements...
Using the --template argument to hg log can be used to format the output in any pattern you desire.

How to replace using sed command in shell scripting to replace a string from a txt file present in one directory by another?

I am very new to shell scripting and trying to learn the "sed" command functionality.
I have a file called configurations.txt with some variables defined in it with some string values initialised to each of them.
I am trying to replace a string in a file (values.txt) which is present in some other directory by the values of the variables defined. The name of the file is values.txt.
Data present in configurations.txt:-
mem="cpu.memory=4G"
proc="cpu.processor=Intel"
Data present in the values.txt (present in /home/cpu/script):-
cpu.memory=1G
cpu.processor=Dell
I am trying to make a shell script called repl.sh and I dont have alot of code in it for now but here is what I got:-
#!/bin/bash
source /home/configurations.txt
sed <need some help here>
Expected output is after an appropriate regex applied, when I run script sh repl.sh, in my values.txt , It must have the following data present:-
cpu.memory=4G
cpu.processor=Intell
Originally which was 1G and Dell.
Would highly appreciate some quick help. Thanks

This question lacks some sort of abstract routine and looks like "help me do something concrete please". Thus it's very unlikely that anyone would provide a full solution for that problem.
What you should do try to split this task into number of small pieces.
1) Iterate over configuration.txt and get values from each line. To do that you need to get X and Y from a value="X=Y" string.
This regex could be helpful here - ([^=]+)=\"([^=]+)=([^=]+)\". It contains 3 matching groups separated by ". For example,
>> sed -r 's/([^=]+)=\"([^=]+)=([^=]+)\"/\1/' configurations.txt
mem
proc
>> sed -r 's/([^=]+)=\"([^=]+)=([^=]+)\"/\2/' configurations.txt
cpu.memory
cpu.processor
>> sed -r 's/([^=]+)=\"([^=]+)=([^=]+)\"/\3/' configurations.txt
4G
Intel
2) For each X and Y find X=Z in values.txt and substitute it with a X=Y.
For example, let's change cpu.memory value in values.txt with 4G:
>> X=cpu.memory; Y=4G; sed -r "s/(${X}=).*/\1${Y}/" values.txt
cpu.memory=4G
cpu.processor=Dell
Use -i flag to do changes in place.

Here is an awk based answer:
$ cat config.txt
cpu.memory=4G
cpu.processor=Intel
$ cat values.txt
cpu.memory=1G
cpu.processor=Dell
cpu.speed=4GHz
$ awk -F= 'FNR==NR{a[$1]=$2; next;}; {if($1 in a){$2=a[$1]}}1' OFS== config.txt values.txt
cpu.memory=4G
cpu.processor=Intel
cpu.speed=4GHz
Explanation: First read config.txt & save in memory. Then read values.txt. If a particular value was defined in config.txt, use the saved value from memory (config.txt).

batch renaming of files with perl expressions

This should be a basic question for a lot of people, but I am a biologist with no programming background, so please excuse my question.
What I am trying to do is rename about 100,000 gzipped data files that have existing name of a code (example: XG453834.fasta.gz). I'd like to name them to something easily readable and parseable by me (example: Xanthomonas_galactus_str_453.fasta.gz).
I've tried to use sed, rename, and mmv, to no avail. If I use any of those commands on a one-off script then they work fine, it's just when I try to incorporate variables into a shell script do I run into problems. I'm not getting any errors, just no names are changed, so I suspect it's an I/O error.
Here's what my files look like:
#! /bin/bash
# change a bunch of file names
file=names.txt
while IFS=' ' read -r r1 r2;
do
mmv ''$r1'.fasta.gz' ''$r2'.fasta.gz'
# or I tried many versions of: sed -i 's/"$r1"/"$r2"/' *.gz
# and I tried many versions of: rename -i 's/$r1/$r2/' *.gz
done < "$file"
...and here's the first lines of my txt file with single space delimiter:
cat names.txt
#find #replace
code1 name1
code2 name2
code3 name3
I know I can do this with python or perl, but since I'm stuck here working on this particular script I want to find a simple solution to fixing this bash script and figure out what I am doing wrong. Thanks so much for any help possible.
Also, I tried to cat the names file (see comment from Ashoka Lella below) and then use awk to move/rename. Some of the files have variable names (but will always start with the code), so I am looking for a find & replace option to just replace the "code" with the "name" and preserve the file name structure.
I suspect I am not escaping the variable within the single tick of the perl expression, but I have poured over a lot of manuals and I can't find the way to do this.

If you're absolutely sure than the filenames doesn't contain spaces of tabs, you can try the next
xargs -n2 < names.txt echo mv
This is for DRY run (will only print what will do) - if you satisfied with the result, remove the echo ...
If you want check the existence ot the target, use
xargs -n2 < names.txt echo mv -i
if you want NEVER allow overwriting of the target use
xargs -n2 < names.txt echo mv -n
again, remove the echo if youre satisfied.

I don't think that you need to be using mmv, a simple mv will do. Also, there's no need to specify the IFS, the default will work for you:
while read -r src dest; do mv "$src" "$dest"; done < names.txt
I have double quoted the variable names as it is generally considered good practice but in this case, a space in either of the filenames will result in read not working as you expect.
You can put an echo before the mv inside the loop to ensure that the correct command will be executed.
Note that in your file names.txt, the .fasta.gz suffix is already included, so you shouldn't be adding it inside the loop aswell. Perhaps that was your problem?

This should rename all files in column1 to column2 of names.txt. Provided they are in the same folder as names.txt
cat names.txt| awk '{print "mv "$1" "$2}'|sh

Is my regex too greedy?

Background: We're using a tape library and the backup software NetWorker to back up data here. The client that's installed is fairly basic, and when we need to restore more than one target directory we create a script that simply calls X client instances in the background via a script with X of the following lines:
recover -c client-srv -t "Mon Dec 10 08:00:00" -s barckup-srv -d /dest/dir/ -f -a /src/dir &
The trouble is that different partitions/directories backed up from the same machine at the same time might be spread across several different tapes, and some of those tapes may have been removed from the library between the backup and restore.
Up until recently the only ways the people here have been finding out about which tapes are needed were to either wait for the library to complain that it doesn't have a particular tape, or to set up a fake restore in an crappy old desktop GUI client and hit a particular menu option. The first option is super bad when the tape turns out to be off-site and takes a day to get back, and the second is tedious and time-consuming.
Actual Question: I've written a "meta"-script that reads the script that we've already created with the commands above, feeds it into the interactive CLI client, and gets it to spit out what tapes are required, and if they're actually in the library. To do this, the script uses the following regular expressions to pull out necessary info:
# pull out a list of the -a targets
restore_targets="`sed 's/^.* -a \([^ ]*\) .*$/\1/' $rec_script`"
# pull out a list of -c clients
restore_clients="`sed 's/^.* -c \([^ ]*\) .*$/\1/' $rec_script`"
numclients=`echo $restore_clients | uniq | wc -l`
# pull out a list of -t dates
restore_dates="`sed 's/^.* -t \"\([^\"]*\)\" .*$/\1/' $rec_script`"
numdates=`echo $restore_dates | uniq | wc -l`
I am not terribly familiar with using s/\(x\)/\1/ types of regexes, to the point that I don't remember the name, but is this the best way of accomplishing what I am doing? The commands work, but I'm wondering if I'm using the .* needlessly.

\1 refers to the first capturing group. If you replace foo(.*?) with \1 and feed in foobar, the resulting text becomes bar, as \1 points to the text captured by the first capturing group.
As for your your question, it might be safer and easier to parse the arguments using Python (or another high-level scripting language):
>>> import shlex
>>> shlex.split('recover -c client-srv -t "Mon Dec 10 08:00:00" -s barckup-srv -d /dest/dir/ -f -a /src/dir &')
['recover', '-c', 'client-srv', '-t', 'Mon Dec 10 08:00:00', '-s', 'barckup-srv', '-d', '/dest/dir/', '-f', '-a', '/src/dir', '&']
Now, this is much easier to work with. The quotes are gone and all of the components of the command are nicely split up into a list.
If you want this to be completely foolproof, you could use argparse and implement your own parser for this command line pretty easily. This will enable you to easily get the info, but it might be overkill for your situation.
As for your actual question, you can dissect the regex:
^.* -t "([^\"]*)" .*$
This regex captures -t "foo \" bar", while a non-greedy version would stop at -t "foo \".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js