boost::program_options positional options - c++

I have a positional option (a file name), and I want it to be the very last option. The user can pass in a bunch of stuff on the command line, and also use -F for the file name. However, I want the user also to have the ability to place the file name at the end.
For example:
./program --var 3 /path/to/file
The code I currently have implemented allows the caller to place the file name wherever in the command line. Is there any way to force the positional arguments to always come after the "regular" ones?
Here's how I set-up the positional argument:
pos_opts_desc.add("filename", -1);
And to parse the command line:
store(
command_line_parser(argc, argv).options(opts_desc).positional(pos_opts_desc).run(),
opts_var_map);
Thanks in advance for the help.
Edited to add:
I'm perfectly OK with -F being specified anywhere in the command line. However, if the setting was done via the positional option, I want to ensure that the positional option is at the very end.

The run() member function gives you back an instance of type parsed_options. The simple usage is to never actually look at this object and pass it directly into store(), as in your example:
po::store(
po::command_line_parser(argc, argv).options(opts_desc).positional(pos_opts_desc).run(),
opts_var_map);
But we can hold onto it and examine its contents:
auto parsed = po::command_line_parser(argc, argv)
.options(opts_desc)
.positional(pos_opts_desc)
.run();
po::store(parsed, opts_var_map);
The parsed_options class has a member options which has an ordered list of all the options (unlike the variable map, which is ordered by option name - since it's a std::map). So you can look up the "filename" argument and check its position_key member. We want either: position_key == -1 (which means it provided with -F) or position_key == 0 and it being the last element in the options list (it was a positional argument that was the last argument):
auto it = std::find_if(parsed.options.begin(),
parsed.options.end(),
[](po::option const& o) {
return o.string_key == "filename";
});
if (it == parsed.options.end()) {
// fail: missing filename);
}
if (it->position_key != -1 && it != std::prev(parsed.options.end())) {
// fail: filename was positional but wasn't last
}

variables_map is as the name suggests a std::map, which allows us to use regular STL functions on it.
if ( vm.count("filename") ) {
if ( vm.find("filename") != std::prev(vm.rbegin()).base() ) {
std::cout << "filename must go at the end.";
}
}
Test cases:
g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp -lboost_system -lboost_program_options \
&& echo -n "Case 1 " && ./a.out asdf --foo=12 && echo \
&& echo -n "Case 2 " && ./a.out --foo=12 asdf && echo \
&& echo -n "Case 3 " && ./a.out asdf && echo \
&& echo -n "Case 4 " && ./a.out --foo=12 && echo \
&& echo -n "Case 5 " && ./a.out && echo \
&& echo -n "Case 6 " && ./a.out --foo=12 asdf asdf
Result:
Case 1 filename must go at the end.
Case 2
Case 3
Case 4
Case 5
Case 6 option '--filename' cannot be specified more than once

Related

Is there a way to use a variable in to define range in awk match() function 'match($0,/r{0,var}/)'

I am processing text files with thousands of records per file. Each record is made up of two lines: a header that starts with > and followed by a line with a long string of characters -AGTCNR. The two lines make a complete record.
Here is how a simple file looks like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------NNNN
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
--------NNNTCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-----TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAANNN-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA--NNAGTNNNNNNNNNNNNNNNAATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTAGGAAATTGATTAGTACCTTTAATATT----CCGAAT---
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
NNNNNNNNNNNTCCCTTTAATACTAGGAGCCCCTTTCCT----TAAATAAT-----
With the following code I can search through the second line, that contains the string of characters, for each record and extract records which have up to a certain maximum number of - or N or n characters at the beginning of line using $start_gaps variable and end of line using $end_gaps variable, this is done in the thread here:
start_Ns=10
end_Ns=10
awk -v start_N=$start_Ns -v end_N=$end_Ns ' /^>/ {
hdr=$0; next }; match($0,/^[-Nn]*/) && RLENGTH<=start_N &&
match($0,/[-Nn]*$/) && RLENGTH<=end_N {
print hdr; print }' infile.aln > without_shortseqs.aln
Now i need to search for the occurrence of - or N or n characters in the region "not including" the beginning or end terminals of the second line for every record and filter out records with more than a specific maximum number of - or N or n characters. The code below does it but i need to use a variable that i can easily reset:
start_Ns=10
end_Ns=10
awk -v start_N=10 -v end_N=10 ' /^>/ {
hdr=$0; next }; match($0,/^[-Nn]*/) && RLENGTH<=start_N &&
match($0,/[-Nn]*$/) && RLENGTH<=end_N && match($0,/N{0,11}/) {
print hdr; print }' infile.aln > without_shortseqs_mids.aln
As for a variable i tried the following but failed:
awk -v start_N=10 -v mid_N=11 -v end_N=10 ' /^>/ {
hdr=$0; next }; match($0,/^[-Nn]*/) && RLENGTH<=start_N &&
match($0,/N{0,mid_N}/) && match($0,/[-Nn]*$/) && RLENGTH<=end_N {
print hdr; print }' infile.aln > without_shortseqs_mids.aln
Expected results:
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-----TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAANNN-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTAGGAAATTGATTAGTACCTTTAATATT----CCGAAT---
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
I would suggest the following logic in order not to overcomplicate things.
Search the beginning part, remove it from the string
Search the end part, remove it from the string
Search the middle part in the remainder:
awk -v start_N=10 -v mid_N=11 -v end_N=10 '
/^>/{hdr=$0; next}
{ seq=$0 }
match(seq,/^[-Nn]*/) && RLENGTH > start_N { next }
{ seq=substr(seq,RSTART+RLENGTH) }
match(seq,/[-Nn]*$/) && RLENGTH > end_N { next }
{ seq=substr(seq,1,RSTART-1) }
{ while (match(seq,/[-Nn]+/)) {
if(RLENGTH>mid_N) next
seq=substr(seq,RSTART+RLENGTH)
}
}
{ print hdr; print $0 }' file
An alternative method would be making use of Extended Regular expressions with character duplication:
awk -v start_N=10 -v mid_N=11 -v end_N=10 '
(FNR==1) { ere_start = "^[-Nn]{" start_N+1 ",}"
ere_end = "[-Nn]{" mid_N+1 ",}$"
ere_mid = "[^-Nn][-Nn]{" end_N+1 ",}[^-Nn]"
/^>/{hdr=$0; next}
{ seq=$0 }
match(seq,ere_start) { next }
match(seq,ere_end) { next }
match(seq,ere-mid) { next }
{ print hdr; print $0 }' file
You can use a string as the second argument to match and then the regular string interpolation operators in Awk work fine.
awk -v start_N=10 -v mid_N=11 -v end_N=10 ' /^>/ {
hdr=$0; next }
match($0,/^[-Nn]*/) && RLENGTH<=start_N &&
match($0,"N{0," mid_N "}") &&
match($0,/[-Nn]*$/) && RLENGTH<=end_N {
print hdr; print }'
Just to spell this out a bit, if you use /regex/ then the text between the slashes is immediately interpreted as a regular expression, but if you use "regex" or a piece of code which evaluates to a string, the regular Awk string-handling functions are processed first, and only then is the resulting string interpreted as a regular expression.
Thanks for your question. In my humble opinion, you should rephrase a bit your question and make sure your objective is 100% clear to all potential readers of this thread.
With regards to having a variable inside a construct in which awk doesn't allow the use of a variable, there is a standard trick that would apply whichever scripting tool you would use (e.g. sed or even some more complex stuff in perl or Python): interrupt your awk script by breaking the single-quote construct, and you insert in there a variable expansion that is performed by the shell, not by awk. For instance, here, you would define mid_N in Bash and then use "${mid_N}" in the middle of your awk script, with a closing single quote immediately before and a (re-)opening single quote immediately after. Like so:
mid_N=11
awk -v start_N=10 -v end_N=10 ' /^>/ {
hdr=$0; next }; match($0,/^[-Nn]*/) && RLENGTH<=start_N &&
match($0,/N{0,'"${mid_N}"'}/) && match($0,/[-Nn]*$/) && RLENGTH<=end_N {
print hdr; print }' infile.aln > without_shortseqs_mids.aln
That's a minimal-edit solution to the specific issue you mentioned below your "As for a variable i tried the following but failed:"

sending output of a c++ program into a variable bash

I'm trying to write a system that grades a c++ code with pre-written examples that i have ready. It's a very simple c++ code like the following:
#include <iostream>
using namespace std;
int main()
{
int a;
cin >> a;
if (a > 100)
cout << "Big";
else
cout << "Small";
return 0;
}
So i want to test and grade this program with a bash, declare a score variable and echo it in the end. Here's what i wrote(I've marked where i need help writing with double quotes)
#!/bin/bash
g++ new.cpp -o new
test1=101
test2=78
score=0
if [ "Result of executing ./new with $test1 variable as input"=="Big" ]
then
(( score += 50 ))
fi
if [ "Result of executing ./new with $test2 variable as input"=="Small" ]
then
(( score += 50 ))
fi
echo $score
Also i'm still very new to bash so if you can tell me a simpler way to use bash for the examples (like loops) i'd love to hear it.
Thanks!
If you want to execute new with the params and get its result, you should try something like this:
#!/bin/bash
g++ new.cpp -o new
test1=101
test2=78
score=0
if [ $(./new $test1) == "Big" ]; then
(( score += 50 ))
fi
if [ $(./new $test2) == "Small" ]; then
(( score += 50 ))
fi
echo $score

Bash script: require each option parameter to occur at most once

I am writing a script in unix, which will take options as parameter as shown:
./command -pers
The options allowed are p, e, r, s and can be in any order and are optional also. Example, these are correct syntax: ./command -e, ./command -sr, ./command -pes, but this one is incorrect ./command -pep, ./command -ssr. Repetition of option is not allowed, but atleast one option is required.
For the same I have used regular expression, but it is not avoiding repetition.
But it is allowing repetition. Please tell what is wring with the expression.
[[ $1 =~ ^-[p{0,1}r{0,1}s{0,1}e{0,1}]{1,4}$ ]] || { echo "$MSG_INCORRECT_USAGE"; }
You can use a script opt.sh like this to avoid processing each passed option more than once:
#!/bin/bash
while getopts "pers" opt; do
[[ -n ${!opt} ]] && { echo "Error: $opt already processed"; exit 1; } || declare $opt=1
case $opt in
p) echo "processing p!" ;;
e) echo "processing e!" ;;
r) echo "processing r!" ;;
s) echo "processing s!" ;;
\?) echo "Invalid option: -$OPTARG" ;;
esac
done
Testing:
bash ./opt.sh -srr
processing s!
processing r!
Error: r already processed
bash ./opt.sh -pep
processing p!
processing e!
Error: p already processed
bash ./opt.sh -pers
processing p!
processing e!
processing r!
processing s!
#!/bin/bash
while getopts "pers" OPTION; do
echo $OPTION
done
Results:
$ bash test.sh -pers
p
e
r
s
Replace echo $OPTION with a case statement and report errors if an option appears twice. Example:
#!/bin/bash
unset OPT_P OPT_E OPT_R OPT_S
while getopts "pers" OPTION; do
case $OPTION in
p)
if [ $OPT_P ]; then
echo "-p appeared twice"
exit 64
else
OPT_P="true"
fi
;;
#... and so on ...
\?)
echo "Unrecognized option $OPTION"
exit 64
;;
done
arg0=$(basename $0 .sh)
error() { echo "$arg0: $*" >&2; exit 1; }
usage() { echo "Usage: $arg0 [-pers]" >&2; exit 1; }
p_flag=
e_flag=
r_flag=
s_flag=
while getopts pers arg
do
case "$arg" in
(e) [ -z "$e_flag" ] || error "-e flag repeated"
e_flag=1;;
(p) [ -z "$p_flag" ] || error "-p flag repeated"
p_flag=1;;
(r) [ -z "$r_flag" ] || error "-r flag repeated"
r_flag=1;;
(s) [ -z "$s_flag" ] || error "-s flag repeated"
s_flag=1;;
(*) usage;;
esac
done
shift $(($OPTIND - 1))
[ -z "$e_flag$p_flag$r_flag$s_flag" ] && error "You must specify one of -e, -p, -r, -s"
[ $# = 0 ] || error "You may not specify any non-option arguments"
…process the code for each option that was set…
If you need to process the options in the sequence they're given, then you need to record the order in which they arrive. Use an array:
flags=()
before the loop, and
flags+=("$arg")
after the esac. The processing can then process each element of the $flags in sequence.
for flag in "${flags[#]}"
do
…processing for the particular flag…
done

Matching blocks with conditions

I am in the need for some regexp guru help.
I am trying to make a small config system for a home project, but for this it seams that I need a bit more regexp code than my regexp skills can come up with.
I need to be able to extract some info inside blocks based on conditions and actions. For an example.
action1 [condition1 condition2 !condition3] {
Line 1
Line 2
Line 3
}
The conditions are stored in simple variables separated by space. I use these variables to create the regexp used to extract the block info from the file. Most if this is working fine, except that I have no idea how to make the "not matching" part, which basically means that a "word" is not available in the condition variable.
VAR1="condition1 condition2"
VAR2="condition1 condition2 condition3"
When matched against the above, it should match VAR1 but not VAR2.
This is what I have so far
PARAMS="con1 con2 con3"
INPUT_PARAMS="[^!]\\?\\<$(echo $PARAMS | sed 's/ /\\>\\|[^!]\\?\\</g')\\>"
sed -n "/^$ACTION[ \t]*\(\[\($INPUT_PARAMS\)*\]\)\?[ \t]*{/,/}$/p" default.cfg | sed '/^[^{]\+{/d' | sed '/}/d'
Not sure how pretty this is, but it does work, except for not-matching.
EDIT:
Okay I will try to elaborate a bit.
Let's say that I have the below text/config file
action1 [con1 con2 con3] {
Line A
Line B
}
action2 [con1 con2 !con3] {
Line C
}
action3 [con1 con2] {
Line D
}
action4 {
Line E
}
and I have the fallowing conditions to match against
ARG1="con1 con2 con3"
ARG2="con1 con2"
ARG3="con1"
ARG4="con1 con4"
# Matching against ARG1 should print Line A, B, D and E
# Matching against ARG2 should print Line C, D and E
# Matching against ARG3 should print Line E
# Matching against ARG4 should print Line E
Below is a java like example of action2 using normal conditional check. It give a better idea of what I am trying
if (ARG2.contains("con1") && ARG2.contains("con2") && !ARG2.contains("con3")) {
// Print all lines in this block
}
The logic of how you're selecting which records to print lines from isn't clear to me so here's how to create sets of positive and negative conditions using awk:
$ cat tst.awk
BEGIN{
RS = ""; FS = "\n"
# create the set of the positive conditions in the "conds" variable.
n = split(conds,tmp," ")
for (i=1; i<=n; i++)
wanted[tmp[i]]
}
{
# create sets of the positive and negative conditions
# present in the first line of the current record.
delete negPresent # use split("",negPresent) in non-gawk
delete posPresent
n = split($1,tmp,/[][ {]+/)
for (i=2; i<n; i++) {
cond = tmp[i]
sub(/^!/,"",cond) ? negPresent[cond] : posPresent[cond]
}
allPosInWanted = 1
for (cond in posPresent)
if ( !(cond in wanted) )
allPosInWanted = 0
someNegInWanted = 0
for (cond in negPresent)
if (cond in wanted)
someNegInWanted = 1
if (allPosInWanted && !someNegInWanted)
for (i=2;i<NF;i++)
print $i
}
.
$ awk -v conds='con1 con2 con3' -f tst.awk file
Line A
Line B
Line D
Line E
$
$ awk -v conds='con1 con2' -f tst.awk file
Line C
Line D
Line E
$
$ awk -v conds='con1' -f tst.awk file
Line E
$
$ awk -v conds='con1 con4' -f tst.awk file
Line E
$
and now you just have to code whatever logic you like in that final block where the printing is being done to compare the conditions in each of the sets.

Run file until the output matches regular expression

I would like to write a bash expression that would run the file "a.out" until the output of the file is equal to "b\na" where "\n" is a newline.
Here you go:
#/bin/bash
a.out | while :
do
read x
read y
[[ $x == 'b' && $y == 'a' ]] && break
echo $x $y
done
Tested this in bash on Ubuntu 13.04.
This might also help you quantify your results:
let ab=0 ba=0
for (( i=0; i<1000; ++i )); do
case "$(./a.out)" in
$'a\nb') let ab+=1;;
$'b\na') let ba+=1;;
esac
done
echo "a\\nb: $ab times; b\\na: $ba times"
Tested on Ubuntu 13.04
pcregrep matches b\na.
the -m flag to grep causes it to exit on first match.
until ./a.out | pcregrep -M 'b\na' | grep -m 1 a; do :; done
This should work:
./a.out | while read line; do
    [[ $s == 1 && $line == 'a' ]] && break
    s=0 
    [[ $line == 'b' ]] && s=1 
done 
Overkill way:
mkfifo myfifo
./a.out > myfifo &
pp=$!
while read line; do
[[ $s == 1 && $line == 'a' ]] && break
s=0
[[ $line == 'b' ]] && s=1
done < myfifo
kill $pp
rm myfifo
Something like
./a.out | sed '/^b$/!b;:l;n;/^b$/bl;/^a$/q'
Translation: if the current input line does not match ^b$ (beginning of line, b, end of line) start over with the next input line; otherwise, fetch the next input line; as long as we get another ^b$, keep reading, otherwise, if it matches ^a$, stop reading and quit.
:l declares a label so we have somewhere to go back to in the while loop. b without an explicit label branches to the end of the script (which then starts over with the next input line).