I am trying to find a way of parsing sequences of related arguments, preferably using argparse.
For example:
command --global-arg --subgroup1 --arg1 --arg2 --subgroup2 --arg1 --arg3 --subgroup3 --arg4 --subcommand1 --arg1 --arg3
where --global-arg applies to the whole command, but each --subgroupN argument has sub-arguments that apply only to it (and may have the same name, such as --arg1 and --arg3 above), and where some sub-arguments are optional, so the number of sub-arguments is not constant. However, I know that each --subgroupN sub-argument set is complete either by the presence of another --subgroupN or the end of the argument list (I am not fussed if global arguments cannot appear at the end, although I imagine that is possible as long as they don't clash with sub-argument names).
The --subgroupN elements are essentially sub-commands, but I do not appear to be able to use the sub-parser ability of argparse as it slurps any following --subgroupN entries as well (and therefore barfs with unexpected arguments).
(An example of this style of argument list is used by xmlstarlet)
Are there any suggestions beyond writing my own parser? I assume I can at least leverage something out of argparse if that is the only option...
Examples
The examples below were an attempt to find a way to parse an argument structure along the following lines:
(a --name <name>|b --name <name>)+
in the first example I hoped to have --a and --b introduce a set of arguments that were processed by a subparser.
I was hoping to get something out perhaps along the lines of
Namespace(a=Namespace(name="dummya"), b=Namespace(name="dummyb"))
subparser example fails
parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers()
parser_a = subparsers.add_parser("a")
parser_b = subparsers.add_parser("b")
parser_a.add_argument("--name")
parser_b.add_argument("--name")
parser.parse_args(["a", "--name", "dummy"])
> Namespace(name='dummy') (Good)
parser.parse_args(["b", "--name", "dummyb", "a", "--name", "dummya"])
> error: unrecognized arguments: a (BAD)
mutually exclusive group fails
parser = argparse.ArgumentParser()
g = parser.add_mutually_exclusive_group()
g1 = g.add_mutually_exclusive_group()
g1.add_argument("--name")
g2 = g.add_mutually_exclusive_group()
g2.add_argument("--name")
> ArgumentError: argument --name: conflicting option string(s): --name (BAD)
(I wasn't really expecting this to work, it was an attempt to see if I could have repetition of grouped arguments.)
Other than the subparser mechanism, argparse is not designed to handle groups of arguments. Other than the nargs grouping, it handles the arguments in the order that they appear in the argv list.
As I mentioned in the comments there have been earlier questions, which can probably be found by search with words like multiple. But one way or other they seek to work about the basic order-independent design of argparse.
https://stackoverflow.com/search?q=user%3A901925+[argparse]+multiple
I think the most straight forward solution is to process the sys.argv list before hand, breaking it into groups, and then passing those sublists to one or more parsers.
parse [command --global-arg],
parse [--subgroup1 --arg1 --arg2],
parse [--subgroup2 --arg1 --arg3],
parse [--subgroup3 --arg4],
parse [--subcommand1 --arg1 --arg3]
In fact the only alternative is to use that subparser 'slurp everything else' behavior to get a remainder of arguments that can be parsed again. Use parse_known_args to return a list of unknown arguments (parse_args raises an error if that list is not empty).
Using hpaulj's reply above, I came up with the following:
args = [
"--a", "--name", "dummya",
"--b", "--name", "dummyb",
"--a", "--name", "another_a", "--opt"
]
parser_globals = argparse.ArgumentParser()
parser_globals.add_argument("--test")
parser_a = argparse.ArgumentParser()
parser_a.add_argument("--name")
parser_a.add_argument("--opt", action="store_true")
parser_b = argparse.ArgumentParser()
parser_b.add_argument("--name")
command_parsers = {
"--a": parser_a,
"--b": parser_b
}
the_namespace = argparse.Namespace()
if globals is not None:
(the_namespace, rest) = parser_globals.parse_known_args(args)
subcommand_dict = vars(the_namespace)
subcommand = []
val = rest.pop()
while val:
if val in command_parsers:
the_args = command_parsers[val].parse_args(subcommand)
if val in subcommand_dict:
if "list" is not type(subcommand_dict[val]):
subcommand_dict[val] = [subcommand_dict[val]]
subcommand_dict[val].append(the_args)
else:
subcommand_dict[val] = the_args
subcommand = []
else:
subcommand.insert(0, val)
val = None if not rest else rest.pop()
I end up with:
Namespace(
--a=[
Namespace(
name='another_a',
opt=True
),
Namespace(
name='dummya',
opt=False
)
],
--b=Namespace(
name='dummyb'
),
test=None
)
which seems to serve my purposes.
Related
I am working on creating a Sagemaker pipeline. In the evaluation step, I would like to pass an argument to my preprocess.py script.
There are a few examples online of how to do so (a sample below) but they all use static values. I want to pass a Workflow parameter (string in this case) to the script.
I tried multiple approaches but to no avail, and I even opened a Github Issue but received no response so far.
The linked Github Issue details all approaches I've taken so far, but it all boils down to the fact that a workflow parameter is only evaluated at runtime.
I would like to know if what I want to do is possible or not.
Option1: Typical approach: Passing Static values
sklearn_processor.run(
code="preprocess.py",
inputs = [
ProcessingInput(source = 'my_package/', destination = '/opt/ml/processing/input/code/my_package/')
],
outputs=[
ProcessingOutput(output_name="test_transform_data",
source = '/opt/ml/processing/output/test_transform',
destination = out_path),
],
arguments=["--time-slot-minutes", "30min"]
)
source for the sample code: How to pass region to the SKLearnProcessor - botocore.exceptions.NoRegionError: You must specify a region
Option2: My approach: Passing Workflow Parameter
step_args=myprocessor.run(
inputs=[
ProcessingInput(source=s3_full_address, destination="/opt/ml/processing/input"),
],
outputs=[
ProcessingOutput(output_name="raw", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
],
code="generate_train_test_data.py",
arguments=["--s3_prefix", s3_prefix]
)
Where s3_prefix is a workflow argument defined as s3_prefix = ParameterString(name="InputPrefix", default_value="myprefix")
To pass a workflow argument to your script you can use the option
job_arguments
1. Step defintion
Update your step definition to add the argument job_arguments
ProcessingStep(
name="step-name",
processor=my_processor,
job_arguments=[
"--my_argument",my_argument
],
...
code=f"myscript.py"
)
2. Reading the argument
In your script (myscript.py in this example), add ready the argument as follows:
def parse_args():
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script
parser.add_argument('--my_argument', type=str)
return parser.parse_known_args()
args, _ = parse_args()
args, _ = parse_args()
my_argument = args.my_argument
(Note: this employs ESRI arcpy.Describe)
I have an empty dictionary, say it's called file_dict.
I have two lists: 1. One is the list of items of file types I'll use as keys called typeList.
2. The second is a list of files in a folder, called fileList.
I am able to:
Get typeList into the dictionary as keys.
file_dict.keys()
[u'Layer', u'DbaseTable', u'ShapeFile', u'File', u'TextFile', u'RasterDataset']
I need help with:
Using comparisons that check the following: (pseudocoded)
FOR each file in fileList:
CHECK the file type
''' using arcpy.Describe -- I have a variable already called desc - it is how I got typeList '''
IF file is a particular type (say shapefile):
INSERT that value from fileList into a list within the appropriate typeList KEY in file_dict
ENDIF
ENDFOR
My desired output for file_dict would be:
>>> file_dict
{
u'Layer': ['abd.lyr', '123.lyr'], u'DbaseTable': ['ABD.dbf'],
u'ShapeFile': ['abc.shp', '123.shp'], u'File': ['123.xml'],
u'TextFile': ['ABC.txt', '123.txt'],
u'RasterDataset': ['ABC.jpg', '123.TIF']
}
Note: I would like to avoid zipping. (I get it's easier but...)
If you want to do it using simple Python script then this would help
# Input
file_list = ['abd.lyr', '123.lyr', 'ABD.dbf', 'abc.shp', '123.shp', '123.xml',
'ABC.jpg', '123.TIF', 'ABC.txt', '123.txt'
]
# Main code
file_dict = {} #dict declaration
case = {
'lyr': "Layer",
'dbf': "DbaseTable",
'shp': "ShapeFile",
'xml': "File",
'txt': "TextFile",
'jpg': "RasterDataset",
'TIF': "RasterDataset",
} # Case declaration for easy assignment
for i in file_list:
file_dict.setdefault(case[i.split(".")[-1]], []).append(i) # appending files to the case identified using setdefault method.
print (file_dict)
# Output
# {'Layer': ['abd.lyr', '123.lyr'], 'DbaseTable': ['ABD.dbf'], 'ShapeFile': ['abc.shp', '123.shp'], 'File': ['123.xml'], 'RasterDataset': ['ABC.jpg', '123.TIF'], 'TextFile': ['ABC.txt', '123.txt']}
I hope this helps and counts!
Custom Keyword written in python 2.7:
#keyword("Update ${filename} with ${properties}")
def set_multiple_test_properties(self, filename, properties):
for each in values.split(","):
each = each.replace(" ", "")
key, value = each.split("=")
self.set_test_properties(filename, key, value)
When we send paremeters in a single line as shown below, its working as expected:
"Update sample.txt with "test.update=11,timeout=20,delay.seconds=10,maxUntouchedTime=10"
But when we modify the above line with a new lines (for better readability) it's not working.
Update sample.txt with "test.update = 11,
timeout=20,
delay.seconds=10,
maxUntouchedTime=10"
Any clue on this please?
I am not very sure whether it will work or not, but please try like this
Update sample.txt with "test.update = 11,
... timeout=20,
... delay.seconds=10,
... maxUntouchedTime=10"
Your approach is not working, cause the 2nd line is considered a call to a keyword (called "timeout=20,"), the 3rd another one, and so on. The 3 dots don't work cause they are "cell separators" - delimiter b/n arguments.
If you are going for readability, you can use the Catenate kw (it's in the Strings library):
${props}= Catenate SEPARATOR=${SPACE}
... test.update = 11,
... timeout=20,
... delay.seconds=10,
... maxUntouchedTime=10
, and then call your keyword with that variable:
Update sample.txt with "${props}"
btw, a) I think your keyword declaration in the decorator is without the double quotes - i.e. called like that ^ they'll be treated as part of the argument's value, b) there seems to be an error in the py method - the argument's name is "properties" while the itterator uses "values", and c) you might want to consider using named varargs (**kwargs in python, ${kwargs} in RF syntax) for this purpose (sorry, offtopic, but couldn't resist :)
I have a list of countries that i need to convert into standardized format (iso3c). Some have long names, others have 2 or 3 digit codes, and others do not display the whole country name like "Africa" instead of "South Africa". Ive done some research and come up to use countrycode package in R. However, when i tried to use "regex" R doesnt seem to recognize it. Im getting the error below:
> countrycode(data,"regex","iso3c", warn = TRUE)
Error in countrycode(data, "regex", "iso3c", :
Origin code not supported
Any other option I need to do?
Thanks!
You can view the README for the countrycode package here https://github.com/vincentarelbundock/countrycode, or you can pull up the help file in R by entering this into your R console ?countrycode::countrycode.
"regex" is not a valid 'origin' value (2nd argument in the countrycode() function). You must use one of "cowc", "cown", "eurostat", "fao", "fips105", "imf", "ioc", "iso2c", "iso3c", "iso3n", "p4_ccode", "p4_scode", "un", "wb", "wb_api2c", "wb_api3c", "wvs", "country.name", "country.name.de" (using latest version 0.19).
If you use either of the following 'origin' values, regex matching will be performed automatically: "country.name" or "country.name.de"
If you're using a custom dictionary with the new (as of version 0.19) custom_dict argument, you must set the origin_regex argument to TRUE for regex matching to occur.
In your example, this should do what you want:
countrycode(data, origin = "country.name", destination = "iso3c", warn = TRUE)
So I'm doing something wrong in this python script, but it's becoming convoluted and I'm losing sight of what I'm doing wrong.
I want a script to go through a file, find all the function definitions, and then pull out the name, return type, and parameters of the function, and output a "doxygen" style comment like this:
/******************************************************************************/
/*!
\brief
Main function for the file
\return
The exit code for the program
*/
/******************************************************************************/
But I'm doing something wrong with the regular expression in trying to parse the parameters... Here is the script so far:
import re
import sys
f = open(sys.argv[1])
functions = []
for line in f:
match = re.search(r'([\w]+)\s+([\S]+)\(([\w+\s+\w+])+\)',line)
if line.find("\\fn") < 0:
if match:
returntype = match.group(1)
funcname = match.group(2)
print '/********************************************************************'
print " \\fn " + match.group()
print ''
print ' \\brief'
print ' Function description for ' + funcname
print ''
if len(match.groups()) > 2:
params = []
count = len(match.groups()) - 2
while count > 0:
matchingstring = match.group(count + 2)
if matchingstring.find("void") < 0:
params.append(matchingstring)
count -= 1
for parameter in params:
print " \\param " + parameter
print ' Description of ' + parameter
print ''
print ' \\return'
print ' ' + returntype
print '********************************************************************/'
print ''
Any help would be appreciated. Thanks
The grammar of C++ is far too complex to be handled by simple
regular expressions. You'll need at least a minimal parser.
I've found that for restricted cases, where I'm not concerned
with C++ in general, but only my own style, I can often get away
with a flex based tokenizer and a simple state machine. This
will fail in many cases of legal C++—for starters, of
course, if someone uses the pre-processor to modify the syntax;
but also because < can have different meanings, depending on
what precedes it names a template or not. But it's often
adequate for a specific job.
I've used a PEG parser with great success when trying to do simple format parsing. pyPeg is a very simple implementation of such a parser written in Python.
Example Python code for C++ function parser:
EDIT: Address template parameters. Tested with input from SK-logic and output is correct.
import pyPEG
from pyPEG import parseLine
import re
def symbol(): return re.compile(r"[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&*][\w:]+")
def type(): return symbol
def functionName(): return symbol
def templatedType(): return symbol, "<", -1, [templatedType, symbol, ","], ">"
def parameter(): return [templatedType, type], symbol
def template(): return "<", -1, [symbol, template], ">"
def function(): return [type, templatedType], functionName, -1, template, "(", -1, [",", parameter], ")" # -1 -> zero or more repetitions.
sourceCode = "std::string foobar(std::vector<int> &A, std::map<std::string, std::vector<std::string> > &B)"
results = parseLine(sourceCode, function(), [], packrat=True)
When this is executed results is:
([(u'type', [(u'symbol', 'std::string')]), (u'functionName', [(u'symbol', 'foobar')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'int')]), (u'symbol', '&A')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::map'), (u'symbol', 'std::string'), (u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'std::string')])]), (u'symbol', '&B')])], '')
C++ cannot really be parsed by a (sane) regular expression: they are a nightmare as soon as nesting is concerned.
There is another concern too, determining when to parse and when not to. A function may be declared:
at file scope
in a namespace
in a class
And the two last can be nested at arbitrary depths.
I would propose to use CLang here. It's a real C++ front-end with a full-featured parser and there are:
a C API, with (notably) an API to the Indexing Library
Python bindings on top of the C API
The C API and Python bindings are far from fully exposing the underlying C++ model, but for a task as simple as listing functions it should be enough.
That said, I would question the usefulness of the project: if the documentation can be generated by a simple parser, then it is redundant with the code. And redundancy is at best, useless, and worst dangerous: it introduces the potential threat of desynchronization...
If the function is tricky enough that its use requires documentation, then a developer, who knows the limitations and al, has to write this documentation.