"Conditional" parsing of command-line arguments - c++

Say I have an executable (running on mac, win, and linux)
a.out [-a] [-b] [-r -i <file> -o <file> -t <double> -n <int> ]
where an argument in [ ] means that it is optional. However, if the last argument -r is set then -i,-o,-t, and -n have to be supplied, too.
There are lots of good C++-libraries out there to parse command-line arguments, e.g. gflags (http://code.google.com/p/gflags/), tclap (http://tclap.sourceforge.net/), simpleopt(http://code.jellycan.com/simpleopt/), boost.program_options (http://www.boost.org/doc/libs/1_52_0/doc/html/program_options.html), etc. But I wondered if there is one that lets me encode these conditional relationships between arguments directly, w/o manually coding error handling
if ( argR.isSet() && ( ! argI.isSet() || ! argO.isSet() || ... ) ) ...
and manually setting up the --help.
The library tclap allows to XOR arguments, e.g. either -a or -b is allowed but not both. So, in that terminology an AND for arguments would be nice.
Does anybody know a versatile, light-weight, and cross-platform library that can do that?

You could two passes over the arguments; If -r is in the options you reset the parser and start over with the new mandatory options added.
You could also look into how the TCLAP XorHandler works, and create your own AndHandler.

You could change the argument syntax so that -r takes four values in a row.

I have part of the TCLAP snippet of code lying around that seems to fit the error handling portion that you're looking for, however it doesn't match exactly what you're looking for:
# include "tclap/CmdLine.h"
namespace TCLAP {
class RequiredDependentArgException : public ArgException {
public:
/**
* Constructor.
* \param text - The text of the exception.
* \param parentArg - The text identifying the parent argument source
* \param dependentArg - The text identifying the required dependent argument
* of the exception.
*/
RequiredDependentArgException(
const TCLAP::Arg& parentArg,
const TCLAP::Arg& requiredArg)
: ArgException(
std::string( "Required argument ") +
requiredArg.toString() +
std::string(" missing when the ") +
parentArg.toString() +
std::string(" flag is specified."),
requiredArg.toString())
{ }
};
} // namespace TCLAP
And then make use of the new exception after TCLAP::CmdLine::parse has been called:
if (someArg.isSet() && !conditionallyRequiredArg.isSet()) {
throw(TCLAP::RequiredDependentArgException(someArg, conditionallyRequiredArg));
}
I remember looking in to extending and adding an additional class that would handle this logic, but then I realized the only thing I was actually looking for was nice error reporting because the logic wasn't entirely straightforward and couldn't be easily condensed (at least, not in a way that was useful to the next poor guy who came along). A contrived scenario dissuaded me from pursuing it further, something to the effect of, "if A is true, B must be set but C can't be set if D is of value N." Expressing such things in native C++ is the way to go, especially when it comes time to do very strict argument checks at CLI arg parse time.
For truly pathological cases and requirements, create a state machine using something like Boost.MSM (Multi-State Machine). HTH.

do you want to parse a command line?you can use simpleopt,it can be used as followings:downLoad simpleopt from:
https://code.google.com/archive/p/simpleopt/downloads
test:
int _tmain(int argc, TCHAR * argv[])
argv can be:1.txt 2.txt *.cpp

Related

Using zsh file globbing in another application

Zsh has amazing file globbing. I want to use it in another application. I dug around the zsh code a bit and found the function zglob: https://github.com/zsh-users/zsh/blob/c96606cc0617b85d3bf0784d0bf1ecd71e44cbd7/Src/glob.c#L1152-L1158
This looks like what I want to be using, but there are a few complications. The first two arguments that zglob takes are internal zsh types so it's not immediately clear how to construct a LinkList and a LinkNode given a char* which represents the string that I'd like to glob.
Ideally, I would be able to write a function with the following signature:
char** do_the_glob(char* path)
So, given a char* (or std::string) which represents the path pattern, can I get a char** (or std::vector) with the paths which the pattern resolves to?
Also, if I build the zsh code, I get a ton of libraries:
$ pwd
~/git/zsh
$ find . -name "*.so"
./Src/Builtins/rlimits.so
./Src/Builtins/sched.so
./Src/Modules/attr.so
./Src/Modules/cap.so
./Src/Modules/clone.so
./Src/Modules/curses.so
./Src/Modules/datetime.so
./Src/Modules/db_gdbm.so
./Src/Modules/example.so
./Src/Modules/files.so
./Src/Modules/langinfo.so
./Src/Modules/mapfile.so
./Src/Modules/mathfunc.so
./Src/Modules/newuser.so
./Src/Modules/parameter.so
./Src/Modules/regex.so
./Src/Modules/socket.so
./Src/Modules/stat.so
./Src/Modules/system.so
./Src/Modules/tcp.so
./Src/Modules/termcap.so
./Src/Modules/terminfo.so
./Src/Modules/zftp.so
./Src/Modules/zprof.so
./Src/Modules/zpty.so
./Src/Modules/zselect.so
./Src/Modules/zutil.so
./Src/Zle/compctl.so
./Src/Zle/complete.so
./Src/Zle/complist.so
./Src/Zle/computil.so
./Src/Zle/deltochar.so
./Src/Zle/zle.so
./Src/Zle/zleparameter.so
It's not entirely clear which libraries include which symbols, if at all. Any help from someone who's familiar with this codebase would be super awesome.

How to find unreferenced classes in a codebase

We're in a period of development where there are a lot of code that is created which may be short-lived, as it's effectively scaffolding which at some point gets replaced with something else, but will often continue to exist and be forgotten about.
Are there any good techniques for finding the classes in a codebase that aren't used? Obviously there will be many false positives (eg library classes: you might not be using all the standard containers, but you want to know they're there), but if they were listed by directory then it may make it easier to see at a glance.
I could write a script that greps for all class XXX then searches again for all instances, but has to omit results for the cpp file that the class's methods were defined in. This would also be incredibly slow - O(N^2) for the number of classes in the codebase
Code coverage tools aren't really an option here as this is has a GUI that can't have all functions easily invoked programmatically.
Platforms are Visual Studio 2013 or Xcode/clang
EDIT: I don't believe this to be a duplicate of the dead code question. Although there is an overlap, identifying dead or unreachable code isn't quite the same as finding unreferenced classes.
If you're on linux, then you can use g++ to help you with this.
I'm going to assume that only when an instance of the class is created will we consider it as being used. Therefore, rather than looking just for the name of the class you could look for calls to the constructors.
struct A
{
A () { }
};
struct B
{
B () { }
};
struct C
{
C () { }
};
void bar ()
{
C c;
}
int main ()
{
B b;
}
On linux at least, running nm on the binary has the following mangled names:
00000000004005bc T _Z3barv
00000000004005ee W _ZN1BC1Ev
00000000004005ee W _ZN1BC2Ev
00000000004005f8 W _ZN1CC1Ev
00000000004005f8 W _ZN1CC2Ev
Immediately we can tell that none of the constructors for 'A' are called.
Using slightly modified information from this SO answer we can also get g++ to remove function call graphs that are not used:
Which results in:
00000000004005ba W _ZN1BC1Ev
00000000004005ba W _ZN1BC2Ev
So, on linux at least, you can tell that neither A nor C is required in the final executable.
I've come up with a simple shell script that will at least help to focus attention on the classes that are referenced the least. I've made the assumption that if a class isn't used then it's name will still appear in one or two files (declaration in the header and definition in the cpp file). So the script uses ctags to search for class declarations in a source directory. Then for each class it does a recursive grep to find all the files that mention the class (note: you can specify different class and usage directories), and finally it writes the file counts and class names to a file and displays them in numerical order. You can then review all the entries that only had 1 or 2 mentions.
#!/bin/bash
CLASSDIR=${1:-}
USAGEDIR=${2:-}
if [ "${CLASSDIR}" = "" -o "${USAGEDIR}" = "" ]; then
echo "Usage: find_unreferenced_classes.sh <classdir> <usagedir>"
exit 1
fi
ctags --recurse=yes --languages=c++ --c++-kinds=c -x $CLASSDIR | awk '{print $1}' | uniq > tags
[ -f "counts" ] && rm counts
for class in `cat tags`;
do
count=`grep -l -r $class $USAGEDIR --include=*.h --include=*.cpp | wc -l`
echo "$count $class" >> counts
done
sort -n counts
Sample output:
1 SomeUnusedClassDefinedInHeader
2 SomeUnusedClassDefinedAndDeclaredInHAndCppFile
10 SomeClassUsedLots

What does execve() do?

What exactly does execve() do? I've tried looking at the documentation (http://linux.die.net/man/2/execve) but given that I'm very new to linux and this sort of programming it doesn't make a lot of sense. What I want to do is be able to execute this command:
nc -l -p someport -e /bin/sh
Can I do something like the following (where someport is a number such as 4444)
char *command[2];
command[0] = "nc -l -p someport -e /bin/sh"
execve(command[0], name, NULL);
execve asks the operating system to start executing a different program in the current process.
Chances are pretty decent that you want execvp or execlp instead -- you haven't mentioned anything about wanting to provide the environment for the child, but from the looks of things you probably do want the path searched to find the executable you're using.
Correct usage is
extern char * const environ[];
char * const command[] = {"nc", "-l", "-p", "porthere", "-e", "/bin/sh", NULL};
execve("/usr/bin/nc", command, environ);
You must use a full pathname, not a short name such as "nc" (more precisely: no PATH search is done, the pathname must be an actual existing file), and you must split arguments into separate strings beforehand. You also need to propagate the environment somehow, either via the extern environ mentioned in the above snippet or as obtained from the third parameter of main(); the latter is slightly more standards-blessed but may be more painful to pass around as needed.

C++: How to escape user input for safe system calls?

On a Linux platform, I have C++ code that goes like this:
// ...
std::string myDir;
myDir = argv[1]; // myDir is initialized using user input from the command line.
std::string command;
command = "mkdir " + myDir;
if (system(command.c_str()) != 0) {
return 1;
}
// continue....
Is passing user input to a system() call safe at all?
Should the user input be escaped / sanitized?
How?
How could the above code be exploited for malicious purposes?
Thanks.
Just don't use system. Prefer execl.
execl ("/bin/mkdir", "mkdir", myDir, (char *)0);
That way, myDir is always passed as a single argument to mkdir, and the shell isn't involved. Note that you need to fork if you use this method.
But if this is not just an example, you should use the mkdir C function:
mkdir(myDir, someMode);
Using system() call with command line parameters without sanitizing the input can be highly insecure.
The potential security threat could be a user passing the following as directory name
somedir ; rm -rf /
To prevent this , use a mixture of the following
use getopt to ensure your input is
sanitized
sanitize the input
use execl instead of system to execute
the command
The best option would be to use all three
Further to Matthew's answer, don't spawn a shell process unless you absolutely need it. If you use a fork/execl combination, individual parameters will never be parsed so don't need to be escaped. Beware of null characters however which will still prematurely terminate the parameter (this is not a security problem in some cases).
I assume mkdir is just an example, as mkdir can trivially be called from C++ much more easily than these subprocess suggestions.
Reviving this ancient question as I ran into the same problem and the top answers, based on fork() + execl(), weren't working for me. (They create a separate process, whereas I wanted to use async to launch the command in a thread and have the system call stay in-process to share state more easily.) So I'll give an alternative solution.
It's not usually safe to pass user input as-is, especially if the utility is designed to be sudo'd; in order to sanitize it, instead of composing the string to be executed yourself, use environment variables, which the shell has built-in escape mechanisms for.
For your example:
// ...
std::string myDir;
myDir = argv[1]; // myDir is initialized using user input from the command line.
setenv("MY_DIR", myDir, 1);
if (system("mkdir \"${MY_DIR}\"") != 0) {
return 1;
}
// continue....

Implementing "app.exe -instruction file" notation in C++

I have a project for my Data Structures class, which is a file compressor that works using Binary Trees and other stuff. We are required to "zip" and "unzip" any given file by using the following instructions in the command line:
For compressing: compressor.exe -zip file.whatever
For uncompressing: compressor.exe -unzip file.zip
We are programming in C++. I use the IDE Code::Blocks and compile using GCC in Windows.
My question is: How do you even implement that??!! How can you make your .exe receive those parameters in command line, and then execute them the way you want?
Also, anything special to have in mind if I want that implementation to compile in Linux?
Thanks for your help
You may want to look in your programming text for the signature of the main function, your program's entry point. That's where you'll be able to pull in those command line parameters.
I don't want to be more detailed than that because this is apparently a key point of the assignment, and if I ever find myself working with you, I'll expect you to be able to figure this sort of stuff out on your own once you've received an appropriate nudge. :)
Good luck!
As I recall, the Single UNIX Specification / POSIX defines getopt in unistd.h to handle the parsing of arguments for you. While this is a C function, it should also work in C++.
GNU GLIBC has this in addition to getopt_long (in getopt.h) to support GNU's extended --style .
Lo logré, I gotz it!!
I now have a basic understanding on how to use the argc and argv[ ] parameters on the main() function (I always wondered what they were good for...). For example, if I put in the command line:
compressor.exe -unzip file.zip
Then:
argc initializes in '3' (number of arguments in line)
argv[0] == "compressor.exe" (name of app.)
argv[1] == "-unzip"
argv[2] == "file.zip"
Greg (not 'Creg', sorry =P) and Bemrose, thank you guys for your help!! ^^