Why check if (*argv == NULL)? [duplicate] - c++

This question already has answers here:
When can argv[0] have null?
(4 answers)
Closed 6 years ago.
In the data structures class that I am currently taking, we have been tasked with writing a web crawler in C++. To give us a head start, the professor provided us with a program to get the source from a given URL and a simple HTML parser to strip the tags out. The main function for this program accepts arguments and so uses argc/argv. The code used to check for the arguments is as follows:
// Process the arguments
if (!strcmp(option, "-h"))
{
// do stuff...
}
else if (!strcmp(option, ""))
{
// do stuff...
}
else if (!strcmp(option, "-t"))
{
// do stuff...
}
else if (!strcmp(option, "-a"))
{
// do stuff...
}
if ( *argv == NULL )
{
exit(1);
}
Where "option" has been populated with the switch in argv[1], and argv[2] and higher has the remaining arguments. The first block I understand just fine, if the switch equals the string do whatever based on the switch. I'm wondering what the purpose of the last if block is though.
It could be that my C++ is somewhat rusty, but I seem to recall *argv being equivalent to argv[0], basically meaning it is checking to make sure arguments exist. Except I was under the impression that argv[0] always (at least in most implementations) contained the name of the program being run. It occurs to me that argv[0] could be null if argc is equal to 0, but searching around on Google I couldn't find a single post determining whether or not that is even possible.
And so I turn to you. What exactly is that final if block checking?
EDIT: I've gone with the reasoning provided in the comments of the selected answer, that it may be possible to intentionally cause argv[0] to become NULL, or otherwise become NULL based on an platform-specific implementation of main.

3.6.1/2:
If argc is non-zero those arguments
shall be provided in argv[0] though
... and argv[0] shall be the pointer
to the initial character of a NTMBS
that represents the name used to
invoke the program or "". The value of
argc shall be nonnegative. The value
of argv[argc] shall be 0.
Emphasis mine. argc is only guaranteed non-negative, not non-zero.
This is at entry to main. It's also possible that //do stuff modifies the value of argv, or the contents of the array it points to. It's not entirely unheard of for option-handling code to shift values off argv as it processes them. The test for *argv == null may therefore be testing whether or not there are any command-line arguments left, after the options have been removed or skipped over. You'd have to look at the rest of the code.

argc will provide you with the number of command line arguments passed. You shouldn't need to check the contents of argv too see if there are not enough arguments.
if (argc <= 1) { // The first arg will be the executable name
// print usage
}

Remembering just how portable C is, it might not always be running on a standard platform like Windows or Unix. Perhaps it's some micro-code inside your washing machine running on some cheap, hacked environment. As such, it's good practice to make sure a pointer isn't null before dereferencing it, which might have led to the question.
Even so, you're correct. *argv is the same as argv[0], and argv is supposed to be initialized by the environment, if it's provided.

just a speculation.
what if your professor is referring to this ??
while(*++argv !=NULL)
printf("%s\n",*argv);

Related

C++ sizeof(environ), the value isn't what I expected [duplicate]

In the proccess of learning C, I'm trying to write a program that accepts one of your environment variable as input, and outputs its value.
The question is, is there any way to know the length of envp? I mean, how many envp is there? I'm aware that it is a char** - an array of string. And finding the size of array in C is problematic already. What can I do to know the size of envp?
Please just provide direction, not the concrete answer (or code).
It's terminated by a NULL pointer. You have to count it if you want to know the length.
the value of argv[argc] == NULL that should give you a clue.
You should look into getenv(). It's more portable than manipulating envp, because environments like plan9 implement the environment differently, while preserving the behavior of this function.

How are the argc and argv values passed to main() set up?

I want to better understand what's going on under the hood with the command line arguments when a C or C++ program is launched. I know, of course, that argc and argv, when passed to main(), represent the argument count and argument vector, respectively.
What I'm trying to figure out is how the compiler knows to interpret int argc as the number of arguments passed from the command line. If I write a simple function that attempts to mimic main() (e.g. int testfunc(int argc, char* argv[])), and pass in a string, the compiler complains, "Expected 'int' but argument is of type char*" as I would expect. How is this interpreted differently when command line arguments are passed to main()?
In common C implementations, main is not the first routine called when your process starts. Usually, it is some special entry point like _start that is provided by the C library built into your program when you link it. The code at this special entry point examines the command line information that is passed to it (in some way outside of C, particular to the operating system) and constructs an argument list for main. After that and other work, it calls main.
You don't pass argc value on your own (from the command line, for example), it is supplied by your environment (runtime), just like the exact content for argc.[Note below]
To elaborate, C11, chapter §5.1.2.2.1, (indicators mine)
The value of argc shall be nonnegative.
argv[argc] shall be a null pointer.
If the value of argc is greater than zero, the array members argv[0] through
argv[argc-1] inclusive shall contain pointers to strings, which are given
implementation-defined values by the host environment prior to program startup. The
intent is to supply to the program information determined prior to program startup
from elsewhere in the hosted environment. [Note start]If the host environment is not capable of
supplying strings with letters in both uppercase and lowercase, the implementation
shall ensure that the strings are received in lowercase.[Note end]

How much memory is allocated for argv[]? [duplicate]

This question already has answers here:
where command line arguments are stored?
(4 answers)
Closed 6 years ago.
I know that the command line arguments are character arrays and that they are stored on the stack. But, I want to know actual memory allocation for of each argument. e.g. suppose I passed the directory name "/tmp" as a command line argument. This will be stored in argv[1]. But as I tested, it is allowed to change argv[1] to "/tmp/log/" (size increased) in the program. How is this possible ?
To answer your question, the total maximum size available to argument strings and the passed environment can be obtained with:
getconf ARG_MAX
from the command line or the syconf equivalent from C (see http://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html for more information).
(On my Linux box, the limit is 2097152).
Your example happens to work because the arguments and the environment are realistically stored contiguously, so appending to a string will overwrite what comes after it (following arguments, or the environment).
And that's why it's a bad idea to try and expand the argv strings like that. If you want to modify them, either edit them or shrink them, but trying to expand them is a call for trouble.
On Linux, parameters are populated by create_elf_tables. For this specific platform at least, you are correct that the values are stored on the stack.
Linux only uses exactly as much memory as is necessary to store arguments and (initial) environment variables on the stack; if you try to use more than what is already there, you're overwriting something else (or crashing).
The standard states that the argv can be modified since it is a special internal.
177 — The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination, so it is allocated only what you need at the assignment or replacement.
Standard text:
http://c0x.coding-guidelines.com/5.1.2.2.1.html

getopt_long() function with custom argc and argv

I am having trouble using getopt_long() function with custom argc and argv.
I receive my arguments in a string instead of the real command line args. Then a new_argc and new_argv was built from this string to be used with getopt_long(). But getopt_long() fails on the first call itself. returns EOF and optarg = NULL.
string is "-c 10.30.99.41"
new_argc = 3
new_argv[0]=>./prog_name
new_argv[1]=>-c
new_argv[2]=>10.30.99.41
getopt_long works OK for me if I pass command line args. So my short and long options are correct. But if I pass the new_argc and new_argv it fails.
I am sure my short and long options are right and the argv is NULL terminated. I apologize I cant post more code here.
I doubt if getopt_long can be used with a custom argc and argv. I suspect it works only with a real argc and argv because it must be referencing some other code in libc related to argc,argv. Any comments?
option = getopt_long( new_argc, new_argv, short_options, long_options, NULL );
EDIT:
"The variable optind is the index of the next element to be processed in argv. The system initializes this value to 1. The caller can reset it to 1 to restart scanning of the same argv, or when scanning a new argument vector."
So, yes. You can use getopt_long to scan the arguments or another argument list again. However, if someone has called getopt_long previously, you have to set the global optind variable to back to 1.
Remember that the argv in main() is NULL terminated and argc long, that is; argv[argc] == NULL. So you likely have to make sure the last element in your own new_argv is a NULL pointer.
(Note, please show all the relevant code when posting, it's hard to guess what the error is, e.g. showing what short_options, long_option is, how you actually build your new_argv, variable declarations etc.)

What type of input check can be performed against binary data in C++?

let's say I have a function like this in C++, which I wish to publish to third parties. I want to make it so that the user will know what happened, should he/she feeds invalid data in and the library crashes.
Let's say that, if it helps, I can change the interface as well.
int doStuff(unsigned char *in_someData, int in_data_length);
Apart from application specific input validation (e.g. see if the binary begins with a known identifier etc.), what can be done? E.g. can I let the user know, if he/she passes in in_someData that has only 1 byte of data but passes in 512 as in_data_length?
Note: I already asked a similar question here, but let me ask from another angle..
It cannot be checked whether the parameter in_data_length passed to the function has the correct value. If this were possible, the parameter would be redundant and thus needless.
But a vector from the standard template library solves this:
int doStuff(const std::vector<unsigned char>& in_someData);
So, there is no possibility of a "NULL buffer" or an invalid data length parameter.
If you would know how many bytes passed by in_someData why would you need in_data_length at all?
Actually, you can only check in_someData for NULL and in_data_length for positive value. Then return some error code if needed. If a user passed some garbage to your function, this problem is obviously not yours.
In C++, the magic word you're looking for is "exception". That gives you a method to tell the caller something went wrong. You'll end up with code something like
int
doStuff(unsigned char * inSomeData, int inDataLength) throws Exception {
// do a test
if(inDataLength == 0)
throw new Exception("Length can't be 0");
// only gets here if it passed the test
// do other good stuff
return theResult;
}
Now, there's another problem with your specific example, because there's no universal way in C or C++ to tell how long an array of primitives really is. It's all just bits, with inSomeData being the address of the first bits. Strings are a special case, because there's a general convention that a zero byte ends a string, but you can't depend on that for binary data -- a zero byte is just a zero byte.
Update
This has currently picked up some downvotes, apparently by people misled by the comment that exception specifications had been deprecated. As I noted in a comment below, this isn't actually true -- while the specification will be deprecated in C++11, it's still part of the language now, so unless questioner is a time traveler writing in 2014, the throws clause is still the correct way to write it in C++.
Also note that the original questioner says "I want to make it so that the user will know what happened, should he/she feeds [sic] invalid data in and the library crashes." Thus the question is not just what can I do to validate the input data (answer: not much unless you know more about the inputs than was stated), but then how do I tell the caller they screwed up? And the answer to that is "use the exception mechanism" which has certainly not been deprecated.